Generic documents are documents which conform to the "Generic" API in CMR. These are JSON documents validated against a known schema file.
Within the Generic config.json file there is a section called IndexConfiguration. This contains two settings:
- AllowAppending: When set to true, then multiple Indexes with the same
Namevalue will be appended together to create one larger value. Otherwise the last one in the config file will be the setting used. - AdditionalKeywords: List of simple fields to be added to the
keywordfield for general searching. By default CMR uses: LongName, Version, Description, RelatedURLs.
Indexes:
- Description: Human readable description of field. Shows up in some logs
- Field: jq like path to field data
- Name: Field Name
- Mapping: The Elastic field type
- token: text-field-mapping
- string: string-field-mapping
- int: int-field-mapping
- date: date-field-mapping
- Indexer:
- default (none): direct, one-to-one mapping
- simple-array-field: index a sub field of an array element
- complex-fields-only: Complex indexer field that handles both single objects and arrays, and formats using field values only (not field names)
- complex-field: takes a list of sub fields and combines them
If adding a new document, you will need to update the defconf variable by either setting an ENV for global change, or by updating the default value in /common-lib/src/cmr/common/config.clj. The format for this value is either JSON for an ENV variable or a clojure map if setting directly in the default attribute of the defconfig like this:
(defconfig approved-pipeline-documents
{:default {:grid ["0.0.1"]
:data-quality-summary ["1.0.0"]
:order-option ["1.0.0"]
:service-entry ["1.0.0"]
:service-option ["1.0.0"]
:visualization ["1.0.0"]}
:parser #(json/parse-string % true)})
When setting in an ENV or in AWS, use the JSON format:
"{\"grid\": [\"0.0.1\"],
\"data-quality-summary\": [\"1.0.0\"],
\"order-option\": [\"1.0.0\"],
\"service-entry\": [\"1.0.0\"],
\"service-option\": [\"1.0.0\"]
\"visualization\": [\"1.0.0\"]}"
Each setting consists of a key, which is the name for the Generic which must be unique, and a list of version numbers. These values must match parts of a file system path under "schemas". For example, the order-option value must resolve to:
./{CMR-Root}/schemas/order-option/v0.0.1/
README.md
config.json
metadata.json
schema.json
CMR will search for Generic definitions using the lower case value of the key (name) and the version number prefixed with a "v". Inside the directory there must be 4 files, three of which are directly read by CMR:
- README.md - for humans
- config.json - Search/Index/Validation configuration settings, must comply with [Config Schema][schema-config].
- metadata.json - sample record, may be called by system-int-tests
- schema.json - A schema document conforming to JSON Schema.
Run CMR as normal, however if you wish to confirm which schemas are configured, look for the following in the logs:
Generic documents pipeline supports:
Followed by a list of configured Generic Documents and the supported versions.
Creating:
- Create a new directory in the EMFD Generics repository.
- In the directory create a README.md file
- Create a
CHANGELOG.MDfile and populate like other formats do - Create a directory with the semantic version number prefixed with the letter
v- Version 1.0.0 would be
v1.0.0. - Information on [Semantic Versioning][semver]
- Version 1.0.0 would be
- Create at least a metadata.json and schema.json file inside the version number.
- Commit, Get approved, Merge
- Copy all files to the
schemasdirectory under CMR - call
cmr setup devto update schemas in all projects and must be called for a change to show up- NOTE:
(user/reset)within the repl will also trigger a copy of files
- NOTE:
Updating is much the same as start, create a new version folder, populate it.
DON'T forget to update the change log!
Commit and distribute to CMR as done in the addition steps.
The API endpoints are generated from a pre-configured list of document types. The type is singular in ingest and plural in search. The examples below use the "order-option"/"order-options" type, with a native ID of "order-option-1".
curl -v -XPOST -H "$TOKEN" -H "Content-Type:application/vnd.nasa.cmr.umm+json" "https://cmr.earthdata.nasa.gov/ingest/order-option/order-option-1?provider=PROV1" -d @order-option-1.json
curl -v -XPUT -H "$TOKEN" -H "Content-Type:application/vnd.nasa.cmr.umm+json" "https://cmr.earthdata.nasa.gov/ingest/order-option/order-option-1?provider=PROV1" -d @order-option-1.json
curl -v -XGET -H "$TOKEN" -H "Content-Type:application/vnd.nasa.cmr.umm+json" "https://cmr.earthdata.nasa.gov/ingest/order-option/order-option-1?provider=PROV1"
curl -v -XDELETE -H "$TOKEN" "https://cmr.earthdata.nasa.gov/ingest/order-option/order-option-1?provider=PROV1"
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/search/concepts/OO1200000002-PROV1
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/search/order-options?name="With%20Browse"
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/search/order-options.json?name="With%20Browse"
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/search/order-options.json?provider="PROV1"
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/search/order-options.json?concept_id="OO1200000002-PROV1"
Note also that concept IDs begin with a prefix unique to that document type. (Above see "OO" for order option)
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/bootstrap/bulk_index/grids/
curl -v -H "$TOKEN" https://cmr.earthdata.nasa.gov/bootstrap/bulk_index/grids/PROV1
Copyright © 2014-2022 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. All Rights Reserved.
[schema-config]: https://git.earthdata.nasa.gov/projects/EMFD/repos/otherschemas/browse/Config " "Generics Configuration definition" [semver]: https://semver.org "Information on semantic versioning"