Conversation
for more information, see https://pre-commit.ci
| @@ -0,0 +1,34 @@ | |||
| locals { | |||
| build_config = jsondecode(file("${path.module}/../build-config.json")) | |||
There was a problem hiding this comment.
It appears this build_config is unused
mikedorfman
left a comment
There was a problem hiding this comment.
I haven't had a chance to dive too deeply, but I added a few comments to start with. I'll take a closer look later or possibly tomorrow - in the meantime feel free to address the comments.
| type = string | ||
| } | ||
|
|
||
| # The location of footprint dataset-config file. Ex. s3://my-internal/datset-config/ |
There was a problem hiding this comment.
This comment doesn't appear to match this file
| type = list(string) | ||
| } | ||
|
|
||
| variable "region" { |
There was a problem hiding this comment.
It appears this is unused
| type = string | ||
| } | ||
|
|
||
| variable "app_name" { |
There was a problem hiding this comment.
It appears this is unused
|
|
||
| source = "http://github.com/downloadpath/cma_to_cnm_module.zip" | ||
| prefix = var.prefix | ||
| region = var.region |
There was a problem hiding this comment.
This region should be removed if the variables.tf is updated to remove it.
| A dict representing input and copied files. See schemas/output.json for more information. | ||
|
|
||
| """ | ||
| LOGGER.debug(event) |
There was a problem hiding this comment.
Consider removing the debug of the whole event unless it's needed for something specific.
| if "granules" not in input.keys(): | ||
| raise Exception('"granules" is missing from input') |
There was a problem hiding this comment.
This check can be removed since schema validation will fail in this case
| if "granules" not in input.keys(): | ||
| raise Exception('"granules" is missing from input') | ||
| # Building the URI from info provided by provider since the granule itself might not have it | ||
| uri = f"{meta_provider['protocol']}://{meta_provider['host']}/" |
There was a problem hiding this comment.
In the schema, protocol and host are not required fields. Either update the schema to make them required or update handling here for when they are missing.
| + granule_file.name | ||
| if granule_file.path is not None | ||
| else uri + granule_file.name, | ||
| size=granule_file.size or 0, |
There was a problem hiding this comment.
Instead of 0 here, should we calculate the actual size of the file? Or throw an error? I'd be worried about setting this to 0 since that won't be accurate.
| receivedTime=now_aware, | ||
| processCompleteTime=now_aware, | ||
| submissionTime=now_aware, | ||
| identifier=str(uuid.uuid4()), |
There was a problem hiding this comment.
When returning a CNM-R, this identifier must match that of the original CNM-S - what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one? I'm not sure whether there's a use-case for a randomized identifier string in the CNM-R.
There was a problem hiding this comment.
hmm, wish to talk to you more about this one.
I assume CNM-R means CNMResponse. CNM-S means CNMRequet
since we are creating a bunch of CNM Request message here then leverage your CNM Sender Lambda to send the CNM, we are creating the identifier here.
" what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one"
do you mean
the identifier could be coming from upstream as an optional field in the config object then:
- if config.cnm_identifier is not None, then make use of it
- Otherwise, generate a identifier here?
There was a problem hiding this comment.
I assume CNM-R means CNMResponse. CNM-S means CNMRequet
Correct
do you mean
the identifier could be coming from upstream as an optional field in the config object then:
if config.cnm_identifier is not None, then make use of it
Otherwise, generate a identifier here?
Close, except my proposal would be to make this a required config field for the task. I took a look at the ICD to sanity-check my understanding and in table 2-4, the identifier field definition is "The generated unique identifier for the CNM-S message." (eg the identifier the provider sent us in their CNM message) and the notes state "Must be kept from the original CNM-S." - so I think this cannot be generated by the task in this way, it has to be provided to the task.
| The result of the cumulus task. See schemas/output.json for more information. | ||
|
|
||
| """ | ||
| return run_cumulus_task(task, event, context) |
There was a problem hiding this comment.
In order to validate the schemas, consider passing in the fourth schemas arg to run_cumulus_task similar to https://github.com/nasa/cumulus/blob/master/tasks/granule-invalidator/src/main.py#L33
There was a problem hiding this comment.
schemas containing, input, output and config
since input/output are forced by Pydantic, maybe I could only pass in
schemas['config'] = path to config schema?
There was a problem hiding this comment.
If you have jsonschema files for all three (input, output and config), I think it makes sense to include all three in the run_cumulus_task call to be consistent with other tasks and to assure the jsonschema for the input/output don't drift from what's reflected in the task. But we can talk more about that today in our standup per my comment here to make sure we're all handling validation in the same way.
yenes56
left a comment
There was a problem hiding this comment.
Could you please review 2 of my comments/questions related to
- identifier
- schema validation as 4th parameter
Thanks!
?
|
|
||
| resource "aws_lambda_function" "cma_to_cnm" { | ||
| filename = "${path.module}/../dist/final/lambda.zip" | ||
| function_name = "${var.prefix}-CMAToCNM" |
There was a problem hiding this comment.
Looking at other tasks that have acronyms in the name (such as PDR) it seems that only the first letter of the acronym is capitalized. So to be consistent with the naming of other core tasks I would suggest naming it like this:
| function_name = "${var.prefix}-CMAToCNM" | |
| function_name = "${var.prefix}-CmaToCnm" |
| vpc_config { | ||
| subnet_ids = var.subnet_ids | ||
| security_group_ids = var.security_group_ids | ||
| } |
There was a problem hiding this comment.
Take a look at the vpc_config section of cnm-response, cnm-to-cma, or granule-invalidator and copy from there. It's got a slightly different setup that allows for the vpc_config to be optional.
| tags = local.tags | ||
| } | ||
|
|
||
| resource "aws_cloudwatch_log_group" "cma_to_cnm" { |
There was a problem hiding this comment.
A note here, I was doing more research on this and I found a recommendation that the lambda function should have a depends_on on the log group so that terraform knows to create the log group before the lambda. Otherwise it could be possible for terraform to create the lambda first, which auto-creates a log group with this name and then terraform fails to create the log group due to the resource already existing.
However, that change also requires moving the lambda function name to a local otherwise you get a cyclic dependency between the two resources.
I was working on adding those changes here: #4268
| # Config Content | ||
| meta_provider = config.get("provider", []) | ||
| meta_collection = config.get("collection") | ||
| meta_cumulus = config.get("cumulus_meta", {}) |
There was a problem hiding this comment.
I might suggest passing the items that you need from the cumulus_meta section through the config explicitly rather than passing in the entire cumulus_meta section. So that would be state_machine and execution_name by the looks of things. That somehow feels a bit cleaner to me, but feel free to disagree if you have a good reason for passing the entire cumulus_meta section!
| granules: List[models_cma.Granule] = granule_model.granules | ||
| cnm_json_dicts: List[dict] = [] | ||
| for granule in granules: | ||
| LOGGER.debug("granuleId: {}", granule.granuleId) |
There was a problem hiding this comment.
I would suggest thinking about what messages are appropriate for what log level. I think it would be useful to have at least some INFO level messages as well so that the task will output something useful in production.
| { | ||
| "name": "@cumulus/cma-to-cnm", | ||
| "version": "1.0.0", | ||
| "description": "CMA 2 CNM task", |
There was a problem hiding this comment.
I think we should be consistent about the choice to use 'to' or '2'. So I'd suggest:
| "description": "CMA 2 CNM task", | |
| "description": "CMA to CNM task", |
|
|
||
|
|
||
| This task is designed as a modular component. You can load it into your infrastructure using the following Terraform example: | ||
| ```angular2html |
There was a problem hiding this comment.
I don't think angular2html is the right language here. I think you want hcl for terraform code if I recall correctly.
| ```angular2html | |
| ```hcl |
|
|
||
| Example of workflow configuraton: | ||
|
|
||
| ```angular2html |
There was a problem hiding this comment.
| ```angular2html | |
| ```json |
| The output key : cnm_list should be mapped to the payload key in the workflow's output. | ||
| Workflow developer could choose to the original cnm message to be stored in meta.cnm key. | ||
| Example workflow: | ||
| ```angular2html |
There was a problem hiding this comment.
I'd also suggest running your json through an autoformatter before pasting it in here. There's a chance if you have an autoformatter set up in your editor, that putting the correct language here will enable the autoformatter to work on the fenced code as well.
| ```angular2html | |
| ```json |
| - About json schema compiling to Plaint Python classes: | ||
| - used datamodel-code-generator tool: https://koxudaxi.github.io/datamodel-code-generator/ to generate pydantic models from json schema files. | ||
| - example below | ||
| ```angular2html |
There was a problem hiding this comment.
Either shell or no language probably would be the best options.
Summary: Summary of changes
Addresses CUMULUS-4381: cma-to-cnm task
Changes
Migrate cma-to-cnm from PODAAC repo to cumulus repo and code change based on Pydantic
build script
terraform
Code and deployment script in consistent with cnm-response and granule-invalidator
PR Checklist
📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.