Skip to content

Dy/cumulus 4381 cma to cnm#4271

Open
yenes56 wants to merge 16 commits intomasterfrom
dy/CUMULUS-4381-cmaToCNM
Open

Dy/cumulus 4381 cma to cnm#4271
yenes56 wants to merge 16 commits intomasterfrom
dy/CUMULUS-4381-cmaToCNM

Conversation

@yenes56
Copy link
Copy Markdown
Contributor

@yenes56 yenes56 commented Feb 20, 2026

Summary: Summary of changes

Addresses CUMULUS-4381: cma-to-cnm task

Changes

Migrate cma-to-cnm from PODAAC repo to cumulus repo and code change based on Pydantic
build script
terraform
Code and deployment script in consistent with cnm-response and granule-invalidator

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.

Comment thread tasks/cma-to-cnm/deploy/main.tf Outdated
@@ -0,0 +1,34 @@
locals {
build_config = jsondecode(file("${path.module}/../build-config.json"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this build_config is unused

Copy link
Copy Markdown
Contributor

@mikedorfman mikedorfman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to dive too deeply, but I added a few comments to start with. I'll take a closer look later or possibly tomorrow - in the meantime feel free to address the comments.

Comment thread tasks/cma-to-cnm/deploy/variables.tf Outdated
type = string
}

# The location of footprint dataset-config file. Ex. s3://my-internal/datset-config/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't appear to match this file

Comment thread tasks/cma-to-cnm/deploy/variables.tf Outdated
type = list(string)
}

variable "region" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is unused

Comment thread tasks/cma-to-cnm/deploy/variables.tf Outdated
type = string
}

variable "app_name" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is unused

Comment thread tasks/cma-to-cnm/README.md Outdated

source = "http://github.com/downloadpath/cma_to_cnm_module.zip"
prefix = var.prefix
region = var.region
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This region should be removed if the variables.tf is updated to remove it.

A dict representing input and copied files. See schemas/output.json for more information.

"""
LOGGER.debug(event)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing the debug of the whole event unless it's needed for something specific.

Comment on lines +42 to +43
if "granules" not in input.keys():
raise Exception('"granules" is missing from input')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be removed since schema validation will fail in this case

if "granules" not in input.keys():
raise Exception('"granules" is missing from input')
# Building the URI from info provided by provider since the granule itself might not have it
uri = f"{meta_provider['protocol']}://{meta_provider['host']}/"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the schema, protocol and host are not required fields. Either update the schema to make them required or update handling here for when they are missing.

+ granule_file.name
if granule_file.path is not None
else uri + granule_file.name,
size=granule_file.size or 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 0 here, should we calculate the actual size of the file? Or throw an error? I'd be worried about setting this to 0 since that won't be accurate.

receivedTime=now_aware,
processCompleteTime=now_aware,
submissionTime=now_aware,
identifier=str(uuid.uuid4()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When returning a CNM-R, this identifier must match that of the original CNM-S - what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one? I'm not sure whether there's a use-case for a randomized identifier string in the CNM-R.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, wish to talk to you more about this one.
I assume CNM-R means CNMResponse. CNM-S means CNMRequet

since we are creating a bunch of CNM Request message here then leverage your CNM Sender Lambda to send the CNM, we are creating the identifier here.

" what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one"

do you mean

the identifier could be coming from upstream as an optional field in the config object then:

  • if config.cnm_identifier is not None, then make use of it
  • Otherwise, generate a identifier here?

Copy link
Copy Markdown
Contributor

@mikedorfman mikedorfman Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume CNM-R means CNMResponse. CNM-S means CNMRequet

Correct

do you mean

the identifier could be coming from upstream as an optional field in the config object then:

if config.cnm_identifier is not None, then make use of it
Otherwise, generate a identifier here?

Close, except my proposal would be to make this a required config field for the task. I took a look at the ICD to sanity-check my understanding and in table 2-4, the identifier field definition is "The generated unique identifier for the CNM-S message." (eg the identifier the provider sent us in their CNM message) and the notes state "Must be kept from the original CNM-S." - so I think this cannot be generated by the task in this way, it has to be provided to the task.

The result of the cumulus task. See schemas/output.json for more information.

"""
return run_cumulus_task(task, event, context)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to validate the schemas, consider passing in the fourth schemas arg to run_cumulus_task similar to https://github.com/nasa/cumulus/blob/master/tasks/granule-invalidator/src/main.py#L33

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemas containing, input, output and config
since input/output are forced by Pydantic, maybe I could only pass in
schemas['config'] = path to config schema?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have jsonschema files for all three (input, output and config), I think it makes sense to include all three in the run_cumulus_task call to be consistent with other tasks and to assure the jsonschema for the input/output don't drift from what's reflected in the task. But we can talk more about that today in our standup per my comment here to make sure we're all handling validation in the same way.

Copy link
Copy Markdown
Contributor Author

@yenes56 yenes56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikedorfman

Could you please review 2 of my comments/questions related to

  • identifier
  • schema validation as 4th parameter

Thanks!

?

Comment thread tasks/cma-to-cnm/deploy/main.tf Outdated

resource "aws_lambda_function" "cma_to_cnm" {
filename = "${path.module}/../dist/final/lambda.zip"
function_name = "${var.prefix}-CMAToCNM"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other tasks that have acronyms in the name (such as PDR) it seems that only the first letter of the acronym is capitalized. So to be consistent with the naming of other core tasks I would suggest naming it like this:

Suggested change
function_name = "${var.prefix}-CMAToCNM"
function_name = "${var.prefix}-CmaToCnm"

Comment on lines +22 to +25
vpc_config {
subnet_ids = var.subnet_ids
security_group_ids = var.security_group_ids
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the vpc_config section of cnm-response, cnm-to-cma, or granule-invalidator and copy from there. It's got a slightly different setup that allows for the vpc_config to be optional.

tags = local.tags
}

resource "aws_cloudwatch_log_group" "cma_to_cnm" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note here, I was doing more research on this and I found a recommendation that the lambda function should have a depends_on on the log group so that terraform knows to create the log group before the lambda. Otherwise it could be possible for terraform to create the lambda first, which auto-creates a log group with this name and then terraform fails to create the log group due to the resource already existing.

However, that change also requires moving the lambda function name to a local otherwise you get a cyclic dependency between the two resources.

I was working on adding those changes here: #4268

# Config Content
meta_provider = config.get("provider", [])
meta_collection = config.get("collection")
meta_cumulus = config.get("cumulus_meta", {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest passing the items that you need from the cumulus_meta section through the config explicitly rather than passing in the entire cumulus_meta section. So that would be state_machine and execution_name by the looks of things. That somehow feels a bit cleaner to me, but feel free to disagree if you have a good reason for passing the entire cumulus_meta section!

granules: List[models_cma.Granule] = granule_model.granules
cnm_json_dicts: List[dict] = []
for granule in granules:
LOGGER.debug("granuleId: {}", granule.granuleId)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest thinking about what messages are appropriate for what log level. I think it would be useful to have at least some INFO level messages as well so that the task will output something useful in production.

{
"name": "@cumulus/cma-to-cnm",
"version": "1.0.0",
"description": "CMA 2 CNM task",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistent about the choice to use 'to' or '2'. So I'd suggest:

Suggested change
"description": "CMA 2 CNM task",
"description": "CMA to CNM task",



This task is designed as a modular component. You can load it into your infrastructure using the following Terraform example:
```angular2html
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think angular2html is the right language here. I think you want hcl for terraform code if I recall correctly.

Suggested change
```angular2html
```hcl


Example of workflow configuraton:

```angular2html
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```angular2html
```json

The output key : cnm_list should be mapped to the payload key in the workflow's output.
Workflow developer could choose to the original cnm message to be stored in meta.cnm key.
Example workflow:
```angular2html
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest running your json through an autoformatter before pasting it in here. There's a chance if you have an autoformatter set up in your editor, that putting the correct language here will enable the autoformatter to work on the fenced code as well.

Suggested change
```angular2html
```json

- About json schema compiling to Plaint Python classes:
- used datamodel-code-generator tool: https://koxudaxi.github.io/datamodel-code-generator/ to generate pydantic models from json schema files.
- example below
```angular2html
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either shell or no language probably would be the best options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants