Dy/cumulus 4381 cma to cnm by yenes56 · Pull Request #4271 · nasa/cumulus

yenes56 · 2026-02-20T01:42:50Z

Summary: Summary of changes

Changes

Migrate cma-to-cnm from PODAAC repo to cumulus repo and code change based on Pydantic
build script
terraform
Code and deployment script in consistent with cnm-response and granule-invalidator

PR Checklist

Update CHANGELOG
Unit tests
Ad-hoc testing - Deploy changes and test manually
Integration tests

📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.

for more information, see https://pre-commit.ci

mikedorfman · 2026-02-23T15:32:00Z

@@ -0,0 +1,34 @@
+locals {
+  build_config = jsondecode(file("${path.module}/../build-config.json"))


It appears this build_config is unused

mikedorfman

I haven't had a chance to dive too deeply, but I added a few comments to start with. I'll take a closer look later or possibly tomorrow - in the meantime feel free to address the comments.

mikedorfman · 2026-02-23T15:33:00Z

+  type = string
+}
+
+# The location of footprint dataset-config file. Ex. s3://my-internal/datset-config/


This comment doesn't appear to match this file

mikedorfman · 2026-02-23T15:33:37Z

+  type = list(string)
+}
+
+variable "region" {


It appears this is unused

mikedorfman · 2026-02-23T15:33:40Z

+  type = string
+}
+
+variable "app_name" {


It appears this is unused

mikedorfman · 2026-02-23T15:34:22Z

+
+   source = "http://github.com/downloadpath/cma_to_cnm_module.zip"
+   prefix = var.prefix
+   region = var.region


This region should be removed if the variables.tf is updated to remove it.

mikedorfman · 2026-02-23T15:38:10Z

+        A dict representing input and copied files. See schemas/output.json for more information.
+
+    """
+    LOGGER.debug(event)


Consider removing the debug of the whole event unless it's needed for something specific.

mikedorfman · 2026-02-23T15:43:19Z

+    if "granules" not in input.keys():
+        raise Exception('"granules" is missing from input')


This check can be removed since schema validation will fail in this case

mikedorfman · 2026-02-23T15:44:36Z

+    if "granules" not in input.keys():
+        raise Exception('"granules" is missing from input')
+    # Building the URI from info provided by provider since the granule itself might not have it
+    uri = f"{meta_provider['protocol']}://{meta_provider['host']}/"


In the schema, protocol and host are not required fields. Either update the schema to make them required or update handling here for when they are missing.

mikedorfman · 2026-02-23T15:49:37Z

+                    + granule_file.name
+                    if granule_file.path is not None
+                    else uri + granule_file.name,
+                    size=granule_file.size or 0,


Instead of 0 here, should we calculate the actual size of the file? Or throw an error? I'd be worried about setting this to 0 since that won't be accurate.

mikedorfman · 2026-02-23T15:58:06Z

+                receivedTime=now_aware,
+                processCompleteTime=now_aware,
+                submissionTime=now_aware,
+                identifier=str(uuid.uuid4()),


When returning a CNM-R, this identifier must match that of the original CNM-S - what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one? I'm not sure whether there's a use-case for a randomized identifier string in the CNM-R.

hmm, wish to talk to you more about this one.
I assume CNM-R means CNMResponse. CNM-S means CNMRequet

since we are creating a bunch of CNM Request message here then leverage your CNM Sender Lambda to send the CNM, we are creating the identifier here.

" what are your thoughts on providing a config value for CNM-S identifier that's populated here instead of generating a new one"

do you mean

the identifier could be coming from upstream as an optional field in the config object then:

if config.cnm_identifier is not None, then make use of it

Otherwise, generate a identifier here?

I assume CNM-R means CNMResponse. CNM-S means CNMRequet

Correct

do you mean

the identifier could be coming from upstream as an optional field in the config object then:

if config.cnm_identifier is not None, then make use of it
Otherwise, generate a identifier here?

Close, except my proposal would be to make this a required config field for the task. I took a look at the ICD to sanity-check my understanding and in table 2-4, the identifier field definition is "The generated unique identifier for the CNM-S message." (eg the identifier the provider sent us in their CNM message) and the notes state "Must be kept from the original CNM-S." - so I think this cannot be generated by the task in this way, it has to be provided to the task.

mikedorfman · 2026-02-23T16:00:27Z

+        The result of the cumulus task. See schemas/output.json for more information.
+
+    """
+    return run_cumulus_task(task, event, context)


In order to validate the schemas, consider passing in the fourth schemas arg to run_cumulus_task similar to https://github.com/nasa/cumulus/blob/master/tasks/granule-invalidator/src/main.py#L33

schemas containing, input, output and config
since input/output are forced by Pydantic, maybe I could only pass in
schemas['config'] = path to config schema?

If you have jsonschema files for all three (input, output and config), I think it makes sense to include all three in the run_cumulus_task call to be consistent with other tasks and to assure the jsonschema for the input/output don't drift from what's reflected in the task. But we can talk more about that today in our standup per my comment here to make sure we're all handling validation in the same way.

yenes56

@mikedorfman

Could you please review 2 of my comments/questions related to

identifier
schema validation as 4th parameter

Thanks!

?

reweeden · 2026-02-24T17:31:49Z

+
+resource "aws_lambda_function" "cma_to_cnm" {
+  filename         = "${path.module}/../dist/final/lambda.zip"
+  function_name    = "${var.prefix}-CMAToCNM"


Looking at other tasks that have acronyms in the name (such as PDR) it seems that only the first letter of the acronym is capitalized. So to be consistent with the naming of other core tasks I would suggest naming it like this:

Suggested change

function_name = "${var.prefix}-CMAToCNM"

function_name = "${var.prefix}-CmaToCnm"

reweeden · 2026-02-24T17:33:24Z

+  vpc_config {
+    subnet_ids         = var.subnet_ids
+    security_group_ids = var.security_group_ids
+  }


Take a look at the vpc_config section of cnm-response, cnm-to-cma, or granule-invalidator and copy from there. It's got a slightly different setup that allows for the vpc_config to be optional.

reweeden · 2026-02-24T17:36:03Z

+  tags = local.tags
+}
+
+resource "aws_cloudwatch_log_group" "cma_to_cnm" {


A note here, I was doing more research on this and I found a recommendation that the lambda function should have a depends_on on the log group so that terraform knows to create the log group before the lambda. Otherwise it could be possible for terraform to create the lambda first, which auto-creates a log group with this name and then terraform fails to create the log group due to the resource already existing.

However, that change also requires moving the lambda function name to a local otherwise you get a cyclic dependency between the two resources.

I was working on adding those changes here: #4268

reweeden · 2026-02-24T17:39:55Z

+    # Config Content
+    meta_provider = config.get("provider", [])
+    meta_collection = config.get("collection")
+    meta_cumulus = config.get("cumulus_meta", {})


I might suggest passing the items that you need from the cumulus_meta section through the config explicitly rather than passing in the entire cumulus_meta section. So that would be state_machine and execution_name by the looks of things. That somehow feels a bit cleaner to me, but feel free to disagree if you have a good reason for passing the entire cumulus_meta section!

reweeden · 2026-02-24T17:41:24Z

+        granules: List[models_cma.Granule] = granule_model.granules
+        cnm_json_dicts: List[dict] = []
+        for granule in granules:
+            LOGGER.debug("granuleId: {}", granule.granuleId)


I would suggest thinking about what messages are appropriate for what log level. I think it would be useful to have at least some INFO level messages as well so that the task will output something useful in production.

reweeden · 2026-02-24T17:58:11Z

+{
+  "name": "@cumulus/cma-to-cnm",
+  "version": "1.0.0",
+  "description": "CMA 2 CNM task",


I think we should be consistent about the choice to use 'to' or '2'. So I'd suggest:

Suggested change

"description": "CMA 2 CNM task",

"description": "CMA to CNM task",

reweeden · 2026-02-24T17:59:52Z

+
+
+This task is designed as a modular component. You can load it into your infrastructure using the following Terraform example:
+```angular2html


I don't think angular2html is the right language here. I think you want hcl for terraform code if I recall correctly.

Suggested change

```angular2html

```hcl

reweeden · 2026-02-24T18:00:16Z

+
+Example of workflow configuraton:
+
+```angular2html


Suggested change

```angular2html

```json

reweeden · 2026-02-24T18:01:20Z

+The output key : cnm_list should be mapped to the payload key in the workflow's output.
+Workflow developer could choose to the original cnm message to be stored in meta.cnm key.
+Example workflow:
+```angular2html


I'd also suggest running your json through an autoformatter before pasting it in here. There's a chance if you have an autoformatter set up in your editor, that putting the correct language here will enable the autoformatter to work on the fenced code as well.

Suggested change

```angular2html

```json

reweeden · 2026-02-24T18:02:14Z

+- About json schema compiling to Plaint Python classes:
+   - used datamodel-code-generator tool: https://koxudaxi.github.io/datamodel-code-generator/ to generate pydantic models from json schema files.
+   - example below
+```angular2html


Either shell or no language probably would be the best options.

…to dy/CUMULUS-4381-cmaToCNM

eyen added 11 commits February 4, 2026 19:15

initial commit. Passed unit test.

1695999

Merge branch 'master' into dy/CUMULUS-4332-cmaToCNM

ead6832

ruff check and refromat

b376aea

Merge branch 'master' into dy/CUMULUS-4381-cmaToCNM

62d1f31

re-write with input schema and input pydantic models

77ce901

README correction

a392c81

terraform cleanup

baba0f3

schema for the config object

3def1f5

refactor code for consistency

11536e6

Merge branch 'master' into dy/CUMULUS-4381-cmaToCNM

932d247

CUMULUS-4381 CHANGELOG

2e6f48b

yenes56 assigned reweeden and mikedorfman Feb 20, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

f84a7d5

for more information, see https://pre-commit.ci

yenes56 assigned bhazuka Feb 20, 2026

mikedorfman reviewed Feb 23, 2026

View reviewed changes

yenes56 commented Feb 24, 2026

View reviewed changes

reweeden unassigned bhazuka, mikedorfman and reweeden Feb 24, 2026

reweeden requested review from bhazuka and reweeden February 24, 2026 17:30

reweeden reviewed Feb 24, 2026

View reviewed changes

eyen added 3 commits February 25, 2026 16:54

remove build-config.json and address code consistency comments

edfc488

Merge branch 'dy/CUMULUS-4381-cmaToCNM' of github.com:nasa/cumulus in…

d75e927

…to dy/CUMULUS-4381-cmaToCNM

Merge branch 'master' into dy/CUMULUS-4381-cmaToCNM

004345b

code refactory

ed03747

		@@ -0,0 +1,34 @@
		locals {
		build_config = jsondecode(file("${path.module}/../build-config.json"))

		if "granules" not in input.keys():
		raise Exception('"granules" is missing from input')

	function_name = "${var.prefix}-CMAToCNM"
	function_name = "${var.prefix}-CmaToCnm"

	"description": "CMA 2 CNM task",
	"description": "CMA to CNM task",



		This task is designed as a modular component. You can load it into your infrastructure using the following Terraform example:
		```angular2html

Conversation

yenes56 commented Feb 20, 2026

Changes

PR Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedorfman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedorfman Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yenes56 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikedorfman Feb 24, 2026 •

edited

Loading