Skip to content

Generalize deploy steps and add TOA xarray data structure#91

Open
MariaRocco wants to merge 12 commits into
emit-sds:mainfrom
MariaRocco:generalize-deploy-steps
Open

Generalize deploy steps and add TOA xarray data structure#91
MariaRocco wants to merge 12 commits into
emit-sds:mainfrom
MariaRocco:generalize-deploy-steps

Conversation

@MariaRocco
Copy link
Copy Markdown

Summary of changes:

  • Pulls out and adds functions for smaller general workflows that can be reused
  • adds a new way to start from a TOA xarray dataset and run inference as opposed to providing radiance asset paths.

Specific changes

spectf/dataset.py

  • New ToaDataset class that initializes the settings/parameters needed for downstream workflows
  • RasterDatasetTOA inherits from ToaDataset
  • New XarrayDatasetTOA class that inherits from ToaDataset that takes an input of an xarray top of atmosphere

spectf_cloud/deploy/infra_setup.py

  • adds a new open_model_arch_spec that parses and loads the model spec

spectf_cloud/deploy/deploy_pt.py

  • make the outfp input optional - if nothing is provided - no file will be saved
  • return the inference mask from the deploy_pt workflow
  • move core model setup to initialize_pt_model function
  • move core model run to run_pt_inference_model function
  • Uses the new initialize_pt_model and run_pt_inference_model in the deplot_pt function
  • Adds a new deploy_pt_from_toa function that takes an xarray dataset as an input instead of filepaths
  • uses the more generic open_model_arch_spec

spectf_cloud/deploy/deploy_trt.py

  • make the outfp input optional - if nothing is provided - no file will be saved
  • return the inference mask from the deploy_pt workflow
  • move core model setup + run to run_trt_inference_model function
  • Uses the new run_trt_inference_model in the deploy_trt function

other notes

I think there are some further optimizations we can do such as converting the pt/trt models into classes - they do similar things and this can be further improved using class structure. I'll make an issue for some of the ideas I had as I was looking through.

Let me know your thoughts!

@pgbrodrick
Copy link
Copy Markdown
Member

Thanks for this PR @MariaRocco .

General thoughts below.

xarray stuff:
Presently xarray isn't a dependency - meaning to accept these changes we'd need to add it (modify the .toml). Before we go there though, can you help me on the requirement/need? My quick read is that the class addition here is basically a thin wrapper to support a native xarray input...but it seems this could be equally served by just casting when providing the input. I'm curious if I missed something or if you had other intents.

deploy modifications:
I see the desire here, and this makes sense. Approach-wise, my gut would be to add a new function that returns the array-based version, and then just call this inside the current deploy functions and add the output file write - this would mitigate downstream changes to currently operating code, while still giving the same functionality. Curious on @michaelkiper's thoughts though.

On style:
I don't believe we have an enforced or stated styleguide recommendation for this repo, so your changes are fair game. However they do make the PR harder to review as it's more challenging to isolate actual code changes. I would suggest minimizing these when possible to facilitate faster reviews.

@MariaRocco
Copy link
Copy Markdown
Author

Hey @pgbrodrick - thanks of the feedback!

for the xarray portion:
we could change this to expect a numpy array instead of xarray - let me know if that's your preference and I will update. I went with xarray because that's the format our data is in and is a fairly common format for data cubes/data arrays that allows dimensions to be more descriptive - which would be nice for instance for ensuring access to the right dimension order when unpacking /reshaping the toa data. Numpy arrays will work too.

for the deploy portions:
the implementation in this PR shouldn't impact any downstream usage/workflows - the input format and outputs are all the same - just if no output filepath is provided - no file will get saved. Let me know if I'm missing something - but the way this is written now, any existing downstream usage of this code will have the same outputs.

on style: yeah that's definitely an oops - just a habit but you're right it does make review more cumbersome. I'll avoid that in the future.

Comment thread spectf/dataset.py Outdated
Comment thread spectf_cloud/deploy/deploy_pt.py Outdated
Comment thread spectf_cloud/deploy/infra_setup.py Outdated
Comment thread spectf_cloud/deploy/infra_setup.py
Comment thread spectf_cloud/deploy/infra_setup.py Outdated
Comment thread spectf_cloud/deploy/deploy_pt.py Outdated
Comment thread spectf_cloud/deploy/deploy_pt.py Outdated
Comment thread spectf_cloud/deploy/deploy_trt.py Outdated
import numpy as np
from osgeo import gdal

from spectf_cloud.deploy.infra_setup import open_model_arch_spec
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick] Please move this import to be with the other spectf_cloud imports. (ideally here)

Comment thread spectf_cloud/deploy/deploy_trt.py Outdated
return padded_batch


def run_trt_inference_model(dataset, engine, inference, device_):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Is this function part of your pipeline? I don't see any reference to the XArray dataset in this change.

If it is, I'd recommend adding it in.

@michaelkiper
Copy link
Copy Markdown
Collaborator

@MariaRocco Thank you for submitting a PR! We appreciate contributions.
Do you have a specific reason as to why returning a 2D matrix would be a useful contribution for you? Do you have some supplementary subroutine that would ingest this mask as a 2D matrix?

On XArray:
Yes, if you're able to make it use Numpy arrays as well, that'd be preferred. It'd help us use one less dependancy.

On Deployment Modifications:
@pgbrodrick yes, I agree that there is likely a less-invasive solution to getting the inputs from XArray and suppressing a file output, but I don't think that this implementation is unreasonable. If we're valuing PR turnaround time, it's looking fine to me. Adding in a new CLI flag/function parameter would be better, but not a killer if we value time.

@MariaRocco could you walk me through exactly how you use these functions? What is the parent process and what is the output subroutine you're looking to improve with these changes? That'd help me get a bigger picture. Thanks.

On Style:
Yes, I'd probably prefer to get rid of all the style related changes - they crowed out the intention of the PR. However, they're not a killer.

Ideally, we'd use an idealized version of black formatting, but for now, a good rule is simply to make it legible.

@michaelkiper
Copy link
Copy Markdown
Collaborator

I've left some comments on the PR. Once those are implemented, I'll test it to make sure functionality is the same.

Comment thread spectf/dataset.py Outdated
def __init__(
self,
toa: xr.DataArray,
rm_bands: List[List[int]] | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint format made the 3.9 build fail - please use Optional instead.

======================================================================
ERROR: test_train_cli_commands (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_train_cli_commands
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.25/x64/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/opt/hostedtoolcache/Python/3.9.25/x64/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/home/runner/work/SpecTf/SpecTf/tests/test_train_cli_commands.py", line 12, in <module>
    from spectf_cloud.train import train
  File "/home/runner/work/SpecTf/SpecTf/spectf_cloud/train.py", line 22, in <module>
    from spectf.dataset import SpectraDataset
  File "/home/runner/work/SpecTf/SpecTf/spectf/dataset.py", line 137, in <module>
    class XarrayDatasetTOA(ToaDataset):
  File "/home/runner/work/SpecTf/SpecTf/spectf/dataset.py", line 146, in XarrayDatasetTOA
    rm_bands: List[List[int]] | None = None,
TypeError: unsupported operand type(s) for |: '_GenericAlias' and 'NoneType'

@MariaRocco
Copy link
Copy Markdown
Author

Excellent - thank you both for the review!

I am actually about to leave on 2 week vacation - so I just wanted to open the PR for @jakehlee (and y'all) to review while I'm out. I'm so happy you were able to take a look quickly :) but I won't be able to make these updates until I am back. Sorry about that.

In the meantime - our workflow is:

  • As we process tanager radiance data, we have a method for calculating TOA as an xarray dataset
  • this TOA (as an xarray dataset at the moment) we pass to deploy_pt_from_toa along with weights that our ML scientist has generated using tanager training datasets
  • we take the 2d array inference output and then save it as a cloud optimized geotiff, both ortho-rectified and unotho'd- so we do something very similar to what is done with the make_geotiff method - but returning the dataset allows us to add band definitions/other metadata we typically add to our geotiffs - so it just allows us to have slightly more control over the format + metadata in the output files.

One other note related to xarray - another benefit it adds - is instead of the band definitions being passed as a separate input variable - they can be part of the dataset - which just ensures the bands/toa dataset are the right sizes + aligned. We can still pass this data in as 2 separate numpy arrays, but I think there are benefits to using an xarray dataset.

@MariaRocco
Copy link
Copy Markdown
Author

Alrighty - I am back from vacation - thanks for your patience. I believe I addressed all the comments - let me know if I missed anything though

@michaelkiper
Copy link
Copy Markdown
Collaborator

Alrighty - I am back from vacation - thanks for your patience. I believe I addressed all the comments - let me know if I missed anything though

@MariaRocco sweet! Thanks for the commits. I'll take a look.

@MariaRocco
Copy link
Copy Markdown
Author

@michaelkiper and @pgbrodrick just wanted to touch base on this - have you been able to take a look / test these updates?

@michaelkiper
Copy link
Copy Markdown
Collaborator

@michaelkiper and @pgbrodrick just wanted to touch base on this - have you been able to take a look / test these updates?

@MariaRocco I haven't been able to yet. I can get them done by Friday for you though.

Comment thread spectf_cloud/deploy/deploy_pt.py Outdated
Comment thread spectf_cloud/deploy/deploy_trt.py Outdated
Comment thread spectf_cloud/deploy/deploy_trt.py
@michaelkiper
Copy link
Copy Markdown
Collaborator

@MariaRocco I was able to test the new commit you made. There are 3 critical changes that do have to be updated before we can merge this PR - should be very simple, I laid them out in these 3 comments:

Once those changes are made, I'm fine with approving this PR.

Hash showing equivalence between generating a tif with the old dev branch using PT/TRT, and this new PR branch using PT/TRT:

(SpecTf) [<> SpecTf]$ md5sum new-trt.tif
7b9840abc02f39cece7e91c773973bfa  new-trt.tif
(SpecTf) [<> SpecTf]$ md5sum old-trt.tif
7b9840abc02f39cece7e91c773973bfa  old-trt.tif
(SpecTf) [<>SpecTf]$ md5sum old-pt.tif
7b9840abc02f39cece7e91c773973bfa  old-pt.tif
(SpecTf) [<> SpecTf]$ md5sum new-pt.tif
7b9840abc02f39cece7e91c773973bfa  new-pt.tif

@MariaRocco
Copy link
Copy Markdown
Author

Excellent - just made those changes :) thanks for the review!

@michaelkiper
Copy link
Copy Markdown
Collaborator

Excellent - just made those changes :) thanks for the review!

Perfect! All good with me. I can approve - @pgbrodrick do you sign off as well?

@pgbrodrick
Copy link
Copy Markdown
Member

pgbrodrick commented Mar 31, 2026

Thanks for the careful review @michaelkiper , and for the PR @MariaRocco !

Tests seem to pass with current ops workflows without any issues as well. Merging once workflow completes.

@pgbrodrick
Copy link
Copy Markdown
Member

@MariaRocco , I opened a tiny PR to clear the 3.9 typing issue flagged by the test. Once you accept we can merge this in.

@jakehlee
Copy link
Copy Markdown
Collaborator

jakehlee commented May 23, 2026

@MariaRocco apologies for leaving this open for as long as it has, let's get this merged in ASAP.

EDIT: we'll wait for regular business hours on the 26th to avoid merging on a late Friday night :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants