Generalize deploy steps and add TOA xarray data structure#91
Generalize deploy steps and add TOA xarray data structure#91MariaRocco wants to merge 12 commits into
Conversation
|
Thanks for this PR @MariaRocco . General thoughts below. xarray stuff: deploy modifications: On style: |
|
Hey @pgbrodrick - thanks of the feedback! for the xarray portion: for the deploy portions: on style: yeah that's definitely an oops - just a habit but you're right it does make review more cumbersome. I'll avoid that in the future. |
| import numpy as np | ||
| from osgeo import gdal | ||
|
|
||
| from spectf_cloud.deploy.infra_setup import open_model_arch_spec |
There was a problem hiding this comment.
[Nitpick] Please move this import to be with the other spectf_cloud imports. (ideally here)
| return padded_batch | ||
|
|
||
|
|
||
| def run_trt_inference_model(dataset, engine, inference, device_): |
There was a problem hiding this comment.
[Question] Is this function part of your pipeline? I don't see any reference to the XArray dataset in this change.
If it is, I'd recommend adding it in.
|
@MariaRocco Thank you for submitting a PR! We appreciate contributions. On XArray: On Deployment Modifications: @MariaRocco could you walk me through exactly how you use these functions? What is the parent process and what is the output subroutine you're looking to improve with these changes? That'd help me get a bigger picture. Thanks. On Style: Ideally, we'd use an idealized version of |
|
I've left some comments on the PR. Once those are implemented, I'll test it to make sure functionality is the same. |
| def __init__( | ||
| self, | ||
| toa: xr.DataArray, | ||
| rm_bands: List[List[int]] | None = None, |
There was a problem hiding this comment.
This type hint format made the 3.9 build fail - please use Optional instead.
======================================================================
ERROR: test_train_cli_commands (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_train_cli_commands
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.9.25/x64/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "/opt/hostedtoolcache/Python/3.9.25/x64/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
__import__(name)
File "/home/runner/work/SpecTf/SpecTf/tests/test_train_cli_commands.py", line 12, in <module>
from spectf_cloud.train import train
File "/home/runner/work/SpecTf/SpecTf/spectf_cloud/train.py", line 22, in <module>
from spectf.dataset import SpectraDataset
File "/home/runner/work/SpecTf/SpecTf/spectf/dataset.py", line 137, in <module>
class XarrayDatasetTOA(ToaDataset):
File "/home/runner/work/SpecTf/SpecTf/spectf/dataset.py", line 146, in XarrayDatasetTOA
rm_bands: List[List[int]] | None = None,
TypeError: unsupported operand type(s) for |: '_GenericAlias' and 'NoneType'
|
Excellent - thank you both for the review! I am actually about to leave on 2 week vacation - so I just wanted to open the PR for @jakehlee (and y'all) to review while I'm out. I'm so happy you were able to take a look quickly :) but I won't be able to make these updates until I am back. Sorry about that. In the meantime - our workflow is:
One other note related to xarray - another benefit it adds - is instead of the band definitions being passed as a separate input variable - they can be part of the dataset - which just ensures the bands/toa dataset are the right sizes + aligned. We can still pass this data in as 2 separate numpy arrays, but I think there are benefits to using an xarray dataset. |
|
Alrighty - I am back from vacation - thanks for your patience. I believe I addressed all the comments - let me know if I missed anything though |
@MariaRocco sweet! Thanks for the commits. I'll take a look. |
|
@michaelkiper and @pgbrodrick just wanted to touch base on this - have you been able to take a look / test these updates? |
@MariaRocco I haven't been able to yet. I can get them done by Friday for you though. |
|
@MariaRocco I was able to test the new commit you made. There are 3 critical changes that do have to be updated before we can merge this PR - should be very simple, I laid them out in these 3 comments:
Once those changes are made, I'm fine with approving this PR. Hash showing equivalence between generating a tif with the old |
|
Excellent - just made those changes :) thanks for the review! |
Perfect! All good with me. I can approve - @pgbrodrick do you sign off as well? |
|
Thanks for the careful review @michaelkiper , and for the PR @MariaRocco ! Tests seem to pass with current ops workflows without any issues as well. Merging once workflow completes. |
|
@MariaRocco , I opened a tiny PR to clear the 3.9 typing issue flagged by the test. Once you accept we can merge this in. |
typing change
|
@MariaRocco apologies for leaving this open for as long as it has, let's get this merged in ASAP. EDIT: we'll wait for regular business hours on the 26th to avoid merging on a late Friday night :) |
Summary of changes:
Specific changes
spectf/dataset.py
spectf_cloud/deploy/infra_setup.py
open_model_arch_specthat parses and loads the model specspectf_cloud/deploy/deploy_pt.py
outfpinput optional - if nothing is provided - no file will be saveddeploy_ptworkflowinitialize_pt_modelfunctionrun_pt_inference_modelfunctioninitialize_pt_modelandrun_pt_inference_modelin thedeplot_ptfunctiondeploy_pt_from_toafunction that takes an xarray dataset as an input instead of filepathsopen_model_arch_specspectf_cloud/deploy/deploy_trt.py
outfpinput optional - if nothing is provided - no file will be saveddeploy_ptworkflowrun_trt_inference_modelfunctionrun_trt_inference_modelin thedeploy_trtfunctionother notes
I think there are some further optimizations we can do such as converting the pt/trt models into classes - they do similar things and this can be further improved using class structure. I'll make an issue for some of the ideas I had as I was looking through.
Let me know your thoughts!