Conversation
|
|
||
| # Validate | ||
| return DatasetConfig.model_validate(dataset_config) | ||
|
|
There was a problem hiding this comment.
load_from_path gives us the ability to load configuration from path instead of the module
There was a problem hiding this comment.
I'm trying to get my bearings again, I thought the existing load_dataset_config could load from a path already:
14:from aodn_cloud_optimised.lib.config import load_dataset_config
121: dataset_ardc_netcdf_config = load_dataset_config(DATASET_CONFIG_NC_ARDC_JSON)
133: dataset_soop_sst_netcdf_config = load_dataset_config(
There was a problem hiding this comment.
You are correct.
When I opened this task I must have mistook there being tight coupling to loading from the aodn_cloud_optimised config dir., it actually exists in dataflow-orchestration here.
dataset_config = load_dataset_config(
str(files("aodn_cloud_optimised.config.dataset").joinpath(dataset_config))
)Anyway, the refactoring of all these classes into separate files makes this a lot easier to maintain and extend.
Fixing the load logic to the class and return instances of the class from the load function are also improvenments that:
- Make it easier to track and update load logic (now part of the class functionality)
- Allow extension later eg to loading config from external config stores eg prefect/aws.
There was a problem hiding this comment.
Pull request overview
This PR upgrades key Python dependencies and refactors dataset configuration validation into a dedicated aodn_cloud_optimised/bin/config package, while also fixing package-resource access to use importlib.resources.as_file for safer config file handling.
Changes:
- Upgrade
h5pyand addpydanticas a first-class dependency. - Switch
load_variable_from_configto useimportlib.resources.as_filewhen readingcommon.json. - Extract the large, inline Pydantic config schema from
generic_cloud_optimised_creation.pyinto separate modules underaodn_cloud_optimised/bin/config/.
Reviewed changes
Copilot reviewed 2 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updates dependency pins and adds pydantic dependency. |
| aodn_cloud_optimised/lib/config.py | Uses as_file() to safely access packaged common.json. |
| aodn_cloud_optimised/bin/generic_cloud_optimised_creation.py | Removes inline schema classes and imports new config models. |
| aodn_cloud_optimised/bin/config/init.py | Exposes new config models as a package. |
| aodn_cloud_optimised/bin/config/cluster_config.py | Defines Pydantic model for cluster configuration. |
| aodn_cloud_optimised/bin/config/coiled_cluster_options.py | Defines Pydantic model for Coiled cluster options. |
| aodn_cloud_optimised/bin/config/worker_options.py | Defines Pydantic model for worker sizing options. |
| aodn_cloud_optimised/bin/config/ec2_cluster_options.py | Defines Pydantic model for EC2 cluster options. |
| aodn_cloud_optimised/bin/config/ec2_adapt_options.py | Defines Pydantic model for EC2 adapt (min/max) options. |
| aodn_cloud_optimised/bin/config/path_config.py | Defines and validates dataset input path configuration. |
| aodn_cloud_optimised/bin/config/run_settings.py | Defines and validates run-level settings and S3 option blocks. |
| aodn_cloud_optimised/bin/config/csv_config_model.py | Validates mutually exclusive pandas vs polars CSV reader config. |
| aodn_cloud_optimised/bin/config/parquet_schema_transformation.py | Validates parquet schema transformation blocks. |
| aodn_cloud_optimised/bin/config/zarr_schema_transformation.py | Validates zarr schema transformation blocks. |
| aodn_cloud_optimised/bin/config/dataset_config.py | Defines the top-level dataset config model and file-loading helpers. |
|
@lbesnard I started work on that external config load ability, starting with a refactor of the Looks like it is failing some tests in the actions but they don't fail on local so a bit confused. If you search on |
could be maybe less confusing if all the pydantic functions would be in a separate folder under |
|
maybe try using https://github.com/nektos/act to help debugging the failed tests? there are definitely a lot, this one is a bit weird |
|
@lbesnard I think I pinpointed the issue with the h5py 5.13 upgrade; looks like the default timestamp export was as microseconds not nanoseconds for the new version which caused the unit test fails. I rolled h5py back to 3.11 and looks like everything is fine now. Also chucked the config models into a separate folder as suggested :) |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #256 +/- ##
==========================================
+ Coverage 65.00% 67.48% +2.48%
==========================================
Files 28 41 +13
Lines 5404 5678 +274
==========================================
+ Hits 3513 3832 +319
+ Misses 1891 1846 -45 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ), | ||
| ) | ||
| # TODO: Implement aws_opendata_registry model for validation | ||
| aws_opendata_registry: dict | None = pydantic.Field( |
There was a problem hiding this comment.
Added aws_opendata_registry
There are some checks, for empty values but they now skip aws_opendata_registry so they pass.
| parent_config: str | None = pydantic.Field( | ||
| default=None, | ||
| description="Name of the parent config to load on top of. May be recursive.", | ||
| ) |
There was a problem hiding this comment.
Also logger_name, handler_class and parent_config are all added with defaults. This allows the DatasetConfig class to pick them up if they exist.
| @@ -75,11 +75,6 @@ | |||
| } | |||
| } | |||
| }, | |||
There was a problem hiding this comment.
The spatial_extent is properly captured elsewhere in the config now
| @@ -93,11 +93,9 @@ def load_variable_from_config(variable_name) -> str: | |||
| :raises KeyError: If the variable is not found in the configuration file. | |||
| """ | |||
| # Obtain the file path using the context manager | |||
There was a problem hiding this comment.
Could consider using the import directly eg
Kind of cleaner than the import lib file/as_file
path lib.Path(aodn_cloud_optimised.config.__path__[0]) / "common.json"
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 24 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
|
@copilot can you check that nothing got missed from the previous |
There was a problem hiding this comment.
what is that file for? where is it used?
There was a problem hiding this comment.
This routes the path correctly for udunits on MacOS
|
|
||
| # Config files that intentionally contain "FILL UP MANUALLY" placeholders and | ||
| # are expected to fail DatasetConfig validation — excluded from the compat sweep. | ||
| _PLACEHOLDER_CONFIGS = { |
There was a problem hiding this comment.
from what i understand here, you're running the tests directly on the configs available in config/dataset.
The load_from_cloud_optimised_directory function should be mocked or point to test configs available in the test/resources path.
The config files you pointing to right now may not have placeholders in the future
There was a problem hiding this comment.
Currently this dynamically loads all config in the directory except for the _PLACEHOLDER_CONFIGS
These are skipped because they contain "FILL UP MANUALLY"
The intent of the test is to verify all the legacy loads match the new load method for all config that are valid to load.
There was a problem hiding this comment.
yeah but that's my point.
later you have
def test_all_configs_load_without_error(self):
for config_path in self._loadable_configs():
with self.subTest(config=config_path.name):
DatasetConfig.load_from_cloud_optimised_directory(config_path.name)
which IMO should not be in the unittests. This is more for pre-commit
see
|
|
||
| # Validate | ||
| return DatasetConfig.model_validate(dataset_config) | ||
|
|
There was a problem hiding this comment.
I'm trying to get my bearings again, I thought the existing load_dataset_config could load from a path already:
14:from aodn_cloud_optimised.lib.config import load_dataset_config
121: dataset_ardc_netcdf_config = load_dataset_config(DATASET_CONFIG_NC_ARDC_JSON)
133: dataset_soop_sst_netcdf_config = load_dataset_config(
…eck) The bug fix from 075c557 corrected any('.parquet' in f for f in filter) to '.parquet' in filter in generic_cloud_optimised_creation.py, but after the refactor moved PathConfig to bin/config/model/path_config.py, the old buggy code remained there. This commit ports the fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…patibility @model_validator(mode='after') on a classmethod-style function (cls, values) is deprecated since Pydantic v2.12 and will be removed in v3.0. Changed validate_cross_fields to an instance method using self, as required by the Pydantic v2.12 API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e78df6f to
3311314
Compare
|
@thommodin see my latest comment above re testing all config files, which i don't think should happen as part of the unittests. Anyway I'll merge this PR as we need to move forward |
This pull request updates dependency versions and improves configuration file handling in the codebase. The most significant changes are the upgrade of the
h5pylibrary, the addition ofpydanticas a dependency, and a fix to the way configuration files are accessed usingimportlib.resources.Dataset Config Refactor
Re-fatored DatasetConfig into separate files and added loading from disk instead of module:
Dependency updates:
Upgradedh5pyfrom version3.11.0to3.13.0inpyproject.toml, allows installation on MacOS3.13for dev, then roll back for publication and allow unit test in CICD. Obviously, take special care around errors and ensure they are related to the nano/micro second errors.pydantic>=2.12.5to the dependencies inpyproject.tomlfor data validation and settings management.Configuration handling improvements:
importlib.resourcesto useas_filefor safer and more correct file handling, and updated the context manager inload_variable_from_configto useas_filewhen accessing thecommon.jsonconfiguration file. This change improves compatibility and reliability when working with package resources. [1] [2]Delete uv lock
uv.locksnuck into the main branch somehow(?)Deleted it...