Support multidimensional coordinates for bounds and regridding#820
Support multidimensional coordinates for bounds and regridding#820tomvothecoder wants to merge 12 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #820 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 18 18
Lines 1989 2007 +18
=========================================
+ Hits 1989 2007 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Notes
|
|
Thanks for pushing this forward @jasonb5! This PR seems to work for the example in #816, but I think it is failing for a related case (curvilinear->rectilinear): wget https://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Amon/tas/gr/v20180917/tas_Amon_CNRM-CM6-1_historical_r1i1p1f2_gr_185001-201412.nc
wget https://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/SImon/siconc/gn/v20180917/siconc_SImon_CNRM-CM6-1_historical_r1i1p1f2_gn_185001-201412.nc# %% imports
import xcdat as xc
# %% parameters
fn_source = 'siconc_SImon_CNRM-CM6-1_historical_r1i1p1f2_gn_185001-201412.nc'
fn_target = 'tas_Amon_CNRM-CM6-1_historical_r1i1p1f2_gr_185001-201412.nc'
# %% get source / target grids
ds = xc.open_dataset(fn_source)
ngrid = xc.open_dataset(fn_target)
# %% try regridding with xcdat
dsr = ds.regridder.horizontal('tmp2m', ngrid, tool="xesmf", method="conservative_normed", periodic=True)
I can get this to work with: import xesmf as xe
regridder = xe.Regridder(ds, ngrid, "conservative_normed", periodic=True, ignore_degenerate=True)
dsr = regridder(ds) |
27ee94b to
36aa432
Compare
|
@pochedls Could you give this another try? Thanks! |
|
This is working for both of the cases I raised (code below for my reference), but I think it isn't generically picking up the coordinates. For example, CESM2 has the following structure
Which produces the following error when regridding:
Is there a more generic way to handle these auxiliary coordinates (x/nj/etc)? Perlmutter code for reference. import xsearch as xs
import xcdat as xc
p = list(xs.findPaths('historical', 'siconc', 'mon', model='CNRM-CM6-1', member = 'r1i1p1f2').keys())[0]
p2 = list(xs.findPaths('historical', 'tas', 'mon', model='CNRM-CM6-1', member = 'r1i1p1f2').keys())[0]
ds = xc.open_mfdataset(p)
ngrid = xc.open_mfdataset(p2)
dsr = ds.regridder.horizontal('siconc', ngrid, tool="xesmf", method="conservative_normed", periodic=True) |
|
That actually caught an edge case I didn't consider. If the |
Thanks @jasonb5 – I think curvilinear regridding is seemingly full of edge cases. I appreciate your effort in making this robust (and I'm happy you can quickly understand what is going on). |
|
@pochedls There's more to this issue. My original fix may not be correct. I'm working on a new fix and will push it soon. |
b4d0557 to
048a07a
Compare
|
@pochedls Try the latest. The CESM example will fail because it has two possible |
FYI – this is working for CNRM-CM6-1 and the |
|
@pochedls Go ahead and try again. All three cases should work this time. |
|
@jasonb5 – This is working for all the examples I provided. I'm good with moving forward with this PR (fixes the issues raised), but I could also test more models if that would be helpful. |
|
@pochedls Thanks! |
There was a problem hiding this comment.
Pull request overview
Adds initial support for datasets where horizontal coordinates (e.g., lat/lon) are multidimensional (curvilinear or otherwise not represented as 1D dimension coordinates), primarily to unblock horizontal regridding with xESMF for the #816 case.
Changes:
- Introduces a
supports_multidimcapability flag on regridder implementations and threads it through the regridder accessor. - Extends
get_dim_coords()with an optionalmultidimmode to discover non-index coordinate variables. - Skips automatic bounds generation in
_obj_to_grid_ds()when operating in multidimensional-coordinate mode.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| xcdat/regridder/base.py | Adds a regridder capability hook (supports_multidim / can_handle_multidim()). |
| xcdat/regridder/xesmf.py | Marks xESMF regridder as able to handle multidimensional coordinates. |
| xcdat/regridder/regrid2.py | Marks Regrid2 regridder as not supporting multidimensional coordinates. |
| xcdat/regridder/xgcm.py | Marks xgcm regridder as not supporting multidimensional coordinates. |
| xcdat/regridder/accessor.py | Passes multidim capability into grid extraction; disables bounds generation in multidim mode. |
| xcdat/axis.py | Adds multidim option to coordinate discovery logic in get_dim_coords(). |
Comments suppressed due to low confidence (1)
xcdat/axis.py:116
get_dim_coords()docstring/Notes say “Multidimensional coordinates are ignored.”, but the newmultidimparameter changes that behavior. Please update the docstring to document themultidimparameter and clarify how multidimensional coordinates are handled when it’s enabled vs disabled.
"""Gets the dimension coordinates for an axis.
This function uses ``cf_xarray`` to attempt to map the axis to its
dimension coordinates by interpreting the CF axis and coordinate names
found in the coordinate attributes. Refer to [1]_ for a list of CF axis and
coordinate names that can be interpreted by ``cf_xarray``.
If ``obj`` is an ``xr.Dataset,``, this function can return a single
dimension coordinate variable as an ``xr.DataArray`` or multiple dimension
coordinate variables in an ``xr Dataset``. If ``obj`` is an ``xr.DataArray``,
this function should return a single dimension coordinate variable as an
``xr.DataArray``.
Parameters
----------
obj : xr.Dataset | xr.DataArray
The Dataset or DataArray object.
axis : CFAxisKey
The CF axis key ("X", "Y", "T", "Z").
Returns
-------
xr.Dataset | xr.DataArray
A Dataset of dimension coordinate variables or a DataArray for
the single dimension coordinate variable.
Raises
------
ValueError
If the ``obj`` is an ``xr.DataArray`` and more than one dimension is
mapped to the same axis.
KeyError
If no dimension coordinate variables were found for the ``axis``.
Notes
-----
Multidimensional coordinates are ignored.
|
|
||
| for axis, has_bounds in axis_has_bounds.items(): | ||
| if not has_bounds: | ||
| # FIXME: Line 313 --Can't add bounds for multidimensional coordinates |
There was a problem hiding this comment.
Is this #FIXME still relevant or should it be a #TODO? Also exact line number should be removed.
There was a problem hiding this comment.
No longer relevant, will remove.
| supports_multidim: bool | ||
|
|
||
| @classmethod | ||
| def can_handle_multidim(cls) -> bool: | ||
| return cls.supports_multidim | ||
|
|
There was a problem hiding this comment.
Fixed, defaulting to False.
| if multidim: | ||
| # multidimensional coordinates cannot be indexes, use all coords | ||
| index_keys = list([y for x in obj.cf.coordinates.values() for y in x]) | ||
| else: | ||
| # Get the object's index keys, with each being a dimension. | ||
| # NOTE: xarray does not include multidimensional coordinates as index keys. | ||
| # Example: ["lat", "lon", "time"] | ||
| index_keys = list(obj.indexes.keys()) |
There was a problem hiding this comment.
Good point, this is more robust than only using obj.cf.coordinates.
There was a problem hiding this comment.
Combined cf.coordinates and cf.axes, should have complete coverage.
| input_grid = _get_input_grid( | ||
| self._ds, data_var, ["X", "Y"], multidim=regrid_tool.can_handle_multidim() | ||
| ) |
There was a problem hiding this comment.
Test needed.
| # Multidimensional coordinates bounds generation is not supported | ||
| if multidim: | ||
| return output_ds |
There was a problem hiding this comment.
I think Copilot is referencing the logic before this conditional in _obj_to_grid_ds():
xcdat/xcdat/regridder/accessor.py
Lines 281 to 305 in d13a72a
|
@jasonb5 I tagged Copilot for review and it caught some things. Can you review its suggestions and decide how and where to take action? Thanks. |
Set a default value for supports_multidim so subclasses are not required to explicitly declare it. Added tests to verify the default and override behavior.
The multidim=True branch only used obj.cf.coordinates to build index_keys, missing coordinates discoverable only via obj.cf.axes. Combine both sources to avoid incorrect KeyError when an axis coordinate exists but is not in cf.coordinates.
Exercises ds.regridder.horizontal(..., tool='xesmf') on a dataset with 2D lat/lon coordinates and no 1D index coord vars, ensuring the multidim code path correctly discovers coordinates via CF axis metadata.
Datasets with non-standard 2D coordinate names (e.g., nav_lat/nav_lon) that are only CF-discoverable via axis metadata were failing because _get_axis_coord_and_bounds called get_dim_coords without multidim=True. Pass the flag through so the full coord discovery path works.
|
@pochedls @tomvothecoder Made the changes, might one to do one more pass before we merge. |
|
This is failing for the CESM2 case again (works for CNRM-CM6-1 and the original example issue).
|
Description
Checklist
If applicable: