Skip to content

Support multidimensional coordinates for bounds and regridding#820

Open
tomvothecoder wants to merge 12 commits intomainfrom
bug/816-multidim-coords
Open

Support multidimensional coordinates for bounds and regridding#820
tomvothecoder wants to merge 12 commits intomainfrom
bug/816-multidim-coords

Conversation

@tomvothecoder
Copy link
Copy Markdown
Collaborator

@tomvothecoder tomvothecoder commented Dec 3, 2025

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (baf02ae) to head (e0d344c).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #820   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         1989      2007   +18     
=========================================
+ Hits          1989      2007   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread xcdat/regridder/accessor.py Outdated
@tomvothecoder tomvothecoder marked this pull request as ready for review February 25, 2026 18:12
@tomvothecoder
Copy link
Copy Markdown
Collaborator Author

Notes

  • Add guard for regrid2 since it not supported

@pochedls
Copy link
Copy Markdown
Collaborator

Thanks for pushing this forward @jasonb5! This PR seems to work for the example in #816, but I think it is failing for a related case (curvilinear->rectilinear):

wget https://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Amon/tas/gr/v20180917/tas_Amon_CNRM-CM6-1_historical_r1i1p1f2_gr_185001-201412.nc
wget https://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/SImon/siconc/gn/v20180917/siconc_SImon_CNRM-CM6-1_historical_r1i1p1f2_gn_185001-201412.nc
# %% imports
import xcdat as xc

# %% parameters
fn_source = 'siconc_SImon_CNRM-CM6-1_historical_r1i1p1f2_gn_185001-201412.nc'
fn_target = 'tas_Amon_CNRM-CM6-1_historical_r1i1p1f2_gr_185001-201412.nc'

# %% get source / target grids
ds = xc.open_dataset(fn_source)
ngrid = xc.open_dataset(fn_target)

# %% try regridding with xcdat
dsr = ds.regridder.horizontal('tmp2m', ngrid, tool="xesmf", method="conservative_normed", periodic=True)

KeyError: "No 'X' axis dimension coordinate variables were found in the xarray object. Make sure dimension coordinate variables exist, they are one dimensional, and their CF 'axis' or 'standard_name' attrs are correctly set."

I can get this to work with:

import xesmf as xe

regridder = xe.Regridder(ds, ngrid, "conservative_normed", periodic=True, ignore_degenerate=True)                                            
dsr = regridder(ds)

@jasonb5 jasonb5 force-pushed the bug/816-multidim-coords branch from 27ee94b to 36aa432 Compare March 14, 2026 07:23
@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Mar 17, 2026

@pochedls Could you give this another try? Thanks!

@pochedls
Copy link
Copy Markdown
Collaborator

This is working for both of the cases I raised (code below for my reference), but I think it isn't generically picking up the coordinates.

For example, CESM2 has the following structure

Dimensions: (time: 1980, nj: 384, ni: 320, d2: 2, nvertices: 4, bnds: 2)
Coordinates:

  • time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
  • nj (nj) int32 2kB 1 2 3 4 5 6 7 8 ... 378 379 380 381 382 383 384
  • ni (ni) int32 1kB 1 2 3 4 5 6 7 8 ... 314 315 316 317 318 319 320
    lat (nj, ni) float64 983kB dask.array<chunksize=(384, 320), meta=np.ndarray>
    lon (nj, ni) float64 983kB dask.array<chunksize=(384, 320), meta=np.ndarray>
    Dimensions without coordinates: d2, nvertices, bnds
    Data variables:
    siconc (time, nj, ni) float32 973MB dask.array<chunksize=(1, 384, 320), meta=np.ndarray>

Which produces the following error when regridding:

ValueError: This DataArray has more than one dimension ['lon', 'nj'] mapped to the 'X' axis, which is an unexpected behavior. Try dropping extraneous dimensions from the DataArray first (might affect data shape).

Is there a more generic way to handle these auxiliary coordinates (x/nj/etc)?

Perlmutter code for reference.

import xsearch as xs
import xcdat as xc

p = list(xs.findPaths('historical', 'siconc', 'mon', model='CNRM-CM6-1', member = 'r1i1p1f2').keys())[0]
p2 = list(xs.findPaths('historical', 'tas', 'mon', model='CNRM-CM6-1', member = 'r1i1p1f2').keys())[0]

ds = xc.open_mfdataset(p)
ngrid = xc.open_mfdataset(p2)
dsr = ds.regridder.horizontal('siconc', ngrid, tool="xesmf", method="conservative_normed", periodic=True)

@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Mar 18, 2026

That actually caught an edge case I didn't consider. If the multidim flag is set it should return a matched coordinate no matter what. I'll push a fix and have you retest. Thanks!

@pochedls
Copy link
Copy Markdown
Collaborator

That actually caught an edge case I didn't consider. If the multidim flag is set it should return a matched coordinate no matter what. I'll push a fix and have you retest. Thanks!

Thanks @jasonb5 – I think curvilinear regridding is seemingly full of edge cases. I appreciate your effort in making this robust (and I'm happy you can quickly understand what is going on).

@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Mar 19, 2026

@pochedls There's more to this issue. My original fix may not be correct. I'm working on a new fix and will push it soon.

@jasonb5 jasonb5 force-pushed the bug/816-multidim-coords branch from b4d0557 to 048a07a Compare March 20, 2026 05:04
@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Mar 20, 2026

@pochedls Try the latest. The CESM example will fail because it has two possible X coordinates ('nj', 'lon') so xCDAT can differentiate. It looks like cf_xarray can figure out what are the correct coordinates. I may switch ds.coords to ds.cf.coordinates to fix this. Will update after some testing.

@pochedls
Copy link
Copy Markdown
Collaborator

@pochedls Try the latest. The CESM example will fail because it has two possible X coordinates ('nj', 'lon') so xCDAT can differentiate. It looks like cf_xarray can figure out what are the correct coordinates. I may switch ds.coords to ds.cf.coordinates to fix this. Will update after some testing.

FYI – this is working for CNRM-CM6-1 and the source.nc data in the original bug report. It is failing for CESM2 (as you expected). Once you update for CESM2, I can try a broader set of models

@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Apr 3, 2026

@pochedls Go ahead and try again. All three cases should work this time.

@pochedls
Copy link
Copy Markdown
Collaborator

pochedls commented Apr 6, 2026

@jasonb5 – This is working for all the examples I provided. I'm good with moving forward with this PR (fixes the issues raised), but I could also test more models if that would be helpful.

@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented Apr 15, 2026

@pochedls Thanks!
@tomvothecoder Do you want to review before we merge?

@tomvothecoder
Copy link
Copy Markdown
Collaborator Author

Thanks @jasonb5 for the work and @pochedls for testing. I will review and merge early next week.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial support for datasets where horizontal coordinates (e.g., lat/lon) are multidimensional (curvilinear or otherwise not represented as 1D dimension coordinates), primarily to unblock horizontal regridding with xESMF for the #816 case.

Changes:

  • Introduces a supports_multidim capability flag on regridder implementations and threads it through the regridder accessor.
  • Extends get_dim_coords() with an optional multidim mode to discover non-index coordinate variables.
  • Skips automatic bounds generation in _obj_to_grid_ds() when operating in multidimensional-coordinate mode.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
xcdat/regridder/base.py Adds a regridder capability hook (supports_multidim / can_handle_multidim()).
xcdat/regridder/xesmf.py Marks xESMF regridder as able to handle multidimensional coordinates.
xcdat/regridder/regrid2.py Marks Regrid2 regridder as not supporting multidimensional coordinates.
xcdat/regridder/xgcm.py Marks xgcm regridder as not supporting multidimensional coordinates.
xcdat/regridder/accessor.py Passes multidim capability into grid extraction; disables bounds generation in multidim mode.
xcdat/axis.py Adds multidim option to coordinate discovery logic in get_dim_coords().
Comments suppressed due to low confidence (1)

xcdat/axis.py:116

  • get_dim_coords() docstring/Notes say “Multidimensional coordinates are ignored.”, but the new multidim parameter changes that behavior. Please update the docstring to document the multidim parameter and clarify how multidimensional coordinates are handled when it’s enabled vs disabled.
    """Gets the dimension coordinates for an axis.

    This function uses ``cf_xarray`` to attempt to map the axis to its
    dimension coordinates by interpreting the CF axis and coordinate names
    found in the coordinate attributes. Refer to [1]_ for a list of CF axis and
    coordinate names that can be interpreted by ``cf_xarray``.

    If ``obj`` is an ``xr.Dataset,``, this function can return a single
    dimension coordinate variable as an ``xr.DataArray`` or multiple dimension
    coordinate variables in an ``xr Dataset``. If ``obj`` is an ``xr.DataArray``,
    this function should return a single dimension coordinate variable as an
    ``xr.DataArray``.

    Parameters
    ----------
    obj : xr.Dataset | xr.DataArray
        The Dataset or DataArray object.
    axis : CFAxisKey
        The CF axis key ("X", "Y", "T", "Z").

    Returns
    -------
    xr.Dataset | xr.DataArray
        A Dataset of dimension coordinate variables or a DataArray for
        the single dimension coordinate variable.

    Raises
    ------
    ValueError
        If the ``obj`` is an ``xr.DataArray`` and more than one dimension is
        mapped to the same axis.
    KeyError
        If no dimension coordinate variables were found for the ``axis``.

    Notes
    -----
    Multidimensional coordinates are ignored.

Comment thread xcdat/regridder/accessor.py Outdated

for axis, has_bounds in axis_has_bounds.items():
if not has_bounds:
# FIXME: Line 313 --Can't add bounds for multidimensional coordinates
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #FIXME still relevant or should it be a #TODO? Also exact line number should be removed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant, will remove.

Comment thread xcdat/regridder/base.py Outdated
Comment on lines +98 to +103
supports_multidim: bool

@classmethod
def can_handle_multidim(cls) -> bool:
return cls.supports_multidim

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, defaulting to False.

Comment thread xcdat/axis.py
Comment on lines +121 to +128
if multidim:
# multidimensional coordinates cannot be indexes, use all coords
index_keys = list([y for x in obj.cf.coordinates.values() for y in x])
else:
# Get the object's index keys, with each being a dimension.
# NOTE: xarray does not include multidimensional coordinates as index keys.
# Example: ["lat", "lon", "time"]
index_keys = list(obj.indexes.keys())
Copy link
Copy Markdown
Collaborator Author

@tomvothecoder tomvothecoder Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is more robust than only using obj.cf.coordinates.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined cf.coordinates and cf.axes, should have complete coverage.

Comment on lines +170 to +172
input_grid = _get_input_grid(
self._ds, data_var, ["X", "Y"], multidim=regrid_tool.can_handle_multidim()
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test needed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added missing test.

Comment on lines +307 to +309
# Multidimensional coordinates bounds generation is not supported
if multidim:
return output_ds
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Copilot is referencing the logic before this conditional in _obj_to_grid_ds():

axis_names: list[CFAxisKey] = ["X", "Y", "Z"]
axis_coords: dict[str, xr.DataArray] = {}
axis_bounds: dict[str, xr.DataArray] = {}
axis_has_bounds: dict[CFAxisKey, bool] = {}
with xr.set_options(keep_attrs=True):
for axis in axis_names:
coord, bounds = _get_axis_coord_and_bounds(obj, axis)
if coord is not None:
axis_coords[str(coord.name)] = coord
if bounds is not None:
axis_bounds[str(bounds.name)] = bounds
axis_has_bounds[axis] = True
else:
axis_has_bounds[axis] = False
# Create a new dataset with coordinates and bounds
output_ds = xr.Dataset(
coords=axis_coords,
data_vars=axis_bounds,
attrs=obj.attrs,
)

@tomvothecoder
Copy link
Copy Markdown
Collaborator Author

@jasonb5 I tagged Copilot for review and it caught some things. Can you review its suggestions and decide how and where to take action? Thanks.

jasonb5 added 5 commits May 1, 2026 11:22
Set a default value for supports_multidim so subclasses are not required
to explicitly declare it. Added tests to verify the default and override
behavior.
The multidim=True branch only used obj.cf.coordinates to build
index_keys, missing coordinates discoverable only via obj.cf.axes.
Combine both sources to avoid incorrect KeyError when an axis
coordinate exists but is not in cf.coordinates.
Exercises ds.regridder.horizontal(..., tool='xesmf') on a dataset with
2D lat/lon coordinates and no 1D index coord vars, ensuring the
multidim code path correctly discovers coordinates via CF axis metadata.
Datasets with non-standard 2D coordinate names (e.g., nav_lat/nav_lon)
that are only CF-discoverable via axis metadata were failing because
_get_axis_coord_and_bounds called get_dim_coords without multidim=True.
Pass the flag through so the full coord discovery path works.
@jasonb5
Copy link
Copy Markdown
Collaborator

jasonb5 commented May 4, 2026

@pochedls @tomvothecoder Made the changes, might one to do one more pass before we merge.

@pochedls
Copy link
Copy Markdown
Collaborator

pochedls commented May 5, 2026

This is failing for the CESM2 case again (works for CNRM-CM6-1 and the original example issue).

import xcdat as xc

p = '/global/cfs/projectdirs/m4931/gsharing/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/SImon/siconc/gn/v20190308/' 
p2 = '/global/cfs/projectdirs/m4931/gsharing/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/v20190308/'

ds = xc.open_mfdataset(p)
ngrid = xc.open_mfdataset(p2)
dsr = ds.regridder.horizontal('siconc', ngrid, tool="xesmf", method="conservative_normed", periodic=True)

ValueError: This DataArray has more than one dimension ['lon', 'nj'] mapped to the 'X' axis, which is an unexpected behavior. Try dropping extraneous dimensions from the DataArray first (might affect data shape).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[Bug]: Metadata regridding issues related to multidimensional coords

4 participants