Skip to content

Sunset kerchunk and support all of virtualizarr parsers with a new top level API#1272

Merged
betolink merged 23 commits intoearthaccess-dev:mainfrom
betolink:virtualizarr-parsers
Mar 31, 2026
Merged

Sunset kerchunk and support all of virtualizarr parsers with a new top level API#1272
betolink merged 23 commits intoearthaccess-dev:mainfrom
betolink:virtualizarr-parsers

Conversation

@betolink
Copy link
Copy Markdown
Contributor

@betolink betolink commented Mar 26, 2026

…solidated API to virtualize() and removed consolidate_metadata()

Description

Addresses #1271 by consolidating consolidate_metadata, open_virtual_mfdatasets and open_virtual_dataset into a single virtualize() API. The user can control which parser earthaccess needs to use but there is a hierarchy: dmrpp first, then HDF then the rest.

We may need another couple methods in case our users decide to use VirtualiZarr directly but don't want to deal with configuring Obstore/Icechunk. Not required for this PR IMO.


"Ready for review" checklist

  • Open PR as draft
  • Please review our Pull Request Guide
  • Mark "ready for review" after following instructions in the guide

Merge checklist

  • PR title is descriptive
  • PR body contains links to related and resolved issues (e.g. closes #1)
  • If needed, CHANGELOG.md updated
  • If needed, docs and/or README.md updated
  • If needed, unit tests added
  • All checks passing (comment pre-commit.ci autofix if pre-commit is failing)
  • At least one approval

📚 Documentation preview 📚: https://earthaccess--1272.org.readthedocs.build/en/1272/

…solidated API to virtualize() and removed consolidate_metadata()
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 26, 2026

Binder 👈 Launch a binder notebook on this branch for commit 7054bb1

Binder 👈 Launch a binder notebook on this branch for commit eab2365

Binder 👈 Launch a binder notebook on this branch for commit 67fd706

Binder 👈 Launch a binder notebook on this branch for commit ee15996

Binder 👈 Launch a binder notebook on this branch for commit 896ac4e

Binder 👈 Launch a binder notebook on this branch for commit fd2d152

Binder 👈 Launch a binder notebook on this branch for commit bf178d3

Binder 👈 Launch a binder notebook on this branch for commit 4eac74b

Binder 👈 Launch a binder notebook on this branch for commit 768d81e

Binder 👈 Launch a binder notebook on this branch for commit 8dc248a

Binder 👈 Launch a binder notebook on this branch for commit c74eaca

Binder 👈 Launch a binder notebook on this branch for commit 9a09ae5

Binder 👈 Launch a binder notebook on this branch for commit e33e37f

@betolink betolink marked this pull request as ready for review March 26, 2026 18:46
chuckwondo
chuckwondo previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betolink, this looks great. I have only a few minor suggestions, but I'm approving, as nothing is a show-stopper from my perspective.

Comment thread earthaccess/virtual/_credentials.py Outdated
Comment thread earthaccess/virtual/_credentials.py Outdated
Comment thread earthaccess/virtual/_credentials.py Outdated
Comment thread earthaccess/virtual/_credentials.py

if access == "direct":
credentials_endpoint, region = get_granule_credentials_endpoint_and_region(
granules[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of granules could be empty, so I suggest handling this case with a pertinent ValueError, otherwise the user will get a less helpful IndexError here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, not sure if the Obstore registry instantiation is the best way to place this but for now... if the granules are not a empty list and they have data inks, in theory they could be virtualized. We can later use a custom "flag" if there is one in CMR about QA e.g. "vritualizable"

Comment thread earthaccess/virtual/_parser.py Outdated
Comment thread earthaccess/virtual/_parser.py Outdated
Comment thread earthaccess/virtual/core.py
@betolink
Copy link
Copy Markdown
Contributor Author

@chuckwondo since I updated the code with the granule validation for Obstore, I think I need your approval again =)

Copy link
Copy Markdown
Contributor

@danielfromearth danielfromearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @betolink! This is such a good step in making the virtualization process more streamlined :)

@betolink betolink merged commit feed410 into earthaccess-dev:main Mar 31, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants