Skip to content

Rewrite download infrastructure with pooch#1182

Merged
msimberg merged 10 commits intomainfrom
copilot/rewrite-download-infrastructure
Apr 22, 2026
Merged

Rewrite download infrastructure with pooch#1182
msimberg merged 10 commits intomainfrom
copilot/rewrite-download-infrastructure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 30, 2026

Use pooch (https://www.fatiando.org/pooch/latest/) for downloads. Also downloads tarballs to $TMPDIR by default. Tarballs are deleted on successful extraction.

@jenkins-apn
Copy link
Copy Markdown

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

1 similar comment
@jenkins-apn
Copy link
Copy Markdown

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

Copilot AI linked an issue Mar 30, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits March 30, 2026 10:34
- Replace `wget` library with `pooch` in data_handling.py
- Use `pooch.retrieve()` with `pooch.Untar(extract_dir=".")` for downloading
  and extracting tar archives
- Remove manual temp directory management and tarfile handling
- Remove the private `_perform_download()` helper function
- Update dependency in model/testing/pyproject.toml: wget>=3.2 -> pooch>=1.7.0
- Update root pyproject.toml: wget>=3.2 -> pooch>=1.7.0
- Update uv.lock to reflect dependency changes

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/97eae63c-abce-4bf1-90d6-1f7c34967217

Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I think this already looks good because it's a simple change (I've very briefly tested it locally as well).

However, if copilot (not mentioning you, yet) is motivated, I think there are some adjacent changes that could be made.

Comment thread model/testing/src/icon4py/model/testing/data_handling.py Outdated
Comment thread model/testing/src/icon4py/model/testing/data_handling.py Outdated
Thread known_hash parameter through download_and_extract(),
download_test_data(), _download_grid_file(), and muphys test fixtures.
All hashes are None for now with TODO(msimberg) reminders.

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/9803a0a8-5d2a-40c0-9350-bb28ca915902

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
Copilot AI requested a review from msimberg April 15, 2026 09:16
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
…nces

- Add known_hash=None with TODO(msimberg) to every GridDescription instance
- Add known_hashes dict field to Experiment dataclass (per-rank: 1, 2, 4)
- Add known_hashes={1: None, 2: None, 4: None} with TODO(msimberg) to every Experiment instance
- Thread known_hashes through _download_ser_data in datatest.py

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/6dcb0164-6d9b-4703-8785-37167685647b

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
Copilot AI requested a review from msimberg April 15, 2026 10:47
Comment thread model/testing/src/icon4py/model/testing/fixtures/datatest.py Outdated
Comment thread model/testing/src/icon4py/model/testing/data_handling.py Outdated
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
… values

- Remove default=None from GridDescription.known_hash, MuphysExperiment.known_hash
- Remove default_factory from Experiment.known_hashes, reorder before version field
- Remove default=None from download_and_extract() and download_test_data() known_hash param
- Use [] instead of .get() for known_hashes lookup in _download_ser_data
- Add explicit known_hash=None TODO(msimberg) to all MuphysExperiment instances

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/80900bd2-f4e9-4d60-8600-a3543d7c8cbd

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
@msimberg
Copy link
Copy Markdown
Contributor

We need to check that muphys downloads also still work with these changes.

@msimberg msimberg changed the title [WIP] Rewrite download infrastructure with pooch Rewrite download infrastructure with pooch Apr 20, 2026
@msimberg msimberg requested a review from nfarabullini April 20, 2026 09:28
@msimberg msimberg marked this pull request as ready for review April 20, 2026 09:28
@github-actions
Copy link
Copy Markdown

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Copy link
Copy Markdown
Contributor

@jcanton jcanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice and clean

@jcanton
Copy link
Copy Markdown
Contributor

jcanton commented Apr 21, 2026

@msimberg did you test it manually already?

@msimberg
Copy link
Copy Markdown
Contributor

@msimberg did you test it manually already?

Not yet...

Copy link
Copy Markdown
Contributor

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wiped the grid file cache for github workflows and they were downloaded successfully. Jobs are using the new cache again.

I also tried this locally and was able to interrupt both a download and an extraction. For the interrupted extraction the downloaded tarball was reused.

Using automatic download in CSCS CI will be tested separately.

@msimberg
Copy link
Copy Markdown
Contributor

cscs-ci run default

@msimberg
Copy link
Copy Markdown
Contributor

cscs-ci run distributed

@msimberg msimberg merged commit 7d9b9ea into main Apr 22, 2026
102 checks passed
@msimberg msimberg deleted the copilot/rewrite-download-infrastructure branch April 22, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite download infrastructure with pooch

4 participants