Skip to content

FixDataQueryDeps — Overhaul dependency model, env setup tooling, CI, and release workflow#266

Merged
lbesnard merged 23 commits intomainfrom
FixDataQueryDeps
Apr 8, 2026
Merged

FixDataQueryDeps — Overhaul dependency model, env setup tooling, CI, and release workflow#266
lbesnard merged 23 commits intomainfrom
FixDataQueryDeps

Conversation

@lbesnard
Copy link
Copy Markdown
Collaborator

@lbesnard lbesnard commented Mar 31, 2026

What changed

Bug Fix

  • aodn_cloud_optimised/__init__.py: Removed the top-level DataQuery import entirely. DataQuery must now be imported directly (from aodn_cloud_optimised.lib.DataQuery import GetAodn), which fixes poetry install (without extras) being broken at import time due to missing optional dependencies.

Dependency Architecture (pyproject.toml)

  • Migrated from [tool.poetry.group.dev.dependencies] to PEP 621 [project.optional-dependencies] with four named extras:
    • notebooks — DataQuery + visualisation libs (cartopy, matplotlib, seaborn, folium, tqdm, papermill, …)
    • tests — pytest, coverage, moto
    • docs — Sphinx and related tools
    • dev — contributor tooling (poetry, pre-commit, ipdb, virtualenv)
  • python-levenshtein / fuzzywuzzy (DataQuery-only) moved from core → notebooks extra.
  • Python version tightened to >=3.11,<3.13.
  • poetry-core bumped to 2.3.2.

New: .poetry-version

Single source of truth for the Poetry version used across all CI workflows, Dockerfile, and environment.yml.

New: Makefile

Convenience targets for environment management:

  • make core / make notebooks / make tests / make docs / make dev / make clean

Each target calls poetry sync --extras "..." and correctly locates the global poetry binary even when invoked from inside an active venv.

New: setup_miniforge_venvs.sh

Bash script to create named conda/mamba environments (one per install mode: core, notebooks, tests, docs, dev, all). Alternative to the Makefile for contributors who prefer named conda environments.

New: poetry.toml

Ensures Poetry always creates .venv inside the project root (virtualenvs.in-project = true).

CI Workflows

  • build.yml: Reads Poetry version from .poetry-version; removed unreliable venv caching; uses make tests for install; added 3-tier wheel verification (Production / Notebooks / Dev modes) each installing the built wheel and validating real imports.
  • pre-commit.yml: Reads Poetry version from .poetry-version; uses make dev instead of bare poetry install.
  • release.yml (major overhaul):
    • Trigger changed from on: release: createdon: workflow_dispatch with a bump_type choice (patch / minor / major).
    • Fully automated pipeline: tests → version bump → build frozen + unfrozen wheel + sdist → verify → git commit + tag + push → GitHub Release with all artifacts.
    • if: ${{ !env.ACT }} guards allow end-to-end local testing with act without pushing or creating releases.

Documentation

  • docs/development/release.rst (new): Full documentation for the automated release workflow, wheel types, and local testing with act.
  • docs/development/installation.rst (rewritten): Covers PEP 621 extras table, Makefile workflow, setup_miniforge_venvs.sh, and core-only install.
  • README.md: New Core install section, updated Development section.
  • notebooks/README.md: Two local setup options — Poetry (make notebooks) and Mamba/Conda (setup_miniforge_venvs.sh notebooks).

Other

  • environment.yml, Dockerfile: Updated to use .poetry-version.
  • requirements.txt / notebooks/requirements.txt: Regenerated after pyproject.toml restructure.
  • poetry.lock: Regenerated with Poetry 2.3.x.
  • .gitignore: Added notebooks/Untitled.ipynb and artifacts/.
  • aodn_cloud_optimised/bin/validate_data_query.py: Added shebang, minor cleanup.

Copilot AI review requested due to automatic review settings March 31, 2026 02:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix broken production installs (Poetry install without dev dependencies) caused by DataQuery being imported at package import-time, and adds CI checks to validate the library in both “prod-only” and “with dev” dependency modes.

Changes:

  • Make the Poetry dev dependency group explicitly optional.
  • Change aodn_cloud_optimised package initialization to tolerate missing dev-only dependencies by guarding DataQuery.
  • Update CI build workflow to verify imports/behavior in both production-only and dev modes.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pyproject.toml Marks Poetry dev group as optional to support prod-only installs.
poetry.lock Updates lock metadata hash after dependency/group changes.
notebooks/requirements.txt Refreshes exported notebook requirements (new deps/markers).
aodn_cloud_optimised/bin/validate_data_query.py Adds shebang and removes stray blank line in version extraction loop.
aodn_cloud_optimised/__init__.py Wraps DataQuery import with a fallback stub to allow prod installs without dev deps.
.github/workflows/build.yml Adds prod/dev-mode verification steps after building.

Comment thread aodn_cloud_optimised/__init__.py Outdated
Comment on lines +6 to +12
try:
# Attempt to load the dev-only module
from .lib import DataQuery

__all__.append("DataQuery")
except (ImportError, ModuleNotFoundError):
# Define a guard function that only raises an error if called
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except around from .lib import DataQuery catches any ImportError, which can silently mask real bugs inside aodn_cloud_optimised.lib.DataQuery (or its imports) even when dev deps are installed, and then replace it with the placeholder. Consider narrowing the handling (e.g., only fallback for missing optional deps / specific ModuleNotFoundError cases) and/or preserving the original exception as the cause so failures are diagnosable.

Copilot uses AI. Check for mistakes.
Comment thread aodn_cloud_optimised/__init__.py Outdated
Comment thread .github/workflows/build.yml Outdated
Comment on lines +96 to +105
# - name: Verify build
# run: |
# pip install dist/*.whl --no-deps

- name: Verify build (Production Mode)
run: |
# 1. Install ONLY production dependencies (no dev tools)
# --sync ensures the environment matches the lock file exactly
poetry install --without dev --sync

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new verification steps no longer install/test the built wheel artifact (the previous pip install dist/*.whl --no-deps check is commented out). As written, this validates poetry install from the repo source, not the package produced by poetry build, so packaging issues in the wheel/sdist could slip through. Consider restoring a wheel install test in an isolated env for both production and dev scenarios.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/build.yml Outdated
lbesnard and others added 13 commits March 31, 2026 13:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sion as source of truth

- Add .poetry-version file (2.3.3) with full sync checklist for all locations
- GHA workflows (build, pre-commit, release) now read version from .poetry-version
  via 'Read Poetry version' step — no manual edits needed in workflows on bump
- Upgrade poetry 2.2.1 → 2.3.3 across all sync points:
  environment.yml, pyproject.toml dev extras, .pre-commit-config.yaml rev:
- Fix stale mismatch: pre-commit.yml had poetry 2.3.3 in snok, now reads from file
- Pin pre-commit 4.5.1 exactly (was >=4.0.1) in pyproject.toml
- Regenerate poetry.lock with poetry 2.3.3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@ccef564). Learn more about missing BASE report.
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #266   +/-   ##
=======================================
  Coverage        ?   65.89%           
=======================================
  Files           ?       28           
  Lines           ?     5324           
  Branches        ?        0           
=======================================
  Hits            ?     3508           
  Misses          ?     1816           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lbesnard and others added 6 commits April 2, 2026 12:52
Describes the workflow_dispatch-based release.yml:
- how to trigger patch/minor/major bumps from GitHub Actions UI
- what each workflow step does (test, bump, frozen/unfrozen wheels, tag, release)
- frozen vs unfrozen wheel explanation and install instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… → --extras

- Remove hardcoded ENV POETRY_VERSION=2.2.1; COPY .poetry-version and read
  it via shell substitution in the RUN step — Dockerfile now auto-syncs on bump
- Fix broken 'poetry install --with dev' (old group syntax, removed in Phase 2)
  → 'poetry install --extras "notebooks tests docs dev"'
- Update .poetry-version comment to list Dockerfile as auto-read (no manual sync)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…miniforge_venvs.sh

Replace stale mamba env create + AodnCloudOptimisedQuery instructions with:
- Option 1: make notebooks (Poetry venv)
- Option 2: ./setup_miniforge_venvs.sh notebooks (conda/mamba → AodnCloudOptimised_notebooks)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lbesnard
Copy link
Copy Markdown
Collaborator Author

lbesnard commented Apr 2, 2026

@copilot create a thorough summary of what was changed in this PR, many new features

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 2, 2026

@copilot create a thorough summary of what was changed in this PR, many new features

Here is a thorough summary of all changes in this PR:


PR Summary — FixDataQueryDeps

🐛 Root Bug Fix

aodn_cloud_optimised/__init__.py — The top-level package __init__.py previously imported DataQuery at load time. Because DataQuery depends on optional libraries (e.g. fuzzywuzzy, cartopy, matplotlib) that are not part of the core install, poetry install (without dev/notebook extras) would fail. The fix removes all imports from __init__.py entirely. DataQuery must now be imported directly: from aodn_cloud_optimised.lib.DataQuery import GetAodn.


📦 Dependency Architecture Overhaul — pyproject.toml

The biggest structural change: dev/optional dependencies were migrated from [tool.poetry.group.dev.dependencies] to PEP 621 [project.optional-dependencies], split into four named extras:

Extra Contents When to use
notebooks DataQuery + viz libs (cartopy, matplotlib, seaborn, folium, tqdm, papermill, …) Jupyter notebooks, DataQuery API
tests pytest, coverage, moto Running the test suite
docs Sphinx, sphinx-rtd-theme, asciinema Building documentation
dev poetry, pre-commit, ipdb, virtualenv Contributor tooling
  • python-levenshtein and fuzzywuzzy (DataQuery-only deps) moved from core → notebooks extra.
  • Python version constraint tightened to >=3.11,<3.13 (avoids source builds for h5py/netCDF4).
  • poetry-core bumped to 2.3.2.

🆕 New File: .poetry-version

A single source of truth for the Poetry version used across all CI workflows, Dockerfile, and environment.yml. All workflows now read from this file instead of hardcoding the version.


🆕 New File: Makefile

Convenience targets for environment management:

make core       # core data-processing only
make notebooks  # core + DataQuery/Jupyter
make tests      # core + tests + notebooks (CI)
make docs       # core + Sphinx
make dev        # everything — full contributor setup
make clean      # wipe .venv and reinstall

Each target calls poetry sync --extras "..." and correctly finds the global poetry binary even when run from inside an active venv.


🆕 New File: setup_miniforge_venvs.sh

A bash script for contributors who prefer named conda/mamba environments over Poetry's single .venv. Creates one named environment per install mode (e.g. AodnCloudOptimised_dev, AodnCloudOptimised_notebooks).

./setup_miniforge_venvs.sh dev        # full contributor setup
./setup_miniforge_venvs.sh notebooks  # notebooks only
./setup_miniforge_venvs.sh all        # create all environments

🆕 New File: poetry.toml

Ensures poetry always creates the .venv inside the project root (virtualenvs.in-project = true), so IDE tooling and make targets always find the right environment.


⚙️ CI Workflows

build.yml (updated):

  • Reads Poetry version from .poetry-version.
  • Removed unreliable venv caching.
  • Uses make tests (= poetry sync --extras "tests notebooks") for dependency install.
  • Added 3-tier wheel verification: Production Mode (core only), Notebooks Mode, and Dev Mode, each installing the built wheel and validating real imports.

pre-commit.yml (updated):

  • Reads Poetry version from .poetry-version.
  • Uses make dev for full contributor install instead of bare poetry install.

release.yml (major overhaul):

  • Changed trigger from on: release: createdon: workflow_dispatch with a bump_type choice (patch / minor / major). No manual tag creation needed anymore.
  • Fully automated pipeline: tests → version bump (Python script, no Poetry CLI needed) → build (frozen + unfrozen wheel + sdist) → verify → git commit + tag + push → GitHub Release with all artifacts.
  • Produces two wheels in dist-artifacts/: a frozen wheel (all deps pinned via poetry-plugin-freeze) and an unfrozen wheel (flexible ranges).
  • Steps guarded with if: ${{ !env.ACT }} so the full pipeline can be tested locally with act without pushing or creating releases.

📄 Documentation

docs/development/release.rst (new file):
Full documentation for the new automated release process, covering the workflow trigger, each step, the two wheel types, local testing with act, and manual version bump instructions.

docs/development/installation.rst (major rewrite):

  • New Dependency Extras table.
  • New Makefile (recommended) section.
  • setup_miniforge_venvs.sh section replacing old conda/mamba instructions.
  • Core-only install section.
  • Updated poetry export commands for the new extras model.

README.md (updated):

  • New Core install section with make core.
  • Updated Development section pointing to make dev / setup_miniforge_venvs.sh dev.

notebooks/README.md (updated):

  • Two local setup options: Poetry (make notebooks) and Mamba/Conda (setup_miniforge_venvs.sh notebooks).

🔧 Other Changes

  • environment.yml: Updated to Poetry 2.3.3; removed -r requirements.txt (handled by extras now).
  • Dockerfile: Reads Poetry version from .poetry-version at build time.
  • requirements.txt / notebooks/requirements.txt: Regenerated after pyproject.toml restructure.
  • poetry.lock: Regenerated with Poetry 2.3.x to match updated pyproject.toml.
  • .gitignore: Added notebooks/Untitled.ipynb and artifacts/.
  • aodn_cloud_optimised/bin/validate_data_query.py: Added shebang, minor cleanup.

Copilot AI changed the title FixDataQueryDeps FixDataQueryDeps — Overhaul dependency model, env setup tooling, CI, and release workflow Apr 2, 2026
@thommodin
Copy link
Copy Markdown
Contributor

Looks like maybe nbformat got missed as a dependency?

make core
env -u VIRTUAL_ENV PATH="/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/usr/local/go/bin:/opt/podman/bin:/Users/thommodin/.cargo/bin:/Users/thommodin/.lmstudio/bin" /opt/homebrew/bin/poetry sync
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: aodn-cloud-optimised (0.1.67)

poetry run cloud_optimised_create_dataset_config -f s3://aodn-dataflow-dev/thomas.galindo/processing/stored/dugong/dugong.parquet -b aodn-dataflow-dev -c parquet
Traceback (most recent call last):
File "/Users/thommodin/dev/aodn_cloud_optimised/.venv/bin/cloud_optimised_create_dataset_config", line 3, in
from aodn_cloud_optimised.bin.create_dataset_config import main
File "/Users/thommodin/dev/aodn_cloud_optimised/aodn_cloud_optimised/bin/create_dataset_config.py", line 48, in
import nbformat
ModuleNotFoundError: No module named 'nbformat'

@thommodin
Copy link
Copy Markdown
Contributor

Looks like maybe nbformat got missed as a dependency?

make core
env -u VIRTUAL_ENV PATH="/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/usr/local/go/bin:/opt/podman/bin:/Users/thommodin/.cargo/bin:/Users/thommodin/.lmstudio/bin" /opt/homebrew/bin/poetry sync
Installing dependencies from lock file
No dependencies to install or update
Installing the current project: aodn-cloud-optimised (0.1.67)

poetry run cloud_optimised_create_dataset_config -f s3://aodn-dataflow-dev/thomas.galindo/processing/stored/dugong/dugong.parquet -b aodn-dataflow-dev -c parquet
Traceback (most recent call last):
File "/Users/thommodin/dev/aodn_cloud_optimised/.venv/bin/cloud_optimised_create_dataset_config", line 3, in
from aodn_cloud_optimised.bin.create_dataset_config import main
File "/Users/thommodin/dev/aodn_cloud_optimised/aodn_cloud_optimised/bin/create_dataset_config.py", line 48, in
import nbformat
ModuleNotFoundError: No module named 'nbformat'

Fixed by running make dev instead of make core

@lbesnard lbesnard merged commit 797a713 into main Apr 8, 2026
7 checks passed
@lbesnard lbesnard deleted the FixDataQueryDeps branch April 8, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants