Skip to content

feat(gdal): pre-built tarball install, LFS platform, release workflow#28

Merged
mjohns-databricks merged 4 commits into
mainfrom
gdal-init-packaging
May 20, 2026
Merged

feat(gdal): pre-built tarball install, LFS platform, release workflow#28
mjohns-databricks merged 4 commits into
mainfrom
gdal-init-packaging

Conversation

@mjohns-databricks
Copy link
Copy Markdown
Collaborator

Summary

  • Replace the 10–15 min on-cluster PPA install with a ~30–90 sec install from a CI-built platform tarball; trust binds via a 4-layer chain (CI-side UBUNTUGIS_FPR fingerprint pin → PR review of the LFS-committed tarball → outer .sha256 sidecar in UC Volume → inner SHA256SUMS per-file manifest).
  • Slowly-changing platform tarball (~90 MB) committed under resources/static/geobrix-gdal-platform-noble.tar.gz via Git LFS; the PR that updates it is the single human-review checkpoint for every cluster that subsequently installs. Bundle excludes -dev / autotools build-time helpers — only runtime libs (libgdal37, libproj25, libgeos, proj-data, libhdf5, libnetcdf, libspatialite, source-compiled GDAL Python wheel, pinned pip toolchain, libgdalalljni.so).
  • New package-geobrix-artifacts.yml GitHub workflow grafts the per-release JAR onto the committed platform tarball, builds the Python wheel from a slim hash-pinned requirements-build.in, and attaches six artifacts to a tag (JAR, wheel, GDAL tarball, sidecar, init script, docs zip). workflow_dispatch input hardened against command injection.
  • New scripts/geobrix-gdal-init.sh (tarball-based runtime init). Sidecar-pin trust model — no hash hardcoded in the script; operator stages *.tar.gz + .sha256 in UC Volume, init verifies. Per-host persistent logging at $VOL_DIR/_init_logs/<cluster_id>/<hostname>/init_<ts>.log (writes to /tmp then bulk-cp's on exit to sidestep S3-no-append). Step breadcrumbs (_NN_<step>.txt) throughout for failure forensics even when the EXIT trap is bypassed. No apt-get install -fy fallback — pinned-trust model would be defeated. pip installs are --no-index --find-links= against bundled wheels only.
  • Legacy on-cluster PPA install renamed to scripts/geobrix-gdal-init-ppa.sh and kept for platform-tarball bootstrapping / debugging; ~15-min cold-start cost flagged in the header.
  • scripts/build-gdal-artifacts.sh: local Docker-based tool (run inside ubuntu:24.04) that produces the platform tarball; reviewer reproduces this locally to verify the committed bytes.
  • Docs: installation.mdx rewritten for the tarball flow with troubleshooting for the new persistent logging; security.mdx updated for the 4-layer trust chain + Git LFS PR-review anchor; developers.mdx gets a new Git LFS section; limitations.mdx adds x86_64-only callout + PROJ 9.4.1 vs DBR PROJ 9.7.1 coexistence note; beta-release-notes.mdx adds a v0.3.0 bullet.

Test plan

  • Reproduce the platform tarball locally per resources/static/README.mddocker run --rm --platform linux/amd64 ubuntu:24.04 … and compare the resulting sha256 to the committed sidecar (0e9057e94f0ba7a8cdc37b2ace5dd12176661ca810d998bd9e415fa1e7fac670).
  • Verify the LFS pointer is intact: git show HEAD:resources/static/geobrix-gdal-platform-noble.tar.gz should be 3 lines (version, oid, size), not the 90 MB binary.
  • Cluster smoke test on DBR 17.3 LTS:
    • Upload the tarball + sidecar to a test UC Volume; upload the matching geobrix-gdal-init.sh to a workspace files path; configure cluster init to point at the script.
    • Confirm cluster launches in ~30–90 sec (vs ~15 min for geobrix-gdal-init-ppa.sh).
    • Verify per-host logs land at $VOL_DIR/_init_logs/<cluster_id>/<hostname>/init_<ts>.log and breadcrumbs _01_started.txt_13_script_complete.txt are all present.
    • Run the diagnostic from installation.mdx's "Reading init logs" section to confirm GDAL stack: dpkg -l | grep -E '^ii\s+(libgdal|libproj|libgeos|gdal-bin|gdal-data|python3-gdal)', python -c "from osgeo import gdal; print(gdal.__version__)", env | grep -i proj.
    • Verify DBR's PROJ at /databricks/native/proj-data is untouched (per limitations.mdx note).
  • Optionally dry-run the release workflow against a test tag: gh workflow run package-geobrix-artifacts.yml -f ref=gdal-init-packaging -f attach_to_tag=<test-tag>.
  • Review the doc changes in preview: gbx:docs:dev.

This pull request and its description were written by Isaac.

Michael Johns added 2 commits May 20, 2026 08:38
…se-packaging workflow

Replace the on-cluster PPA install (10-15 min cold cluster start) with a
CI-built platform bundle (~30-90 sec) reused across releases.

Architecture
- resources/static/geobrix-gdal-platform-noble.tar.gz (Git LFS): slowly-
  changing platform layer (PPA-resolved runtime .debs for libgdal37,
  libproj25, libgeos, proj-data, libhdf5, libnetcdf, libspatialite;
  source-compiled GDAL Python wheel; pinned pip toolchain;
  libgdalalljni.so). Rebuilt only on GDAL_PPA_VERSION bumps or Ubuntu
  LTS changes. PR that updates it is the single human-review checkpoint
  for what ships to every cluster.
- scripts/build-gdal-artifacts.sh: local Docker-based tool (run inside
  ubuntu:24.04) that produces the platform tarball with fingerprint-pinned
  UbuntuGIS PPA key check (UBUNTUGIS_FPR). Requests libgdal37 (runtime),
  not libgdal-dev, so the bundle walks only the runtime side of apt's dep
  graph; -dev/autotools build-time helpers are filtered defensively.
- .github/workflows/package-geobrix-artifacts.yml: per-release workflow
  that grafts the JAR onto the committed platform tarball, builds the
  Python wheel from a slim hash-pinned requirements-build.in, and
  attaches six artifacts (JAR, wheel, GDAL tarball, sidecar, init script,
  docs zip) to a tag via gh release upload. workflow_dispatch input is
  bound to env: per command-injection-hardening guidance.
- scripts/geobrix-gdal-init.sh (new): tarball-based runtime init. Sidecar-
  pin trust model (no hash hardcoded in script — operator stages
  *.tar.gz + .sha256 in UC Volume, init verifies). Per-host persistent
  logging at \$VOL_DIR/_init_logs/\$DB_CLUSTER_ID/\$(hostname)/init_*.log
  (writes to /tmp then bulk-cp's on exit to sidestep S3-no-append).
  Step breadcrumbs (_NN_<step>.txt) throughout for failure forensics
  even if EXIT trap skipped. No apt-get install -fy fallback (would
  defeat SHA256-pinned trust). pip installs --no-index --find-links=
  against bundled wheels only.
- scripts/geobrix-gdal-init-ppa.sh (renamed from geobrix-gdal-init.sh):
  legacy on-cluster PPA install path kept for bundle bootstrapping and
  debugging; ~15-min path is flagged in the header.

Trust chain (defense in depth)
1. UBUNTUGIS_FPR fingerprint pin gates what enters the bundle at build
   time.
2. PR review of resources/static/ commit: reviewer reproduces the build
   locally and verifies sha256 matches the committed sidecar.
3. <tarball>.sha256 outer sidecar (operator-staged in UC Volume) gates
   what enters the cluster's extraction step.
4. SHA256SUMS per-file manifest inside the tarball gates what gets
   installed.

Docs
- installation.mdx: rewrote install steps for the tarball flow; added
  troubleshooting subsection for persistent init logs and breadcrumbs.
- security.mdx: documented the 4-layer trust chain including the Git
  LFS PR-review anchor; expanded "build on this foundation" guidance.
- developers.mdx: new "Git LFS" section covering install / clone /
  update procedures.
- limitations.mdx: x86_64-only callout; PROJ version skew note (cluster
  carries our PROJ 9.4.1 alongside DBR's PROJ 9.7.1 at distinct paths).
- beta-release-notes.mdx: 0.3.0 bullet for the new bundle architecture.
- resources/static/README.md: rebuild recipe for the platform tarball.

Co-authored-by: Isaac
Two edits to scripts/geobrix-gdal-init-ppa.sh that were dropped from the
prior commit because `git mv` pre-staged the rename before these edits:

- LEGACY PATH banner at the top documenting the ~15-min cold-start cost
  and pointing readers at scripts/geobrix-gdal-init.sh as the preferred
  path.
- setuptools 74.0.0 → 80.9.0 to match requirements-ci.in; required to
  parse GDAL 3.11+ sdist's PEP 639 SPDX license string.

Co-authored-by: Isaac
…unch

The previous design wrote `init_<timestamp>.log` per launch, which meant
the per-host directory under $VOL_DIR/_init_logs/<cluster-id>/<hostname>/
grew a new file every cluster launch — fine for forensics but not what
we want for operational hygiene.

Switch to stable filenames so each launch overwrites in place:
  init.log              — full output
  _NN_<step>.txt        — step breadcrumbs
  _99_trap_*.txt        — trap-fired markers

Also `rm -f *.txt *.log` in the per-host dir at script start as a
safety net (only the host's own subdir; sibling host subdirs are left
alone so parallel-running worker inits don't lose their writes).

Updates the init script's troubleshooting header and the matching
"Reading init logs" subsection in installation.mdx.

Co-authored-by: Isaac
…e-safe)

The previous design only cleared each host's own subdir on launch. For
ephemeral / autoscaling clusters where worker hostnames change between
launches, that leaves orphaned hostname subdirs from prior launches
forever — directories never empty out, even though file contents do.

Driver now `rm -rf $VOL_DIR/_init_logs/$DB_CLUSTER_ID` at startup,
giving the cluster a fully clean slate. Workers leave peer subdirs
alone (would race with the still-running driver during the initial
startup window, and with other workers' parallel inits during
autoscale). Detection via DB_IS_DRIVER env var, which Databricks sets
on every cluster-scoped init.

Co-authored-by: Isaac
@mjohns-databricks mjohns-databricks merged commit 8b9e10d into main May 20, 2026
7 checks passed
mjohns-databricks pushed a commit that referenced this pull request May 20, 2026
Missing companion lockfile for python/geobrix/requirements-build.in
(added in #28 with the package-geobrix-artifacts.yml workflow). The
workflow's hash-pinned wheel-build step `pip install --require-hashes
-r python/geobrix/requirements-build.txt` errored on its first
v0.3.0 run with "Could not open requirements file".

Generated via:
  cd python/geobrix
  uv pip compile --generate-hashes --python-version 3.12 \
      --output-file requirements-build.txt requirements-build.in

Co-authored-by: Isaac
mjohns-databricks added a commit that referenced this pull request May 20, 2026
#30)

* chore(python): regenerate requirements-build.txt hash-pinned lockfile

Missing companion lockfile for python/geobrix/requirements-build.in
(added in #28 with the package-geobrix-artifacts.yml workflow). The
workflow's hash-pinned wheel-build step `pip install --require-hashes
-r python/geobrix/requirements-build.txt` errored on its first
v0.3.0 run with "Could not open requirements file".

Generated via:
  cd python/geobrix
  uv pip compile --generate-hashes --python-version 3.12 \
      --output-file requirements-build.txt requirements-build.in

Co-authored-by: Isaac

* ci(package-geobrix-artifacts): fail-fast preflights for LFS + lockfile

Two operator-facing preflight steps that surface actionable errors
instead of opaque failures, motivated by the two failed runs of this
workflow against v0.3.0:

1. LFS preflight. actions/checkout@v6 with `lfs: true` errors out with
   a bare `Object does not exist on the server: [404]` when an LFS
   pointer is committed but the bytes were never uploaded to GitHub
   LFS storage. Split the LFS handling out of actions/checkout (now
   `lfs: false`) and run `git lfs pull` in a follow-up step that prints
   the exact `git lfs push origin <branch-or-tag> --all` recovery
   command on failure. Common when a repo is the first to use LFS at
   the org level and storage hasn't been primed.

2. Lockfile preflight. The hash-pinned wheel-build step
   `pip install --require-hashes -r python/geobrix/requirements-build.txt`
   previously errored with pip's generic `Could not open requirements
   file` when the lockfile was missing. New step checks for the file
   up front and prints the `uv pip compile --generate-hashes` command
   to regenerate it.

Co-authored-by: Isaac

---------

Co-authored-by: Michael Johns <user.name>
@mjohns-databricks mjohns-databricks deleted the gdal-init-packaging branch May 20, 2026 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant