Skip to content

Local ci:testmock and GHA CI have diverged env-var surfaces #958

@jakebromberg

Description

@jakebromberg

Problem

Per #164, GHA CI deliberately runs the integration suite via host node processes with env from the .github/workflows/test.yml Start services step. The local npm run ci:testmock path still uses dev_env/docker-compose.yml with env from .env + the ci backend service block. These two env surfaces have diverged: env vars added to one can silently miss the other.

#955 is the canonical incident — G4's process-wide LML limiter (LML_CLIENT_MAX_CONCURRENT / LML_CLIENT_RATE_PER_MIN) needed test-env overrides; the unit test author for G4 added the override to the unit harness only, and neither CI surface got it. The integration suite drained the bucket and metadata-lml.spec.js started timing out. The fix in #957 added the env vars to both surfaces by hand, with a source-grep test (tests/unit/scripts/lml-limiter-test-env.test.ts) pinning each independently — the only safeguard against future drift today.

The .env shape required for the local docker-compose path is also undocumented (TEST_HOST, AUTH_USERNAME, AUTH_PASSWORD, DB_USERNAME/DB_PASSWORD/DB_NAME). Debugging the divergence during BS#955 surfaced 5 separate env mismatches before tests could run.

End state

Pick one of the following (or propose alternative):

  1. Mechanical guard — a CI check that diffs the env-var surface between Start services and the ci-profile backend env block; alarms on divergence (or requires an opt-out comment per var). Smallest change, preserves dual-path.
  2. Shared source of truth — factor the env into a .env.ci (or YAML fragment) that both ci:testmock's docker-compose and Start services consume. Eliminates the divergence at the root.
  3. Retire ci:testmock — since GHA is what gates merges, drop the local docker-based path. Move local repro to a host-process script that mirrors the workflow. Largest change but eliminates the entire problem class.

Files

  • .github/workflows/test.ymlStart services env block (lines 343-370 at HEAD).
  • dev_env/docker-compose.yml — ci backend service env (lines 157-225 at HEAD).
  • scripts/ci-env.sh, scripts/ci-test.sh — local-only path scaffolding.

Constraints

  • Local repro must remain useful for new contributors. If retiring ci:testmock, the replacement must be documented in CLAUDE.md / README.md.
  • BS#164's "host process model on CI" decision is not in scope to revisit.

Acceptance criteria

  • Adding a new env var to one CI surface either applies to the other automatically OR fails a CI check.
  • The .env shape required for local repro is either eliminated or documented in CLAUDE.md / README.md.
  • tests/unit/scripts/lml-limiter-test-env.test.ts can be deleted (the broader mechanism obviates the per-PR pin).

Related

  • #164 — the deliberate divergence decision (closed 2026-03-08).
  • #955 — the incident that exposed the gap.
  • #957 — the manual fix this ticket would obsolete.
  • #207 — related discussion on dynamic service startup in CI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    choreMaintenance and housekeepingciCI/CD and testing infrastructure

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions