Skip to content

Pin env-var surface parity between docker-compose ci backend and GHA workflow (BS#958)#964

Merged
jakebromberg merged 3 commits into
mainfrom
fix-958-env-surface-guard
May 19, 2026
Merged

Pin env-var surface parity between docker-compose ci backend and GHA workflow (BS#958)#964
jakebromberg merged 3 commits into
mainfrom
fix-958-env-surface-guard

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Closes #958.

Summary

tests/unit/scripts/ci-env-surface-parity.test.ts — source-grep test that extracts env-var KEYS from both CI surfaces and asserts any divergence falls in an explicit allowlist. Adding a new env var to one surface now fails CI unless mirrored or justified.

Why

Per #164, the GHA integration suite deliberately runs the backend as a host node process with env from .github/workflows/test.yml. npm run ci:testmock runs it as a docker container with env from dev_env/docker-compose.yml. The two env surfaces drift independently, and a new var landing in one but not the other can silently break the integration suite.

#955 was the most recent symptom: G4's LML_CLIENT_MAX_CONCURRENT + LML_CLIENT_RATE_PER_MIN overrides made it into the unit-test harness but neither CI surface, so the integration suite's bucket drained and metadata-lml.spec.js started timing out. The manual fix in #957 wired the limiter env vars to both sides with a per-PR source-grep test pinning the values. This PR generalizes the pattern so the next limiter knob (or any other new env var) doesn't need its own per-PR pin.

Scope choice

Of the three options listed in #958:

  1. Mechanical guard ← this PR. Smallest change, preserves the dual-path model.
  2. Shared source of truth (factor into .env.ci consumed by both).
  3. Retire ci:testmock entirely.

(2) and (3) are deferred — both are larger investments and the mechanical guard catches the bug class today.

How it works

The test asserts:

  • actualOnlyInCompose === EXPECTED_ONLY_IN_COMPOSE
  • actualOnlyInWorkflow === EXPECTED_ONLY_IN_WORKFLOW

The two allowlist arrays capture today's intentional divergence (18 compose-only keys, 9 workflow-only keys), each with a one-line // Why: comment — the deliberate-thought receipt. When the test fails, the dev either:

  • Mirrors the new var to the other surface (preferred), or
  • Adds it to the appropriate allowlist with a Why: justification.

What this does not do

  • Doesn't enforce value equality. LML_CLIENT_RATE_PER_MIN=60000 in workflow vs =10000 in compose would both pass this guard. The existing lml-limiter-test-env.test.ts value-pin therefore stays useful and is not retired.
  • Doesn't fix the existing divergences. Some entries on the allowlist (e.g. CATALOG_TRACK_SEARCH_* only in compose, BETTER_AUTH_AUDIENCE/ISSUER/JWKS_URL only in compose) may be genuine coverage gaps — but resolving each is its own ticket.

Test plan

  • Test passes on main (npx jest --testPathPatterns ci-env-surface-parity) — 2/2 pass.
  • Full unit suite — 1978/1978 pass.
  • Negative test: temporarily added FAKE_GUARD_TEST_KEY=value to the compose backend env; test failed with a clean diff showing the unauthorized key. File restored.
  • Typecheck + format:check clean.

Related

…workflow (BS#958)

Per BS#164, the GHA integration suite runs the backend as a host node process with env from `.github/workflows/test.yml`; `npm run ci:testmock` runs it as a docker container with env from `dev_env/docker-compose.yml`. The two env surfaces drift independently — BS#955 was the most recent symptom (G4's `LML_CLIENT_*` overrides made it into the unit harness but neither CI surface) and the manual fix in BS#957 pinned the limiter env vars on both sides with a per-PR source-grep test.

This commit generalizes that pattern. `tests/unit/scripts/ci-env-surface-parity.test.ts` extracts the env-key set from both surfaces and asserts that any divergence falls within an explicit allowlist. Each allowlist entry carries a one-line `Why:` justifying the divergence — the deliberate-thought receipt. Adding a new env var to one surface now fails CI unless either (a) the var is mirrored to the other or (b) the allowlist is amended with a justification.

The allowlist captures today's intentional divergence (18 compose-only keys, 9 workflow-only keys). The lml-limiter-test-env.test.ts value-pin remains useful and is not retired — this guard enforces key presence, not value equality.

Verified the test catches a regression: temporarily added a `FAKE_GUARD_TEST_KEY` to the compose backend env, ran the test, observed a clean diff failure showing the unauthorized key.
…eedback)

The first cut missed the `Integration-Tests:` job-level `env:` block (lines 266-281), which sets DB_HOST, DB_PORT, BETTER_AUTH_AUDIENCE/ISSUER/JWKS_URL, AUTH_PORT, BETTER_AUTH_SECRET, RUN_TESTS on the host-process backend. As a result the prior allowlist marked five vars as "only in compose" with `Why:` comments asserting that the workflow relied on backend defaults — but the workflow does set them, just in the job scope. That misinformation undermined the deliberate-thought-receipt purpose of the `Why:` lines.

Fixes:
- Split `extractWorkflowEnvKeys()` into three named extractors (`extractTopLevelEnv`, `extractIntegrationTestsJobEnv`, `extractStartServicesStepEnv`), unioned at the call site. Anchored the top-level extractor to the pre-`jobs:` slice so it can't accidentally match a job-level `env:` if file order changes.
- Fixed a slice-off-by-one in the job-env extractor: `slice(0, stepsIdx)` excluded the trailing newline of the last env line, silently dropping `BETTER_AUTH_SECRET`. Adjusted to `slice(0, stepsIdx + 1)` and documented the gotcha inline.
- Removed BETTER_AUTH_AUDIENCE/ISSUER/JWKS_URL, DB_HOST, DB_PORT from EXPECTED_ONLY_IN_COMPOSE (they're in the workflow job env).
- Added AUTH_PORT, BETTER_AUTH_SECRET, RUN_TESTS to EXPECTED_ONLY_IN_WORKFLOW (genuinely workflow-only, with accurate `Why:` lines).
- Header docstring updated to describe the three-scope union and dropped the stale reference to the BS#959 sibling test (not in this branch).
@jakebromberg jakebromberg merged commit 16dede6 into main May 19, 2026
5 checks passed
@jakebromberg jakebromberg deleted the fix-958-env-surface-guard branch May 19, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local ci:testmock and GHA CI have diverged env-var surfaces

1 participant