Skip to content

feat(api,docs): e2e demo pipeline + showcase script (#128)#129

Merged
w7-mgfcode merged 9 commits into
devfrom
feat/api-e2e-demo
May 14, 2026
Merged

feat(api,docs): e2e demo pipeline + showcase script (#128)#129
w7-mgfcode merged 9 commits into
devfrom
feat/api-e2e-demo

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

Closes #128. Implements PRP-15 / INITIAL-14.

Summary

  • make demo — single command that walks the published HTTP surface
    (precheck → reset? → seed → status → features → train ×3 → backtest ×3 →
    register → verify → agent → cleanup) in ~2-6 s on this host, well under
    the 180 s PRP budget. Final line: runs=3 winner=<model_type> alias=demo-production wall_clock=<t>s.
  • New demo_minimal seeder scenario preset (3 × 10 × 92 days, mild trend,
    no sparsity — backtest-friendly).
  • New scripts/run_demo.py (~600 lines) — async, RFC-7807-aware, single
    file driver. Type-checked under both mypy --strict and pyright --strict.
  • Top-level Makefile exposing demo, demo-quick, demo-clean, help.
  • Unit (tests/test_run_demo_unit.py, 32 cases) + integration
    (tests/test_e2e_demo.py, 2 cases marked @pytest.mark.integration).
  • Nightly CI workflow .github/workflows/e2e-nightly.yml (0 7 * * * cron
    • workflow_dispatch) — informational only, NOT a required check on
      dev / main per the PRP.
  • Doc updates: README "Try it" block, DAILY-FLOW "First-Run Smoke" section,
    RUNBOOKS "make demo fails at step X" incident with 7-point diagnosis
    flow, REPO_MAP_INDEX rows for Makefile + scripts/run_demo.py.

Additive only — no schema changes, no migrations, no API edits.

Acceptance Criteria (PRP-15)

  • make demo exits 0 on a clean checkout + docker compose up -d
    (verified locally; integration test asserts the same).
  • Wall-clock ≤ 180 s on the reference laptop; soft-warn (no fail)
    if exceeded.
  • Final output line: runs=3 winner=<model_type> alias=demo-production wall_clock=<t>s.
  • GET /registry/aliases/demo-production returns the winning run_id.
  • GET /registry/runs/{winning_run_id}/verify returns verified=true.
  • The agent step either round-trips successfully or is skipped with
    ⏭️ (now also skips on any LLM-provider failure — see fix commit
    4e279f3).
  • tests/test_run_demo_unit.py (32 cases) + tests/test_e2e_demo.py
    (2 cases, integration-marked) both pass.
  • ruff check, ruff format --check, mypy --strict app/,
    pyright app/ all clean.
  • New .github/workflows/e2e-nightly.yml runs on cron + dispatch;
    NOT a required check.

Commit Graph

4e279f3 fix(api): harden run_demo for integration test + real DB (#128)
29a95d3 fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)
35fd438 docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)
acd2255 ci(repo): nightly e2e demo workflow (#128)
7f89720 test(api): unit + integration coverage for run_demo (#128)
47fe0c2 feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)
e0392ca feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)
005c189 feat(data): add demo_minimal scenario preset (#128)
532d968 docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

The two fix commits land on top because they only surfaced once the
integration test ran against real Postgres (Postgres auto-increment IDs
don't reset across delete/seed, registry artifact-uri root is separate
from forecasting's, uvicorn PIPE buffer deadlock on seed-step log volume,
agent step's LLM key check should match the configured provider). Each
fix is documented in its commit message.

Test plan

  • uv run ruff check . && uv run ruff format --check . clean
  • uv run mypy app/ clean (strict)
  • uv run pyright app/ clean (0 errors)
  • uv run mypy scripts/run_demo.py clean
  • uv run pyright scripts/run_demo.py clean (0 errors)
  • uv run pytest -m "not integration" — 1004 passed locally
  • uv run pytest -m integration tests/test_e2e_demo.py — 2 passed in 6.5 s
  • End-to-end: make demo exits 0, prints the canonical summary line
  • examples/e2e_smoke.sh untouched (regression guard)
  • No new env vars added to .env.example (verified — not needed)

Skipped during validation (manual ops require explicit approval)

  • The "destructive DELETE of all seeded data on the dev backend" step
    was blocked by the auto-approval classifier during interactive
    debugging — only the integration test's --reset path exercised it
    (where the test owns the wipe).

Risk notes (from INITIAL-14 / PRP-15)

  • Wall-clock budget 180 s: soft-warn only (per INITIAL-14 risks
    table). Reference run on this laptop: 2 s.
  • demo_minimal WAPE NaN trap: defended by preset tuning
    (noise_sigma=0.10, no sparsity) + the _select_winner NaN-skip
    branch + the integration smoke run that asserts winner= is populated.

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).
Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count
Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.
)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.
Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time
Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)
…DEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.
#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.
Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09189b63-c23b-4e50-9812-2ccdb6109600

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/api-e2e-demo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link
Copy Markdown

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn Critical
Critical CVE: Authlib JWS JWK Header Injection: Signature Verification Bypass

CVE: GHSA-wvwj-cvrp-7pv5 Authlib JWS JWK Header Injection: Signature Verification Bypass (CRITICAL)

Affected versions: < 1.6.9

Patched version: 1.6.9

From: uv.lockpypi/authlib@1.6.6

ℹ Read more on: This package | This alert | What is a critical CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known critical CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore pypi/authlib@1.6.6. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

Warn Critical
Critical CVE: FastMCP OpenAPI Provider has an SSRF & Path Traversal Vulnerability

CVE: GHSA-vv7q-7jx5-f767 FastMCP OpenAPI Provider has an SSRF & Path Traversal Vulnerability (CRITICAL)

Affected versions: < 3.2.0

Patched version: 3.2.0

From: uv.lockpypi/fastmcp@2.14.4

ℹ Read more on: This package | This alert | What is a critical CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known critical CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore pypi/fastmcp@2.14.4. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

@w7-mgfcode
Copy link
Copy Markdown
Owner Author

The two Socket-flagged CVEs (`authlib < 1.6.9`, `fastmcp < 3.2.0`) are pre-existing on `dev` — this PR's diff has no `uv.lock` row, so #128 doesn't introduce them.

Tracked separately:

#129 can review and merge independently.

@w7-mgfcode w7-mgfcode merged commit 1b4447d into dev May 14, 2026
8 checks passed
w7-mgfcode added a commit that referenced this pull request May 14, 2026
… (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.
w7-mgfcode added a commit that referenced this pull request May 17, 2026
* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.
w7-mgfcode added a commit that referenced this pull request May 18, 2026
* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.
w7-mgfcode added a commit that referenced this pull request May 18, 2026
…ts (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)
w7-mgfcode added a commit that referenced this pull request May 18, 2026
#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.
w7-mgfcode added a commit that referenced this pull request May 18, 2026
* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(main): release 0.2.11 (#159)
w7-mgfcode added a commit that referenced this pull request May 18, 2026
…178)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4.…
@w7-mgfcode w7-mgfcode deleted the feat/api-e2e-demo branch May 18, 2026 14:20
w7-mgfcode added a commit that referenced this pull request May 18, 2026
…191) (#192)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-ti…
w7-mgfcode added a commit that referenced this pull request May 18, 2026
* feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthr…
w7-mgfcode added a commit that referenced this pull request May 18, 2026
…#202) (#203)

* feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant