Skip to content

feat: release v0.2.14 — UI interactivity, AI admin console, agent reliability fixes#201

Merged
w7-mgfcode merged 36 commits into
mainfrom
dev
May 18, 2026
Merged

feat: release v0.2.14 — UI interactivity, AI admin console, agent reliability fixes#201
w7-mgfcode merged 36 commits into
mainfrom
dev

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

@w7-mgfcode w7-mgfcode commented May 18, 2026

Release: devmain (v0.2.14)

Cuts the next release off dev. Current released version is 0.2.13; the changes below are feat: + fix: so release-please will open a Release PR on main bumping PATCH → v0.2.14 (pre-1.0 config).

31 commits since v0.2.13 — 9 feat, 12 fix, 6 docs, 4 chore.

Highlights

Features

  • feat(ui) — Visualize Demand Planner page + interactive Forecast/Backtest pages (PRP-22)
  • feat(ui) — Explorer interactivity: detail views for stores/products/runs/jobs, run comparison, artifact verify, sortable tables (PRP-20, PRP-21)
  • feat(ui)Knowledge page + Agent Guide page (PRP-19)
  • feat(api,ui)AI model admin console with runtime model swap + Ollama support
  • feat(api,ui) — in-product demo Showcase page + e2e demo pipeline (PRP-15, PRP-17)
  • feat(data) — markdown-generator age_days trigger

Fixes

  • fix(agents) x7 — agent reliability: FallbackModel wiring, tool-name correctness, tool-error recovery, retry-attempts, message-history round-trip, tool-retry crash handling
  • fix(ui) x4 — chart series rendering, TimeSeriesChart stroke, forecast/backtest result parsing
  • fix(registry) — tolerate multiple _find_duplicate matches
  • fix(data) — anchor seeded data window to the current date
  • fix(jobs) — backtest job result retains fold metrics/stability/baselines

Docs

  • AGENTS.md + llms.txt agent-memory layer; INITIAL docs relocated to PRPs/INITIAL/; DEV_GUIDE onboarding filled; CLAUDE.md / README refreshes

Merge instructions

Per docs/_base/RUNBOOKS.md -> "release-please skipped the bump after a dev -> main merge": this PR is titled feat: so the merge-commit subject bumps the version regardless of merge method. Merge via the GitHub web UI, or gh pr merge --merge (the feat: title makes it safe). After merge, release-please opens the chore(main): release 0.2.14 PR on main — merge that to tag v0.2.14.

Summary by CodeRabbit

  • New Features

    • Explorer pages now feature click-through detail routes for stores, products, runs, and jobs; run comparison view with profile and metrics diff
    • Demand Planner aggregates completed forecasts into multi-SKU demand/inventory tables with lead-time selection
    • Knowledge page enables browsing indexed content with semantic search
    • Agent Guide displays agent capabilities, tools, and configured session limits
    • Interactive forecast and backtest execution available in-page
    • CSV export and server-side sorting across all explorer tables
    • Artifact integrity verification for model runs
    • New analytics endpoints for time-series sales data and inventory status
  • Documentation

    • Product roadmap and feature specifications added
    • Optional feature concepts documented

Review Change Stack

w7-mgfcode and others added 30 commits May 14, 2026 06:11
#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.
* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.
… (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.
* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.
…ts (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)
…t vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.
…lines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.
…lt (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.
)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.
#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.
…155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.
* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(main): release 0.2.11 (#159)
…163)

* feat(api,ui): add AI model admin console with Ollama support (#162)

* fix(db): register AppConfig model in alembic env for schema-drift check (#162)
A casual message to the Experiment agent could crash the WebSocket
stream with a raw 'Tool ... exceeded max retries count of 1' error
when the model produced an invalid tool call.

- Catch PydanticAI's UnexpectedModelBehavior in stream_chat and chat;
  surface a clean, recoverable error event / message instead of
  leaking the internal exception string.
- Make tool_compare_backtest_results tolerant of missing/empty args
  (return a self-correcting hint) so a malformed call no longer burns
  the retry budget and crashes the run.
- Add a conversational-fallback line to the experiment system prompt
  so greetings are answered without invoking workflow tools.
- Add regression tests for both the chat and stream-chat paths.
…e adapter (#166) (#167)

* fix(agents): round-trip agent message history through pydantic-ai type adapter (#166)

Multi-turn agent chat crashed with a dict-has-no-attribute-conversation_id
error: _deserialize_messages returned the raw stored dicts unchanged, but
PydanticAI 1.96 requires real ModelMessage objects (it accesses
msg.conversation_id on every history item).

- _serialize_messages now uses ModelMessagesTypeAdapter.dump_python
  (mode=json), so stored history can be round-tripped.
- _deserialize_messages uses ModelMessagesTypeAdapter.validate_python,
  degrading to an empty history (with a warning) when stored data
  predates this format instead of crashing the run.
- Replace the serialization tests with a real round-trip test and a
  legacy-format fallback test.

* fix(agents): broaden deserialize-failure handling and log session id (#166)

Address code-review feedback on the message-history round-trip fix:

- _deserialize_messages now catches any Exception, not only
  ValidationError, so a malformed stored record (wrong shape, type
  errors) can never crash an otherwise-valid agent run.
- The warning logs exc_info (full type, message, traceback) instead of
  just str(e), and includes session_id so a failure can be correlated
  with the specific stored record.
- Add a regression test that a non-ValidationError adapter failure also
  degrades to an empty history.
… (#171)

Settings.agent_retry_attempts (default 3, set in .env) was never passed
to the PydanticAI Agent constructor, so both agents silently used the
framework default of 1. Agent runs failed with "Exceeded maximum output
retries (1)" — a weaker model got only one attempt to emit a valid
structured ExperimentReport / RAGAnswer.

- Add get_agent_retries() helper in agents/base.py.
- Pass output_retries and tool_retries to the experiment and
  rag_assistant Agent constructors (PydanticAI 1.96 deprecated the
  combined retries kwarg in favour of the two explicit ones).
- Add tests asserting both agents are built with the configured budget.
…ptedOutput (#172, #173) (#174)

* fix(agents): run agent tool calls sequentially over the shared session (#172)

When the model emitted multiple DB-touching tool calls in one turn,
PydanticAI executed them concurrently. Every agent tool shares the
single AgentDeps.db AsyncSession, and SQLAlchemy forbids concurrent
operations on one session, so the run failed intermittently with
"InvalidRequestError: concurrent operations are not permitted".

- Wrap agent.run() (chat) and agent.run_stream() (stream_chat) in
  Agent.parallel_tool_call_execution_mode("sequential").
- Add a regression test asserting chat() runs under sequential mode.

* fix(agents): use PromptedOutput so weaker models can produce structured output (#173)

Both agents declared output_type as a plain model, which PydanticAI
serves via its default ToolOutput mode (the model must call a hidden
final_result tool). Weaker/local models answer in plain prose instead,
PydanticAI rejects it as json_invalid, and the run fails with
"Exceeded maximum output retries".

- Wrap the experiment and rag_assistant output_type in PromptedOutput,
  which places the JSON schema in the prompt and parses the model's
  text reply. Works for local and cloud models alike.
- Add tests asserting both agents build with a PromptedOutputSchema.

* refactor(agents): centralize sequential-tool-execution policy and harden agent tests (#173)

Addresses code-review feedback on PR #174:

- Extract the duplicated Agent.parallel_tool_call_execution_mode("sequential")
  wrapping from chat() and stream_chat() into a _sequential_tool_execution()
  helper, so the issue-#172 execution-mode policy lives in one place.
- Replace test reliance on private internals. test_base.py no longer asserts
  agent._output_schema's class-name string; it now verifies PromptedOutput
  behaviorally via the public FunctionModel test double (no final_result
  output tool registered, plain-text JSON reply parsed into the schema).
- test_service.py drops the private _parallel_execution_mode_ctx_var import
  and asserts the public Agent.parallel_tool_call_execution_mode API instead.
- Add test_stream_chat_runs_tools_sequentially mirroring the chat() test so
  the streaming path is covered against issue #172 regressions.
…175) (#177)

Two coupled robustness fixes for the agent layer, both surfaced by a
capture_run_messages diagnostic. The changes share base.py and
experiment.py, so they land in one commit.

#175 — the experiment prompt named tools as run_backtest / list_runs /
compare_backtest_results, but the registered tools are tool_-prefixed
(tool_run_backtest, ...). Weaker models trusted the prompt and called
unknown tool names. TOOL_USAGE_INSTRUCTIONS and the EXPERIMENT_SYSTEM_PROMPT
workflow now use the exact registered names.

#176 — a tool raising a plain exception aborted the whole run (observed:
ValueError "No data found for store=..."). New recoverable() decorator
wraps every async DB-touching tool so an expected ValueError becomes a
ModelRetry the model can correct from; other exceptions still propagate.

- Add recoverable() to agents/base.py; decorate the 6 experiment tools
  and the 2 rag_assistant tools (tool_plain pure tools left alone).
- Tests: prompt names use tool_* ; recoverable converts ValueError to
  ModelRetry, passes other exceptions through, is transparent on success.
Absorb the v0.2.11 release commits and the Dependabot CI bumps (#168
setup-uv 8.1.0, #169 codeql-action 4.35.5) that landed on main.

Conflicts resolved (all add/add — dev added the #162/#163 config slice
that main lacked, plus the setup-uv pin):
- app/main.py — keep dev's config_router wiring + apply_overrides_on_startup
- docs/_base/API_CONTRACTS.md — keep dev's /config/* endpoint rows
- frontend/src/types/api.ts — keep dev's AI-model config types
- .github/workflows/e2e-nightly.yml — take main's setup-uv v8.1.0 pin

Clears the conflict on PR #178 (dev → main).
Absorb the release-please v0.2.12 version bump (.release-please-manifest.json,
pyproject.toml, CHANGELOG.md) so dev tracks main. Clean merge, no conflicts.
…#183) (#184)

* fix(agents): wire FallbackModel so a primary 503 retries the fallback (#183)

* test(agents): assert FallbackModel wiring order and primary fail-fast (#183)
* feat(api): expose agent session limits on GET /config/ai (#185)

* feat(ui): add knowledge and agent guide pages with nav (#185)

* test(ui): cover knowledge-utils pure helpers (#185)

* docs(docs): document the knowledge and agent guide pages (#185)
… charts, cross-filtering (PRP-20) (#188)

* feat(analytics): add GET /analytics/timeseries aggregated sales endpoint (#187)

* feat(dimensions): add sort_by/sort_order to store and product listings (#187)

* feat(ui): add explorer detail pages, sortable tables, and sales charts (#187)

* test(ui): cover the csv-export pure helper (#187)

* docs(docs): document the explorer interactivity extension (#187)
…mparison, verify, sorting (PRP-21) (#190)

* feat(registry): add sort_by/sort_order to model-run listing (#189)

* feat(jobs): add sort_by/sort_order to job listing (#189)

* test(registry,jobs): cover list-endpoint sorting (#189)

* feat(ui): add run/job detail and run-comparison pages (#189)

* feat(ui): make Runs and Jobs tables interactive (#189)

* docs(docs): document explorer runs/jobs interactivity (#189)
* feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthr…
…st pages (PRP-22) (#196)

* feat(analytics): add inventory-status endpoint (#195)

* test(analytics): cover inventory-status endpoint (#195)

* feat(ui): add demand-planner data layer — types, hook, demand-utils (#195)

* feat(ui): add Visualize Demand Planner page (#195)

* feat(ui): add prediction-interval band to TimeSeriesChart (#195)

* feat(ui): make Forecast and Backtest pages interactive (#195)

* docs(docs): document Visualize demand planner + interactivity (#195)
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

Implements new analytics endpoints and sorting, dynamic seeding, agent model fallback, and extensive frontend: interactive Explorer details, Demand Planner, URL-driven tables, CSV export, Knowledge and Guide pages. Updates docs/PRPs and bumps version to 0.2.13 with comprehensive tests.

Changes

Release bump and documentation suite

Layer / File(s) Summary
Release metadata bump
.release-please-manifest.json, pyproject.toml, CHANGELOG.md
Version updated to 0.2.13 with changelog entry.
AGENTS and CLAUDE operational guides
AGENTS.md, CLAUDE.md
Adds universal agent brief; simplifies CLAUDE to index referencing AGENTS and rules.
PRP specifications for features
PRPs/PRP-19*, PRPs/PRP-20*, PRPs/PRP-21*, PRPs/PRP-22*
Adds detailed planning docs for Knowledge/Guide, Explorer interactivity, Runs/Jobs interactivity, and Demand Planner.
README, ADRs, base docs, optional features, and LLM index
README.md, docs/**/*, llms.txt
Updates features/API contracts, fixes links, adds optional-features docs, and LLMs index.

Agent model fallback wiring

Layer / File(s) Summary
Fallback builder and imports
app/features/agents/agents/base.py
Adds build_agent_model_with_fallback.
Apply fallback to agents
app/features/agents/agents/*
Uses fallback in experiment and RAG agents.
Unit tests for fallback behavior
app/features/agents/tests/test_base.py
Covers wrapping, error, and degeneration paths.

Analytics: timeseries and inventory-status

Layer / File(s) Summary
Routes and request surface
app/features/analytics/routes.py
Adds GET /analytics/timeseries and /inventory-status.
Pydantic response models
app/features/analytics/schemas.py
Introduces timeseries and inventory models.
Service implementations
app/features/analytics/service.py
Implements compute methods for new endpoints.
Integration fixtures and route tests
app/features/analytics/tests/*
Async fixtures and endpoint coverage.
Schema unit tests
app/features/analytics/tests/test_schemas.py
Validates model construction and constraints.

Dimensions sorting (stores/products)

Layer / File(s) Summary
Routes query params and service ordering
app/features/dimensions/*
Adds allow-listed sorting in routes/services.
Fixtures and sort behavior tests
app/features/dimensions/tests/*
Integration tests verifying ordering and fallbacks.

Jobs and Run Registry sorting + tests

Layer / File(s) Summary
Jobs sorting surface and service
app/features/jobs/*
Adds allow-listed sorting and applies in service.
Jobs API integration tests and fixtures
app/features/jobs/tests/*
Covers listing, sorting, invalid inputs, and get-by-id.
Registry sorting surface, service, and tests
app/features/registry/*
Adds sorting to runs and verifies behavior.

Dynamic seeding windows and demo pipeline

Layer / File(s) Summary
Seeder config defaults and scenarios
app/shared/seeder/config.py
Adds today-anchored defaults and updates scenarios.
Seeder feature schemas/service and tests
app/features/seeder/*
Switches defaults; adjusts scenarios and tests.
Demo pipeline/script updates and showcase tests
app/features/demo/pipeline.py, scripts/*, tests/*
Computes rolling windows and updates tests accordingly.

Frontend: routes, components, hooks, utils, and pages

Layer / File(s) Summary
Routing and navigation constants
frontend/src/App.tsx, frontend/src/lib/constants.ts
Adds detail/compare/knowledge/guide/demand routes and nav.
Charts, data-table, and common components
frontend/src/components/**/*
Adds revenue chart, interval band, view options, row click, and JsonBlock.
Hooks for analytics, lifecycle, inventory, rag, and lists
frontend/src/hooks/**/*
Adds/extends hooks (timeseries, lifecycle, inventory, sorting, retrieve, verify).
CSV/knowledge/demand utils and API types
frontend/src/lib/*, frontend/src/types/api.ts
Adds CSV export, demand/knowledge utils with tests; expands API types.
Explorer lists, details, and comparison
frontend/src/pages/explorer/*
URL-driven tables with CSV; adds store/product details, run detail/compare, job detail.
Visualize: Forecast, Backtest, and Demand Planner
frontend/src/pages/visualize/*
Enables in-page jobs, CSV, interval bands; adds Demand Planner page.
Knowledge and Guide pages; Chat CTA
frontend/src/pages/*
Adds Knowledge and Guide pages; chat screen links to Guide.

Sequence Diagram(s)

sequenceDiagram
  participant Browser
  participant FastAPI as AnalyticsRoutes
  participant AnalyticsService
  participant Postgres
  Browser->>FastAPI: GET /analytics/inventory-status?store_id&product_id
  FastAPI->>AnalyticsService: compute_inventory_status(filters)
  AnalyticsService->>Postgres: SELECT DISTINCT ON (store_id, product_id) latest snapshots
  Postgres-->>AnalyticsService: rows
  AnalyticsService-->>FastAPI: InventoryStatusResponse
  FastAPI-->>Browser: 200 JSON
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested labels

autorelease: pending

Suggested reviewers

  • w7-learn
  • w7-l7ab

Poem

A rabbit taps the keys with cheer,
New charts and routes now all appear.
Timeseries sings, inventories chime,
Agents fail—fallbacks keep time.
Seeds shift with sun, not yester-year—
CSVs for all to hear.
Guide and Knowledge—hop right here! 🐇✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch dev

…#202) (#203)

* feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139)

* docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141)

* docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143)

* docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145)

* fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153)

index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and
Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win
the cascade, so at runtime --chart-N is a full colour. The chart components
still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid
CSS, so recharts fell back to a black fill/stroke — invisible on the dark
theme.

Reference var(--chart-N) directly in backtest-folds-chart.tsx and
time-series-chart.tsx. Verified in a browser: the backtest per-fold bars
and the forecast line now render in colour.

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152)

* fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148)

_execute_backtest ran BacktestingService.run_backtest — which computes
per-fold metrics, stability indices and a naive/seasonal baseline comparison
— but stored only four aggregated values and discarded the rest. The
dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean,
stability_index}, fold_metrics[] and baseline_comparison, so it showed
"0 folds", all-zero metrics and an empty chart.

Add _shape_backtest_result(), which flattens a BacktestResponse into the
contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0
so the result stays JSONB-safe (stability is NaN with fewer than two folds).

Add app/features/jobs/tests/test_service.py with unit coverage for the
shaping logic: fold metrics, *_mean keys, stability, baseline comparison,
the no-baselines path, and NaN coercion.

* refactor(jobs): centralize backtest metric keys and surface drift (#148)

Addresses review feedback on PR #152.

- Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline
  stability metric into _STABILITY_METRIC, so the hardcoded keys live in one
  documented place instead of being repeated across the shaping logic.
- Log jobs.backtest_metrics_missing when an expected metric is absent from the
  backtest response, so a future rename in the backtesting service fails loud
  instead of silently emitting 0.0.
- Document the WAPE stability convention in the _shape_backtest_result docstring.
- Tests: assert backtest_id / model_type / duration_ms pass through unchanged,
  and add a regression test for the missing-metric default path.

* fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151)

The /visualize/forecast page never rendered the chart for a valid completed
predict job. It read job.result.predictions with field `predicted`, but
POST /jobs (job_type="predict") returns job.result.forecasts with field
`forecast`. forecastData was therefore always undefined and the page fell
through to "No prediction data available in job result".

Read result.forecasts with field `forecast`, and pass predictedKey="forecast"
to TimeSeriesChart (which already supports a configurable data key).

Verified in a browser: entering a completed predict job ID now renders the
14-day forecast line chart with correct tooltip values.

* fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150)

Under the default registry_duplicate_policy="detect", duplicate runs are
created intentionally, so multiple non-archived model_run rows can share one
config hash. _find_duplicate used scalar_one_or_none(), which raised
MultipleResultsFound once two duplicates existed — POST /registry/runs then
returned HTTP 500. This made the demo/Showcase register step fail
deterministically on any DB with repeated runs.

Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it
returns the most recent matching run instead of asserting a single match.

Add an integration regression test that POSTs an identical run three times
under the detect policy and asserts all three return 201.

* fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157)

TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] /
[predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS
variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)"
and stroke="var(--color-predicted)". The forecast page passes
predictedKey="forecast", so the injected variable is --color-forecast;
var(--color-predicted) was undefined, the stroke was invalid, and SVG fell
back to its initial value `none` — the forecast line was invisible.

Build the stroke from the key: stroke={`var(--color-${actualKey})`} /
stroke={`var(--color-${predictedKey})`}.

Verified in a browser: the forecast line now renders in colour.

* feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155)

The visualization pages only accepted a job ID typed into a text box, so
users had to already know the ID. Add a JobPicker component: a dropdown of
completed jobs of the relevant type (predict / backtest), newest first, with
each option labelled by short id, model and timestamp.

- New shared component src/components/common/job-picker.tsx, used by both
  forecast.tsx and backtest.tsx.
- The manual job-ID input stays alongside the dropdown for pasting an ID.
- The most recent completed job auto-loads on mount so a chart shows
  immediately without interaction.

No backend change — GET /jobs?job_type=&status=completed already exists.
Verified in a browser on both pages.

* chore(repo): back-merge main into dev after v0.2.11 (#160) (#161)

* chore(main): release 0.2.9 (#126)

* feat: release v0.2.10 — demo showcase page + e2e pipeline (#134)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, an .env with anthropic/openai keys but
   a Gemini default model failed hard at chat-time.
4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate
   for demo_minimal can spend 60-90 s on slower laptops once you
   include inventory + prices + promotions inserts.
5. step_seed detail string: GenerateResult.records_created uses 'sales'
   (singular), not 'sales_daily'; cosmetic fix.

tests/test_e2e_demo.py:
- Redirect uvicorn stdout to a temp file rather than subprocess.PIPE.
  The seeder + structlog produce enough INFO log volume to fill a
  64-KB pipe buffer; once full, uvicorn blocks on write and seeder
  requests hang for the full --timeout. Verified locally: integration
  suite now passes in ~6.5 s instead of timing out at 120 s.
- Cleanup leaves the log file on disk only when the test failed
  (postmortem-friendly).

tests/test_run_demo_unit.py:
- Bump test_defaults timeout expectation to match the new 120 s
  default.

End-to-end manual run on this machine: 11 steps, wall_clock=2 s,
exit 0. Integration test: 2 passed in 6.48 s.

* chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131)

Headline:
- authlib  1.6.6  -> 1.7.2  (clears GHSA-wvwj-cvrp-7pv5 — JWS signature
  verification bypass; patched at >= 1.6.9)
- fastmcp  2.14.4 -> 3.2.4  (clears GHSA-vv7q-7jx5-f767 — OpenAPI
  Provider SSRF + path traversal; patched at >= 3.2.0)

Both CVEs were flagged on PR #129 by Socket Security and are
pre-existing on dev (not introduced by #128).

Wider scope — read before merging:
`uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers
a full re-resolve of the dependency graph. Because dev's uv.lock had
drifted from pyproject.toml (the project's constraint envelope had
loosened over time), this single command also brings the lockfile in
sync with current pyproject.toml. Net diff: 243 insertions / 369
deletions on uv.lock; no other files touched.

Transitive cascades worth flagging:
- anthropic    0.77.0 -> 0.102.0 (pydantic-ai-slim extra)
- pydantic-graph 1.51.0 -> 1.96.0
- temporalio   1.20.0 -> 1.27.2
- alembic      1.18.1 -> 1.18.4
- aws-* and cohere transitives bumped along
- griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched)
- Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus
  exporter, pydocket, redis, rsa, sortedcontainers — these were
  transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in.

Verification on this host:
- uv sync --extra dev  -> green
- ruff check .         -> clean
- mypy --strict app/   -> 192 files clean
- pyright app/         -> 0 errors (50 warnings, pre-existing)
- pytest -m 'not integration' -> 969 passed

Known install quirk: griffelib 2.0.2 ships a top-level `griffe/`
package whose RECORD files don't always materialize on first install
when uv replaces an older `griffe` dist in the same sync. A clean
venv install (which CI does via `uv sync --frozen`) is unaffected;
local devs who upgrade in place may need a one-shot
`uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api,ui): in-product demo showcase page (#132) (#133)

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132)

New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It
drives the published API surface in-process via httpx.ASGITransport (no
cross-slice imports, satisfying the vertical-slice rule) and streams one
StepEvent per pipeline step: precheck -> reset -> seed -> status -> features
-> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup.

A module-level asyncio.Lock enforces single-flight; concurrent runs get an
RFC 7807 409. The orchestration is a faithful in-process port of
scripts/run_demo.py (PR #129). Implements PRP-17.

* test(api): cover the demo slice pipeline, routes, and e2e integration (#132)

Unit tests mock the in-process HTTP client to exercise step sequencing,
winner selection, and fail-fast; route tests cover POST /demo/run (200 +
409) and the WS /demo/stream handler. The integration test seeds
demo_minimal and asserts an end-to-end green run against real Postgres.

Implements PRP-17.

* feat(ui): add showcase page streaming the live demo pipeline (#132)

New /showcase route and nav entry. The page opens a one-shot WebSocket to
/demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders
the 11 pipeline steps as live status cards: glyph, detail, duration, the
backtest per-model WAPE breakdown with the winner highlighted, and a
pass/fail summary banner.

Also block-scopes a pre-existing no-case-declarations lint error in
chat.tsx so pnpm lint is green for this PR. Implements PRP-17.

* test(ui): add vitest setup and use-demo-pipeline hook coverage (#132)

Adds the frontend test stack (vitest + jsdom + @testing-library/react), a
test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the
pure event reducer (idle -> running -> pass transitions, summary assembly,
error phase) and a renderHook smoke test.

The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented
fix for pnpm 11's esbuild build-script gate. Implements PRP-17.

* docs(docs): document the demo slice and showcase page (#132)

Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the
/demo/run + /demo/stream rows and a WebSocket Events section in
API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and
REPO_MAP_INDEX rows for the demo slice and showcase page.

Implements PRP-17.

* chore(main): release 0.2.10 (#135)

* feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158)

* feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127)

Resolves #94 via the heuristic path documented in the issue. No schema
column, no Alembic migration, no FIFO cohort tracking — the trigger
self-reads the existing per-(store, product) on_hand_qty series and
fires when inventory has been "unrefreshed" past
`cfg.age_days_threshold`.

Decision rationale (schema column vs. heuristic):
- The schema column path would add `oldest_unit_age_days` to
  `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO
  cohort tracking in `InventorySnapshotGenerator`. No downstream
  consumer reads this column today — adding it for one generator
  trigger violates the "don't design for hypothetical future
  requirements" rule in CLAUDE.md.
- The heuristic path is self-contained in MarkdownGenerator,
  deterministic (preserves the zero-rng-draw regression invariant),
  and additive (no migration, no model change). 354 LOC net, all
  inside one slice.

Heuristic spec:
- A "refresh" is a day where `on_hand_qty` rose by >=
  `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day.
- Age at day t = days since most recent refresh (or `dates[0]` if no
  refresh has been observed).
- Firing requires age >= `age_days_threshold` AND on_hand >=
  `markdown_min_units_remaining` — never markdown an empty shelf.
- After firing, refresh anchor resets to the day AFTER the markdown
  window ends, so back-to-back fires can't happen and the next age
  clock starts from a "clear shelf" baseline.

Wiring: `MarkdownGenerator.generate()` gains an optional kwarg
`inventory_records: list[dict[str, Any]] | None = None` which `core.py`
passes through from `InventorySnapshotGenerator`. Disabled-path and
non-age_days-path behavior is byte-identical (kwarg ignored).

Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete
`NotImplementedError` test. Coverage: no-records defensive, threshold
not-met, threshold met, spike resets age, post-fire reset avoids
back-to-back, low-inventory skip, unknown-product skip, rng
non-consumption.

Validation (local):
- ruff check + format: clean
- mypy --strict: 0 issues, 192 files
- pyright --strict: 0 errors
- pytest -m "not integration": 969 passed (+7 vs pre-PR)

Closes #94.

* feat(api,docs): e2e demo pipeline + showcase script (#128) (#129)

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128)

Adds the planning documents for the end-to-end demo pipeline work tracked
in #128. Implementation commits follow on this branch.

- INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics,
  open questions resolved in the PRP).
- PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6
  commits, additive only — no schema changes, no API edits).

* feat(data): add demo_minimal scenario preset (#128)

Tiny preset that powers the upcoming `make demo` target. Three stores ×
ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy
an expanding backtest with n_splits=3, horizon=14, min_train_size=30
(needs >= 72 days, 92 leaves margin), small enough to keep the demo
loop comfortable on a laptop.

Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10,
modest promotion + stockout probabilities) so backtest WAPE stays
non-NaN across all three baseline models.

- app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch
- app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios
- tests cover the new preset and the updated scenario count

* feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128)

Single-file async driver that walks the published HTTP surface
(precheck -> reset? -> seed -> status -> features -> train x3 ->
backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the
shape of scripts/seed_random.py and scripts/check_db.py.

- HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s
  timeout (default 5 s is too short for /seeder/generate); surfaces
  RFC 7807 problem+json bodies as a typed StepError that echoes
  title / detail / request_id (never the raw body — secrets-safe).
- DemoContext + StepOutcome dataclasses thread cross-step references.
- Reporter renders the output-formatting.md glyphs (verbose by
  default, --quiet collapses to one line per step).
- Per-step error handling converts httpx + StepError into fail
  outcomes; precheck failure exits 2, any other failure exits 1,
  green exits 0.
- Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor
  ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings()
  to honor the no-os.environ-in-feature-code rule.
- Registry handshake uses the mandatory pending -> running -> success
  transition and the wire alias "model_config" (not "model_config_data");
  artifact_hash is computed client-side via sha256 since we share the
  FS with the API on this single-host system.
- Winner selection: lowest aggregated WAPE, skipping NaN folds.

Also adds scripts/__init__.py so tests can `import scripts.run_demo`
without invoking the file as a script.

* feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128)

Wraps scripts/run_demo.py so reviewers can run the full end-to-end
demo with one command. Recipes mirror the three modes the script
supports: full run, skip-seed iteration, destructive reset.

Make targets:
- demo        — docker compose up -d + alembic + run_demo
- demo-quick  — run_demo --skip-seed (no compose/migration touch)
- demo-clean  — full reset (--reset) before seeding
- help        — default goal; lists targets + preconditions

Tab-indented recipes and .PHONY declarations per make conventions.
Preconditions (Postgres on :5433, uvicorn on :8123) documented in
the help block; the script itself enforces them via the precheck
step and exits 2 on failure.

* test(api): unit + integration coverage for run_demo (#128)

Unit (`tests/test_run_demo_unit.py`, 32 cases):
- argparse defaults + all-flags variants
- DemoContext defaults (no leaking state across runs)
- _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None
- _model_config_payload: discriminated-union shape per baseline; rejects
  unsupported model_type (defends the "no lightgbm in PRP-15" boundary)
- Reporter: glyph mapping; verbose + quiet output; summary green / failure
  / over-budget soft-warn branches
- StepError formats RFC 7807 (title/detail/request_id) without leaking
  the raw response body
- HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError
- Step payload sanity: seed sends demo_minimal+correct dims+ISO dates;
  features sends cutoff_date as ISO; train fires three model_types in
  parallel; agent step skips with ⏭️ when no LLM key

Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration):
- Skips if Postgres on :5433 isn't reachable
- Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default)
- Runs scripts/run_demo.py --reset against it; asserts exit 0 +
  canonical "runs=3 winner=... alias=demo-production" summary
- Second case asserts a bogus URL exits 2 (no silent success)
- Cleans up uvicorn on teardown with terminate/kill fallback
- Resolves `uv` via shutil.which to keep ruff S607 happy and avoid
  PATH-dependent exec at test time

* ci(repo): nightly e2e demo workflow (#128)

Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py
against a fresh Postgres+pgvector service every night at 07:00 UTC
(plus on-demand via workflow_dispatch). Catches regressions in the
documented end-to-end pipeline before they bleed into the per-PR
gate.

Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally
NOT a required status check on dev or main. Flake-budget lives in
the nightly slot, not in ci.yml.

- pgvector/pgvector:pg16 service container (same as ci.yml `test` job)
- uvicorn started in background; /health polled with a 30 s deadline
- run_demo.py called with --seed 42 (deterministic)
- LLM-key env vars intentionally absent — agent step auto-skips via
  ⏭️, keeping the workflow self-contained
- uvicorn logs uploaded as artifact on failure (7-day retention) so
  postmortems can read what the API was doing when the script broke
- astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md
- permissions: contents: read (least-privilege)

* docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128)

Discoverability layer for PRP-15.

- README.md: new 'Try it: end-to-end demo' step right after the curl
  /health verification; shows the canonical final-line summary so
  reviewers know what green looks like.
- docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section
  documenting all three Make targets.
- docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common
  Incidents entry with a 7-point diagnosis flow keyed to the script's
  step names + a postmortem-capture recipe.
- docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows
  added to the Document Index table.

Pure additive; no existing content removed or renamed.

* fix(data): update /seeder/scenarios route test for demo_minimal preset (#128)

Companion to feat(data): add demo_minimal scenario preset — the
route-level assertion in TestListScenarios.test_returns_scenarios still
expected 6 scenarios; bumping to 7 and adding the demo_minimal name
membership check to match the service-layer + config-layer tests
already updated in 005c189.

* fix(api): harden run_demo for integration test + real DB (#128)

Three real failures surfaced when first running the integration test
against docker-compose Postgres + a freshly booted uvicorn; all three
are now closed:

scripts/run_demo.py:
1. step_status: discover the real (store_id, product_id) from
   /dimensions/stores + /dimensions/products instead of hardcoding 1.
   Postgres auto-increment does NOT reset after delete, so the freshly
   seeded IDs are NOT 1 (they were ~150-260 on this branch after a few
   delete/seed cycles).
2. step_register: copy the trained-model artifact into the registry's
   own root (settings.registry_artifact_root) and record a registry-
   relative URI. The registry verify endpoint resolves artifact_uri
   against its own root, which is separate from where /forecasting/train
   writes (settings.forecast_model_artifacts_dir). Pre-fix, verify
   returned 404 even though the artifact existed on disk.
3. step_agent: skip with the soft-skip glyph on any LLM provider failure
   (invalid key, model unavailable, 5xx), and make _llm_key_present
   provider-aware so it matches the right env var to the configured
   agent_default_model. Pre-fix, …
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
frontend/src/components/data-table/data-table.tsx (1)

101-107: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use visible columns for skeleton and empty state to prevent misalignment with column visibility.

When enableColumnVisibility is true, hiding columns causes skeleton rows and the empty state to span more cells than visible headers, creating visual misalignment. Use table.getVisibleLeafColumns() for both the skeleton loading rows (lines 101-107) and the empty-state colSpan (line 131) to match the actual visible columns, consistent with how row data uses row.getVisibleCells().

Proposed fix
+  const visibleColumns = table.getVisibleLeafColumns()
...
-              Array.from({ length: pagination.pageSize }).map((_, i) => (
+              Array.from({ length: pagination.pageSize }).map((_, i) => (
                 <TableRow key={i}>
-                  {columns.map((_, j) => (
+                  {visibleColumns.map((_, j) => (
                     <TableCell key={j}>
                       <Skeleton className="h-4 w-full" />
                     </TableCell>
                   ))}
                 </TableRow>
               ))
...
-                  colSpan={columns.length}
+                  colSpan={visibleColumns.length}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/data-table/data-table.tsx` around lines 101 - 107,
The skeleton rows and empty-state colSpan currently iterate over the full
columns array causing misalignment when column visibility is enabled; update the
skeleton map and the empty-state colspan to use table.getVisibleLeafColumns()
(instead of columns) and size it to table.getVisibleLeafColumns().length so the
Skeleton TableCell rendering (inside TableRow) and the empty-state colSpan match
the actual visible columns just like row.getVisibleCells() does.
frontend/src/pages/explorer/jobs.tsx (1)

66-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add error handling to the handleCancelJob mutation call.

The mutateAsync call can reject when the API request fails, and this rejection is unhandled. This will result in an unhandled promise rejection warning in the browser console. The useCancelJob hook has no error handler, and the onClick event doesn't suppress or catch the rejection.

Suggested fix
 const handleCancelJob = async (jobId: string) => {
-  await cancelJob.mutateAsync(jobId)
+  try {
+    await cancelJob.mutateAsync(jobId)
+  } catch (error) {
+    console.error('Failed to cancel job:', error)
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/jobs.tsx` around lines 66 - 68, The
handleCancelJob function currently awaits cancelJob.mutateAsync(jobId) without
handling rejections; wrap the mutation call in a try/catch (inside
handleCancelJob) to catch errors from cancelJob.mutateAsync and surface them
(e.g., show a user notification or call cancelJob.onError) and optionally
rethrow or suppress as appropriate; reference the handleCancelJob function and
the cancelJob (useCancelJob) hook to locate where to add the try/catch and error
handling logic.
🟠 Major comments (21)
.release-please-manifest.json-2-2 (1)

2-2: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release target version is behind the stated PR objective.

Line 2 sets the manifest to 0.2.13, but this PR is scoped as the 0.2.14 release cut. Please bump this to 0.2.14 to keep release-please state aligned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.release-please-manifest.json at line 2, The release manifest currently pins
the release target to ".": "0.2.13" but the PR is intended to cut 0.2.14; update
the manifest entry by changing the version string from "0.2.13" to "0.2.14" so
the .release-please-manifest.json matches the PR objective.
CHANGELOG.md-3-9 (1)

3-9: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Changelog version header does not match this release PR target.

Lines 3-9 document 0.2.13, while the release objective is 0.2.14. Please update this section/version link to 0.2.14 so release notes map to the correct tag.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 3 - 9, Update the release header and comparison
link from 0.2.13 to 0.2.14: change the "##
[0.2.13](...compare/v0.2.12...v0.2.13) (2026-05-18)" line to reference "0.2.14"
and the correct compare range (e.g., ...v0.2.13...v0.2.14) so the section title
and URL match the intended release; locate the header string "##
[0.2.13](https://github.com/w7-mgfcode/ForecastLabAI/compare/v0.2.12...v0.2.13)"
and update the version numbers and compare target accordingly.
app/features/analytics/routes.py-333-335 (1)

333-335: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use RFC 7807 error responses on the new time-series route path.

get_timeseries currently inherits raw HTTPException string errors from validate_date_range, so invalid date ranges on this new endpoint won’t return the standard application/problem+json shape.

As per coding guidelines app/**/*.py: “Use RFC 7807 application/problem+json for error responses via app/core/problem_details.py. Never use bare HTTPException with raw strings or ad-hoc error shapes.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/analytics/routes.py` around lines 333 - 335, get_timeseries is
calling validate_date_range which raises raw HTTPException strings; catch
validation errors and convert them into RFC 7807 problem responses using the
helper in app/core/problem_details.py so the endpoint returns
application/problem+json. Specifically, wrap the call to validate_date_range
inside a try/except in the get_timeseries handler (or update validate_date_range
to raise the problem helper), and when catching the validation error
create/raise the RFC 7807 problem (using the factory in
app/core/problem_details.py) with appropriate status and detail instead of
propagating the raw HTTPException string.
app/features/analytics/tests/conftest.py-128-136 (1)

128-136: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Scope teardown deletes to test-owned rows only.

Line 128 and Line 129 currently delete all InventorySnapshotDaily and SalesDaily rows, and Line 133–135 removes a shared calendar range. This can wipe non-test data and make integration tests order-dependent.

💡 Suggested direction
-from sqlalchemy import delete
+from sqlalchemy import delete, select
...
-            await session.execute(delete(InventorySnapshotDaily))
-            await session.execute(delete(SalesDaily))
+            test_store_ids = select(Store.id).where(Store.code.like("TEST-%"))
+            test_product_ids = select(Product.id).where(Product.sku.like("TEST-%"))
+
+            await session.execute(
+                delete(InventorySnapshotDaily).where(
+                    InventorySnapshotDaily.store_id.in_(test_store_ids),
+                    InventorySnapshotDaily.product_id.in_(test_product_ids),
+                )
+            )
+            await session.execute(
+                delete(SalesDaily).where(
+                    SalesDaily.store_id.in_(test_store_ids),
+                    SalesDaily.product_id.in_(test_product_ids),
+                )
+            )
             await session.execute(delete(Product).where(Product.sku.like("TEST-%")))
             await session.execute(delete(Store).where(Store.code.like("TEST-%")))
-            await session.execute(
-                delete(Calendar).where(
-                    (Calendar.date >= date(2024, 1, 1)) & (Calendar.date <= date(2024, 4, 29))
-                )
-            )
+            # Avoid deleting shared Calendar rows in integration DB.

As per coding guidelines app/**/tests/*.py: “Integration tests must be marked with @pytest.mark.integration and run against real docker-compose Postgres. Never mock the database in integration tests.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/analytics/tests/conftest.py` around lines 128 - 136, The
teardown currently wipes entire tables via
session.execute(delete(InventorySnapshotDaily)) and delete(SalesDaily) and
removes a shared calendar range via delete(Calendar)...date between, which can
remove non-test data; change those deletes to target only test-owned rows: for
InventorySnapshotDaily and SalesDaily add a WHERE that ties them to test
products/stores (e.g., join or filter by Product.sku.like("TEST-%") or
Store.code.like("TEST-%") when deleting), and for Calendar change the
delete(Calendar).where(...) to only delete dates that are exclusively created by
tests (e.g., restrict to dates in the range AND NOT EXISTS(...) checks against
other production-linked tables or only delete calendar rows that have a test
marker), keeping the existing deletes for Product and Store
(Product.sku.like("TEST-%"), Store.code.like("TEST-%")) unchanged.
app/features/dimensions/service.py-104-113 (1)

104-113: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a stable secondary sort key before offset pagination.

Sorting by non-unique columns can reorder tied rows between requests, which makes offset pagination unstable (duplicates/misses across pages). Add a deterministic tie-breaker (Store.code / Product.sku) to every sorted query.

Proposed fix
-        sort_column = _STORE_SORT_COLUMNS.get(sort_by) if sort_by else None
-        if sort_column is not None:
-            order_by = sort_column.desc() if sort_order == "desc" else sort_column.asc()
-        else:
-            order_by = Store.code.asc()
+        sort_column = _STORE_SORT_COLUMNS.get(sort_by) if sort_by else None
+        if sort_column is not None:
+            primary_order = sort_column.desc() if sort_order == "desc" else sort_column.asc()
+            stmt = stmt.order_by(primary_order, Store.code.asc())
+        else:
+            stmt = stmt.order_by(Store.code.asc())
@@
-        stmt = stmt.order_by(order_by).offset(offset).limit(page_size)
+        stmt = stmt.offset(offset).limit(page_size)
@@
-        sort_column = _PRODUCT_SORT_COLUMNS.get(sort_by) if sort_by else None
-        if sort_column is not None:
-            order_by = sort_column.desc() if sort_order == "desc" else sort_column.asc()
-        else:
-            order_by = Product.sku.asc()
+        sort_column = _PRODUCT_SORT_COLUMNS.get(sort_by) if sort_by else None
+        if sort_column is not None:
+            primary_order = sort_column.desc() if sort_order == "desc" else sort_column.asc()
+            stmt = stmt.order_by(primary_order, Product.sku.asc())
+        else:
+            stmt = stmt.order_by(Product.sku.asc())
@@
-        stmt = stmt.order_by(order_by).offset(offset).limit(page_size)
+        stmt = stmt.offset(offset).limit(page_size)

Also applies to: 229-238

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/dimensions/service.py` around lines 104 - 113, The current
ordering uses only a requested sort column (via _STORE_SORT_COLUMNS and sort_by)
which can be non-unique and causes unstable offset pagination; update the logic
that builds order_by (the block that sets sort_column and order_by and the
stmt.order_by call) to always append a deterministic tie-breaker (Store.code for
store queries and Product.sku for product queries) as a secondary sort key so
that ORDER BY is stable before offset/limit are applied; ensure the tie-breaker
is added in both the store query block (using sort_column.desc()/asc() then
.nulls_last()/etc if needed) and the analogous product query block (lines
~229-238) so stmt.order_by receives a tuple/list of primary and secondary
ordering expressions.
app/features/jobs/service.py-256-267 (1)

256-267: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make job sorting deterministic for paginated reads.

Current single-column ordering can reorder tied rows between requests, causing duplicates/misses with offset pagination. Add stable secondary keys (e.g., created_at, job_id).

Proposed fix
-        sort_column = _JOB_SORT_COLUMNS.get(sort_by) if sort_by else None
-        if sort_column is not None:
-            order_by = sort_column.desc() if sort_order == "desc" else sort_column.asc()
-        else:
-            order_by = Job.created_at.desc()
+        sort_column = _JOB_SORT_COLUMNS.get(sort_by) if sort_by else None
+        if sort_column is not None:
+            primary_order = sort_column.desc() if sort_order == "desc" else sort_column.asc()
+            stmt = stmt.order_by(primary_order, Job.created_at.desc(), Job.job_id.asc())
+        else:
+            stmt = stmt.order_by(Job.created_at.desc(), Job.job_id.asc())
@@
-        stmt = stmt.order_by(order_by).offset(offset).limit(page_size)
+        stmt = stmt.offset(offset).limit(page_size)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/jobs/service.py` around lines 256 - 267, The current ordering
(sort_column or Job.created_at) can yield non-deterministic ties across pages;
update the ordering logic around _JOB_SORT_COLUMNS.get(sort_by), sort_column,
sort_order and stmt.order_by to append stable secondary keys (e.g.,
Job.created_at and Job.job_id) so tie-breaks are deterministic. Concretely,
build order_by as a list/tuple: if sort_column exists use primary =
sort_column.asc()/desc() per sort_order then append Job.created_at.(same
direction) and Job.job_id.(same direction); if no sort_column use
Job.created_at.desc() then Job.job_id.desc(); finally call
stmt.order_by(*order_by).offset(...).limit(...).
app/features/dimensions/tests/conftest.py-69-72 (1)

69-72: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope SalesDaily cleanup to test rows only.

Deleting the entire sales_daily table in fixture teardown can corrupt state for other integration tests sharing the same DB.

Proposed fix
-from sqlalchemy import delete
+from sqlalchemy import delete, select
@@
-            await session.execute(delete(SalesDaily))
+            test_store_ids = select(Store.id).where(Store.code.like("TEST-%"))
+            test_product_ids = select(Product.id).where(Product.sku.like("TEST-%"))
+            await session.execute(
+                delete(SalesDaily).where(
+                    SalesDaily.store_id.in_(test_store_ids),
+                    SalesDaily.product_id.in_(test_product_ids),
+                )
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/dimensions/tests/conftest.py` around lines 69 - 72, The teardown
currently deletes all rows from SalesDaily; change the delete(SalesDaily) call
to only remove test rows by filtering SalesDaily to entries linked to test
Products/Stores (e.g., use
delete(SalesDaily).where(SalesDaily.product_id.in_(select(Product.id).where(Product.sku.like("TEST-%")))
) or similarly filter by store_id against Store.code.like("TEST-%")) so only
rows related to Product/Store test fixtures are removed; update the
session.execute call that references delete(SalesDaily) accordingly and keep the
existing Product/Store delete filters.
app/features/registry/service.py-317-327 (1)

317-327: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add deterministic tie-breakers to paginated sorting.

Line 321 sorts by potentially non-unique fields, and Line 327 uses offset pagination. This can reorder ties between requests, causing duplicate/missing rows across pages.

💡 Suggested fix
-        sort_column = _RUN_SORT_COLUMNS.get(sort_by) if sort_by else None
-        if sort_column is not None:
-            order_by = sort_column.desc() if sort_order == "desc" else sort_column.asc()
-        else:
-            order_by = ModelRun.created_at.desc()
+        sort_column = _RUN_SORT_COLUMNS.get(sort_by) if sort_by else None
+        if sort_column is not None:
+            primary_order = sort_column.desc() if sort_order == "desc" else sort_column.asc()
+            tie_breaker = ModelRun.run_id.desc() if sort_order == "desc" else ModelRun.run_id.asc()
+            stmt = stmt.order_by(primary_order, tie_breaker)
+        else:
+            stmt = stmt.order_by(ModelRun.created_at.desc(), ModelRun.run_id.desc())
@@
-        stmt = stmt.order_by(order_by).offset(offset).limit(page_size)
+        stmt = stmt.offset(offset).limit(page_size)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/registry/service.py` around lines 317 - 327, The current
ordering can produce nondeterministic ties; update the ordering logic around
sort_column/_RUN_SORT_COLUMNS and stmt.order_by(...) to always append a
deterministic tie-breaker (e.g., ModelRun.id) after the primary sort so
pagination is stable. Specifically, when building order_by for both the
allow-listed column and the default (ModelRun.created_at), add a secondary
ordering on ModelRun.id with the same direction as the primary (use .desc() when
sort_order == "desc", else .asc()) so stmt.order_by(order_by,
ModelRun.id.<direction>) (or chain order_by calls) guarantees stable, repeatable
pagination across requests.
frontend/src/components/data-table/data-table.tsx-115-117 (1)

115-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make clickable rows keyboard-accessible.

Lines 115-117 add mouse click behavior only; keyboard users can’t activate row actions.

Proposed fix
                 <TableRow
                   key={row.id}
                   data-state={row.getIsSelected() && 'selected'}
                   onClick={onRowClick ? () => onRowClick(row.original) : undefined}
+                  onKeyDown={
+                    onRowClick
+                      ? (e) => {
+                          if (e.key === 'Enter' || e.key === ' ') {
+                            e.preventDefault()
+                            onRowClick(row.original)
+                          }
+                        }
+                      : undefined
+                  }
+                  tabIndex={onRowClick ? 0 : undefined}
+                  role={onRowClick ? 'button' : undefined}
                   className={cn(onRowClick && 'cursor-pointer')}
                 >
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/data-table/data-table.tsx` around lines 115 - 117,
The clickable row currently only handles mouse clicks (onClick={onRowClick ? ()
=> onRowClick(row.original) : undefined}), which is not keyboard-accessible;
update the row JSX (the element using onRowClick and className={cn(onRowClick &&
'cursor-pointer')}) to add keyboard accessibility: give it tabIndex={0} and
role="button" when onRowClick is present, and add an onKeyDown handler that
invokes onRowClick(row.original) when Enter or Space is pressed (prevent default
for Space to avoid page scroll). Keep the existing onClick and cursor-pointer
class and ensure the handler checks onRowClick before calling it.
frontend/src/components/charts/revenue-bar-chart.tsx-44-44 (1)

44-44: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace runtime-generated Tailwind class with inline style to ensure dynamic height is applied.

On line 44, h-[${height}px] uses a template literal for the height value, which Tailwind cannot statically discover during the build process. This causes the class to be missing in production, collapsing the chart.

Use an inline style instead:

Proposed fix
-        <ChartContainer config={chartConfig} className={`h-[${height}px] w-full`}>
+        <ChartContainer config={chartConfig} className="w-full" style={{ height }}>

Note: The same pattern appears in time-series-chart.tsx and backtest-folds-chart.tsx and should also be fixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/charts/revenue-bar-chart.tsx` at line 44, The dynamic
Tailwind class h-[${height}px] can't be resolved at build time; change the
ChartContainer usage in revenue-bar-chart.tsx to remove the runtime-generated
h-[] token and instead pass the height via an inline style (e.g., style={{
height: `${height}px` }}) while keeping the remaining className (like "w-full");
apply the same change to the corresponding ChartContainer usages in
time-series-chart.tsx and backtest-folds-chart.tsx, referencing the height
prop/variable and preserving chartConfig and other props.
frontend/src/components/charts/time-series-chart.tsx-73-73 (1)

73-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use inline style for dynamic chart height.

On Line 73, h-[${height}px] uses a dynamic template string that Tailwind's static scanner cannot process at build time, risking the class being dropped in production. Use an inline style prop instead.

Proposed fix
-        <ChartContainer config={chartConfig} className={`h-[${height}px] w-full`}>
+        <ChartContainer config={chartConfig} className="w-full" style={{ height }}>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/charts/time-series-chart.tsx` at line 73, The dynamic
Tailwind class h-[${height}px] on the ChartContainer will be stripped by
Tailwind's static scanner; replace the dynamic height class with an inline style
on the ChartContainer (use the existing height variable to set style={{ height:
`${height}px` }}) and keep static classes like "w-full" in className; update the
ChartContainer JSX (the props passed to ChartContainer where chartConfig and
className are used) to remove the dynamic template string and apply the inline
style instead.
frontend/src/lib/demand-utils.ts-89-91 (1)

89-91: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure latest inventory snapshot is selected deterministically.

Line 89-91 currently keeps whichever duplicate grain appears last in array order, which can select stale inventory data instead of the latest snapshot.

🧭 Proposed fix
-  const inventoryByGrain = new Map(
-    inventory.map((item) => [`${item.store_id}:${item.product_id}`, item]),
-  )
+  const inventoryByGrain = new Map<string, InventoryStatusItem>()
+  for (const item of inventory) {
+    const key = `${item.store_id}:${item.product_id}`
+    const current = inventoryByGrain.get(key)
+    if (!current || item.date > current.date) {
+      inventoryByGrain.set(key, item)
+    }
+  }

Also applies to: 105-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/lib/demand-utils.ts` around lines 89 - 91, The current
inventoryByGrain Map (created via inventory.map and new Map) can keep a
non-deterministic/stale entry when multiple items share the same
`${store_id}:${product_id}` key; instead, iterate inventory and for each key
compare snapshot_at (or other timestamp field) and replace the Map entry only if
the incoming item is newer so the Map always holds the latest snapshot; apply
the same deterministic "keep newest by snapshot_at" logic to the other similar
Map at the second occurrence (the code using the same inventory.map pattern
around line 105) and reference the inventoryByGrain Map and the
`${item.store_id}:${item.product_id}` key when updating.
frontend/src/lib/demand-utils.ts-46-46 (1)

46-46: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use ceiling for reorder quantity to avoid under-ordering.

Line 46 uses Math.round, which can round small positive shortfalls down to 0 and leave unmet demand.

📦 Proposed fix
-  return Math.max(0, Math.round(leadTimeDemand - onHand - (onOrder ?? 0)))
+  return Math.max(0, Math.ceil(leadTimeDemand - onHand - (onOrder ?? 0)))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/lib/demand-utils.ts` at line 46, The reorder quantity
calculation currently uses Math.round and can under-order when shortfalls are
small; change the expression in the function that returns the computed reorder
(the line returning Math.max(0, Math.round(leadTimeDemand - onHand - (onOrder ??
0)))) to use Math.ceil instead of Math.round so any positive shortfall is
rounded up, while keeping the Math.max(0, ...) guard and the (onOrder ?? 0)
nullish fallback unchanged.
frontend/src/lib/csv-export.ts-8-13 (1)

8-13: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mitigate CSV formula injection in exported fields.

Line 8-13 escapes RFC4180 delimiters but still allows formula-prefixed values (=, +, -, @) to execute when opened in spreadsheet apps.

🔒 Proposed fix
 function quoteField(value: unknown): string {
-  const str = value === null || value === undefined ? '' : String(value)
-  if (/[",\r\n]/.test(str)) {
-    return `"${str.replace(/"/g, '""')}"`
+  const raw = value === null || value === undefined ? '' : String(value)
+  const neutralized = /^[=+\-@]/.test(raw) ? `'${raw}` : raw
+  if (/[",\r\n]/.test(neutralized)) {
+    return `"${neutralized.replace(/"/g, '""')}"`
   }
-  return str
+  return neutralized
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/lib/csv-export.ts` around lines 8 - 13, The quoteField function
currently escapes RFC4180 delimiters but doesn't mitigate CSV formula injection;
update quoteField to detect values that start with any of the formula-trigger
characters (=, +, -, @) after trimming leading whitespace and, for such values,
prefix with a single quote (') or another safe marker before performing the
existing double-quote escaping logic so spreadsheets treat them as plain text;
ensure you still handle null/undefined and apply the existing /"/g replacement
and enclosing quotes when delimiters are present.
frontend/src/pages/explorer/job-detail.tsx-76-81 (1)

76-81: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle cancel mutation failures explicitly.

mutateAsync can reject when the API call fails. Without a try/catch, the rejection becomes an unhandled promise rejection when triggered from the dialog action.

Suggested fix
 async function handleCancel() {
-  await cancelJob.mutateAsync(jobId)
-  // useCancelJob invalidates ['jobs']; refresh this detail query explicitly
-  // so the page reflects the cancelled status immediately.
-  void queryClient.invalidateQueries({ queryKey: ['jobs', jobId] })
+  try {
+    await cancelJob.mutateAsync(jobId)
+    // useCancelJob invalidates ['jobs']; refresh this detail query explicitly
+    // so the page reflects the cancelled status immediately.
+    await queryClient.invalidateQueries({ queryKey: ['jobs', jobId] })
+  } catch (error) {
+    console.error('Failed to cancel job:', error)
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/job-detail.tsx` around lines 76 - 81, The
handleCancel function calls cancelJob.mutateAsync(jobId) without handling
rejections; wrap the mutateAsync call in a try/catch inside handleCancel, call
queryClient.invalidateQueries({ queryKey: ['jobs', jobId] }) in the success path
(or finally if you still want refresh regardless), and surface the error in the
catch (e.g., show a user-facing error/toast or set local error state) so mutate
failures do not become unhandled promise rejections; update handleCancel to
reference cancelJob.mutateAsync, queryClient.invalidateQueries, and any UI error
reporting helper you use.
frontend/src/pages/visualize/backtest.tsx-88-104 (1)

88-104: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce n_splits / test_size bounds before enabling run.

The button can submit with nSplits=0 or testSize=0 (or other invalid values) because formReady doesn’t validate numeric bounds, and input parsing currently allows zero fallback.

Suggested fix
-const formReady = !!storeId && !!productId && !!dateRange?.from && !!dateRange?.to
+const validSplits = Number.isInteger(nSplits) && nSplits >= 2
+const validTestSize = Number.isInteger(testSize) && testSize >= 1
+const formReady =
+  !!storeId && !!productId && !!dateRange?.from && !!dateRange?.to && validSplits && validTestSize

- onChange={(event) => setNSplits(Number(event.target.value) || 0)}
+ onChange={(event) => setNSplits(Math.max(0, Number(event.target.value) || 0))}

- onChange={(event) => setTestSize(Number(event.target.value) || 0)}
+ onChange={(event) => setTestSize(Math.max(0, Number(event.target.value) || 0))}

Also applies to: 185-199, 203-203

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/visualize/backtest.tsx` around lines 88 - 104, formReady
currently only checks presence of values so the form can submit with invalid
numeric values; update formReady and submit validation to enforce numeric bounds
(e.g., nSplits > 0 and testSize > 0 and within expected range) and ensure input
parsing doesn't fall back to zero. Specifically, adjust the formReady expression
and add an explicit pre-submit validation in handleRunBacktest to check nSplits
and testSize (and convert/parse them safely), setRunError with a clear message
and return if out of bounds, and/or disable the Run button when the numeric
constraints fail so createJob.mutateAsync is never called with invalid n_splits
or test_size.
frontend/src/pages/visualize/demand.tsx-83-86 (1)

83-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use keyboard-accessible controls for sortable headers and row selection.

Interactive behavior is attached to non-interactive table elements (TableHead, TableRow) via onClick only. Sorting and row drill-in should be operable by keyboard with proper semantics.

Suggested fix
-<TableHead ... onClick={() => onSort(columnKey)}>
-  <span ...>...</span>
-</TableHead>
+<TableHead className={cn(numeric && 'text-right')}>
+  <button
+    type="button"
+    onClick={() => onSort(columnKey)}
+    className={cn('inline-flex items-center gap-1', numeric && 'flex-row-reverse')}
+    aria-label={`Sort by ${label}`}
+  >
+    ...
+  </button>
+</TableHead>

-<TableRow key={row.jobId} onClick={() => setSelectedJobId(row.jobId)} className="cursor-pointer ...">
+<TableRow
+  key={row.jobId}
+  role="button"
+  tabIndex={0}
+  onClick={() => setSelectedJobId(row.jobId)}
+  onKeyDown={(e) => (e.key === 'Enter' || e.key === ' ') && setSelectedJobId(row.jobId)}
+  className="cursor-pointer ..."
+>

Also applies to: 304-307

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/visualize/demand.tsx` around lines 83 - 86, The table
headers and rows (e.g., TableHead and TableRow) have interactive behavior wired
only to onClick (calling onSort(columnKey) for headers and the row drill-in
handler), which is not keyboard-accessible; update the header and row components
to be operable via keyboard by adding tabIndex={0}, a semantic role (e.g.,
role="button" for headers or appropriate row/rowheader roles), key handlers that
call the same handlers on Enter/Space (e.g., onKeyDown invoking
onSort(columnKey) for TableHead), and relevant ARIA attributes (e.g., aria-sort
on sortable headers and aria-selected on selectable rows) so screen readers and
keyboard users can discover and trigger sorting and drill-in. Ensure the same
props/handlers used in onClick are reused in onKeyDown to avoid divergent
behavior.
frontend/src/pages/explorer/sales.tsx-24-29 (1)

24-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard URL parsing for dimension, store_id, and product_id.

Current parsing accepts invalid values (dimension=foo, store_id=abc) and can propagate NaN into filters/requests. This can render invalid chips and trigger invalid drilldown/timeseries calls.

Suggested fix
+const VALID_DIMENSIONS: DrilldownDimension[] = ['store', 'product', 'category', 'region', 'date']
+const parsePositiveInt = (v: string | null): number | undefined => {
+  if (!v) return undefined
+  const n = Number(v)
+  return Number.isInteger(n) && n > 0 ? n : undefined
+}
-const dimension = (searchParams.get('dimension') as DrilldownDimension | null) ?? 'store'
+const dimensionParam = searchParams.get('dimension')
+const dimension: DrilldownDimension =
+  dimensionParam && VALID_DIMENSIONS.includes(dimensionParam as DrilldownDimension)
+    ? (dimensionParam as DrilldownDimension)
+    : 'store'
-const storeIdParam = searchParams.get('store_id')
-const productIdParam = searchParams.get('product_id')
-const storeId = storeIdParam ? Number(storeIdParam) : undefined
-const productId = productIdParam ? Number(productIdParam) : undefined
+const storeId = parsePositiveInt(searchParams.get('store_id'))
+const productId = parsePositiveInt(searchParams.get('product_id'))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/sales.tsx` around lines 24 - 29, The URL parsing
currently trusts searchParams and can produce invalid values (e.g.,
dimension="foo", store_id="abc") which lead to NaN or invalid enum values;
update the parsing around dimension, storeId and productId: validate that the
retrieved dimension string is one of the allowed DrilldownDimension values
before assigning (fallback to 'store' or undefined), and parse
store_id/product_id with a robust numeric guard (e.g., parseInt/Number and check
isFinite/isNaN) so that non-numeric inputs yield undefined instead of NaN;
change the logic where dimension, storeId and productId are assigned (the
constants named dimension, storeIdParam/productIdParam, storeId, productId) to
perform these checks and ensure downstream filters/drilldown calls only receive
validated values.
frontend/src/pages/explorer/runs.tsx-93-111 (1)

93-111: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate URL query params before issuing API calls.

page, status, model_type, and sort_by are taken from the URL and passed through without runtime validation. Malformed/shared links can push invalid values (e.g., negative page, unknown status/sort key) and break fetch behavior.

Suggested fix
+const ALLOWED_STATUS: RunStatus[] = ['pending', 'running', 'success', 'failed', 'archived']
+const ALLOWED_SORT = new Set(['status', 'model_type', 'store_id', 'product_id', 'created_at'])
+const ALLOWED_MODEL = new Set(['naive', 'seasonal_naive', 'moving_average', 'lightgbm'])
+
-const modelType = searchParams.get('model_type') ?? undefined
-const status = searchParams.get('status') ?? undefined
-const page = Number(searchParams.get('page')) || 1
-const sortBy = searchParams.get('sort_by') ?? undefined
+const modelTypeRaw = searchParams.get('model_type') ?? undefined
+const modelType = modelTypeRaw && ALLOWED_MODEL.has(modelTypeRaw) ? modelTypeRaw : undefined
+const statusRaw = searchParams.get('status') ?? undefined
+const status = statusRaw && ALLOWED_STATUS.includes(statusRaw as RunStatus) ? (statusRaw as RunStatus) : undefined
+const pageRaw = Number(searchParams.get('page'))
+const page = Number.isInteger(pageRaw) && pageRaw > 0 ? pageRaw : 1
+const sortByRaw = searchParams.get('sort_by') ?? undefined
+const sortBy = sortByRaw && ALLOWED_SORT.has(sortByRaw) ? sortByRaw : undefined
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/runs.tsx` around lines 93 - 111, The URL query
params (page, status, model_type, sort_by, sort_order) are used directly for API
calls; validate and normalize them before building pagination/sorting and
calling useRuns: ensure page is a positive integer (fallback to 1 on NaN/<=0),
clamp pageSize via pagination.pageSize, constrain status to a known RunStatus
set (map unknown values to undefined), whitelist allowed sortBy keys (set to
undefined if invalid) and only allow sortOrder when sortBy is valid, and ensure
modelType is one of the expected values or undefined; update the variables
modelType, status, page, sortBy, sortOrder, pagination, sorting accordingly
before passing them into useRuns so malformed URLs can't break fetching.
frontend/src/pages/explorer/stores.tsx-63-84 (1)

63-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize URL-derived table params before querying.

page, region, store_type, and sort_by are consumed directly from query params. Invalid values can produce unstable pagination/sort behavior and bad backend requests from manually edited URLs.

Suggested fix
+const ALLOWED_SORT = new Set(['code', 'name', 'region', 'city', 'store_type'])
+const ALLOWED_REGION = new Set(['North', 'South', 'East', 'West'])
+const ALLOWED_TYPE = new Set(['Supermarket', 'Convenience', 'Hypermarket'])
+
 const search = searchParams.get('search') ?? ''
-const region = searchParams.get('region') ?? undefined
-const storeType = searchParams.get('store_type') ?? undefined
-const page = Number(searchParams.get('page')) || 1
-const sortBy = searchParams.get('sort_by') ?? undefined
+const regionRaw = searchParams.get('region') ?? undefined
+const region = regionRaw && ALLOWED_REGION.has(regionRaw) ? regionRaw : undefined
+const storeTypeRaw = searchParams.get('store_type') ?? undefined
+const storeType = storeTypeRaw && ALLOWED_TYPE.has(storeTypeRaw) ? storeTypeRaw : undefined
+const pageRaw = Number(searchParams.get('page'))
+const page = Number.isInteger(pageRaw) && pageRaw > 0 ? pageRaw : 1
+const sortByRaw = searchParams.get('sort_by') ?? undefined
+const sortBy = sortByRaw && ALLOWED_SORT.has(sortByRaw) ? sortByRaw : undefined
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/stores.tsx` around lines 63 - 84, Validate and
sanitize URL-derived params before using them: parse and clamp page into a
positive integer (default 1) and set pagination.pageIndex = Math.max(0, page-1);
whitelist region, storeType, and sortBy against allowed constants (e.g.,
REGION_OPTIONS, STORE_TYPE_OPTIONS, SORTABLE_FIELDS) and fall back to undefined
if not allowed; derive sortOrder only if sortBy is valid; keep the existing
search length check. Update the code around variables page, region, storeType,
sortBy, sortOrder and the useStores(...) call to use these sanitized values and
ensure PaginationState and SortingState use the cleaned page/pageIndex and
sorting only when sortBy is valid.
pyproject.toml-3-3 (1)

3-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release version is out of sync with this PR’s stated target.

Line 3 still sets 0.2.13, while this release PR target is 0.2.14. Please bump here (and keep release metadata/changelog aligned) before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 3, Update the package version in pyproject.toml from
"0.2.13" to "0.2.14" so it matches the PR target; ensure the release metadata
and changelog entries are updated to reflect 0.2.14 as well (verify the version
string in the version = "..." line and any related release notes/comments).
🟡 Minor comments (7)
app/features/analytics/routes.py-302-313 (1)

302-313: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add positive bounds for new store_id/product_id query params.

The new endpoint params accept negative integers; constrain them at the boundary (ge=1) to reject invalid IDs early.

✅ Suggested patch
-    store_id: int | None = Query(
+    store_id: int | None = Query(
         None,
+        ge=1,
         description="Filter by store ID. Use GET /dimensions/stores to find valid IDs.",
     ),
-    product_id: int | None = Query(
+    product_id: int | None = Query(
         None,
+        ge=1,
         description="Filter by product ID. Use GET /dimensions/products to find valid IDs.",
     ),
-    store_id: int | None = Query(
+    store_id: int | None = Query(
         None,
+        ge=1,
         description="Filter by store ID. Use GET /dimensions/stores to find valid IDs.",
     ),
-    product_id: int | None = Query(
+    product_id: int | None = Query(
         None,
+        ge=1,
         description="Filter by product ID. Use GET /dimensions/products to find valid IDs.",
     ),
As per coding guidelines `app/**/*.py`: “Validate all inputs at boundaries using Pydantic v2. Apply validation to HTTP requests, agent tools, and seeder configurations.”

Also applies to: 381-388

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/analytics/routes.py` around lines 302 - 313, The Query
parameters store_id and product_id in app/features/analytics/routes.py currently
accept negative integers; update their Query definitions (the store_id and
product_id parameters) to enforce positive bounds by adding ge=1 to both
Query(...) calls (apply the same change to the other occurrences around the
381-388 region) so Pydantic v2 validates IDs as >=1 and rejects invalid negative
values at the boundary.
app/features/analytics/tests/conftest.py-157-157 (1)

157-157: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t clear unrelated dependency overrides in teardown.

Line 157 clears all overrides, which can erase overrides set by other fixtures. Remove only the get_db override.

✅ Minimal fix
-    app.dependency_overrides.clear()
+    app.dependency_overrides.pop(get_db, None)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/analytics/tests/conftest.py` at line 157, The teardown currently
calls app.dependency_overrides.clear(), which removes all app overrides; change
it to only remove the get_db override by calling
app.dependency_overrides.pop("get_db", None) (or check and del if present) so
other fixtures' overrides remain intact; locate the teardown in conftest.py
where app.dependency_overrides.clear() is used and replace that call
accordingly.
app/features/config/tests/test_service.py-103-109 (1)

103-109: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore Settings mutations after the test.

Line 103–109 mutates the cached singleton and does not restore it, which can make later tests order-dependent.

🔧 Minimal stabilization
         settings = get_settings()
-        settings.agent_max_tool_calls = 7
-        settings.agent_timeout_seconds = 99
-        settings.agent_retry_attempts = 2
-        settings.agent_session_ttl_minutes = 45
-        settings.agent_require_approval = ["create_alias"]
-
-        config = await service.get_effective_config(_mock_db())
+        prev = (
+            settings.agent_max_tool_calls,
+            settings.agent_timeout_seconds,
+            settings.agent_retry_attempts,
+            settings.agent_session_ttl_minutes,
+            list(settings.agent_require_approval),
+        )
+        try:
+            settings.agent_max_tool_calls = 7
+            settings.agent_timeout_seconds = 99
+            settings.agent_retry_attempts = 2
+            settings.agent_session_ttl_minutes = 45
+            settings.agent_require_approval = ["create_alias"]
+            config = await service.get_effective_config(_mock_db())
+        finally:
+            (
+                settings.agent_max_tool_calls,
+                settings.agent_timeout_seconds,
+                settings.agent_retry_attempts,
+                settings.agent_session_ttl_minutes,
+                settings.agent_require_approval,
+            ) = prev
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/config/tests/test_service.py` around lines 103 - 109, The test
mutates the global Settings singleton returned by get_settings() (e.g.,
settings.agent_max_tool_calls, agent_timeout_seconds, agent_retry_attempts,
agent_session_ttl_minutes, agent_require_approval) and does not restore original
values; update the test to capture the original settings (or clone the Settings
object) before changing them and restore the originals in a finally/teardown
block (or use monkeypatch to set attributes and undo them) so the cached
singleton is returned to its previous state after the test.
frontend/src/pages/explorer/jobs.tsx-45-51 (1)

45-51: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Sanitize page from URL before building pagination state.

Number(searchParams.get('page')) || 1 accepts negative values (for example -3), which produces invalid pagination/query input.

Suggested fix
-  const page = Number(searchParams.get('page')) || 1
+  const rawPage = Number(searchParams.get('page'))
+  const page = Number.isInteger(rawPage) && rawPage > 0 ? rawPage : 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/jobs.tsx` around lines 45 - 51, The current page
parsing (using page = Number(searchParams.get('page')) || 1) can yield negative
or non-integer values; sanitize the URL page value before building the
PaginationState by parsing it as an integer (e.g., parseInt or Number), treating
NaN as 1, clamping to a minimum of 1, and then computing pageIndex =
sanitizedPage - 1 so pageIndex is never negative; update the code around page,
pagination, and DEFAULT_PAGE_SIZE to use this sanitizedPage variable.
frontend/src/pages/explorer/products.tsx-65-71 (1)

65-71: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clamp URL page to a positive integer.

Current parsing accepts negative values, which can drive invalid pagination and request params.

Suggested fix
-  const page = Number(searchParams.get('page')) || 1
+  const rawPage = Number(searchParams.get('page'))
+  const page = Number.isInteger(rawPage) && rawPage > 0 ? rawPage : 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/explorer/products.tsx` around lines 65 - 71, The page
value from searchParams.get('page') can be negative; parse it as an integer
(e.g., via Number.parseInt or Number()) and clamp it to a minimum of 1 before
computing pageIndex so pagination never becomes negative—update the logic around
the page variable (where page is set) and ensure the PaginationState calculation
(pageIndex: page - 1, pageSize: DEFAULT_PAGE_SIZE) uses the clamped positive
page.
frontend/src/pages/guide.tsx-192-201 (1)

192-201: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid permanent loading skeleton when config is unavailable.

This branch shows a skeleton whenever config is missing, even after loading completes due to an error. Show a clear unavailable state once configLoading is false.

Suggested fix
- {config ? (
+ {config ? (
    config.agent_require_approval.map((tool) => (
      ...
    ))
- ) : (
+ ) : configLoading ? (
    <Skeleton className="h-5 w-40" />
+ ) : (
+   <span className="text-muted-foreground">Unavailable right now.</span>
  )}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/guide.tsx` around lines 192 - 201, The UI currently
renders a permanent Skeleton when config is falsy; change the rendering so that
Skeleton is shown only while configLoading is true, and when configLoading is
false but config is still missing render an explicit "Unavailable" state (e.g.,
a plain Badge/label or message) instead of the Skeleton; update the JSX around
config, configLoading and config.agent_require_approval (the map over
agent_require_approval) to conditionally render: if configLoading -> Skeleton,
else if config -> map agent_require_approval, else -> an "Unavailable"
badge/message so users see a clear error state.
frontend/src/pages/visualize/forecast.tsx-51-54 (1)

51-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Defensively validate result.forecasts before using array methods.

At Line 51, the cast does not enforce runtime shape. If job.result.forecasts is non-array, forecastData?.some(...) at Line 52 can throw. Please gate with Array.isArray(...) before deriving hasBounds and rendering.

Suggested fix
-  const forecastData = job?.result?.forecasts as ForecastPoint[] | undefined
-  const hasBounds = !!forecastData?.some(
+  const forecastDataRaw = job?.result?.forecasts
+  const forecastData: ForecastPoint[] | undefined = Array.isArray(forecastDataRaw)
+    ? (forecastDataRaw as ForecastPoint[])
+    : undefined
+  const hasBounds = !!forecastData?.some(
     (point) => point.lower_bound != null && point.upper_bound != null,
   )

Also applies to: 192-193

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/visualize/forecast.tsx` around lines 51 - 54, The code
casts job?.result?.forecasts to ForecastPoint[] (forecastData) but doesn't check
runtime shape, so calling forecastData?.some(...) can throw if forecasts isn't
an array; update the logic that computes hasBounds (and the similar logic at
lines ~192-193) to first guard with Array.isArray(job?.result?.forecasts) (or
Array.isArray(forecastData)) before using .some, e.g., derive forecastData only
when the array check passes and then compute hasBounds safely; reference
symbols: job?.result?.forecasts, forecastData, and hasBounds.
🧹 Nitpick comments (5)
app/features/jobs/tests/conftest.py (1)

74-74: ⚡ Quick win

Use targeted dependency-override cleanup.

app.dependency_overrides.clear() can remove overrides registered by other fixtures; remove only get_db here.

Proposed fix
-    app.dependency_overrides.clear()
+    app.dependency_overrides.pop(get_db, None)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/jobs/tests/conftest.py` at line 74, Replace the blanket cleanup
app.dependency_overrides.clear() with targeted removal of only the get_db
override: locate the place that currently calls app.dependency_overrides.clear()
in the test fixture and change it to remove the get_db entry (e.g., pop or del
on app.dependency_overrides["get_db"]) so other fixtures' overrides remain
intact; ensure you check for the key's existence before removing to avoid
KeyError.
app/features/dimensions/tests/conftest.py (1)

98-98: ⚡ Quick win

Avoid clearing unrelated dependency overrides in teardown.

Use a targeted removal for get_db; clear() can remove overrides owned by other fixtures/tests.

Proposed fix
-    app.dependency_overrides.clear()
+    app.dependency_overrides.pop(get_db, None)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/dimensions/tests/conftest.py` at line 98, The teardown currently
calls app.dependency_overrides.clear(), which indiscriminately removes overrides
for other tests; instead remove only the override for get_db by deleting or
popping app.dependency_overrides[get_db] (or using pop(get_db, None)) in the
teardown so only the get_db override registered by this fixture is removed and
other overrides remain intact.
app/features/seeder/service.py (1)

324-326: ⚡ Quick win

Anchor scenario windows to a single today value.

Line 324 and Line 325 read the clock separately; around UTC midnight this can produce inconsistent window spans. Derive year_ago from today to keep ranges deterministic.

💡 Suggested refactor
 from app.shared.seeder.config import (
+    DEFAULT_SEED_SPAN_DAYS,
     DEMO_MINIMAL_SPAN_DAYS,
@@
-    default_seed_end_date,
-    default_seed_start_date,
+    default_seed_end_date,
 )
@@
-    today = default_seed_end_date()
-    year_ago = default_seed_start_date()
+    today = default_seed_end_date()
+    year_ago = today - timedelta(days=DEFAULT_SEED_SPAN_DAYS)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/features/seeder/service.py` around lines 324 - 326, Read the clock once
into today (today = default_seed_end_date()) and derive year_ago from that
single time value by computing the original span and subtracting it from today
(e.g. span = default_seed_end_date() - default_seed_start_date(); year_ago =
today - span), then compute demo_start = today -
timedelta(days=DEMO_MINIMAL_SPAN_DAYS); this ensures today, year_ago and
demo_start are all anchored to the same timestamp and avoids cross-midnight
inconsistencies.
frontend/src/hooks/index.ts (1)

1-13: ⚡ Quick win

Re-export use-inventory from the hooks barrel for API consistency.

useInventoryStatus is introduced in this PR but not exported here, which makes the shared hooks surface inconsistent.

Suggested change
 export * from './use-drilldowns'
 export * from './use-timeseries'
 export * from './use-lifecycle-curve'
+export * from './use-inventory'
 export * from './use-runs'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/hooks/index.ts` around lines 1 - 13, Export the new hook from
the hooks barrel so the public API is consistent: add an export for the module
that defines useInventoryStatus (referencing the symbol useInventoryStatus and
its module, e.g., './use-inventory') to the list of re-exports in the hooks
index (the file that currently re-exports use-stores, use-products, etc.),
ensuring the hook is available to consumers.
scripts/run_demo.py (1)

416-418: ⚡ Quick win

Use shared seeder date helpers instead of reimplementing “today” logic.

This script computes its own end date, while related test/setup paths use default_seed_end_date()/shared span config. Reusing the shared helper avoids drift and timezone edge mismatches.

Suggested fix
-from datetime import UTC, date, datetime, timedelta
+from datetime import date, timedelta
+from app.shared.seeder.config import default_seed_end_date
...
-    seed_end = datetime.now(UTC).date()
+    seed_end = default_seed_end_date()
     seed_start = seed_end - timedelta(days=DEMO_SEED_SPAN_DAYS)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run_demo.py` around lines 416 - 418, Replace the ad-hoc "today"
computation (seed_end = datetime.now(UTC).date(); seed_start = seed_end -
timedelta(days=DEMO_SEED_SPAN_DAYS)) with the shared helper
default_seed_end_date(); call default_seed_end_date() to set seed_end and then
compute seed_start by subtracting DEMO_SEED_SPAN_DAYS (or the shared span
constant if available) so run_demo.py uses the same timezone/edge-case logic as
other seeders and tests.

chore: back-merge main into dev to clear v0.2.13 release-PR conflicts (#202)
@w7-mgfcode w7-mgfcode merged commit 1ffd482 into main May 18, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant