diff --git a/.gitignore b/.gitignore index bdb8bf25..d62a81e3 100644 --- a/.gitignore +++ b/.gitignore @@ -43,3 +43,6 @@ artifacts/ .agents/ .handoffs/ HANDOFF.md + +# Local CI / dogfood logs and screenshots (per-session, never committed) +.ci-logs/ diff --git a/PRPs/PRP-31-mlzoo-d-frontend-registry-explainability.md b/PRPs/PRP-31-mlzoo-d-frontend-registry-explainability.md new file mode 100644 index 00000000..4437f4e9 --- /dev/null +++ b/PRPs/PRP-31-mlzoo-d-frontend-registry-explainability.md @@ -0,0 +1,1538 @@ +# PRP-31 — MLZOO-D — Frontend, Registry, and Explainability Polish + +> **Sequence position:** Final MLZOO PRP. A (PRP-29) → B (PRP-30) → B.2 (PRP-MLZOO-B.2) +> → C1 (PRP-MLZOO-C1) → C2 (PRP-MLZOO-C2) all merged in v0.2.16. This PRP is the +> surfacing layer: it exposes the advanced-model metadata the backend already +> captures (and a small slice it does not yet) in the React dashboard, plus a +> minimal "feature importance" hook for the three feature-aware model families. + +## Purpose + +Make Advanced ML Model Zoo capabilities **discoverable in the product**. Today, a +user who trains a LightGBM, XGBoost, regression, or prophet_like model sees the +same flat `model_type` string in the runs explorer that a `naive` baseline +produces — even though the registry already stores richer config, dependency +versions, and (inside the joblib bundle) the canonical 14-column feature frame +and the fitted estimator's learned importances. This PRP closes that gap with +the smallest possible surface: one new backend endpoint, one computed registry +field, one new React panel, and additive in-place edits to four existing pages. + +This PRP follows the staging contract in `PRPs/INITIAL/INITIAL-MLZOO-index.md` +("D. Frontend / registry / explainability") and the original brief at +`PRPs/INITIAL/INITIAL-MLZOO-D-frontend-registry-explainability.md`. + +## What this PRP already inherits (DO NOT re-build) + +The backend mechanics this PRP surfaces are **already shipped in v0.2.16**. +Do not re-implement any of: + +- **The canonical feature frame** (`app/shared/feature_frames/contract.py:80-98`) + — 14 columns: 4 target lags (`lag_1`, `lag_7`, `lag_14`, `lag_28`), 6 calendar + signals (`dow_sin`, `dow_cos`, `month_sin`, `month_cos`, `is_weekend`, + `is_month_end`), 4 exogenous (`price_factor`, `promo_active`, `is_holiday`, + `days_since_launch`). Use as-is; do not propose adding or removing columns + in MLZOO-D. +- **The four feature-aware forecasters** with `requires_features: ClassVar[bool] = True`: + `RegressionForecaster` (`app/features/forecasting/models.py:483`), + `LightGBMForecaster` (`models.py:625`), `XGBoostForecaster` (`models.py:787`), + `ProphetLikeForecaster` (`models.py:950`). All four already persist + `feature_columns` into `bundle.metadata` at train time + (`app/features/forecasting/service.py:233-238`). +- **Runtime info capture** (`app/features/registry/service.py:84-139`) — Python, + sklearn, numpy, pandas, joblib, plus conditional `lightgbm_version` / + `xgboost_version` already in the `runtime_info` JSONB. +- **Bundle deserialization** (`app/features/forecasting/persistence.py:136-176`) + — `load_model_bundle(path)` is the proven entry point for any "load the + fitted model, inspect it" workflow. Mirror the `/registry/runs/{run_id}/verify` + precedent (`app/features/registry/routes.py:327`) for the new endpoint. +- **RFC 7807 error envelope** via `app/core/exceptions.py:BadRequestError` / + `DatabaseError` — the existing `forecastlab_exception_handler` serializes + these to `application/problem+json`. Never raise bare `HTTPException(400, "...")` + (PR #253 review fix; v0.2.16 closed the last holes). +- **Explanation panel** (`frontend/src/components/explainability/explanation-panel.tsx`) + + hooks (`frontend/src/hooks/use-explanations.ts`) — baseline-only by design + per `PRPs/PRP-28-forecast-explainability-driver-attribution.md`. **Do not + extend the existing panel** to advanced models; build a sibling panel. +- **Run-detail, run-compare, runs explorer, forecast viz, backtest viz** — + existing structure (`frontend/src/pages/explorer/{runs,run-detail,run-compare}.tsx`, + `frontend/src/pages/visualize/{forecast,backtest}.tsx`). Insertions only, + never rewrites. + +## DEPENDS ON — read before starting + +```yaml +# Prerequisite PRPs (all merged; read for context, not action) +- PRPs/PRP-29-feature-aware-forecasting-foundation.md # FeatureFrameContract; ClassVar requires_features +- PRPs/PRP-30-lightgbm-first-advanced-model.md # LightGBM contract; runtime_info shape +- PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md # Per-fold X_train/X_future leakage-safe split +- PRPs/PRP-MLZOO-C1-xgboost-model.md # XGBoost mirror of LightGBM +- PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md # Additive Ridge pipeline; decompose() +- PRPs/PRP-28-forecast-explainability-driver-attribution.md # Existing baseline ExplanationPanel +# Foundation INITIAL +- PRPs/INITIAL/INITIAL-MLZOO-index.md # Sequencing rule: keep D additive, no model families +- PRPs/INITIAL/INITIAL-MLZOO-D-frontend-registry-explainability.md # The brief this PRP implements + +# Files that anchor every implementation decision below +- app/features/forecasting/models.py # feature_importances_ / coef_ extraction +- app/features/forecasting/persistence.py # load_model_bundle() entry point +- app/features/forecasting/service.py # extra_metadata population at line 233 +- app/features/registry/routes.py # /verify precedent at line 327 +- app/features/registry/schemas.py # RunResponse + alias="model_config" +- frontend/src/components/explainability/explanation-panel.tsx # the layout pattern to mirror +- frontend/src/components/explainability/explanation-panel.test.tsx # the test scaffolding to mirror +- frontend/src/pages/explorer/run-detail.tsx # insertion target (line 156-189 region) +- frontend/src/pages/explorer/runs.tsx # MODEL_TYPES allow-list at line 21 +``` + +## Goal + +Surface advanced-model metadata in five places — runs explorer, run detail, +run compare, forecast viz, backtest viz — without changing the data model, +the migration set, the agent surface, or the model contracts. + +Deliverables (each is independently verifiable): + +1. **Two backend endpoints** — both forecasting-scoped, mirroring PRP-28's + `/explain/runs/{run_id}` + `/explain/jobs/{job_id}` shape exactly: + - `GET /forecasting/runs/{run_id}/feature-metadata` — registry-UUID-keyed + (consumed by run-detail / run-compare pages). + - `GET /forecasting/jobs/{job_id}/feature-metadata` — job-keyed sibling + (consumed by `forecast.tsx`, which only has the `job_id` and whose + `job.result.run_id` is the **artifact key**, NOT a registry UUID — + see Critical-fix note below and memory `scenario-run-id-vs-registry-run-id`). + Both return canonical feature columns + learned feature importance + (tree models) or signed coefficients (prophet_like additive). RFC 7807 + errors throughout: 400 for baseline-family runs, 404 for missing run/job, + 422 for runs in `pending`/`running`/`failed` status or missing artifacts + (a successful `archived` run with an intact artifact still returns 200). +2. **One computed registry field** — `model_family: Literal["baseline","tree","additive"]` + on `RunResponse`, derived from `model_type` (no DB column, no migration). +3. **One frontend panel** — `` mirroring the structure + of the existing `` but rendering ranked importance bars + plus the model-family caveat ("model-derived, not causal"). +4. **In-place additions** to the five pages (column, badge, card insertion, + hook wiring). No layout rewrites. +5. **Docs touch-up** — one new subsection in `docs/user-guide/feature-reference.md` + explaining what advanced-model metadata the dashboard exposes and the + correlation-vs-causation caveat. + +## Why + +- INITIAL-MLZOO-D is the last unshipped piece of the MLZOO sequence. Backend + contracts have been frozen across A/B/B.2/C1/C2; the dashboard hasn't moved. +- Operators currently have no way to see WHICH features drove a trained advanced + model. Two LightGBM runs against the same store/product/window with different + hyperparameters look identical in the runs table. +- The existing `` (PRP-28) is rule-based and baseline-only — + it correctly returns 400 for `lightgbm` / `xgboost` / `regression` / + `prophet_like`. Until MLZOO-D ships, there is **no advanced-model + introspection** in the UI at all. +- The "model_family" categorisation is implicit today (the agent reports note + no enum exists). Surfacing it once, deterministically, lets every page show + a consistent Baseline / Tree / Additive badge without 5 ad-hoc string maps. + +## What + +### Backend additions + +1. New module `app/features/forecasting/feature_metadata.py`: + - `extract_feature_importance(model: BaseForecaster, feature_columns: list[str]) + -> list[FeatureImportanceItem]` — branches on instance type: + - `LightGBMForecaster` / `XGBoostForecaster` / `RegressionForecaster` → + `estimator.feature_importances_` (always non-negative, tree-based) + - `ProphetLikeForecaster` → `pipeline.named_steps["ridge"].coef_` (signed) + - Any other type → raise `ValueError` (used by the route to emit 400) + - Returns items sorted by `|importance|` descending; each item carries + `name: str`, `importance: float`, `kind: Literal["tree","linear_coef"]`, + `rank: int` (1-indexed). + +2. New endpoint `GET /forecasting/runs/{run_id}/feature-metadata` in + `app/features/forecasting/routes.py`: + - Looks up the run via the registry service (cross-slice: forecasting + already imports registry via the train flow). + - 404 if the run doesn't exist; 400 (RFC 7807 via `BadRequestError`) if the + run is for a baseline model; 422 if `artifact_uri is None` or the run is + `pending` / `running` / `failed`. + - On success: `load_model_bundle(artifact_uri)`, then call + `extract_feature_importance(bundle.model, bundle.metadata["feature_columns"])`. + - Returns `FeatureMetadataResponse` (Pydantic schema, see Data Models). + +3. Computed field `model_family` on `RunResponse` + (`app/features/registry/schemas.py`): + - `model_family: ModelFamily` populated via a Pydantic `@computed_field` or + `model_validator(mode="after")`, derived from `model_type`. + - Map: `{naive, seasonal_naive, moving_average} → baseline`; + `{regression, lightgbm, xgboost} → tree`; `{prophet_like} → additive`. + - Unknown types return `baseline` and emit a `logger.warning` (forward-compat + for future model families before this map is updated). + +4. Docstring drift fix in `app/features/forecasting/routes.py:33-37` — + list `xgboost`, `regression`, `prophet_like` alongside `lightgbm` in the + `/forecasting/train` description (currently only `lightgbm` is named). + Bundle into the same commit as the new endpoint. + +### Frontend additions + +1. **Types** (`frontend/src/types/api.ts`): + - Add `ModelFamily = 'baseline' | 'tree' | 'additive'`. + - Add `model_family: ModelFamily` to `ModelRun` and `RunResponse`-shaped + types. + - Add `FeatureImportanceItem`, `FeatureMetadataResponse` matching the + backend schemas exactly. + +2. **Hooks** (`frontend/src/hooks/use-feature-metadata.ts` — new file, two sibling exports): + - `useRunFeatureMetadata(runId: string, enabled: boolean)` → + queryKey `['feature-metadata', 'run', runId]`, URL + `/forecasting/runs/${runId}/feature-metadata`. + - `useJobFeatureMetadata(jobId: string, enabled: boolean)` → + queryKey `['feature-metadata', 'job', jobId]`, URL + `/forecasting/jobs/${jobId}/feature-metadata`. **Required** for + `forecast.tsx`, which holds only a `job_id` (not a registry UUID). + - Both use `retry: false` because a 400 (baseline family) / 404 / 422 is + a final answer, not a transient. Mirror `useRunExplanation` + + `useJobExplanation` (PRP-28) exactly. + +3. **Panel** (`frontend/src/components/explainability/feature-importance-panel.tsx` — new): + - Card shell with title "Feature Importance" + description that names the + model family. + - Body: horizontal bar list (top N by importance, default 14 — exactly the + canonical feature count). For `kind === "linear_coef"`, show the sign + (red/green) and label the column as "Coefficient" instead of "Importance". + - Caveat footer: "Importance is model-derived. It reflects how much each + feature reduced the model's training error — not real-world causation." + - Same error-handling shape as `ExplanationPanel`: neutral muted card for + `ApiError.status === 400` (non-feature-aware) or 422 (no artifact), + destructive card for unexpected. + +4. **Test** (`feature-importance-panel.test.tsx` — new): + - Mirror the fixture pattern from `explanation-panel.test.tsx:7-30`. + - Cases: renders ranked tree-model items; renders signed coefficients with + direction icons; loading state; 400 neutral message; 422 "artifact not + available" message; empty `features` list message. + +5. **Badge helper** (`frontend/src/components/common/model-family-badge.tsx` — new): + - `` → shadcn `` with a + deterministic `variant` mapping: `baseline` → `secondary`, `tree` → + `default`, `additive` → `outline`. Pure derivation, no API call. + +6. **Pages — in-place additive edits only:** + + a. `frontend/src/pages/explorer/runs.tsx`: + - Extend the `MODEL_TYPES` allow-list (line 21) to + `['naive', 'seasonal_naive', 'moving_average', 'regression', 'lightgbm', 'xgboost', 'prophet_like']`. + - Insert a new column after `model_type` (line 47) that renders + ``. + - Add `model_family` to `csvColumns` (line 83-92). + + b. `frontend/src/pages/explorer/run-detail.tsx`: + - Add `` next to the model-type field in the profile + card (around line 95-143). + - New `` titled "Feature Metadata" between the existing "Metrics" + card (line 156-164) and the "Forecast Explanation" panel (line 166-170) + — but render it ONLY when `run.model_family !== 'baseline'`. Inside: + a small list of `bundle.metadata.feature_columns` (the 14 canonical + names, fetched from the new endpoint). + - Render `` immediately below the existing + `` (around line 170), fed by + `useRunFeatureMetadata(runId, run.model_family !== 'baseline')`. + + c. `frontend/src/pages/explorer/run-compare.tsx`: + - Add a "Family" row to the Profile table (lines 145-213): two + `` cells side by side. + - Add a new collapsible Card after the "Metrics diff" card (line 231-266) + titled "Feature Importance (Run A vs Run B)" rendering two + ``s side by side in a `grid-cols-2` (mobile: + stacked). Both fed by `useRunFeatureMetadata(_, enabled)` — pass + `enabled: false` for both when families differ so no fetch fires. + + d. `frontend/src/pages/visualize/forecast.tsx`: + - **CRITICAL:** `trainJob.result.run_id` is the forecast-artifact key, + NOT a registry UUID — see Task 17 and memory + `scenario-run-id-vs-registry-run-id`. Use `useJobFeatureMetadata(trainJobId, ...)` + keyed by the **job_id**, not by the artifact key. + - Render a compact `` + `` in + a `` near the existing `` (line 256-263) + — collapsed by default to preserve scan flow. + + e. `frontend/src/pages/visualize/backtest.tsx`: + - Extend `MODEL_OPTIONS` (line 52) to add `lightgbm`, `xgboost`, + `regression`, `prophet_like`. The B.2 backtest path is fully + feature-aware on the backend; this is the last gating that prevented + operators from running an advanced-model backtest from the UI. + - No other change in this PRP (no per-model importance comparison; + deferred — see "Anti-Patterns to Avoid"). + +### Docs additions + +1. New "Advanced Model Metadata" subsection in + `docs/user-guide/feature-reference.md`: + - One paragraph naming the four feature-aware families and what the + dashboard exposes (family badge, feature columns, importance / coefs). + - One paragraph stating the correlation-vs-causation caveat verbatim: + "Feature importance is model-derived. It reflects how much each feature + reduced the model's training error — not real-world causation." + +2. One-line mention in `docs/user-guide/dashboard-guide.md` linking to the + new subsection from wherever it discusses the runs explorer / run detail. + +3. **No new ADR.** No new external dependency. No new core path. SHAP / LIME + remain out of scope (PRP-28 § "Adding SHAP later needs its own PRP + ADR"; + honour that). + +### Success Criteria + +- [ ] `GET /forecasting/runs/{run_id}/feature-metadata` returns 200 with a + sorted `features: list[FeatureImportanceItem]` for a successful + LightGBM / XGBoost / regression / prophet_like run. +- [ ] `GET /forecasting/jobs/{job_id}/feature-metadata` returns 200 with the + same payload shape for a completed `train` job whose underlying model + is non-baseline; 400 for a `predict` / non-completed / baseline-trained + job; 404 for an unknown `job_id`. +- [ ] Both endpoints return RFC 7807 `application/problem+json` 400 for + baseline runs, 404 for missing run/job, 422 for runs whose + `artifact_uri is None`, runs in `pending`/`running`/`failed` status, + `ModuleNotFoundError` raised by `joblib.load` (missing ml-* extra), + and `FileNotFoundError` raised by `load_model_bundle` (artifact file + deleted or moved on disk while the registry/job row lives on). + Archived runs with intact artifacts still return 200. +- [ ] The 422 type URI is `UNPROCESSABLE_ENTITY` (not `VALIDATION_ERROR`) so + consumers can disambiguate state-prevented operations from input + validation failures. +- [ ] `RunResponse.model_family` is populated for every run returned by + `/registry/runs` and `/registry/runs/{run_id}` without a second DB query + and without an Alembic migration. +- [ ] The runs explorer table shows a `Family` Badge column; filtering by + `?model_type=lightgbm|xgboost|prophet_like|regression` works + end-to-end (no 422 from the URL-param allow-list). +- [ ] The run detail page renders ``, a "Feature Metadata" + card listing the 14 canonical columns, and `` + for a successful LightGBM run; renders nothing extra (no error, + no empty card) for a successful baseline run. +- [ ] The run compare page renders two ``s side by + side when both selected runs share `model_family`; renders a single + explanatory message ("cross-family comparison not supported") when + they differ. +- [ ] The forecast viz page renders a collapsible `` + tied to the train run, when the train run's family is not baseline. +- [ ] The backtest viz page lets the user pick `lightgbm`, `xgboost`, + `regression`, or `prophet_like` and runs the backtest successfully. +- [ ] `` renders signed direction (`+` green / `-` red) + for prophet_like coefficients and positive-only bars for tree-model + importances. +- [ ] Validation gates pass: `ruff check`, `ruff format --check`, `mypy --strict`, + `pyright --strict`, `pytest -v -m "not integration"`, + `pytest -v -m integration`, `pnpm tsc --noEmit`, `pnpm lint`, + `pnpm test --run`. +- [ ] Browser dogfood via `webapp-testing` skill: dashboard renders all five + pages cleanly against a database containing at least one trained run + per family (use `make demo` plus a manual LightGBM train; the v0.2.16 + contract makes this reachable with a single curl). + +## All Needed Context + +### Documentation & References + +```yaml +# ─── Library docs (sections, not just domains) ───────────────────────────── +- url: https://lightgbm.readthedocs.io/en/v4.5.0/pythonapi/lightgbm.LGBMRegressor.html + why: feature_importances_ attribute semantics ("gain" by default; non-negative ndarray) + critical: | + LGBMRegressor.feature_importances_ is shape (n_features,), integer "split" + by default but exposed as float when importance_type="gain" is set. + The model_factory does NOT override importance_type — confirm what + ships and document the kind explicitly in the response JSON. + +- url: https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRegressor.feature_importances_ + why: XGBoost mirrors LightGBM's API for feature_importances_; same shape, default importance_type='weight' + critical: | + Like LightGBM, document the importance_type in the response so consumers + know whether they're seeing "gain", "weight", or "cover". + +- url: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge + why: Ridge.coef_ is a signed ndarray. The prophet_like pipeline's coef_ lives at pipeline.named_steps["ridge"].coef_ + critical: | + Sign carries directional information that Importance does NOT — the panel + MUST surface it (green/+ vs red/−). Do NOT take abs() before display. + +- url: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#sklearn.ensemble.HistGradientBoostingRegressor.feature_importances_ + why: RegressionForecaster wraps HistGradientBoostingRegressor; same .feature_importances_ pattern as LightGBM + critical: | + HistGBR's feature_importances_ is "permutation-based on training data" + by default — slower than LightGBM/XGBoost's tree-split impurity but + still non-negative. + +- url: https://tanstack.com/query/v5/docs/framework/react/guides/queries + why: retry: false pattern for endpoints where 4xx is a final answer + critical: Mirror frontend/src/hooks/use-explanations.ts:10-17 verbatim + +- url: https://ui.shadcn.com/docs/components/badge + why: Badge variants (default | secondary | destructive | outline) — used by ModelFamilyBadge + +- url: https://ui.shadcn.com/docs/components/collapsible + why: Used on forecast.tsx to keep the importance panel collapsed by default + +- url: https://recharts.org/en-US/api/BarChart + why: Used by frontend/src/components/charts/revenue-bar-chart.tsx — template for the horizontal importance bars + +# ─── In-repo files (the actual patterns to mirror) ───────────────────────── +- file: app/features/registry/routes.py + lines: 327-410 + why: /verify endpoint — exact precedent for "load the artifact, return JSON" pattern, including 404 + integrity error handling + +- file: app/features/forecasting/persistence.py + lines: 136-176 + why: load_model_bundle() — the only sanctioned bundle-deserialization entry point; do not re-implement + +- file: app/features/forecasting/models.py + lines: 1094-1100 + why: prophet_like's pipeline.named_steps["ridge"].coef_ is the ONLY signed-coefficient extraction path; ridge.coef_ is shape (n_features,) + +- file: app/features/forecasting/service.py + lines: 222-238 + why: shows that bundle.metadata["feature_columns"] is already populated for every feature-aware train; the new endpoint reads this back + +- file: app/features/registry/schemas.py + lines: 65-92, 109-138 + why: RunResponse Pydantic v2 alias pattern (model_config_data aliased to "model_config"); ConfigDict(populate_by_name=True) + critical: | + Pydantic v2 strict mode applies on FastAPI request bodies; date/datetime + fields need Field(strict=False, ...) overrides — but RunResponse is a + RESPONSE, not a request, so this gotcha does NOT apply to the new types + here. (Confirm with .claude/rules/security-patterns.md § Pydantic v2 strict mode.) + +- file: app/features/registry/tests/test_routes.py + why: existing route-test pattern; use as the spec for test_routes_feature_metadata.py + +- file: app/core/exceptions.py + why: BadRequestError, DatabaseError — the only RFC 7807-handled exception classes; v0.2.16 PR #253 closed the last holes that used bare HTTPException + +- file: frontend/src/components/explainability/explanation-panel.tsx + lines: 22-26, 78-92, 104-125, 136-216 + why: exact layout pattern to mirror — PanelShell, ApiError 400 → neutral; everything else → destructive + +- file: frontend/src/components/explainability/explanation-panel.test.tsx + why: vitest scaffolding (fixture shape, render assertions); copy structure verbatim + +- file: frontend/src/hooks/use-explanations.ts + lines: 10-17 + why: useRunExplanation pattern — retry: false; queryKey shape; enabled gating + +- file: frontend/src/types/api.ts + lines: 170-217, 873-903 + why: where to add ModelFamily / FeatureImportanceItem / FeatureMetadataResponse + +- file: frontend/src/pages/explorer/runs.tsx + lines: 21, 24-81, 83-92, 101 + why: MODEL_TYPES allow-list extension point; ColumnDef pattern; csvColumns extension point + +- file: frontend/src/pages/explorer/run-detail.tsx + lines: 88-211 + why: insertion zones (profile card, metrics, explanation; the new card slots in between) + +- file: frontend/src/pages/explorer/run-compare.tsx + lines: 28-50, 82-85, 143-267 + why: DeltaCell pattern (sign-only coloring), URL-param ?a=&b=, side-by-side table layout + +- file: frontend/src/pages/visualize/forecast.tsx + lines: 36-79, 256-263 + why: train→predict job flow; where ExplanationPanel currently renders + +- file: frontend/src/pages/visualize/backtest.tsx + lines: 52-56 + why: MODEL_OPTIONS allow-list — the only file to touch for the backtest extension + +- file: frontend/src/components/charts/revenue-bar-chart.tsx + why: closest existing horizontal-bar precedent (single Bar series; deterministic palette via --chart-N) + +# ─── Project rules (read once before starting) ───────────────────────────── +- file: .claude/rules/security-patterns.md + why: RFC 7807 envelope; Pydantic v2 strict mode policy (applies to request bodies only) + +- file: .claude/rules/test-requirements.md + why: every new module/endpoint/component needs a matching test + +- file: .claude/rules/ui-design.md + why: webapp-testing skill required for browser dogfood; do not hand-roll UI when a skill applies + +- file: .claude/rules/shadcn-ui.md + why: use the shadcn MCP for new components; the Badge / Collapsible / Card primitives needed here are already installed + +- file: docs/_base/SECURITY.md + section: "Pydantic v2 strict mode on FastAPI request bodies" + why: confirms the response-only types in this PRP do not need Field(strict=False, ...) + +- file: docs/_base/DOMAIN_MODEL.md + why: confirms "the registry's RunResponse is the surface MLZOO-D extends"; ubiquitous-language match for "model family", "feature importance" +``` + +### Current Codebase tree (relevant slice) + +```text +app/features/forecasting/ +├── models.py # BaseForecaster, the four feature-aware models +├── persistence.py # ModelBundle, load_model_bundle, save_model_bundle +├── routes.py # /forecasting/train, /forecasting/predict +├── schemas.py # TrainRequest, PredictRequest, ModelConfigBase + subtypes +├── service.py # ForecastingService (train pulls feature_columns into bundle.metadata) +└── tests/ + ├── test_lightgbm_forecaster.py + ├── test_xgboost_forecaster.py + ├── test_prophet_like_forecaster.py + ├── test_regression_forecaster.py + ├── test_routes.py + └── test_service.py + +app/features/registry/ +├── models.py # ModelRun, DeploymentAlias ORM +├── routes.py # /registry/runs CRUD, /verify, /aliases +├── schemas.py # RunCreate, RunResponse (alias="model_config"), RunCompareResponse +├── service.py # RegistryService, runtime_info capture +└── tests/ + ├── test_routes.py + └── test_service.py + +frontend/src/ +├── components/ +│ ├── common/ +│ │ ├── json-block.tsx +│ │ └── status-badge.tsx +│ ├── charts/ +│ │ ├── revenue-bar-chart.tsx +│ │ └── ... (kpi-card, multi-series-chart, time-series-chart) +│ └── explainability/ +│ ├── explanation-panel.tsx +│ └── explanation-panel.test.tsx +├── hooks/ +│ ├── use-explanations.ts +│ └── use-runs.ts +├── pages/ +│ ├── explorer/ +│ │ ├── runs.tsx +│ │ ├── run-detail.tsx +│ │ └── run-compare.tsx +│ └── visualize/ +│ ├── forecast.tsx +│ └── backtest.tsx +└── types/api.ts +``` + +### Desired Codebase tree (additions) + +```text +app/core/ +└── exceptions.py # MODIFIED — adds UnprocessableEntityError(ForecastLabError) for 422 resource-state path + +app/features/forecasting/ +├── feature_metadata.py # NEW — extract_feature_importance(); ModelFamily classifier; importance_type_for() +├── routes.py # MODIFIED — adds GET /forecasting/runs/{run_id}/feature-metadata + GET /forecasting/jobs/{job_id}/feature-metadata +├── schemas.py # MODIFIED — adds FeatureImportanceItem, FeatureMetadataResponse, ModelFamily +└── tests/ + ├── test_feature_metadata.py # NEW — unit tests for extract_feature_importance + └── test_routes_feature_metadata.py # NEW — route tests for BOTH endpoints (200, 400, 404, 422 paths) + +app/features/registry/ +└── schemas.py # MODIFIED — adds model_family computed field to RunResponse + +frontend/src/ +├── components/ +│ ├── common/ +│ │ ├── model-family-badge.tsx # NEW +│ │ └── model-family-badge.test.tsx # NEW — closes test-requirements.md gap (M3) +│ └── explainability/ +│ ├── feature-importance-panel.tsx # NEW +│ └── feature-importance-panel.test.tsx # NEW +├── hooks/ +│ └── use-feature-metadata.ts # NEW — exports useRunFeatureMetadata + useJobFeatureMetadata +├── pages/ +│ ├── explorer/ +│ │ ├── runs.tsx # MODIFIED — column + allow-list +│ │ ├── run-detail.tsx # MODIFIED — badge + cards +│ │ └── run-compare.tsx # MODIFIED — family row + side-by-side +│ └── visualize/ +│ ├── forecast.tsx # MODIFIED — collapsible panel +│ └── backtest.tsx # MODIFIED — MODEL_OPTIONS allow-list +└── types/api.ts # MODIFIED — new types + +docs/user-guide/ +├── dashboard-guide.md # MODIFIED — one-line cross-link +└── feature-reference.md # MODIFIED — new "Advanced Model Metadata" subsection +``` + +### Files to MODIFY + +| Path | What changes | +|------|--------------| +| `app/core/exceptions.py` | + `UnprocessableEntityError(ForecastLabError)` class (status_code=422, code="UNPROCESSABLE_ENTITY") | +| `app/features/forecasting/feature_metadata.py` | NEW — `extract_feature_importance`, `model_family_for(model_type)`, `importance_type_for` | +| `app/features/forecasting/routes.py` | + `GET /forecasting/runs/{run_id}/feature-metadata` AND + `GET /forecasting/jobs/{job_id}/feature-metadata`; doc-string drift fix (lines 33-37) | +| `app/features/forecasting/schemas.py` | + `ModelFamily` enum, `FeatureImportanceItem`, `FeatureMetadataResponse` | +| `app/features/forecasting/service.py` | + `get_feature_metadata_for_run` AND `get_feature_metadata_for_job` methods | +| `app/features/forecasting/tests/test_feature_metadata.py` | NEW — unit tests for the extractor | +| `app/features/forecasting/tests/test_routes_feature_metadata.py` | NEW — route tests | +| `app/features/registry/schemas.py` | + `model_family` computed field on `RunResponse` | +| `app/features/registry/tests/test_schemas.py` | + tests for the computed field | +| `frontend/src/types/api.ts` | + types: `ModelFamily`, `FeatureImportanceItem`, `FeatureMetadataResponse`; extend `ModelRun` | +| `frontend/src/lib/api.ts` | (no change — generic `api()` already handles the new endpoint) | +| `frontend/src/hooks/use-feature-metadata.ts` | NEW | +| `frontend/src/components/common/model-family-badge.tsx` | NEW | +| `frontend/src/components/common/model-family-badge.test.tsx` | NEW — closes test-requirements gap flagged by prp-quality-agent | +| `frontend/src/components/explainability/feature-importance-panel.tsx` | NEW | +| `frontend/src/components/explainability/feature-importance-panel.test.tsx` | NEW | +| `frontend/src/pages/explorer/runs.tsx` | + `model_family` column, MODEL_TYPES allow-list extension, csvColumns update | +| `frontend/src/pages/explorer/run-detail.tsx` | + `` in profile, "Feature Metadata" card, `` | +| `frontend/src/pages/explorer/run-compare.tsx` | + "Family" row in profile; new collapsible side-by-side panel card | +| `frontend/src/pages/visualize/forecast.tsx` | + collapsible importance panel tied to train run | +| `frontend/src/pages/visualize/backtest.tsx` | + MODEL_OPTIONS allow-list extension | +| `docs/user-guide/feature-reference.md` | + "Advanced Model Metadata" subsection | +| `docs/user-guide/dashboard-guide.md` | + one-line cross-link | + +### DECISIONS LOCKED + +1. **[DECISION LOCKED] No new column on `model_run`; no Alembic migration.** + The MLZOO-D surface is read-only and additive. Feature columns live in + `bundle.metadata` already (`service.py:233-238`); feature importance lives + on the fitted estimator object. Lazy extraction via a new endpoint that + `load_model_bundle()`s the artifact is exactly the pattern `/verify` + already uses. A migration would add a backfill story we do not need. + +2. **[DECISION LOCKED] Endpoint lives in the forecasting slice, not registry.** + `GET /forecasting/runs/{run_id}/feature-metadata` — not `/registry/runs/...`. + The forecasting slice OWNS the bundle format (it knows about `ModelBundle`, + the four feature-aware classes, and the `coef_` vs `feature_importances_` + branch). Putting the endpoint in registry would require either a cross-slice + import of `forecasting.persistence` or duplicate bundle-deserialization + code. The new forecasting → registry import (calling `RegistryService.get_run` + from within forecasting) is itself NEW — `app/features/forecasting/` does + not import from `app/features/registry/` today. The precedent for this + read-only direction lives in `app/features/explainability/service.py:57` + (`from app.features.registry.models import ModelRun` — read-only data + contract; see the module docstring for the rationale). Mirror that + pattern: import `RegistryService` (or `ModelRun` directly) read-only + and never call mutating registry methods from forecasting. The reverse + direction (registry → forecasting) does not exist and we are not + introducing it, so the import graph stays one-way. + +3. **[DECISION LOCKED] `model_family` is computed, not persisted.** + A Pydantic `@computed_field` (or `model_validator(mode="after")`) on + `RunResponse` derives `model_family` from `model_type` at serialization + time. No DB column, no migration, no backfill. Future model families + require updating one map in `feature_metadata.py` and one enum in + `forecasting/schemas.py`; existing rows pick up the new value + automatically on next read. + +4. **[DECISION LOCKED] One panel, two display modes.** + `` renders BOTH tree-model positive-only bars AND + prophet_like signed coefficients. The mode is selected by the + `kind: "tree" | "linear_coef"` field on each `FeatureImportanceItem`. + Two separate components would force every consumer page to know the + family up-front before rendering — duplicating the family map and + coupling the panel to the model taxonomy. One component reads `kind` + and labels axes / colors accordingly. + +5. **[DECISION LOCKED] No SHAP, no LIME, no permutation-importance recompute.** + PRP-28 explicitly defers SHAP to its own future PRP + ADR (PRP-28 § + "Adding SHAP later needs its own PRP + ADR"). MLZOO-D surfaces only + what the trained estimator already exposes as a first-class attribute + (`.feature_importances_`, `.coef_`). Permutation importance on hold-out + would require a feature frame and a leakage-safe split — a substantially + larger surface that belongs in a follow-on PRP. + +6. **[DECISION LOCKED] No agent-tool exposure of feature importance in MLZOO-D.** + The chat agent's tool surface (`app/features/agents/tools/*`) is OUT OF + SCOPE. Exposing feature importance to the experiment agent is its own + decision: it changes the surface the human-in-the-loop approval gate + guards, and the conversational ergonomics need their own design. Defer + to a separate PRP (`agents`-scoped). + +7. **[DECISION LOCKED] Backtest viz extension is allow-list-only.** + Extending `MODEL_OPTIONS` in `backtest.tsx` is the bare minimum to let + operators run feature-aware backtests from the UI (B.2 made the backend + ready). Per-model side-by-side fold comparison, multi-family ranking, + and per-fold feature-importance heatmaps are explicitly OUT OF SCOPE — + they are full features in their own right, not a polish PRP's surface. + +8. **[DECISION LOCKED] Cross-slice run-compare is read-only.** + When two compared runs have different `model_family`, the compare page + shows a single "cross-family comparison not supported" message in the + feature-importance card — it does NOT render two panels with a "doesn't + make sense" warning, and it does NOT auto-rank features by some shared + abstract measure. Cross-family metric comparison (MAE, sMAPE) still works + in the existing metrics-diff card because those metrics are absolute. + +### Known Gotchas + +```python +# CRITICAL: LightGBM and XGBoost extras are OPTIONAL. +# pyproject.toml's [project.optional-dependencies] gates lightgbm / xgboost +# behind ml-lightgbm and ml-xgboost extras. The ForecastLabAI wrapper class +# (LightGBMForecaster / XGBoostForecaster) is unconditionally importable — +# the lazy `import lightgbm as lgb` / `import xgboost as xgb` lives inside +# fit() at models.py:704, 861, wrapped in try/except since v0.2.16 PR #253. +# +# BUT the joblib bundle for a lightgbm/xgboost run contains the +# scikit-learn-compatible *estimator* (LGBMRegressor / XGBRegressor) +# pickled. `joblib.load(bundle_path)` deserializes that estimator and +# CAN raise `ModuleNotFoundError: No module named 'lightgbm'` (or xgboost) +# at unpickle time, BEFORE control returns to extract_feature_importance. +# +# Handle this at the route boundary: wrap `load_model_bundle(...)` in +# `try/except ModuleNotFoundError as exc → raise +# UnprocessableEntityError(f"Model artifact requires the {pkg} extra; reinstall with --extra ml-{pkg}") from exc`. +# The 422 surfaces a clear remediation hint to the operator; do NOT 500. +# +# extract_feature_importance itself MUST not import lightgbm / xgboost at +# module scope. It branches on `isinstance(model, LightGBMForecaster)` etc. +# — those forecaster wrapper classes are always importable. Reading +# `.feature_importances_` off the (already-unpickled) estimator instance +# does NOT require re-importing the library. + +# CRITICAL: ProphetLikeForecaster.coef_ lives at `pipeline.named_steps["ridge"].coef_`, +# NOT at `model.coef_`. The model attribute is the wrapper class; `.coef_` +# requires drilling into the sklearn Pipeline. The forecaster already does +# this for its decompose() method (models.py:1094-1098) — mirror that code +# path; do not re-implement. + +# CRITICAL: LGBMRegressor.feature_importances_ default importance_type is +# "split" (integer counts of splits per feature). For dashboarding "what +# drove the model", "gain" is more useful — but changing the default would +# alter the persisted contract. MLZOO-D documents whatever the trained +# estimator exposes by reading model.booster_.params.get("importance_type") +# when present and falling back to "split" in the response JSON. We do NOT +# override the default at training time in this PRP. + +# CRITICAL: RunResponse uses Pydantic v2 aliases. +# model_config_data is the internal attribute name; alias="model_config" is +# the wire field name (registry/schemas.py:117-118). A new computed_field +# named "model_family" must NOT collide with an existing alias and MUST +# include `ConfigDict(populate_by_name=True)` propagation. Test by JSON- +# serializing a RunResponse and asserting both "model_config" and +# "model_family" appear as top-level keys. + +# CRITICAL: FastAPI strict-mode policy applies to REQUEST bodies, not responses. +# `FeatureMetadataResponse` is a response model; its date/datetime fields (if +# any are added later) do not need Field(strict=False, ...). See +# docs/_base/SECURITY.md § "Pydantic v2 strict mode on FastAPI request bodies". + +# GOTCHA: load_model_bundle resolves paths against `forecast_model_artifacts_dir`. +# persistence.py:136 accepts `path: str | Path, base_dir: str | Path | None = None`. +# Pass `base_dir=settings.forecast_model_artifacts_dir` explicitly so the +# resolver's path-traversal guard applies (security-patterns.md § +# "File operations" requires canonicalization at boundaries). The existing +# /forecasting/predict call site at service.py:365 canonicalizes upstream +# instead — that pattern is NOT the one to mirror here; this endpoint +# receives only the run_id (or job_id) and never trusts an arbitrary path +# from the user, so the base_dir form is both safer and shorter. + +# GOTCHA: model_run.status state machine is pending → running → success | failed → archived. +# A run with status=archived but a still-existing artifact CAN have its +# feature-metadata extracted. Return 200 for archived; return 422 only for +# pending / running / failed (which have no usable artifact). Document this +# in the route docstring so it's clear the gate is "is there a usable +# artifact", not "is this the current alias". + +# GOTCHA: TanStack Query keys must include runId so cache doesn't leak between runs. +# Mirror `['explanations', 'run', runId]` (use-explanations.ts:11) with +# `['feature-metadata', runId]`. retry: false because 400 / 422 are final answers. + +# GOTCHA: ApiError 400 vs 422 surfacing in the panel. +# 400 = "this run's family doesn't support feature importance" — render +# neutral muted message ("Feature importance is available for tree and +# additive models only."). +# 422 = "this run has no usable artifact yet" — render the same neutral +# style but a different message ("Feature importance is available once +# training completes and the model artifact is saved."). +# 404 / 5xx = render destructive (red) ErrorDisplay. +# Mirror the if-branch shape from explanation-panel.tsx:104-125. + +# GOTCHA: Run-compare cross-family — DO render the card, DO NOT fetch. +# When run_a.model_family !== run_b.model_family, render the new collapsible +# card with a single-line muted message inside ("Feature-importance +# comparison is only meaningful when both runs share a model family."). +# Call useRunFeatureMetadata with `enabled: false` for both to avoid the +# requests altogether. This keeps the page deterministic and avoids +# triggering a 400 burst that would clutter the network tab. + +# GOTCHA: pnpm test --run is the right CI invocation (vitest run). +# `pnpm test` (no --run) starts watch mode and hangs CI. package.json:11 +# defines `"test": "vitest run"` so plain `pnpm test` works locally, but +# the validation gate must use `pnpm test --run` to be unambiguous. +``` + +## Implementation Blueprint + +### Data models + +```python +# app/features/forecasting/schemas.py — additions + +from enum import Enum +from typing import Literal +from pydantic import BaseModel, ConfigDict, Field + + +class ModelFamily(str, Enum): + """Classifier for advanced-model UI surfacing. + + Derived from model_type; not persisted in the DB. Surfaced on RunResponse + via a computed field and consumed by the dashboard for the family Badge + and the feature-importance panel routing. + """ + + BASELINE = "baseline" # naive, seasonal_naive, moving_average + TREE = "tree" # regression (HistGBR), lightgbm, xgboost + ADDITIVE = "additive" # prophet_like (Ridge pipeline) + + +class FeatureImportanceItem(BaseModel): + """One row of model-derived feature importance, ready for the dashboard.""" + + model_config = ConfigDict(extra="forbid") + + name: str = Field(..., description="Canonical feature column name (e.g. 'lag_7').") + importance: float = Field( + ..., + description=( + "For tree models: estimator.feature_importances_ value " + "(non-negative). For additive models: pipeline.named_steps['ridge'].coef_ " + "value (signed; the sign carries directional information)." + ), + ) + kind: Literal["tree", "linear_coef"] = Field( + ..., + description="Determines display semantics: 'tree' → magnitude bar; " + "'linear_coef' → signed bar with direction icon.", + ) + rank: int = Field(..., ge=1, description="1-indexed rank by |importance| desc.") + + +class FeatureMetadataResponse(BaseModel): + """The /forecasting/runs/{run_id}/feature-metadata response.""" + + model_config = ConfigDict(extra="forbid") + + run_id: str + model_type: str + model_family: ModelFamily + feature_columns: list[str] = Field( + ..., + description="The canonical feature frame the model consumed at training time. " + "Always 14 columns for v0.2.16 feature-aware models (see " + "app/shared/feature_frames/contract.py:80).", + ) + features: list[FeatureImportanceItem] = Field( + ..., + description="Sorted by |importance| descending; len == len(feature_columns).", + ) + importance_type: str | None = Field( + default=None, + description="For LightGBM/XGBoost: the booster's importance_type " + "('split' / 'gain' / 'weight' / 'cover' depending on the lib's " + "default). For HistGBR: 'permutation'. For prophet_like: 'ridge_coef'. " + "Always populated so consumers know what the numbers mean.", + ) +``` + +```python +# app/features/registry/schemas.py — addition (around line 109-138) + +from app.features.forecasting.schemas import ModelFamily + +class RunResponse(BaseModel): + # ... existing fields unchanged ... + + @computed_field # type: ignore[prop-decorator] + @property + def model_family(self) -> ModelFamily: + """Derived from model_type. See app/features/forecasting/feature_metadata.py + for the canonical map. Unknown types log a warning and return BASELINE.""" + from app.features.forecasting.feature_metadata import model_family_for + return model_family_for(self.model_type) +``` + +### Tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Backend foundation: extractor + family classifier ════════ + +Task 1 — CREATE app/features/forecasting/feature_metadata.py: + - PURPOSE: pure-function extraction; no I/O; no FastAPI; no DB. + - DEFINE model_family_for(model_type: str) -> ModelFamily (the map; unknown → BASELINE + logger.warning) + - DEFINE extract_feature_importance(model, feature_columns) -> list[FeatureImportanceItem] + - isinstance check on the four feature-aware classes (import them locally) + - branch on (LightGBM | XGBoost | Regression) → .feature_importances_, kind="tree" + - branch on ProphetLike → pipeline.named_steps["ridge"].coef_, kind="linear_coef" + - else → raise ValueError("model_type 'X' is not feature-aware") + - sort by abs(importance) desc; assign rank 1..N + - DEFINE importance_type_for(model) -> str | None (LightGBM booster.params['importance_type'] or default; HistGBR → "permutation"; ProphetLike → "ridge_coef") + - VALIDATE: uv run ruff check app/features/forecasting/feature_metadata.py + - VALIDATE: uv run mypy app/features/forecasting/feature_metadata.py + - VALIDATE: uv run pyright app/features/forecasting/feature_metadata.py + +Task 2 — CREATE app/features/forecasting/tests/test_feature_metadata.py: + - PURPOSE: unit-test the extractor with concrete fitted instances of each class. + - MIRROR pattern from: app/features/forecasting/tests/test_lightgbm_forecaster.py + - FIXTURES: small synthetic (y, X) frames; fit each forecaster; assert: + - LightGBM/XGBoost/Regression: every importance >= 0; sum > 0; len == len(feature_columns); rank monotonically increasing + - ProphetLike: at least one negative coefficient possible; sign is preserved; |coef| sort still produces correct ranks + - Unknown model class → ValueError with substring "not feature-aware" + - model_family_for: every Literal in ModelType maps to exactly one family + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_feature_metadata.py + +# ════════ STEP 2 — Backend schemas + route ════════ + +Task 3 — MODIFY app/features/forecasting/schemas.py: + - ADD ModelFamily enum (str, Enum) + - ADD FeatureImportanceItem, FeatureMetadataResponse (definitions in Data Models above) + - PRESERVE all existing exports + - VALIDATE: uv run mypy app/features/forecasting/schemas.py + +Task 4 — MODIFY app/features/registry/schemas.py: + - IMPORT ModelFamily from app.features.forecasting.schemas + - ADD model_family computed_field on RunResponse (definition in Data Models above) + - GOTCHA: ConfigDict already has populate_by_name=True (line 112); no change needed + - ADD model_family to RunListResponse items via the same computed field (auto-propagates) + - VALIDATE: uv run mypy app/features/registry/schemas.py + - VALIDATE: uv run pyright app/features/registry/schemas.py + +Task 5 — MODIFY app/features/registry/tests/test_schemas.py: + - ADD test asserting RunResponse(...).model_dump()["model_family"] == ModelFamily.TREE for model_type="lightgbm" + - ADD parametrized test covering all 7 model_types and the unknown case + - VALIDATE: uv run pytest -v app/features/registry/tests/test_schemas.py + +Task 6 — MODIFY app/features/forecasting/service.py: + - ADD method ForecastingService.get_feature_metadata_for_run(run_id: str, db: AsyncSession) -> FeatureMetadataResponse + - Look up the registry run via RegistryService.get_run(db, run_id) + - If None → raise NotFoundError(f"Model run not found: {run_id}") (NotFoundError already exists in app/core/exceptions.py and emits RFC 7807 404 via forecastlab_exception_handler) + - If model_family_for(run.model_type) == ModelFamily.BASELINE → raise BadRequestError("Feature metadata is available for tree and additive families only; this run is a baseline model.") + - If run.artifact_uri is None or run.status not in ('success', 'archived') → raise UnprocessableEntityError("Run has no usable artifact yet; status={status}, artifact_uri={present|absent}") + - Wrap `load_model_bundle(run.artifact_uri, base_dir=settings.forecast_model_artifacts_dir)` in try/except (ModuleNotFoundError, FileNotFoundError) → raise UnprocessableEntityError: + - ModuleNotFoundError → "Model artifact requires the ml-{pkg} extra; reinstall the backend with the extra enabled." (See "Optional ML extras" gotcha) + - FileNotFoundError → "Model artifact file is missing from disk: {path}. The registry row references an artifact that has been deleted or moved." (Same 422 type URI; the registry row is stale but the request shape is valid.) + - feature_columns = bundle.metadata["feature_columns"] (always populated for feature-aware bundles per service.py:233-238) + - features = extract_feature_importance(bundle.model, feature_columns) + - importance_type = importance_type_for(bundle.model) + - return FeatureMetadataResponse(run_id=run.run_id, model_type=run.model_type, model_family=model_family_for(run.model_type), feature_columns=feature_columns, features=features, importance_type=importance_type) + - ADD method ForecastingService.get_feature_metadata_for_job(job_id: str, db: AsyncSession) -> FeatureMetadataResponse + - PURPOSE: mirror PRP-28's /explain/jobs/{job_id} shape — forecast.tsx only has the job_id; trainJob.result.run_id is the **artifact key** (uuid.uuid4().hex[:12], see service.py:270), NOT the registry UUID + - Look up the job via JobsService.get_job(db, job_id) + - If None → raise NotFoundError(f"Job not found: {job_id}") + - If job.job_type != 'train' or job.status != 'completed' → raise BadRequestError("Feature metadata can only be derived from a completed train job; got job_type={...}, status={...}") + - bundle_path_str = job.result.get("model_path") (CRITICAL: use the `model_path` field, NOT `run_id`. `_execute_train` in jobs/service.py:517 already populates `model_path` with the full path including `.joblib` suffix. Reconstructing the path from `run_id` would fail because `load_model_bundle` (persistence.py:154,173) calls `path.exists()` verbatim — it does NOT auto-inject `.joblib` like `save_model_bundle` does (persistence.py:88-89)) + - If not bundle_path_str → raise UnprocessableEntityError("Train job result missing model_path; job may have failed mid-write") + - bundle_path = Path(bundle_path_str) + - Wrap `load_model_bundle(bundle_path, base_dir=settings.forecast_model_artifacts_dir)` in the same try/except (ModuleNotFoundError, FileNotFoundError) → UnprocessableEntityError pair as above (same messages, FileNotFoundError especially likely on the job-keyed path because predict-job cleanup or manual `rm` of stale artifacts is common while the job row lives on) + - Derive model_type from bundle.config.model_type + - If model_family_for(bundle.config.model_type) == ModelFamily.BASELINE → raise BadRequestError(...) (same message as above) + - Same extraction path; the response's `run_id` field is populated with the **artifact key** parsed from the bundle path stem (documented in the FeatureMetadataResponse field description; consumers MUST NOT treat this as a registry UUID when the source is a job) + - PRESERVE: every existing public method on ForecastingService + - VALIDATE: uv run mypy app/features/forecasting/service.py + - VALIDATE: uv run pyright app/features/forecasting/service.py + +Task 6.5 — MODIFY app/core/exceptions.py: + - PURPOSE: the v0.2.16 PR #253 hygiene fix forbids `HTTPException(422, "raw string")`; the existing ForecastLabError subclasses cover 400/404/409/422-Pydantic/500 but NOT 422-resource-state. Add one. + - ADD: class UnprocessableEntityError(ForecastLabError) with status_code=422, code="UNPROCESSABLE_ENTITY", error_type_uri matching the ERROR_TYPES["UNPROCESSABLE_ENTITY"] convention (add the constant if not present) + - VERIFY: forecastlab_exception_handler already serializes ForecastLabError subclasses to application/problem+json (it does — that's the whole pattern); no handler wiring needed + - NB: do NOT collide semantics with the existing ValidationError (status_code=422, code="VALIDATION_ERROR") — that one is for Pydantic input failures. UnprocessableEntityError is for "request is well-formed but the resource is in a state where the operation can't proceed" (no artifact, missing optional extra, etc.). Different `code` → different `type` URI → consumers can disambiguate. + - VALIDATE: uv run mypy app/core/exceptions.py + - VALIDATE: uv run pytest -v app/core/tests/ (existing exception tests must still pass) + +Task 7 — MODIFY app/features/forecasting/routes.py: + - ADD GET /forecasting/runs/{run_id}/feature-metadata + - response_model=FeatureMetadataResponse + - dependency: db = Depends(get_db) + - body: `return await service.get_feature_metadata_for_run(run_id, db)` + - Do NOT wrap with try/except in the route — let the ForecastLabError subclasses (NotFoundError 404, BadRequestError 400, UnprocessableEntityError 422) flow straight through to forecastlab_exception_handler which serializes them as application/problem+json. Mirror the shape of `/explain/runs/{run_id}` (app/features/explainability/routes.py:108-121) — that endpoint does the same thing (catches only ValueError → BadRequestError and SQLAlchemyError → DatabaseError). + - DO catch SQLAlchemyError → DatabaseError for DB-level surprise paths (mirror explainability/routes.py:113-118). + - FIX doc-string drift at lines 33-37: list xgboost, regression, prophet_like alongside lightgbm in the /forecasting/train description + - VALIDATE: uv run mypy app/features/forecasting/routes.py + - VALIDATE: uv run pyright app/features/forecasting/routes.py + +Task 7.5 — ADD sibling endpoint in app/features/forecasting/routes.py: + - PURPOSE: forecast.tsx has only the job_id, not a registry run_id. The Critical-fix flagged by prp-quality-agent. + - ADD GET /forecasting/jobs/{job_id}/feature-metadata + - response_model=FeatureMetadataResponse + - body: `return await service.get_feature_metadata_for_job(job_id, db)` + - Same exception-flow contract as Task 7 (no try/except in the route except SQLAlchemyError → DatabaseError; the ForecastLabError subclasses flow through to the RFC 7807 handler). + - Docstring: cite PRP-28's `/explain/jobs/{job_id}` (app/features/explainability/routes.py:124) as the structural twin so future maintainers see the pair. + - VALIDATE: uv run mypy app/features/forecasting/routes.py + +Task 8 — CREATE app/features/forecasting/tests/test_routes_feature_metadata.py: + - MIRROR pattern from: app/features/explainability/tests/test_routes.py (PRP-28 already shipped the /explain/runs/{id} + /explain/jobs/{id} test pair — copy the structure) + - CASES (cover BOTH endpoints — same matrix for each): + - 200 success for a fitted LightGBM run (assert content-type, sorted importances, kind='tree', importance_type populated) + - 200 success for a fitted ProphetLike run (assert at least one negative coefficient renders intact, kind='linear_coef', importance_type='ridge_coef') + - 400 application/problem+json for a baseline run / a non-train job (assert content-type = application/problem+json, type URI is BAD_REQUEST, detail contains "baseline" or "train job") + - 404 application/problem+json for unknown run_id / job_id (type URI is NOT_FOUND) + - 422 application/problem+json for pending / running / failed run; for an artifact_uri = None on a success-status run; for a ModuleNotFoundError during load_model_bundle (mock joblib.load to raise); for a FileNotFoundError during load_model_bundle (point artifact_uri at a non-existent path, or set job.result["model_path"] to a path the artifacts dir doesn't contain) — type URI is UNPROCESSABLE_ENTITY (NOT the existing VALIDATION_ERROR) + - 422 specifically for the job endpoint: a train job in `pending` / `failed` status returns 400 (not a completed train), and a `predict` job returns 400 (wrong job_type) + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_routes_feature_metadata.py + - VALIDATE: uv run pytest -v -m integration app/features/forecasting/tests/test_routes_feature_metadata.py + +# ════════ STEP 3 — Frontend types + hook + panel ════════ + +Task 9 — MODIFY frontend/src/types/api.ts: + - ADD around line 170: + export type ModelFamily = 'baseline' | 'tree' | 'additive' + - EXTEND ModelRun interface to include `model_family: ModelFamily` + - ADD after the ModelRun group: + export interface FeatureImportanceItem { + name: string + importance: number + kind: 'tree' | 'linear_coef' + rank: number + } + export interface FeatureMetadataResponse { + run_id: string + model_type: string + model_family: ModelFamily + feature_columns: string[] + features: FeatureImportanceItem[] + importance_type: string | null + } + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 10 — CREATE frontend/src/hooks/use-feature-metadata.ts: + - MIRROR verbatim from: frontend/src/hooks/use-explanations.ts:10-17 (useRunExplanation pattern) + - Export TWO sibling hooks (mirroring useRunExplanation + useJobExplanation from PRP-28): + 1. useRunFeatureMetadata(runId: string, enabled = true) + - Query key: ['feature-metadata', 'run', runId] + - URL: `/forecasting/runs/${runId}/feature-metadata` + - retry: false (400/404/422 are all final answers) + 2. useJobFeatureMetadata(jobId: string, enabled = true) + - Query key: ['feature-metadata', 'job', jobId] + - URL: `/forecasting/jobs/${jobId}/feature-metadata` + - retry: false + - Both return useQuery; both gated by `enabled && !!{id}` exactly like useRunExplanation + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 11 — CREATE frontend/src/components/common/model-family-badge.tsx: + - PURPOSE: pure derivation; no hooks + - export function ModelFamilyBadge({ family }: { family: ModelFamily }) — returns shadcn with: + baseline → 'secondary'; tree → 'default'; additive → 'outline' + - Use Lucide icons (TreePine for tree, LineChart for additive, Activity for baseline) at h-3 w-3 with the existing flex-gap pattern from StatusBadge + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 11.5 — CREATE frontend/src/components/common/model-family-badge.test.tsx: + - PURPOSE: closes the test-requirements.md gap flagged by prp-quality-agent (M3). + - MIRROR scaffolding from: frontend/src/components/explainability/explanation-panel.test.tsx + - CASES (3 trivial): + - renders 'tree' family with the expected variant text + TreePine icon + - renders 'additive' family with the LineChart icon + - renders 'baseline' family with the Activity icon + 'secondary' variant + - VALIDATE: cd frontend && pnpm test --run src/components/common/model-family-badge.test.tsx + +Task 12 — CREATE frontend/src/components/explainability/feature-importance-panel.tsx: + - MIRROR layout from: frontend/src/components/explainability/explanation-panel.tsx + - PanelShell variant: title "Feature Importance", description tied to family ("Tree-based gain importance (model-derived)." | "Ridge regression coefficients (signed; sign indicates direction).") + - Body: ordered list of features (use Card + a flex list, OR re-use recharts via a thin wrapper — KEEP IT SIMPLE: a CSS-grid table is acceptable; do not introduce a new chart dependency) + - For kind='tree': horizontal bar with width = (importance / max) × 100%, neutral color + - For kind='linear_coef': horizontal bar with width = (|coef| / max) × 100%, color by sign (text-success for positive, text-destructive for negative), TrendingUp/TrendingDown icon + - Show feature name (mono font, w-40 left-aligned), bar, numeric value (right-aligned, w-20) + - Caveat footer (border-top, text-muted-foreground): "Importance is model-derived. It reflects how much each feature reduced training error — not real-world causation." + - Error handling: + - 400 → neutral muted Info card with "Feature importance is available for tree and additive model families only." + - 422 → neutral muted Info card with "Feature importance is available once training completes and the artifact is saved." + - 404 / 5xx → destructive AlertTriangle ErrorDisplay + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 13 — CREATE frontend/src/components/explainability/feature-importance-panel.test.tsx: + - MIRROR scaffolding from: frontend/src/components/explainability/explanation-panel.test.tsx:1-66 + - FIXTURES: sampleTreeImportance (kind='tree', 14 items), sampleLinearCoef (kind='linear_coef', 14 items including 2 negative) + - CASES: + - renders tree items with positive bars + - renders linear items with signed colors and direction icons + - loading state + - 400 → neutral baseline message + - 422 → neutral artifact-missing message + - empty features list message + - VALIDATE: cd frontend && pnpm test --run frontend/src/components/explainability/feature-importance-panel.test.tsx + +# ════════ STEP 4 — Page wiring (additive, in-place) ════════ + +Task 14 — MODIFY frontend/src/pages/explorer/runs.tsx: + - EXTEND line 21 MODEL_TYPES → ['naive', 'seasonal_naive', 'moving_average', 'regression', 'lightgbm', 'xgboost', 'prophet_like'] + - INSERT new ColumnDef after the model_type column (after line 47): + { + accessorKey: 'model_family', + header: 'Family', + enableSorting: false, + cell: ({ row }) => , + } + - EXTEND csvColumns (line 83-92) with { key: 'model_family', header: 'Family' } + - PRESERVE: all existing URL-param logic; the existing useRuns hook already returns model_family via the backend computed field + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 15 — MODIFY frontend/src/pages/explorer/run-detail.tsx: + - IMPORT ModelFamilyBadge, FeatureImportancePanel, useRunFeatureMetadata + - In the profile card (lines 88-143), add a immediately after the model_type Field cell + - After the existing "Metrics" card (line 156-164), INSERT a new titled "Feature Metadata": + - Only render when run.model_family !== 'baseline' + - Feed by featureMetadata = useRunFeatureMetadata(runId, run.model_family !== 'baseline') + - Body: list the 14 feature columns from featureMetadata.data?.feature_columns (use the existing list pattern; one-line per column, monospace, comma-separated chips OK) + - Show featureMetadata.data?.importance_type at the bottom as a small muted tag + - INSERT immediately below the existing (around line 170), gated by run.model_family !== 'baseline' + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 16 — MODIFY frontend/src/pages/explorer/run-compare.tsx: + - IMPORT ModelFamilyBadge, FeatureImportancePanel, useRunFeatureMetadata + - In the Profile table (lines 145-213), add a new row labeled "Family" with two cells (run.run_a.model_family / run.run_b.model_family) + - After the "Metrics diff" card (line 231-266), INSERT a new titled "Feature Importance" wrapping a (default open): + - sameFamily = run_a.model_family === run_b.model_family && run_a.model_family !== 'baseline' + - When !sameFamily: render a single muted line ("Feature-importance comparison is only meaningful when both runs share a non-baseline family."); call useRunFeatureMetadata(_, enabled: false) for both + - When sameFamily: render two s in a grid-cols-2 (md:grid-cols-2, base: grid-cols-1) + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 17 — MODIFY frontend/src/pages/visualize/forecast.tsx: + - CRITICAL: forecast.tsx has NO access to a registry run_id. `trainJob.result.run_id` + (line 48-49) is the **forecast-artifact key** (uuid.uuid4().hex[:12], see + forecasting/service.py:270) saved as `model_{id}.joblib`. It is NOT the + registry model_run.run_id (full UUID). Calling + `useRunFeatureMetadata(trainJob.result.run_id, …)` would 404 because the + backend treats `{run_id}` as a registry UUID. This is the bug prp-quality-agent + flagged (Critical #1) and the trap recorded in memory `scenario-run-id-vs-registry-run-id`. + - CORRECT WIRING: use the job-based sibling hook + endpoint (Tasks 7.5 + 10). + - IMPORT useJobFeatureMetadata, FeatureImportancePanel, ModelFamilyBadge from their respective new files. (Do NOT import useRun — it is not used in forecast.tsx today and is not needed: the job-based endpoint reads model_type / family from the bundle directly.) + - Wire the panel keyed on the loaded predict job's source train job: + const trainJobMetadata = useJobFeatureMetadata(trainJobId, !!trainJobId) + const trainFamily = trainJobMetadata.data?.model_family ?? null + - Below the existing for the predict job (lines 256-263), INSERT a containing: + - Trigger: "Model details" + (the badge renders only when trainJobMetadata.data is populated, to avoid flashing 'baseline' before the fetch resolves) + - Content: + - The panel's 400-handler will render the neutral message when the train job is a baseline model, so we do not gate the Collapsible's existence on family — TanStack Query handles the empty/error states. + - VALIDATE: cd frontend && pnpm tsc --noEmit + +Task 18 — MODIFY frontend/src/pages/visualize/backtest.tsx: + - EXTEND MODEL_OPTIONS at line 52: + [ + { value: 'naive', label: 'Naive' }, + { value: 'seasonal_naive', label: 'Seasonal Naive' }, + { value: 'moving_average', label: 'Moving Average' }, + { value: 'regression', label: 'Regression (HistGBR)' }, + { value: 'lightgbm', label: 'LightGBM' }, + { value: 'xgboost', label: 'XGBoost' }, + { value: 'prophet_like', label: 'Prophet-like (additive)' }, + ] + - NO OTHER CHANGE in MLZOO-D (per DECISIONS LOCKED #7) + - VALIDATE: cd frontend && pnpm tsc --noEmit + +# ════════ STEP 5 — Docs ════════ + +Task 19 — MODIFY docs/user-guide/feature-reference.md: + - ADD new ## section titled "Advanced Model Metadata" + - One paragraph naming the families (baseline / tree / additive) and what the dashboard shows (badge, feature columns, importance / coefs) + - One paragraph stating the correlation-vs-causation caveat VERBATIM (so support replies can quote it): + "Feature importance is model-derived. It reflects how much each feature reduced the model's training error — not real-world causation. Two products with similar importance profiles are not necessarily driven by the same business factors." + - Cross-link the relevant page sections (run detail, run compare, forecast viz) + - VALIDATE: markdownlint-cli2 if installed; otherwise eyeball the rendered doc + +Task 20 — MODIFY docs/user-guide/dashboard-guide.md: + - ADD one-line cross-link to the new feature-reference section from the runs-explorer / run-detail discussion (find the appropriate H2) + - VALIDATE: visual inspection + +# ════════ STEP 6 — Browser dogfood (per .claude/rules/ui-design.md) ════════ + +Task 21 — DOGFOOD via the webapp-testing skill: + - PRECONDITION: docker compose up -d && uv run alembic upgrade head && make demo (gives one each of naive / seasonal_naive / moving_average); then curl /forecasting/train with model_type=lightgbm (and optionally xgboost / prophet_like) against the same store/product to fill the four advanced families. + - SCENARIO 1 — Runs explorer: filter by model_type=lightgbm; assert the Family column renders with the "tree" badge. + - SCENARIO 2 — Run detail (lightgbm run): assert the Feature Metadata card lists the 14 columns; the importance panel renders 14 bars, sorted. + - SCENARIO 3 — Run detail (baseline run): assert neither the Feature Metadata card nor the FeatureImportancePanel renders. + - SCENARIO 4 — Run compare (two lightgbm runs): assert two panels render side by side. + - SCENARIO 5 — Run compare (lightgbm vs naive): assert the cross-family muted message renders; assert no /feature-metadata requests fire (check network tab). + - SCENARIO 6 — Forecast viz: select a completed train job whose model_type is lightgbm; expand the "Model details" collapsible; assert the FeatureImportancePanel renders importance bars from the new /forecasting/jobs/{job_id}/feature-metadata endpoint (NOT from /runs/.../). Check the Network tab to confirm the job-keyed URL is hit, not a 404-ing run-keyed URL. + - SCENARIO 6b — Forecast viz, baseline train job: select a completed train job whose model_type is naive; expand the collapsible; assert the panel renders the neutral 400 message ("Feature importance is available for tree and additive model families only"). + - SCENARIO 7 — Backtest viz: pick "Prophet-like (additive)"; submit; assert the backtest completes and fold metrics render. + - VALIDATE: capture screenshots for the PR review +``` + +### Per-task pseudocode (only where the task needs more than the YAML above) + +```python +# Task 1 — extract_feature_importance (pure function) + +from app.features.forecasting.models import ( + LightGBMForecaster, XGBoostForecaster, + RegressionForecaster, ProphetLikeForecaster, +) + +def extract_feature_importance( + model: BaseForecaster, + feature_columns: list[str], +) -> list[FeatureImportanceItem]: + """Pure: no I/O, no FastAPI, no DB. Raises ValueError for non-feature-aware classes.""" + if isinstance(model, (LightGBMForecaster, XGBoostForecaster, RegressionForecaster)): + raw = np.asarray(model._estimator.feature_importances_, dtype=np.float64) # private attr by convention + kind: Literal["tree", "linear_coef"] = "tree" + elif isinstance(model, ProphetLikeForecaster): + # MIRROR models.py:1094-1098 — drill into the Pipeline + ridge = model._estimator.named_steps["ridge"] + raw = np.asarray(ridge.coef_, dtype=np.float64) + kind = "linear_coef" + else: + raise ValueError(f"model_type '{type(model).__name__}' is not feature-aware") + + if len(raw) != len(feature_columns): + raise ValueError( + f"feature_columns length mismatch: importance vector has {len(raw)} elements, " + f"feature_columns has {len(feature_columns)}" + ) + + # Sort by absolute magnitude desc; preserve sign in the value for linear_coef. + indices_by_magnitude = np.argsort(-np.abs(raw)) + items = [ + FeatureImportanceItem( + name=feature_columns[i], + importance=float(raw[i]), + kind=kind, + rank=rank, + ) + for rank, i in enumerate(indices_by_magnitude, start=1) + ] + return items +``` + +```python +# Task 7 — /forecasting/runs/{run_id}/feature-metadata route + +@router.get( + "/runs/{run_id}/feature-metadata", + response_model=FeatureMetadataResponse, + status_code=status.HTTP_200_OK, + summary="Extract feature columns + learned importance for an advanced-model run", + description=""" +Returns the canonical 14-column feature frame the model consumed and the +fitted estimator's `feature_importances_` (tree models) or `coef_` (additive +prophet_like). Loads the saved joblib artifact lazily. + +**Error semantics (RFC 7807 application/problem+json):** +- 400 — model_family is 'baseline' (no learned importance to extract) +- 404 — run_id not found +- 422 — run has no artifact_uri yet, or status is pending/running/failed +- 500 — bundle file missing on disk (storage corruption) +""", +) +async def get_run_feature_metadata( + run_id: str, + db: AsyncSession = Depends(get_db), +) -> FeatureMetadataResponse: + # Mirror app/features/explainability/routes.py:108-121 — the service + # raises ForecastLabError subclasses (NotFoundError / BadRequestError / + # UnprocessableEntityError) directly; forecastlab_exception_handler + # serializes each to application/problem+json with the right status. + # Catch only the DB-layer surprise path; everything else flows through. + service = ForecastingService(get_settings(), db) + try: + return await service.get_feature_metadata_for_run(run_id, db) + except SQLAlchemyError as exc: + logger.error("forecasting.feature_metadata_db_error", run_id=run_id, error=str(exc), exc_info=True) + raise DatabaseError( + message="Failed to load feature metadata", + details={"error": str(exc)}, + ) from exc + + +@router.get( + "/jobs/{job_id}/feature-metadata", + response_model=FeatureMetadataResponse, + status_code=status.HTTP_200_OK, + summary="Extract feature columns + learned importance for a completed train job", + description=""" +The job-keyed sibling of `/forecasting/runs/{run_id}/feature-metadata`, +exactly matching the shape of `/explain/jobs/{job_id}` (PRP-28). Use this +endpoint when the caller has only a `job_id` — for example, the dashboard's +forecast viz page (`frontend/src/pages/visualize/forecast.tsx`). + +The route reads `job.result.run_id` (the **forecast-artifact key**, not a +registry UUID) and loads the bundle directly from +`{forecast_model_artifacts_dir}/model_{artifact_id}`. + +**Error semantics (RFC 7807 application/problem+json):** +- 400 — the job is not a completed train job, or the trained model is baseline +- 404 — job_id not found +- 422 — bundle missing or ML extra not installed at unpickle time +""", +) +async def get_job_feature_metadata( + job_id: str, + db: AsyncSession = Depends(get_db), +) -> FeatureMetadataResponse: + service = ForecastingService(get_settings(), db) + try: + return await service.get_feature_metadata_for_job(job_id, db) + except SQLAlchemyError as exc: + logger.error("forecasting.feature_metadata_db_error", job_id=job_id, error=str(exc), exc_info=True) + raise DatabaseError( + message="Failed to load feature metadata", + details={"error": str(exc)}, + ) from exc +``` + +```tsx +// Task 12 — FeatureImportancePanel (sketch; mirror explanation-panel.tsx layout) + +export function FeatureImportancePanel({ + data, + isLoading, + error, +}: FeatureImportancePanelProps) { + if (isLoading) return + if (error) { + const apiError = error instanceof ApiError ? error : null + if (apiError?.status === 400) { + return + } + if (apiError?.status === 422) { + return + } + return + } + if (!data) return null + + const maxAbs = Math.max(...data.features.map((f) => Math.abs(f.importance))) + return ( + +
    + {data.features.map((f) => ( + + ))} +
+ +
+ ) +} +``` + +### Integration Points + +```yaml +ROUTES: + - add to: app/main.py + - pattern: (no change — forecasting_router already registered at line 143; the new GET is a sub-route) + +CONFIG: + - no new env var + - no change to .env.example + +DATABASE: + - no migration + +PYDANTIC IMPORTS: + - app/features/registry/schemas.py imports ModelFamily from app/features/forecasting/schemas.py + - this is a downstream→upstream import: registry → forecasting + - app/features/forecasting NEVER imports app/features/registry/schemas.py for ModelFamily + - (NEW cross-slice import: app/features/forecasting/service.py imports RegistryService from app/features/registry/service.py for the new get_feature_metadata_for_run method. This direction is new to forecasting; it mirrors the read-only pattern already established in app/features/explainability/service.py:57 — see DECISIONS LOCKED #2.) + +FRONTEND ROUTES: + - no new React Router route — all changes are in-place page edits +``` + +## Validation Loop + +### Level 0 — Environment + +```bash +docker compose up -d +uv sync --extra dev --extra ml-lightgbm --extra ml-xgboost +uv run alembic upgrade head +cd frontend && corepack enable pnpm && pnpm install && cd .. +``` + +### Level 1 — Syntax & Style + +```bash +uv run ruff check app/features/forecasting/feature_metadata.py \ + app/features/forecasting/routes.py \ + app/features/forecasting/schemas.py \ + app/features/forecasting/service.py \ + app/features/registry/schemas.py +uv run ruff format --check app/ + +cd frontend && pnpm lint && cd .. +``` + +### Level 2 — Type Checks + +```bash +uv run mypy app/ +uv run pyright app/ + +cd frontend && pnpm tsc --noEmit && cd .. +``` + +### Level 3 — Unit Tests + +```bash +uv run pytest -v app/features/forecasting/tests/test_feature_metadata.py +uv run pytest -v app/features/forecasting/tests/test_routes_feature_metadata.py +uv run pytest -v app/features/registry/tests/test_schemas.py +uv run pytest -v -m "not integration" + +cd frontend && pnpm test --run \ + frontend/src/components/explainability/feature-importance-panel.test.tsx \ + && cd .. +cd frontend && pnpm test --run && cd .. # whole frontend suite +``` + +### Level 4 — Integration Tests + +```bash +docker compose up -d +uv run pytest -v -m integration app/features/forecasting/tests/test_routes_feature_metadata.py +uv run pytest -v -m integration +``` + +### Level 5 — Manual Dogfood (per .claude/rules/ui-design.md) + +```bash +# 1. Seed and train across all four feature-aware families +make demo # naive / seasonal_naive / moving_average +curl -s -X POST localhost:8123/forecasting/train \ + -H 'content-type: application/json' \ + -d '{"store_id":1,"product_id":1,"train_start_date":"2024-01-01","train_end_date":"2024-06-30","model_config":{"model_type":"lightgbm"}}' | jq +# repeat with model_type in {"xgboost","regression","prophet_like"} (lightgbm/xgboost require the extras enabled in pyproject.toml) + +# 2. Drive the dashboard with the webapp-testing skill +# (see Task 21 SCENARIOS 1-7 above) + +# 3. Verify the new endpoint by hand +RUN_ID=$(curl -s 'localhost:8123/registry/runs?model_type=lightgbm&page_size=1' | jq -r '.runs[0].run_id') +curl -s "localhost:8123/forecasting/runs/${RUN_ID}/feature-metadata" | jq + +JOB_ID=$(curl -s 'localhost:8123/jobs?job_type=train&status=completed&page_size=1' | jq -r '.jobs[0].job_id') +curl -s "localhost:8123/forecasting/jobs/${JOB_ID}/feature-metadata" | jq +``` + +## Final Validation Checklist + +- [ ] `uv run ruff check .` clean +- [ ] `uv run ruff format --check .` clean +- [ ] `uv run mypy app/` clean +- [ ] `uv run pyright app/` clean +- [ ] `uv run pytest -v -m "not integration"` green +- [ ] `uv run pytest -v -m integration` green +- [ ] `cd frontend && pnpm tsc --noEmit && pnpm lint && pnpm test --run` green +- [ ] `GET /forecasting/runs/{run_id}/feature-metadata` returns 200 for one fitted run per family (LightGBM, XGBoost, regression, prophet_like) +- [ ] `GET /forecasting/jobs/{job_id}/feature-metadata` returns 200 for the same set of runs reached via their training job_id +- [ ] Both endpoints return RFC 7807 problem+json for 400 (baseline / wrong job_type), 404 (missing run/job), 422 (no artifact / missing ml-* extra) +- [ ] `GET /registry/runs/{run_id}` and `GET /registry/runs` both serialize `model_family` on every item (verify with `curl … | jq '.runs[0].model_family'`) +- [ ] Runs explorer Family column renders with the expected Badge variant for every model_type +- [ ] Run detail renders ModelFamilyBadge + Feature Metadata card + FeatureImportancePanel only when family != baseline +- [ ] Run compare renders side-by-side panels for same-family pairs and a muted single-line message for cross-family pairs (without firing requests) +- [ ] Forecast viz Collapsible expands to show the FeatureImportancePanel tied to the train run; collapses by default +- [ ] Backtest viz lets the user pick any of the seven model types and runs successfully on each (B.2's feature-aware split handles tree/additive correctly) +- [ ] No new Alembic migration in `alembic/versions/` +- [ ] No new dependency in `pyproject.toml` (LightGBM / XGBoost extras unchanged; no SHAP, no LIME) +- [ ] `docs/user-guide/feature-reference.md` contains the "Advanced Model Metadata" section with the verbatim correlation-vs-causation caveat +- [ ] `wc -l CLAUDE.md` ≤ 150 (no change expected in MLZOO-D) +- [ ] CHANGELOG entry will be auto-generated by release-please from the `feat(forecast,ui): …` commit on merge + +## Anti-Patterns to Avoid + +- ❌ **Do NOT add `model_family` as a real column on `model_run`.** Computed + field on `RunResponse` is the locked design — see DECISIONS LOCKED #3. A + column would force a migration and a backfill story we do not need. +- ❌ **Do NOT extend the existing `` to feature importance.** + PRP-28 named it "rule-based driver attribution" deliberately. Build a + sibling panel; tests stay isolated and the baseline-only 400 path stays + meaningful — see DECISIONS LOCKED #4. +- ❌ **Do NOT introduce SHAP, LIME, or a permutation-importance recompute.** + SHAP is explicitly deferred (PRP-28). Permutation-importance is a full + feature (needs holdout splits, leakage proofs) — out of scope. +- ❌ **Do NOT add an agent tool that exposes feature importance** to the chat + agent — that's a HITL approval-surface widening (`agent_require_approval`) + and a separate PRP — see DECISIONS LOCKED #6. +- ❌ **Do NOT extend `backtest.tsx` beyond the MODEL_OPTIONS allow-list.** + Per-fold per-model importance heatmaps, multi-family ranking — those are + full features — DECISIONS LOCKED #7. +- ❌ **Do NOT use `verify=False`** on any HTTP client added here; no new + external integration is needed (security-patterns.md hard rule). +- ❌ **Do NOT raise `HTTPException(400, "raw string")`** — every error path + must flow through `BadRequestError` / `DatabaseError` / a 422 dedicated + exception so the registered handler emits RFC 7807. PR #253 closed the + last holes; do not reopen them. +- ❌ **Do NOT include a Co-Authored-By or "Generated with" trailer.** + Hook `.claude/hooks/check-commit-format.sh` enforces this; do not bypass. +- ❌ **Do NOT touch `app/features/featuresets/tests/test_leakage.py`** to + make any new test pass. It is the leakage spec. +- ❌ **Do NOT import `lightgbm` or `xgboost` at module scope** in + `feature_metadata.py`. The extras are optional; importing at module load + would break installs that don't have them, even though the + `feature_metadata` module is loaded on every API startup. Read + `.feature_importances_` off the unpickled object — no import required. +- ❌ **Do NOT pre-load `bundle.model` in `useRunFeatureMetadata` even when + the family is baseline.** Pass `enabled: run.model_family !== 'baseline'` + to TanStack Query so the request never fires; the panel renders nothing. +- ❌ **Do NOT raise `HTTPException(404, ...)` from the new route** — use + `NotFoundError(message=...)` from `app/core/exceptions.py` (line 75 + precedent: `class NotFoundError(ForecastLabError): status_code = 404`). + The handler at `forecastlab_exception_handler` emits the RFC 7807 + envelope. Plain `HTTPException` does NOT. +- ❌ **Do NOT reuse `ValidationError` for the 422 path.** The existing + `ValidationError` (status_code=422, code="VALIDATION_ERROR") is for + Pydantic input failures. The new `UnprocessableEntityError` + (code="UNPROCESSABLE_ENTITY") signals "resource state prevents the + operation" — a different semantic. Consumers and tests disambiguate via + the `type` URI in the problem+json body. +- ❌ **Do NOT call `useRunFeatureMetadata(trainJob.result.run_id, ...)` from + `forecast.tsx`.** That field is the **artifact key** (12-char hex), not a + registry UUID. Use `useJobFeatureMetadata(trainJobId, ...)` instead. The + job-based endpoint resolves the artifact path internally. Recorded in + memory `[[scenario-run-id-vs-registry-run-id]]`. +- ❌ **Do NOT use `dangerouslySetInnerHTML`** anywhere — render + `feature_columns` as plain text or chips. + +## Open Questions — ALL RESOLVED + +1. **Where does the endpoint live — registry or forecasting?** + → forecasting (DECISIONS LOCKED #2). Forecasting owns the bundle format. +2. **Is `model_family` a DB column or a computed field?** + → computed field (DECISIONS LOCKED #3). No migration, no backfill. +3. **One panel or two?** + → one panel, branching on `kind` (DECISIONS LOCKED #4). +4. **Does MLZOO-D add SHAP / LIME?** + → no, deferred (DECISIONS LOCKED #5; PRP-28 already deferred SHAP). +5. **Does MLZOO-D widen the agent's tool surface?** + → no, deferred to a separate agents-scoped PRP (DECISIONS LOCKED #6). +6. **How far does backtest.tsx change?** + → MODEL_OPTIONS allow-list only (DECISIONS LOCKED #7). +7. **What happens when run-compare's two runs differ in family?** + → render a single muted message; no fetch (DECISIONS LOCKED #8). +8. **Does the panel show signed coefficients for prophet_like?** + → yes; sign is preserved end-to-end (data model: `importance: float` + signed; panel: green/red + direction icon when `kind === 'linear_coef'`). + +## Confidence Score + +**8 / 10** for one-pass implementation success (post-revision; prp-quality-agent +flagged two Critical issues in v1 — ID-namespace mismatch on `forecast.tsx` and +a `HTTPException(422)` anti-pattern that would have re-opened v0.2.16 PR #253's +hygiene fix. Both are now resolved by Tasks 6.5, 7.5, 10's two sibling hooks, +and the rewritten Task 17 + Anti-Patterns block). + +Rationale: every backend mechanic this PRP surfaces is already shipped and +verified in v0.2.16 — the four feature-aware classes, the bundle metadata +write at train time, the `runtime_info` capture, the `load_model_bundle` +deserialization, and the `/verify` endpoint that this new endpoint mirrors. +The two-endpoint pair (`/runs/{id}` + `/jobs/{id}`) is a verbatim mirror of +PRP-28's `/explain/runs/{id}` + `/explain/jobs/{id}` split, which already +ships and is test-covered. Every error path flows through a `ForecastLabError` +subclass into `forecastlab_exception_handler` — no `HTTPException` in the +new code. The frontend insertion points are precisely located (file:line +cited throughout) and use only shadcn primitives that are already installed +and test-covered. No migration, no new external dependency, no new agent +tool, no change to leakage tests. + +The −2 risk is concentrated in two places: **(a) the prophet_like signed- +coefficient display** — the panel must preserve sign through extraction +(NumPy ndarray), Pydantic response serialization (float, not abs), TanStack +Query (untouched), and React render (don't `Math.abs` in the bar-width +calc when computing `widthPercent = abs(value)/maxAbs * 100`, but DO keep +the original sign for the color/icon branch); the mitigation is the `kind` +discriminator on the wire and the matching test case in +`feature-importance-panel.test.tsx`. **(b) browser dogfood** (Task 21) — +the new collapsible on `forecast.tsx`, the conditional render gates on +`run-detail.tsx`, the cross-family branch on `run-compare.tsx`, and now +the additional `SCENARIO 6b` (baseline-train-job 400 path) each have a +state-machine flavor that catches up only in a real browser; the eight +dogfood SCENARIOS are designed to exercise every branch. Both risks are +gate-caught (vitest + webapp-testing skill); neither requires a redesign +to recover. diff --git a/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md b/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md index 105a4075..2d025b32 100644 --- a/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md +++ b/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md @@ -183,7 +183,7 @@ PRP may add an `assumptions` policy; v1 ships exactly one. ### Current Codebase tree (relevant — all already exist) -``` +```text app/ shared/feature_frames/ __init__.py # re-exports the contract surface @@ -204,7 +204,7 @@ app/ ### Desired Codebase tree — files to ADD -``` +```text app/shared/feature_frames/ rows.py # build_historical_feature_rows (promoted), # build_future_feature_rows (NEW, leakage-safe) @@ -216,7 +216,7 @@ PRPs/ ### Files to MODIFY (all additive or behaviour-preserving) -``` +```text app/shared/feature_frames/__init__.py # export the two row builders app/shared/feature_frames/tests/test_leakage.py # ADD build_future_feature_rows leakage spec app/shared/feature_frames/tests/test_contract.py # ADD: rows.py imports nothing from app/features diff --git a/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md b/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md index 9114b8c9..04efa782 100644 --- a/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md +++ b/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md @@ -20,7 +20,7 @@ regressor contributions. > are intentionally **separate branches and separate review units** — never combine them. > They are additive and order-independent; whichever merges second rebases cleanly (see > "Sibling-PRP integration" below). - +> > **Naming honesty.** The model is "Prophet-**like**", never "Prophet". It deliberately > approximates Prophet's *additive decomposition* shape using a linear model over > already-engineered features. It does **not** add the real `prophet`/Stan dependency and diff --git a/README.md b/README.md index 12faaa6a..be623f9f 100644 --- a/README.md +++ b/README.md @@ -397,7 +397,7 @@ curl -X POST http://localhost:8123/backtesting/run \ When `include_baselines=true`, automatically compares against naive and seasonal_naive models. **Feature-Aware Models:** -`regression`, `lightgbm`, and `xgboost` models can be backtested too — set +`regression`, `lightgbm`, `xgboost`, and `prophet_like` models can be backtested too — set `model_config_main.model_type` accordingly. Each fold builds a leakage-safe per-fold feature matrix (`min_train_size >= 30` required); the result carries `feature_aware: true` and `exogenous_policy: "observed"`. diff --git a/app/core/exceptions.py b/app/core/exceptions.py index 260d2fee..ef1561fb 100644 --- a/app/core/exceptions.py +++ b/app/core/exceptions.py @@ -170,6 +170,35 @@ def __init__( ) +class UnprocessableEntityError(ForecastLabError): + """Resource-state 422 error. + + Use when the request itself is well-formed and routable, but the targeted + resource is in a state that prevents the operation from completing — e.g., + a registry run with no artifact saved yet, a saved bundle whose pickle + references an optional ML extra that is not installed, or a bundle file + that has been deleted from disk while the registry row lives on. + + Distinct from :class:`ValidationError` (``code="VALIDATION_ERROR"``), which + is for Pydantic input failures. Consumers and tests disambiguate the two + 422s via the ``type`` URI in the RFC 7807 problem+json body. + """ + + error_type_uri: str = ERROR_TYPES["UNPROCESSABLE_ENTITY"] + + def __init__( + self, + message: str = "Resource state prevents the operation", + details: dict[str, Any] | None = None, + ) -> None: + super().__init__( + message=message, + code="UNPROCESSABLE_ENTITY", + status_code=422, + details=details, + ) + + # ============================================================================= # Exception Handlers (RFC 7807) # ============================================================================= diff --git a/app/core/problem_details.py b/app/core/problem_details.py index 2fcd71cf..2a24ecb7 100644 --- a/app/core/problem_details.py +++ b/app/core/problem_details.py @@ -26,6 +26,7 @@ ERROR_TYPES = { "NOT_FOUND": f"{ERROR_TYPE_BASE}/not-found", "VALIDATION_ERROR": f"{ERROR_TYPE_BASE}/validation", + "UNPROCESSABLE_ENTITY": f"{ERROR_TYPE_BASE}/unprocessable-entity", "DATABASE_ERROR": f"{ERROR_TYPE_BASE}/database", "CONFLICT": f"{ERROR_TYPE_BASE}/conflict", "UNAUTHORIZED": f"{ERROR_TYPE_BASE}/unauthorized", diff --git a/app/features/backtesting/routes.py b/app/features/backtesting/routes.py index 3971bf85..15a8d0e9 100644 --- a/app/features/backtesting/routes.py +++ b/app/features/backtesting/routes.py @@ -8,12 +8,12 @@ import time -from fastapi import APIRouter, Depends, HTTPException, status +from fastapi import APIRouter, Depends, status from sqlalchemy.exc import SQLAlchemyError from sqlalchemy.ext.asyncio import AsyncSession from app.core.database import get_db -from app.core.exceptions import DatabaseError +from app.core.exceptions import BadRequestError, DatabaseError from app.core.logging import get_logger from app.features.backtesting.schemas import BacktestRequest, BacktestResponse from app.features.backtesting.service import BacktestingService @@ -71,7 +71,7 @@ async def run_backtest( BacktestResponse with all results. Raises: - HTTPException: If validation fails or insufficient data. + BadRequestError: If validation fails or insufficient data. DatabaseError: If database operation fails. """ start_time = time.perf_counter() @@ -118,10 +118,7 @@ async def run_backtest( error=str(e), error_type=type(e).__name__, ) - raise HTTPException( - status_code=status.HTTP_400_BAD_REQUEST, - detail=str(e), - ) from e + raise BadRequestError(message=str(e)) from e except SQLAlchemyError as e: logger.error( diff --git a/app/features/backtesting/tests/test_routes_integration.py b/app/features/backtesting/tests/test_routes_integration.py index a0533146..444087ee 100644 --- a/app/features/backtesting/tests/test_routes_integration.py +++ b/app/features/backtesting/tests/test_routes_integration.py @@ -474,4 +474,9 @@ async def test_run_backtest_regression_rejects_small_min_train_size( ) assert response.status_code == 400 - assert "at least 30" in response.text + assert response.headers["content-type"].startswith("application/problem+json") + payload = response.json() + for key in ("type", "title", "status", "detail"): + assert key in payload + assert payload["status"] == 400 + assert "at least 30" in payload["detail"] diff --git a/app/features/forecasting/feature_metadata.py b/app/features/forecasting/feature_metadata.py new file mode 100644 index 00000000..479c098b --- /dev/null +++ b/app/features/forecasting/feature_metadata.py @@ -0,0 +1,228 @@ +"""Pure-function feature-importance extraction for advanced-model runs. + +This module surfaces what every feature-aware forecaster (PRP-29 / PRP-30 / +PRP-MLZOO-C1 / PRP-MLZOO-C2) already exposes on its fitted estimator +(``.feature_importances_`` for the tree models, ``coef_`` for the additive +prophet_like Ridge) into a JSON-ready :class:`FeatureImportanceItem` list the +dashboard can render. + +No I/O, no FastAPI, no DB — pure functions. The lightgbm and xgboost packages +are NOT imported at module scope: the wrapper classes +(:class:`LightGBMForecaster` etc.) are always importable from +``forecasting.models`` because the lazy ``import`` lives inside ``fit``, and +reading ``.feature_importances_`` off an *already-unpickled* estimator does +not require re-importing the library. +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING, Any + +import numpy as np +import structlog + +from app.features.forecasting.models import ( + LightGBMForecaster, + ProphetLikeForecaster, + RegressionForecaster, + XGBoostForecaster, +) +from app.features.forecasting.schemas import FeatureImportanceItem, ModelFamily + +if TYPE_CHECKING: + from app.features.forecasting.models import BaseForecaster + +logger = structlog.get_logger(__name__) + + +# Canonical map: model_type string → ModelFamily. Unknown types log a warning +# and classify as BASELINE (forward-compatible for new families before this +# map is updated). Keep in sync with the ``ModelType`` Literal in +# ``forecasting/models.py`` (line 1133-1135). +_MODEL_FAMILY_MAP: dict[str, ModelFamily] = { + "naive": ModelFamily.BASELINE, + "seasonal_naive": ModelFamily.BASELINE, + "moving_average": ModelFamily.BASELINE, + "regression": ModelFamily.TREE, + "lightgbm": ModelFamily.TREE, + "xgboost": ModelFamily.TREE, + "prophet_like": ModelFamily.ADDITIVE, +} + + +def model_family_for(model_type: str) -> ModelFamily: + """Return the :class:`ModelFamily` for a given ``model_type`` string. + + Unknown types log a warning and return :attr:`ModelFamily.BASELINE` so a + new model registered in :mod:`forecasting.models` before this map is + updated does not raise — it just shows up in the dashboard as a baseline + until the map catches up. + """ + family = _MODEL_FAMILY_MAP.get(model_type) + if family is None: + logger.warning( + "forecasting.unknown_model_family", + model_type=model_type, + fallback=ModelFamily.BASELINE.value, + ) + return ModelFamily.BASELINE + return family + + +class FeatureImportanceUnavailableError(ValueError): + """The estimator does not expose a usable feature-importance vector. + + Subclass of :class:`ValueError` so existing ``except ValueError`` clauses + keep working; the service layer uses ``isinstance`` to map this specific + case to a 422 :class:`~app.core.exceptions.UnprocessableEntityError` + (resource-state) instead of the 400 it maps other ``ValueError`` raises + to. The clearest example is + ``sklearn.ensemble.HistGradientBoostingRegressor`` — a histogram-based + booster that, unlike its tree cousin :class:`GradientBoostingRegressor`, + does NOT expose ``feature_importances_``. + """ + + +def importance_type_for(model: BaseForecaster) -> str | None: + """Return the importance kind the dashboard should label the values with. + + - LightGBM: ``importance_type`` attribute (default ``'split'``). + - XGBoost: ``importance_type`` attribute (default ``'weight'`` — but the + sklearn-API wrapper exposes ``None`` on a freshly-constructed instance, + so we fall back to the documented XGBoost default). + - :class:`RegressionForecaster` (``HistGradientBoostingRegressor``): + ``'permutation'`` — the only honest label we can give a HistGBR fit + whose ``feature_importances_`` we never get to read (see + :class:`FeatureImportanceUnavailableError`). + - :class:`ProphetLikeForecaster`: ``'ridge_coef'``. + - Anything else: ``None``. + """ + if isinstance(model, LightGBMForecaster): + return getattr(model._estimator, "importance_type", None) or "split" # pyright: ignore[reportPrivateUsage] + if isinstance(model, XGBoostForecaster): + return getattr(model._estimator, "importance_type", None) or "weight" # pyright: ignore[reportPrivateUsage] + if isinstance(model, RegressionForecaster): + return "permutation" + if isinstance(model, ProphetLikeForecaster): + return "ridge_coef" + return None + + +def extract_feature_importance( + model: BaseForecaster, + feature_columns: list[str], +) -> list[FeatureImportanceItem]: + """Extract a sorted :class:`FeatureImportanceItem` list from a fitted model. + + Branches on the concrete forecaster class: + + - :class:`LightGBMForecaster` / :class:`XGBoostForecaster` / + :class:`RegressionForecaster` → ``estimator.feature_importances_`` + (non-negative; ``kind='tree'``). + - :class:`ProphetLikeForecaster` → ``pipeline.named_steps['ridge'].coef_`` + (signed; ``kind='linear_coef'``). The sign carries directional + information and MUST be preserved end-to-end. + + Items are sorted by ``|importance|`` descending and 1-indexed ``rank``. + + Args: + model: A fitted feature-aware forecaster. Caller must ensure the + estimator is fitted; this function does not check. + feature_columns: The canonical column order the model was trained on + (always 14 for v0.2.16 — see + ``app/shared/feature_frames/contract.py``). + + Returns: + A list of :class:`FeatureImportanceItem`, length matching + ``feature_columns``, sorted by ``|importance|`` desc with 1-indexed + ``rank``. + + Raises: + ValueError: If ``model`` is not one of the four feature-aware classes, + or if the importance vector length does not match + ``feature_columns``. + """ + raw: np.ndarray[Any, np.dtype[np.floating[Any]]] + kind: str + if isinstance(model, (LightGBMForecaster, XGBoostForecaster, RegressionForecaster)): + estimator: Any = model._estimator # pyright: ignore[reportPrivateUsage] + # HistGradientBoostingRegressor (RegressionForecaster's wrapped + # estimator) does NOT expose ``feature_importances_`` — it is the one + # tree-family booster that doesn't. Surface this as a dedicated + # 422-eligible error rather than the generic AttributeError sklearn + # would otherwise throw, so the route can render a clear + # remediation hint. + if not hasattr(estimator, "feature_importances_"): + raise FeatureImportanceUnavailableError( + f"Feature importance is not available for " + f"{type(estimator).__name__}. Histogram-based boosters " + "(HistGradientBoostingRegressor) do not expose " + "feature_importances_; use lightgbm or xgboost for native " + "tree-importance, or prophet_like for signed coefficients." + ) + raw = np.asarray(estimator.feature_importances_, dtype=np.float64) + kind = "tree" + elif isinstance(model, ProphetLikeForecaster): + # MIRROR models.py:1094-1098 — drill into the Ridge step of the + # Pipeline. The `.coef_` here is signed, shape (n_kept_features,). + # + # GOTCHA: sklearn's `SimpleImputer(strategy="median")` (default + # ``keep_empty_features=False`` since 1.2) DROPS columns whose + # training values are all-NaN. The downstream Ridge therefore + # learns one fewer coefficient than the bundle's input contract + # (``feature_columns``) advertises. We realign by inspecting the + # imputer's ``statistics_`` (length = n_features_in_, NaN entries + # mark dropped columns) and padding the coef vector back to the + # full input width with 0.0 for dropped columns — an honest + # "the model assigned no weight to this feature because the + # training data had no observed values for it" signal. + estimator = model._estimator # pyright: ignore[reportPrivateUsage] + imputer: Any = estimator.named_steps["impute"] + ridge: Any = estimator.named_steps["ridge"] + coef = np.asarray(ridge.coef_, dtype=np.float64) + stats = np.asarray(imputer.statistics_, dtype=np.float64) + kept_mask = ~np.isnan(stats) + if coef.shape[0] == stats.shape[0]: + # No columns were dropped by the imputer — coef already aligns. + raw = coef + elif int(kept_mask.sum()) == coef.shape[0]: + # Pad coef back to the full input width, 0.0 for dropped columns. + raw = np.zeros(stats.shape[0], dtype=np.float64) + raw[kept_mask] = coef + else: + raise FeatureImportanceUnavailableError( + f"ProphetLike coefficient/imputer alignment failed: ridge.coef_ " + f"has {coef.shape[0]} entries, imputer kept " + f"{int(kept_mask.sum())} of {stats.shape[0]} columns. " + "The bundle metadata cannot be reconciled with the fitted " + "estimator's shape; re-train the model with a recent " + "scikit-learn version." + ) + kind = "linear_coef" + else: + raise ValueError( + f"model_type '{type(model).__name__}' is not feature-aware; " + "feature importance is available for LightGBM, XGBoost, " + "RegressionForecaster, and ProphetLikeForecaster only." + ) + + if len(raw) != len(feature_columns): + raise ValueError( + f"feature_columns length mismatch: importance vector has {len(raw)} " + f"elements, feature_columns has {len(feature_columns)}" + ) + + # Sort by absolute magnitude descending; preserve sign in the value for + # linear_coef. argsort returns ascending indices, so we negate the abs + # vector to flip the order in one pass. + indices_by_magnitude = np.argsort(-np.abs(raw)) + items: list[FeatureImportanceItem] = [ + FeatureImportanceItem( + name=feature_columns[int(idx)], + importance=float(raw[int(idx)]), + kind="tree" if kind == "tree" else "linear_coef", + rank=rank, + ) + for rank, idx in enumerate(indices_by_magnitude, start=1) + ] + return items diff --git a/app/features/forecasting/models.py b/app/features/forecasting/models.py index 1c43ea32..07be30a5 100644 --- a/app/features/forecasting/models.py +++ b/app/features/forecasting/models.py @@ -700,8 +700,15 @@ def fit( f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" ) # LAZY import — the optional ``ml-lightgbm`` dependency is only needed - # the first time a LightGBM model is actually fitted. - import lightgbm as lgb + # the first time a LightGBM model is actually fitted. A missing package + # becomes a ValueError so the route returns a controlled 400, not a 500. + try: + import lightgbm as lgb + except ImportError as exc: + raise ValueError( + "LightGBM is not installed. Install the optional 'ml-lightgbm' " + "extra (e.g. `uv sync --extra ml-lightgbm`)." + ) from exc estimator: Any = lgb.LGBMRegressor( n_estimators=self.n_estimators, @@ -857,8 +864,15 @@ def fit( f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" ) # LAZY import — the optional ``ml-xgboost`` dependency is only needed - # the first time an XGBoost model is actually fitted. - import xgboost as xgb + # the first time an XGBoost model is actually fitted. A missing package + # becomes a ValueError so the route returns a controlled 400, not a 500. + try: + import xgboost as xgb + except ImportError as exc: + raise ValueError( + "XGBoost is not installed. Install the optional 'ml-xgboost' " + "extra (e.g. `uv sync --extra ml-xgboost`)." + ) from exc estimator: Any = xgb.XGBRegressor( n_estimators=self.n_estimators, diff --git a/app/features/forecasting/routes.py b/app/features/forecasting/routes.py index 9d84d003..d122ab6f 100644 --- a/app/features/forecasting/routes.py +++ b/app/features/forecasting/routes.py @@ -6,9 +6,10 @@ from app.core.config import get_settings from app.core.database import get_db -from app.core.exceptions import DatabaseError +from app.core.exceptions import BadRequestError, DatabaseError from app.core.logging import get_logger from app.features.forecasting.schemas import ( + FeatureMetadataResponse, PredictRequest, PredictResponse, TrainRequest, @@ -33,7 +34,14 @@ - `naive`: Predicts last observed value for all horizons - `seasonal_naive`: Predicts value from same season in previous cycle - `moving_average`: Predicts mean of last N observations -- `lightgbm`: LightGBM regressor (feature-flagged, disabled by default) +- `regression`: HistGradientBoostingRegressor over the canonical 14-column + feature frame (always available; no optional extra) +- `lightgbm`: LightGBM regressor (feature-aware; requires the `ml-lightgbm` + extra at install time AND `forecast_enable_lightgbm=True` at runtime) +- `xgboost`: XGBoost regressor (feature-aware; requires the `ml-xgboost` + extra at install time AND `forecast_enable_xgboost=True` at runtime) +- `prophet_like`: Ridge-based additive forecaster — trend + seasonality + + holiday-regressor decomposition (always available; pure scikit-learn) **Persistence:** Trained models are saved as ModelBundle files containing: - The fitted model @@ -66,16 +74,14 @@ async def train_model( # Check if LightGBM is enabled if request.config.model_type == "lightgbm" and not settings.forecast_enable_lightgbm: - raise HTTPException( - status_code=status.HTTP_400_BAD_REQUEST, - detail="LightGBM is disabled. Set forecast_enable_lightgbm=True in settings.", + raise BadRequestError( + message="LightGBM is disabled. Set forecast_enable_lightgbm=True in settings.", ) # Check if XGBoost is enabled if request.config.model_type == "xgboost" and not settings.forecast_enable_xgboost: - raise HTTPException( - status_code=status.HTTP_400_BAD_REQUEST, - detail="XGBoost is disabled. Set forecast_enable_xgboost=True in settings.", + raise BadRequestError( + message="XGBoost is disabled. Set forecast_enable_xgboost=True in settings.", ) logger.info( @@ -229,3 +235,140 @@ async def predict( status_code=status.HTTP_400_BAD_REQUEST, detail=str(e), ) from e + + +# ─── MLZOO-D / PRP-31 — feature-metadata endpoints ─────────────────────────── +# +# Two sibling routes (run-keyed + job-keyed) that mirror PRP-28's +# /explain/runs/{run_id} + /explain/jobs/{job_id} shape exactly. Each returns +# the canonical 14-column feature frame the model consumed and its native +# feature_importances_ (tree) or coef_ (additive prophet_like). All error +# paths flow through ForecastLabError subclasses (NotFoundError 404, +# BadRequestError 400, UnprocessableEntityError 422) into +# forecastlab_exception_handler which serializes RFC 7807 problem+json — no +# bare HTTPException, no raw 500. + + +@router.get( + "/runs/{run_id}/feature-metadata", + response_model=FeatureMetadataResponse, + status_code=status.HTTP_200_OK, + summary="Extract feature columns + learned importance for an advanced-model run", + description=""" +Returns the canonical 14-column feature frame the model consumed and the +fitted estimator's `feature_importances_` (tree models) or `coef_` (additive +prophet_like). Loads the saved joblib artifact lazily — the +`/registry/runs/{run_id}/verify` precedent for "load and inspect". + +**Error semantics (RFC 7807 application/problem+json):** +- 400 BAD_REQUEST — model_family is `baseline` (no learned importance to extract) +- 404 NOT_FOUND — run_id not found +- 422 UNPROCESSABLE_ENTITY — run has no artifact_uri yet, status is + pending/running/failed, the artifact file has been deleted from disk + (`FileNotFoundError`), `joblib.load` raised `ModuleNotFoundError` because + an optional `ml-*` extra is not installed, or the underlying estimator + does not expose `feature_importances_` (`HistGradientBoostingRegressor`). +- 500 — DB layer surprise (`SQLAlchemyError`) — mapped to `DatabaseError`. + +Archived runs whose artifact files are intact still return 200. +""", +) +async def get_run_feature_metadata( + run_id: str, + db: AsyncSession = Depends(get_db), +) -> FeatureMetadataResponse: + """Extract feature metadata for a registry-tracked run. + + Args: + run_id: Registry ``model_run.run_id`` (full UUID). + db: Async database session from dependency. + + Returns: + :class:`FeatureMetadataResponse` for a non-baseline run. + + Raises: + NotFoundError: When no run matches ``run_id``. + BadRequestError: For a baseline-family run. + UnprocessableEntityError: For a run with no usable artifact / missing + ``ml-*`` extra / deleted artifact / HistGBR-no-importance gap. + DatabaseError: For DB-layer surprises. + """ + try: + return await ForecastingService().get_feature_metadata_for_run(db, run_id) + except SQLAlchemyError as exc: + logger.error( + "forecasting.feature_metadata_db_error", + run_id=run_id, + error=str(exc), + exc_info=True, + ) + raise DatabaseError( + message="Failed to load feature metadata", + details={"error": str(exc)}, + ) from exc + + +@router.get( + "/jobs/{job_id}/feature-metadata", + response_model=FeatureMetadataResponse, + status_code=status.HTTP_200_OK, + summary="Extract feature columns + learned importance for a completed train job", + description=""" +The job-keyed sibling of `/forecasting/runs/{run_id}/feature-metadata`, +exactly mirroring the shape of `/explain/jobs/{job_id}` (PRP-28). Use this +endpoint when the caller has only a `job_id` — for example, the dashboard's +forecast viz page (`frontend/src/pages/visualize/forecast.tsx`), whose +`trainJob.result.run_id` is the **forecast-artifact key** +(`uuid.uuid4().hex[:12]`, see `forecasting/service.py:270`), NOT a registry +UUID. + +The route reads `job.result["model_path"]` (the full `.joblib` path written +by `_execute_train` at `jobs/service.py:517`) and loads the bundle directly. + +**Error semantics (RFC 7807 application/problem+json):** +- 400 BAD_REQUEST — the job is not a completed train job, or the trained + model is baseline-family. +- 404 NOT_FOUND — job_id not found. +- 422 UNPROCESSABLE_ENTITY — `model_path` missing from the train-job result, + the artifact file is missing from disk (`FileNotFoundError`), an `ml-*` + extra is not installed at unpickle time (`ModuleNotFoundError`), or the + underlying estimator does not expose `feature_importances_` + (`HistGradientBoostingRegressor`). +- 500 — DB layer surprise (`SQLAlchemyError`) — mapped to `DatabaseError`. +""", +) +async def get_job_feature_metadata( + job_id: str, + db: AsyncSession = Depends(get_db), +) -> FeatureMetadataResponse: + """Extract feature metadata for a completed train job. + + Args: + job_id: Job identifier returned by ``POST /jobs``. + db: Async database session from dependency. + + Returns: + :class:`FeatureMetadataResponse` whose ``run_id`` field is the + forecast-artifact key (12-char hex), NOT a registry UUID. + + Raises: + NotFoundError: When no job matches ``job_id``. + BadRequestError: For a non-completed-train job, or a baseline-trained + job. + UnprocessableEntityError: For a missing ``model_path``, deleted + artifact, missing ``ml-*`` extra, or HistGBR-no-importance gap. + DatabaseError: For DB-layer surprises. + """ + try: + return await ForecastingService().get_feature_metadata_for_job(db, job_id) + except SQLAlchemyError as exc: + logger.error( + "forecasting.feature_metadata_db_error", + job_id=job_id, + error=str(exc), + exc_info=True, + ) + raise DatabaseError( + message="Failed to load feature metadata", + details={"error": str(exc)}, + ) from exc diff --git a/app/features/forecasting/schemas.py b/app/features/forecasting/schemas.py index 9219ab21..23308e39 100644 --- a/app/features/forecasting/schemas.py +++ b/app/features/forecasting/schemas.py @@ -10,6 +10,7 @@ import hashlib from datetime import date as date_type +from enum import Enum from typing import Literal from pydantic import BaseModel, ConfigDict, Field, field_validator @@ -411,3 +412,94 @@ class PredictResponse(BaseModel): config_hash: str horizon: int duration_ms: float + + +# ============================================================================= +# Model Family + Feature Metadata Schemas (MLZOO-D / PRP-31) +# ============================================================================= + + +class ModelFamily(str, Enum): + """Classifier for advanced-model UI surfacing. + + Derived from ``model_type``; not persisted in the DB. Surfaced on + ``RunResponse`` via a computed field and consumed by the dashboard for the + family Badge and the feature-importance panel routing. Unknown model types + classify as ``BASELINE`` (forward-compatible for new families before the + map in ``feature_metadata.py`` is updated). + """ + + BASELINE = "baseline" # naive, seasonal_naive, moving_average + TREE = "tree" # regression (HistGBR), lightgbm, xgboost + ADDITIVE = "additive" # prophet_like (Ridge pipeline) + + +class FeatureImportanceItem(BaseModel): + """One row of model-derived feature importance, ready for the dashboard.""" + + model_config = ConfigDict(extra="forbid") + + name: str = Field(..., description="Canonical feature column name (e.g. 'lag_7').") + importance: float = Field( + ..., + description=( + "For tree models: estimator.feature_importances_ value " + "(non-negative). For additive models: " + "pipeline.named_steps['ridge'].coef_ value (signed; the sign " + "carries directional information and MUST be preserved)." + ), + ) + kind: Literal["tree", "linear_coef"] = Field( + ..., + description=( + "Display semantics: 'tree' → magnitude bar; " + "'linear_coef' → signed bar with direction icon." + ), + ) + rank: int = Field(..., ge=1, description="1-indexed rank by |importance| desc.") + + +class FeatureMetadataResponse(BaseModel): + """The /forecasting/runs/{run_id}/feature-metadata response. + + Also the /forecasting/jobs/{job_id}/feature-metadata response. When sourced + from a job, ``run_id`` is the **artifact key** parsed from the bundle file + name (12-char hex), NOT a registry ``model_run.run_id``. Consumers MUST NOT + join this back to the registry by that field when the source was a job. + """ + + model_config = ConfigDict(extra="forbid") + + run_id: str = Field( + ..., + description=( + "Source identifier. For the runs-keyed endpoint: the registry " + "run_id. For the jobs-keyed endpoint: the forecast-artifact key " + "(``uuid.uuid4().hex[:12]``) parsed from the bundle path stem." + ), + ) + model_type: str + model_family: ModelFamily + feature_columns: list[str] = Field( + ..., + description=( + "The canonical feature frame the model consumed at training time. " + "Always 14 columns for v0.2.16 feature-aware models (see " + "app/shared/feature_frames/contract.py:80)." + ), + ) + features: list[FeatureImportanceItem] = Field( + ..., + description="Sorted by |importance| descending; len == len(feature_columns).", + ) + importance_type: str | None = Field( + default=None, + description=( + "For LightGBM / XGBoost: the booster's ``importance_type`` " + "('split' / 'gain' / 'weight' / 'cover' depending on the library " + "default; ForecastLabAI does not override at training time). " + "For HistGBR-based RegressionForecaster: 'permutation'. " + "For prophet_like: 'ridge_coef'. Always populated so consumers " + "know what the numbers mean." + ), + ) diff --git a/app/features/forecasting/service.py b/app/features/forecasting/service.py index ca706803..dc69269a 100644 --- a/app/features/forecasting/service.py +++ b/app/features/forecasting/service.py @@ -17,7 +17,7 @@ from datetime import UTC, datetime, timedelta from datetime import date as date_type from pathlib import Path -from typing import TYPE_CHECKING, Any +from typing import TYPE_CHECKING, Any, cast import numpy as np import structlog @@ -25,7 +25,18 @@ from sqlalchemy.ext.asyncio import AsyncSession from app.core.config import get_settings +from app.core.exceptions import ( + BadRequestError, + NotFoundError, + UnprocessableEntityError, +) from app.features.data_platform.models import Calendar, Product, Promotion, SalesDaily +from app.features.forecasting.feature_metadata import ( + FeatureImportanceUnavailableError, + extract_feature_importance, + importance_type_for, + model_family_for, +) from app.features.forecasting.models import model_factory from app.features.forecasting.persistence import ( ModelBundle, @@ -33,11 +44,21 @@ save_model_bundle, ) from app.features.forecasting.schemas import ( + FeatureMetadataResponse, ForecastPoint, ModelConfig, + ModelFamily, PredictResponse, TrainResponse, ) + +# NOTE: ``RegistryService`` / ``JobService`` and their status enums are imported +# LAZILY inside the feature-metadata methods below. Importing them at module +# scope would close a cycle with ``registry.schemas`` (which eagerly imports +# ``ModelFamily`` from the forecasting slice for the ``model_family`` computed +# field on ``RunResponse``). The explainability slice avoids the same trap by +# importing only ``registry.models`` (a read-only ORM contract); we keep the +# import-graph one-way by deferring our service-level imports. from app.shared.feature_frames import ( HISTORY_TAIL_DAYS, build_historical_feature_rows, @@ -624,3 +645,265 @@ async def _build_regression_features( launch_date_iso=launch_date.isoformat() if launch_date is not None else None, n_observations=len(dates), ) + + # ------------------------------------------------------------------ # + # MLZOO-D / PRP-31 — feature-metadata extraction + # + # Two sibling entry points: one keyed by registry ``run_id`` (consumed by + # the runs explorer / run-detail / run-compare pages), one keyed by + # train-``job_id`` (consumed by ``forecast.tsx``, which only ever has a + # job_id — its ``trainJob.result.run_id`` is the **forecast-artifact key** + # ``uuid.uuid4().hex[:12]``, NOT a registry UUID; see service.py:270 and + # memory ``scenario-run-id-vs-registry-run-id``). + # + # Both methods load the saved joblib bundle lazily (mirroring the + # ``/registry/runs/{run_id}/verify`` precedent) and surface the four + # error categories as RFC 7807 problem+json via the existing + # :class:`ForecastLabError` subclasses: + # + # - 404 :class:`NotFoundError` — unknown run/job + # - 400 :class:`BadRequestError` — baseline family / wrong job type + # - 422 :class:`UnprocessableEntityError` — no artifact yet, ML extra + # missing at unpickle time, artifact file deleted off disk, or the + # specific HistGBR ``feature_importances_`` gap + # ------------------------------------------------------------------ # + + async def get_feature_metadata_for_run( + self, + db: AsyncSession, + run_id: str, + ) -> FeatureMetadataResponse: + """Extract the canonical feature columns + learned importances for a + registry run. + + Args: + db: Async database session. + run_id: Registry ``model_run.run_id`` (full UUID). + + Returns: + A :class:`FeatureMetadataResponse` — see schema for field detail. + + Raises: + NotFoundError: When no run matches ``run_id``. + BadRequestError: When the run's model family is baseline (the + run has no native learned importance). + UnprocessableEntityError: When the run has no artifact yet + (``pending`` / ``running`` / ``failed`` status or + ``artifact_uri is None``), when ``joblib.load`` raises + :class:`ModuleNotFoundError` (missing ``ml-*`` extra), + when the artifact file is missing from disk + (:class:`FileNotFoundError`), or when the underlying + estimator does not expose + ``feature_importances_`` (``HistGradientBoostingRegressor``). + """ + # Lazy cross-slice imports — see module-level NOTE. + from app.features.registry.schemas import RunStatus + from app.features.registry.service import RegistryService + + run = await RegistryService().get_run(db, run_id) + if run is None: + raise NotFoundError(message=f"Model run not found: {run_id}") + + if model_family_for(run.model_type) is ModelFamily.BASELINE: + raise BadRequestError( + message=( + "Feature metadata is available for tree and additive " + "families only; this run is a baseline model " + f"({run.model_type})." + ), + ) + + if run.artifact_uri is None or run.status not in ( + RunStatus.SUCCESS, + RunStatus.ARCHIVED, + ): + artifact_present = "absent" if run.artifact_uri is None else "present" + raise UnprocessableEntityError( + message=( + "Run has no usable artifact yet; status=" + f"{run.status.value}, artifact_uri={artifact_present}." + ), + ) + + try: + bundle = load_model_bundle( + run.artifact_uri, + base_dir=self.settings.forecast_model_artifacts_dir, + ) + except ModuleNotFoundError as exc: + raise UnprocessableEntityError( + message=( + "Model artifact requires an optional ML extra that is not " + f"installed: {exc.name}. Reinstall the backend with the " + f"relevant extra enabled (e.g. `uv sync --extra ml-{exc.name}`)." + ), + ) from exc + except FileNotFoundError as exc: + raise UnprocessableEntityError( + message=( + f"Model artifact file is missing from disk: " + f"{run.artifact_uri}. The registry row references an " + "artifact that has been deleted or moved." + ), + ) from exc + + return self._build_metadata_response( + source_id=run.run_id, + model_type=run.model_type, + bundle=bundle, + ) + + async def get_feature_metadata_for_job( + self, + db: AsyncSession, + job_id: str, + ) -> FeatureMetadataResponse: + """Extract the canonical feature columns + learned importances for a + completed train job. + + Mirrors PRP-28's ``/explain/jobs/{job_id}`` shape exactly so future + maintainers see the run-keyed / job-keyed pair as a structural twin. + + Args: + db: Async database session. + job_id: Job identifier returned by ``POST /jobs``. + + Returns: + A :class:`FeatureMetadataResponse` whose ``run_id`` field carries + the **forecast-artifact key** (12-char hex) parsed from the bundle + path stem — NOT a registry UUID. Documented on the schema field. + + Raises: + NotFoundError: When no job matches ``job_id``. + BadRequestError: When the job is not a completed train job, or + when the underlying trained model is a baseline family. + UnprocessableEntityError: When the train job's + ``result["model_path"]`` is missing or points at a path that + ``load_model_bundle`` can no longer find, or when the + ``ml-*`` extra is missing at unpickle time. + """ + # Lazy cross-slice imports — see module-level NOTE. + from app.features.jobs.models import JobStatus, JobType + from app.features.jobs.service import JobService + + job = await JobService().get_job(db, job_id) + if job is None: + raise NotFoundError(message=f"Job not found: {job_id}") + + if job.job_type != JobType.TRAIN or job.status != JobStatus.COMPLETED: + raise BadRequestError( + message=( + "Feature metadata can only be derived from a completed " + f"train job; got job_type={job.job_type.value}, " + f"status={job.status.value}." + ), + ) + + # CRITICAL: read ``model_path`` (the full ``.joblib`` path written by + # ``_execute_train`` in jobs/service.py:517) — NOT ``run_id`` (the + # artifact key). ``load_model_bundle`` does not auto-inject the + # ``.joblib`` suffix, so reconstructing the path from ``run_id`` + # would FileNotFoundError. See persistence.py:154,173 (verbatim + # ``path.exists()`` in load) vs persistence.py:88-89 (auto-`.joblib` + # in save) for the asymmetry. + result = job.result or {} + bundle_path_str = result.get("model_path") + if not bundle_path_str: + raise UnprocessableEntityError( + message=( + "Train job result is missing model_path; the job may " + "have failed mid-write or the result schema changed." + ), + ) + + try: + bundle = load_model_bundle( + bundle_path_str, + base_dir=self.settings.forecast_model_artifacts_dir, + ) + except ModuleNotFoundError as exc: + raise UnprocessableEntityError( + message=( + "Model artifact requires an optional ML extra that is not " + f"installed: {exc.name}. Reinstall the backend with the " + f"relevant extra enabled (e.g. `uv sync --extra ml-{exc.name}`)." + ), + ) from exc + except FileNotFoundError as exc: + raise UnprocessableEntityError( + message=( + f"Model artifact file is missing from disk: " + f"{bundle_path_str}. The job row references an artifact " + "that has been deleted or moved." + ), + ) from exc + + bundle_model_type = bundle.config.model_type + if model_family_for(bundle_model_type) is ModelFamily.BASELINE: + raise BadRequestError( + message=( + "Feature metadata is available for tree and additive " + "families only; this train job's underlying model is a " + f"baseline ({bundle_model_type})." + ), + ) + + # Parse the artifact key from the bundle's filename stem so callers + # have a stable id for the response. The bundle file is named + # ``model_{artifact_id}.joblib`` (forecasting/service.py:270-272). + artifact_stem = Path(bundle_path_str).stem + artifact_id = artifact_stem.removeprefix("model_") or artifact_stem + + return self._build_metadata_response( + source_id=artifact_id, + model_type=bundle_model_type, + bundle=bundle, + ) + + def _build_metadata_response( + self, + *, + source_id: str, + model_type: str, + bundle: ModelBundle, + ) -> FeatureMetadataResponse: + """Shared response-assembly path for the two feature-metadata endpoints. + + Wraps the importance-extraction call so the + :class:`FeatureImportanceUnavailableError` raised for the + ``HistGradientBoostingRegressor`` gap is translated to a 422 with a + clear remediation hint, while the generic ``ValueError`` raised for a + truly non-feature-aware estimator becomes the existing 400 contract. + """ + feature_columns_raw = bundle.metadata.get("feature_columns") + if not isinstance(feature_columns_raw, list): + raise UnprocessableEntityError( + message=( + "Bundle metadata is missing feature_columns; this run was " + "trained before MLZOO-A's feature-frame contract landed " + "(or with a custom feature builder)." + ), + ) + # The metadata dict is typed ``dict[str, object]``; cast each entry to + # str to satisfy the response schema's stricter contract. + feature_columns: list[str] = [str(c) for c in cast(list[object], feature_columns_raw)] + + try: + features = extract_feature_importance(bundle.model, feature_columns) + except FeatureImportanceUnavailableError as exc: + raise UnprocessableEntityError(message=str(exc)) from exc + except ValueError as exc: + # ValueError here means "not feature-aware" (already gated by the + # family check above, but defence-in-depth). + raise BadRequestError(message=str(exc)) from exc + + importance_type = importance_type_for(bundle.model) + + return FeatureMetadataResponse( + run_id=source_id, + model_type=model_type, + model_family=model_family_for(model_type), + feature_columns=feature_columns, + features=features, + importance_type=importance_type, + ) diff --git a/app/features/forecasting/tests/test_feature_metadata.py b/app/features/forecasting/tests/test_feature_metadata.py new file mode 100644 index 00000000..618bcc91 --- /dev/null +++ b/app/features/forecasting/tests/test_feature_metadata.py @@ -0,0 +1,267 @@ +"""Unit tests for the feature-importance extractor (MLZOO-D / PRP-31). + +These tests fit small, deterministic instances of every feature-aware +forecaster, then assert the extractor returns a sorted, well-shaped +:class:`FeatureImportanceItem` list. The LightGBM and XGBoost tests guard the +optional extras via ``pytest.importorskip`` so the file always runs on a +minimal install — the RegressionForecaster and ProphetLikeForecaster paths +have no optional dependency. + +The leakage spec ``app/features/featuresets/tests/test_leakage.py`` is the +hard invariant; nothing here touches it. +""" + +from __future__ import annotations + +from typing import Any, cast, get_args + +import numpy as np +import pytest + +from app.features.forecasting.feature_metadata import ( + _MODEL_FAMILY_MAP, + FeatureImportanceUnavailableError, + extract_feature_importance, + importance_type_for, + model_family_for, +) +from app.features.forecasting.models import ( + BaseForecaster, + LightGBMForecaster, + ModelType, + MovingAverageForecaster, + NaiveForecaster, + ProphetLikeForecaster, + RegressionForecaster, + SeasonalNaiveForecaster, + XGBoostForecaster, +) +from app.features.forecasting.schemas import ModelFamily +from app.shared.feature_frames import canonical_feature_columns + +FloatArray = np.ndarray[Any, np.dtype[np.floating[Any]]] + +_N_FEATURES = len(canonical_feature_columns()) # 14 + + +# --------------------------------------------------------------------------- +# Synthetic fixtures +# --------------------------------------------------------------------------- + + +def _synthetic_data( + n: int = 120, n_features: int = _N_FEATURES, seed: int = 0 +) -> tuple[FloatArray, FloatArray]: + """Build a synthetic (target, feature-matrix) pair with a couple of + informative columns so the importance vector is not all zero.""" + rng = np.random.default_rng(seed) + features = rng.normal(size=(n, n_features)).astype(np.float64) + # Mix signed coefficients so the prophet_like coef_ vector has both signs + target = ( + 50.0 + + 5.0 * features[:, 0] + - 3.0 * features[:, 1] + + 2.0 * features[:, 5] + + rng.normal(scale=0.5, size=n) + ).astype(np.float64) + return features, target + + +def _feature_columns() -> list[str]: + return list(canonical_feature_columns()) + + +# --------------------------------------------------------------------------- +# model_family_for +# --------------------------------------------------------------------------- + + +def test_model_family_for_maps_baseline_types_to_baseline() -> None: + for mt in ("naive", "seasonal_naive", "moving_average"): + assert model_family_for(mt) == ModelFamily.BASELINE + + +def test_model_family_for_maps_tree_types_to_tree() -> None: + for mt in ("regression", "lightgbm", "xgboost"): + assert model_family_for(mt) == ModelFamily.TREE + + +def test_model_family_for_maps_prophet_like_to_additive() -> None: + assert model_family_for("prophet_like") == ModelFamily.ADDITIVE + + +def test_model_family_for_unknown_returns_baseline() -> None: + """An unknown model_type logs a warning and degrades to BASELINE.""" + assert model_family_for("future_arima_v9") == ModelFamily.BASELINE + + +def test_model_family_map_covers_every_known_model_type() -> None: + """Every Literal in ``ModelType`` is reachable in the family map. + + Catches drift: if a new model type is added to ``forecasting/models.py`` + without updating ``_MODEL_FAMILY_MAP``, this fails immediately and the + dashboard would otherwise silently classify it as BASELINE. + """ + declared = set(get_args(ModelType)) + assert declared <= set(_MODEL_FAMILY_MAP.keys()), ( + f"ModelType declares {declared - set(_MODEL_FAMILY_MAP.keys())} " + "but _MODEL_FAMILY_MAP doesn't" + ) + + +# --------------------------------------------------------------------------- +# importance_type_for +# --------------------------------------------------------------------------- + + +def test_importance_type_for_regression_is_permutation() -> None: + features, target = _synthetic_data() + model = RegressionForecaster().fit(target, features) + assert importance_type_for(model) == "permutation" + + +def test_importance_type_for_prophet_like_is_ridge_coef() -> None: + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + assert importance_type_for(model) == "ridge_coef" + + +def test_importance_type_for_baseline_is_none() -> None: + """Baselines are not feature-aware; the function returns None for them.""" + naive = NaiveForecaster() + seasonal = SeasonalNaiveForecaster() + ma = MovingAverageForecaster() + for m in (naive, seasonal, ma): + assert importance_type_for(cast(BaseForecaster, m)) is None + + +@pytest.mark.parametrize("extra", ["lightgbm", "xgboost"]) +def test_importance_type_for_tree_extras(extra: str) -> None: + """LGBM / XGBoost expose ``importance_type``; XGB falls back to 'weight'. + + LightGBM's sklearn wrapper sets ``importance_type='split'`` by default. + XGBoost's wrapper exposes ``importance_type=None`` on a freshly-built + estimator (the internal default is ``'weight'``), so the helper falls + back to the documented XGBoost default rather than returning ``None``. + """ + pytest.importorskip(extra) + features, target = _synthetic_data() + model: BaseForecaster + expected_default: str + if extra == "lightgbm": + model = LightGBMForecaster().fit(target, features) + expected_default = "split" + else: + model = XGBoostForecaster().fit(target, features) + expected_default = "weight" + kind = importance_type_for(model) + assert kind == expected_default + + +# --------------------------------------------------------------------------- +# extract_feature_importance — happy paths +# --------------------------------------------------------------------------- + + +def test_extract_regression_raises_unavailable() -> None: + """HistGradientBoostingRegressor does not expose ``feature_importances_``. + + Unlike the older ``GradientBoostingRegressor``, the histogram-based booster + sklearn 1.x ships does not implement the attribute. The extractor surfaces + this as :class:`FeatureImportanceUnavailableError` (a ``ValueError`` + subclass) so the service layer can map it to a 422 with a clear + remediation hint, rather than letting an opaque AttributeError reach the + handler. + """ + features, target = _synthetic_data() + model = RegressionForecaster().fit(target, features) + with pytest.raises( + FeatureImportanceUnavailableError, + match="HistGradientBoostingRegressor", + ): + extract_feature_importance(model, _feature_columns()) + + +def test_extract_prophet_like_preserves_sign() -> None: + """Ridge coefficients can be negative; the extractor must preserve sign.""" + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + items = extract_feature_importance(model, _feature_columns()) + + assert len(items) == _N_FEATURES + assert all(item.kind == "linear_coef" for item in items) + # At least one negative coefficient must exist for a target with negative + # weight on feature index 1; if not, the extractor probably called abs(). + assert any(item.importance < 0.0 for item in items), ( + "Expected at least one negative coefficient; sign was probably stripped" + ) + # Sorted by |importance| desc. + abs_seq = [abs(item.importance) for item in items] + assert abs_seq == sorted(abs_seq, reverse=True) + # Ranks are 1-indexed and monotonic. + assert [item.rank for item in items] == list(range(1, _N_FEATURES + 1)) + + +@pytest.mark.parametrize("extra", ["lightgbm", "xgboost"]) +def test_extract_tree_models_match_feature_columns(extra: str) -> None: + """LightGBM / XGBoost extractors are guarded by ``importorskip``.""" + pytest.importorskip(extra) + features, target = _synthetic_data() + model: BaseForecaster + if extra == "lightgbm": + model = LightGBMForecaster().fit(target, features) + else: + model = XGBoostForecaster().fit(target, features) + + items = extract_feature_importance(model, _feature_columns()) + assert len(items) == _N_FEATURES + assert all(item.kind == "tree" for item in items) + assert all(item.importance >= 0.0 for item in items) + abs_seq = [abs(item.importance) for item in items] + assert abs_seq == sorted(abs_seq, reverse=True) + + +# --------------------------------------------------------------------------- +# extract_feature_importance — error paths +# --------------------------------------------------------------------------- + + +def test_extract_raises_on_non_feature_aware_model() -> None: + """Baseline forecasters are explicitly rejected with a clear message.""" + naive = NaiveForecaster() + with pytest.raises(ValueError, match="not feature-aware"): + extract_feature_importance(cast(BaseForecaster, naive), _feature_columns()) + + +def test_extract_raises_on_feature_column_length_mismatch() -> None: + """The length-mismatch guard fires for an estimator that DOES expose + ``feature_importances_`` (ProphetLike here — RegressionForecaster fails + earlier with FeatureImportanceUnavailableError).""" + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + short_cols = _feature_columns()[:-3] + with pytest.raises(ValueError, match="length mismatch"): + extract_feature_importance(model, short_cols) + + +def test_extract_prophet_like_pads_dropped_imputer_columns() -> None: + """Regression test for the SimpleImputer-drops-empty-column edge case. + + sklearn 1.2+'s ``SimpleImputer`` (default ``keep_empty_features=False``) + drops columns whose training values are all NaN. In production this + fires whenever ``days_since_launch`` has no launch date — the imputer + silently strips column index 13 and Ridge learns a shape-(13,) coef. + The extractor must realign the coef vector to the full 14-column + input contract by padding the dropped column with 0.0; otherwise the + response 400s with a misleading "length mismatch" message. + """ + features, target = _synthetic_data() + features[:, 13] = np.nan # mark the last column as all-NaN + model = ProphetLikeForecaster().fit(target, features) + items = extract_feature_importance(model, _feature_columns()) + # The padded result preserves the full input contract. + assert len(items) == _N_FEATURES + # The dropped column appears with importance == 0 (model assigned no + # weight because it had no observed values). + dropped = next(item for item in items if item.name == _feature_columns()[13]) + assert dropped.importance == 0.0 diff --git a/app/features/forecasting/tests/test_routes_feature_metadata.py b/app/features/forecasting/tests/test_routes_feature_metadata.py new file mode 100644 index 00000000..118416f5 --- /dev/null +++ b/app/features/forecasting/tests/test_routes_feature_metadata.py @@ -0,0 +1,565 @@ +"""Unit route tests for the two MLZOO-D feature-metadata endpoints. + +The matrix mirrors PRP-28's ``/explain/runs/{id}`` + ``/explain/jobs/{id}`` +test pair: 200 success for tree + additive runs, 400 for baseline / wrong +job type, 404 for missing run/job, 422 for resource-state issues (no +artifact, missing extra, deleted artifact, HistGBR-no-importance gap). +Every error path asserts the RFC 7807 ``application/problem+json`` +envelope so the panel and tests can disambiguate state-prevented operations +from input-validation failures via the ``type`` URI. + +Each test monkeypatches the relevant service-layer call (``RegistryService``, +``JobService``, ``load_model_bundle``) so the routes are exercised over the +real HTTP boundary without a real database. +""" + +from __future__ import annotations + +from collections.abc import AsyncGenerator +from contextlib import asynccontextmanager +from datetime import UTC, date, datetime +from typing import Any +from unittest.mock import AsyncMock + +import numpy as np +import pytest +from httpx import ASGITransport, AsyncClient + +from app.core.database import get_db +from app.features.forecasting import service as service_module +from app.features.forecasting.models import ( + LightGBMForecaster, + ProphetLikeForecaster, + RegressionForecaster, +) +from app.features.forecasting.persistence import ModelBundle +from app.features.forecasting.schemas import ( + LightGBMModelConfig, + NaiveModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, +) +from app.features.jobs.models import JobStatus, JobType +from app.features.jobs.schemas import JobResponse +from app.features.jobs.service import JobService +from app.features.registry.schemas import RunResponse, RunStatus +from app.features.registry.service import RegistryService +from app.main import app +from app.shared.feature_frames import canonical_feature_columns + +_FEATURE_COLUMNS = list(canonical_feature_columns()) +_N_FEATURES = len(_FEATURE_COLUMNS) + + +# --------------------------------------------------------------------------- +# Shared fixtures +# --------------------------------------------------------------------------- + + +@asynccontextmanager +async def _client() -> AsyncGenerator[AsyncClient, None]: + """Yield an ``AsyncClient`` against the real ``app`` with ``get_db`` stubbed. + + The DB session is never actually used by the feature-metadata service — + every call goes through ``RegistryService`` / ``JobService`` / + ``load_model_bundle``, all monkeypatched in each test. + """ + db = AsyncMock() + + async def override_get_db() -> AsyncGenerator[AsyncMock, None]: + yield db + + app.dependency_overrides[get_db] = override_get_db + try: + async with AsyncClient( + transport=ASGITransport(app=app), + base_url="http://test", + ) as ac: + yield ac + finally: + app.dependency_overrides.pop(get_db, None) + + +def _assert_problem_detail(body: dict[str, Any], expected_status: int) -> None: + """Assert RFC 7807 envelope shape and the right status code.""" + for key in ("type", "title", "status", "detail"): + assert key in body, f"missing RFC 7807 field: {key}" + assert body["status"] == expected_status + + +def _now() -> datetime: + return datetime(2024, 1, 1, 0, 0, 0, tzinfo=UTC) + + +def _make_run( + *, + run_id: str = "run-abc", + model_type: str = "lightgbm", + status: RunStatus = RunStatus.SUCCESS, + artifact_uri: str | None = "/var/forecast-mock/model.joblib", +) -> RunResponse: + return RunResponse.model_validate( + { + "run_id": run_id, + "status": status, + "model_type": model_type, + "model_config_data": {"model_type": model_type}, + "config_hash": "deadbeefdeadbeef", + "data_window_start": date(2024, 1, 1), + "data_window_end": date(2024, 1, 31), + "store_id": 1, + "product_id": 1, + "artifact_uri": artifact_uri, + "created_at": _now(), + "updated_at": _now(), + } + ) + + +def _make_job( + *, + job_id: str = "job-abc", + job_type: JobType = JobType.TRAIN, + status: JobStatus = JobStatus.COMPLETED, + result: dict[str, Any] | None = None, +) -> JobResponse: + return JobResponse.model_validate( + { + "job_id": job_id, + "job_type": job_type, + "status": status, + "params": {"model_type": "lightgbm"}, + "result": result + if result is not None + else {"model_path": "/var/forecast-mock/model.joblib"}, + "created_at": _now(), + "updated_at": _now(), + } + ) + + +def _fitted_lightgbm_bundle() -> ModelBundle: + """Build a real fitted LightGBM bundle for the 200 success path.""" + pytest.importorskip("lightgbm") + rng = np.random.default_rng(0) + x = rng.normal(size=(60, _N_FEATURES)).astype(np.float64) + y = (3.0 * x[:, 0] - 2.0 * x[:, 1] + rng.normal(size=60)).astype(np.float64) + model = LightGBMForecaster(n_estimators=20).fit(y, x) + return ModelBundle( + model=model, + config=LightGBMModelConfig(model_type="lightgbm"), + metadata={"feature_columns": list(_FEATURE_COLUMNS)}, + ) + + +def _fitted_prophet_like_bundle() -> ModelBundle: + rng = np.random.default_rng(1) + x = rng.normal(size=(60, _N_FEATURES)).astype(np.float64) + y = (3.0 * x[:, 0] - 2.0 * x[:, 1] + rng.normal(size=60)).astype(np.float64) + model = ProphetLikeForecaster().fit(y, x) + return ModelBundle( + model=model, + config=ProphetLikeModelConfig(model_type="prophet_like"), + metadata={"feature_columns": list(_FEATURE_COLUMNS)}, + ) + + +def _fitted_regression_bundle() -> ModelBundle: + rng = np.random.default_rng(2) + x = rng.normal(size=(60, _N_FEATURES)).astype(np.float64) + y = (3.0 * x[:, 0] - 2.0 * x[:, 1] + rng.normal(size=60)).astype(np.float64) + model = RegressionForecaster().fit(y, x) + return ModelBundle( + model=model, + config=RegressionModelConfig(model_type="regression"), + metadata={"feature_columns": list(_FEATURE_COLUMNS)}, + ) + + +# --------------------------------------------------------------------------- +# /forecasting/runs/{run_id}/feature-metadata +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_200_for_lightgbm( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """A fitted LightGBM run returns a sorted importance list and 'tree' kind.""" + pytest.importorskip("lightgbm") + bundle = _fitted_lightgbm_bundle() + monkeypatch.setattr( + RegistryService, "get_run", AsyncMock(return_value=_make_run(model_type="lightgbm")) + ) + monkeypatch.setattr(service_module, "load_model_bundle", lambda *a, **kw: bundle) + + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-abc/feature-metadata") + + assert response.status_code == 200 + body = response.json() + assert body["run_id"] == "run-abc" + assert body["model_type"] == "lightgbm" + assert body["model_family"] == "tree" + assert body["feature_columns"] == _FEATURE_COLUMNS + assert len(body["features"]) == _N_FEATURES + # All tree-kind items are non-negative and sorted by |importance| desc. + assert all(item["kind"] == "tree" for item in body["features"]) + assert all(item["importance"] >= 0.0 for item in body["features"]) + abs_seq = [abs(item["importance"]) for item in body["features"]] + assert abs_seq == sorted(abs_seq, reverse=True) + # importance_type is populated (LightGBM default = "split"). + assert body["importance_type"] == "split" + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_200_for_prophet_like( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """A fitted prophet_like run preserves negative coefficients.""" + bundle = _fitted_prophet_like_bundle() + monkeypatch.setattr( + RegistryService, "get_run", AsyncMock(return_value=_make_run(model_type="prophet_like")) + ) + monkeypatch.setattr(service_module, "load_model_bundle", lambda *a, **kw: bundle) + + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-abc/feature-metadata") + + assert response.status_code == 200 + body = response.json() + assert body["model_family"] == "additive" + assert body["importance_type"] == "ridge_coef" + assert all(item["kind"] == "linear_coef" for item in body["features"]) + # The synthetic target weights feature index 1 negatively — at least one + # coefficient must be negative if sign was preserved. + assert any(item["importance"] < 0 for item in body["features"]) + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_404_for_missing_run( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr(RegistryService, "get_run", AsyncMock(return_value=None)) + async with _client() as ac: + response = await ac.get("/forecasting/runs/does-not-exist/feature-metadata") + assert response.status_code == 404 + body = response.json() + _assert_problem_detail(body, 404) + assert body["code"] == "NOT_FOUND" + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_400_for_baseline( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr( + RegistryService, "get_run", AsyncMock(return_value=_make_run(model_type="naive")) + ) + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-naive/feature-metadata") + assert response.status_code == 400 + body = response.json() + _assert_problem_detail(body, 400) + assert body["code"] == "BAD_REQUEST" + assert "baseline" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_422_when_artifact_missing( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr( + RegistryService, + "get_run", + AsyncMock( + return_value=_make_run( + model_type="lightgbm", + status=RunStatus.SUCCESS, + artifact_uri=None, + ) + ), + ) + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-noart/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + # MUST NOT collide with the Pydantic-input VALIDATION_ERROR 422. + assert body["code"] != "VALIDATION_ERROR" + + +@pytest.mark.asyncio +@pytest.mark.parametrize("status", [RunStatus.PENDING, RunStatus.RUNNING, RunStatus.FAILED]) +async def test_get_run_metadata_returns_422_for_non_success_status( + monkeypatch: pytest.MonkeyPatch, status: RunStatus +) -> None: + monkeypatch.setattr( + RegistryService, + "get_run", + AsyncMock(return_value=_make_run(model_type="lightgbm", status=status)), + ) + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-pending/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_422_for_missing_ml_extra( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """``joblib.load`` raises ModuleNotFoundError when an ml-* extra is missing.""" + monkeypatch.setattr( + RegistryService, + "get_run", + AsyncMock(return_value=_make_run(model_type="lightgbm")), + ) + + def _raise(*a: Any, **kw: Any) -> None: + raise ModuleNotFoundError("No module named 'lightgbm'", name="lightgbm") + + monkeypatch.setattr(service_module, "load_model_bundle", _raise) + + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-noextra/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "ml-lightgbm" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_422_for_deleted_artifact( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """``load_model_bundle`` raises FileNotFoundError when the file is gone.""" + monkeypatch.setattr( + RegistryService, + "get_run", + AsyncMock(return_value=_make_run(model_type="lightgbm")), + ) + + def _raise(*a: Any, **kw: Any) -> None: + raise FileNotFoundError("/var/forecast-mock/model.joblib") + + monkeypatch.setattr(service_module, "load_model_bundle", _raise) + + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-deleted/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "missing from disk" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_run_metadata_returns_422_for_regression_no_importance( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """RegressionForecaster's HistGBR doesn't expose feature_importances_.""" + bundle = _fitted_regression_bundle() + monkeypatch.setattr( + RegistryService, + "get_run", + AsyncMock(return_value=_make_run(model_type="regression")), + ) + monkeypatch.setattr(service_module, "load_model_bundle", lambda *a, **kw: bundle) + + async with _client() as ac: + response = await ac.get("/forecasting/runs/run-reg/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "HistGradientBoostingRegressor" in body["detail"] + + +# --------------------------------------------------------------------------- +# /forecasting/jobs/{job_id}/feature-metadata +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_200_for_completed_train( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """A completed train job returns the importance list keyed by the artifact key.""" + bundle = _fitted_prophet_like_bundle() + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock( + return_value=_make_job( + result={"model_path": "/var/forecast-mock/forecast/model_abc123def456.joblib"} + ) + ), + ) + monkeypatch.setattr(service_module, "load_model_bundle", lambda *a, **kw: bundle) + + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-abc/feature-metadata") + + assert response.status_code == 200 + body = response.json() + # The artifact key is parsed from the bundle file stem (NOT a registry UUID). + assert body["run_id"] == "abc123def456" + assert body["model_family"] == "additive" + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_404_for_missing_job( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr(JobService, "get_job", AsyncMock(return_value=None)) + async with _client() as ac: + response = await ac.get("/forecasting/jobs/missing/feature-metadata") + assert response.status_code == 404 + body = response.json() + _assert_problem_detail(body, 404) + assert body["code"] == "NOT_FOUND" + + +@pytest.mark.asyncio +@pytest.mark.parametrize( + ("job_type", "status_value", "detail_substring"), + [ + (JobType.PREDICT, JobStatus.COMPLETED, "completed"), + (JobType.TRAIN, JobStatus.PENDING, "completed"), + (JobType.TRAIN, JobStatus.FAILED, "completed"), + ], +) +async def test_get_job_metadata_returns_400_for_wrong_job_state( + monkeypatch: pytest.MonkeyPatch, + job_type: JobType, + status_value: JobStatus, + detail_substring: str, +) -> None: + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock(return_value=_make_job(job_type=job_type, status=status_value, result=None)), + ) + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-x/feature-metadata") + assert response.status_code == 400 + body = response.json() + _assert_problem_detail(body, 400) + assert body["code"] == "BAD_REQUEST" + assert detail_substring in body["detail"] + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_400_for_baseline_trained_job( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Defence-in-depth: a completed train job whose underlying model is + baseline returns 400, not a 200 with no importance.""" + rng = np.random.default_rng(3) + from app.features.forecasting.models import NaiveForecaster + + y = rng.normal(size=30).astype(np.float64) + baseline_model = NaiveForecaster() + baseline_model.fit(y) + baseline_bundle = ModelBundle( + model=baseline_model, + config=NaiveModelConfig(model_type="naive"), + metadata={"feature_columns": list(_FEATURE_COLUMNS)}, + ) + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock( + return_value=_make_job( + result={"model_path": "/var/forecast-mock/forecast/model_baseline.joblib"} + ) + ), + ) + monkeypatch.setattr(service_module, "load_model_bundle", lambda *a, **kw: baseline_bundle) + + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-baseline/feature-metadata") + assert response.status_code == 400 + body = response.json() + _assert_problem_detail(body, 400) + assert body["code"] == "BAD_REQUEST" + assert "baseline" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_422_for_missing_model_path( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock(return_value=_make_job(result={})), # no model_path key + ) + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-bad/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "model_path" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_422_for_deleted_artifact( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock( + return_value=_make_job( + result={"model_path": "/var/forecast-mock/forecast/model_gone.joblib"} + ) + ), + ) + + def _raise(*a: Any, **kw: Any) -> None: + raise FileNotFoundError("/var/forecast-mock/forecast/model_gone.joblib") + + monkeypatch.setattr(service_module, "load_model_bundle", _raise) + + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-gone/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "missing from disk" in body["detail"] + + +@pytest.mark.asyncio +async def test_get_job_metadata_returns_422_for_missing_ml_extra( + monkeypatch: pytest.MonkeyPatch, +) -> None: + monkeypatch.setattr( + JobService, + "get_job", + AsyncMock( + return_value=_make_job( + result={"model_path": "/var/forecast-mock/forecast/model_x.joblib"} + ) + ), + ) + + def _raise(*a: Any, **kw: Any) -> None: + raise ModuleNotFoundError("No module named 'xgboost'", name="xgboost") + + monkeypatch.setattr(service_module, "load_model_bundle", _raise) + + async with _client() as ac: + response = await ac.get("/forecasting/jobs/job-noext/feature-metadata") + assert response.status_code == 422 + body = response.json() + _assert_problem_detail(body, 422) + assert body["code"] == "UNPROCESSABLE_ENTITY" + assert "ml-xgboost" in body["detail"] diff --git a/app/features/jobs/schemas.py b/app/features/jobs/schemas.py index b7e9663d..716ec3b4 100644 --- a/app/features/jobs/schemas.py +++ b/app/features/jobs/schemas.py @@ -25,7 +25,8 @@ class JobCreate(BaseModel): **Job Types and Required Params**: - **train**: Train a forecasting model - - `model_type`: Required - 'naive', 'seasonal_naive', 'moving_average', 'regression'. + - `model_type`: Required - 'naive', 'seasonal_naive', 'moving_average', + 'regression', 'lightgbm', 'xgboost', 'prophet_like'. - `store_id`: Required - Store ID from /dimensions/stores - `product_id`: Required - Product ID from /dimensions/products - `start_date`: Required - Training data start (YYYY-MM-DD) diff --git a/app/features/registry/schemas.py b/app/features/registry/schemas.py index 97d0ddf1..61a16c19 100644 --- a/app/features/registry/schemas.py +++ b/app/features/registry/schemas.py @@ -15,7 +15,16 @@ from enum import Enum from typing import Any -from pydantic import BaseModel, ConfigDict, Field, field_validator +from pydantic import BaseModel, ConfigDict, Field, computed_field, field_validator + +# Pydantic v2 resolves a ``@computed_field``'s return-type annotation at +# validation time, so ``ModelFamily`` must be a real runtime import here. +# To avoid the cycle this introduces with the forecasting slice (whose +# ``service.py`` imports ``RegistryService``), the forecasting slice's +# cross-slice imports of ``RegistryService`` / ``JobService`` / status enums +# are LAZY (inside the methods that use them). See +# ``app/features/forecasting/service.py`` for the matching contract. +from app.features.forecasting.schemas import ModelFamily class RunStatus(str, Enum): @@ -107,7 +116,14 @@ class RunUpdate(BaseModel): class RunResponse(BaseModel): - """Run details response.""" + """Run details response. + + ``model_family`` is a computed field derived from ``model_type`` at + serialization time — no DB column, no Alembic migration, no backfill. + See ``app/features/forecasting/feature_metadata.py:model_family_for`` for + the canonical map. Unknown model types log a warning and return + ``ModelFamily.BASELINE``. + """ model_config = ConfigDict(from_attributes=True, populate_by_name=True) @@ -136,6 +152,19 @@ class RunResponse(BaseModel): created_at: datetime updated_at: datetime + @computed_field # type: ignore[prop-decorator] + @property + def model_family(self) -> ModelFamily: + """Computed family label derived from ``model_type``. + + Imported lazily to avoid a hard cycle between + ``registry.schemas`` and ``forecasting.feature_metadata`` at module + import time. + """ + from app.features.forecasting.feature_metadata import model_family_for + + return model_family_for(self.model_type) + class RunListResponse(BaseModel): """Paginated list of runs.""" diff --git a/app/features/registry/tests/test_schemas.py b/app/features/registry/tests/test_schemas.py index 459531d7..a1a714e4 100644 --- a/app/features/registry/tests/test_schemas.py +++ b/app/features/registry/tests/test_schemas.py @@ -1,15 +1,18 @@ """Unit tests for registry schemas.""" -from datetime import date +from datetime import UTC, date, datetime +from typing import ClassVar import pytest from pydantic import ValidationError +from app.features.forecasting.schemas import ModelFamily from app.features.registry.schemas import ( VALID_TRANSITIONS, AgentContext, AliasCreate, RunCreate, + RunResponse, RunStatus, RuntimeInfo, RunUpdate, @@ -381,3 +384,76 @@ def test_validate_description_max_length(self) -> None: with pytest.raises(ValidationError) as exc_info: AliasCreate(alias_name="test", run_id="x", description="x" * 501) assert "description" in str(exc_info.value) + + +class TestRunResponseModelFamily: + """Tests for the computed ``model_family`` field on ``RunResponse``. + + MLZOO-D / PRP-31: ``model_family`` is derived from ``model_type`` at + serialization time via a Pydantic ``@computed_field``. No DB column, no + migration, no backfill. Unknown types degrade to ``BASELINE`` and log a + warning (forward-compat). + """ + + _BASE_FIELDS: ClassVar[dict[str, object]] = { + "run_id": "abc123", + "status": RunStatus.SUCCESS, + "model_config_data": {"model_type": "naive"}, + "config_hash": "deadbeefdeadbeef", + "data_window_start": date(2024, 1, 1), + "data_window_end": date(2024, 1, 31), + "store_id": 1, + "product_id": 1, + "created_at": datetime(2024, 1, 1, 0, 0, 0, tzinfo=UTC), + "updated_at": datetime(2024, 1, 2, 0, 0, 0, tzinfo=UTC), + } + + def _make_response(self, model_type: str) -> RunResponse: + fields = {**self._BASE_FIELDS, "model_type": model_type} + return RunResponse.model_validate(fields) + + @pytest.mark.parametrize( + ("model_type", "expected"), + [ + ("naive", ModelFamily.BASELINE), + ("seasonal_naive", ModelFamily.BASELINE), + ("moving_average", ModelFamily.BASELINE), + ("regression", ModelFamily.TREE), + ("lightgbm", ModelFamily.TREE), + ("xgboost", ModelFamily.TREE), + ("prophet_like", ModelFamily.ADDITIVE), + ], + ) + def test_model_family_computed_for_every_known_type( + self, model_type: str, expected: ModelFamily + ) -> None: + """All seven canonical model_types map to the expected ModelFamily.""" + response = self._make_response(model_type) + assert response.model_family == expected + + def test_model_family_unknown_type_falls_back_to_baseline(self) -> None: + """An unknown model_type degrades to BASELINE (logs a warning). + + Forward-compat: a model_type added to ``forecasting/models.py`` before + the family map is updated should not crash the registry — it just + shows up in the dashboard as a baseline until the map catches up. + """ + response = self._make_response("future_arima_v9") + assert response.model_family == ModelFamily.BASELINE + + def test_model_family_serializes_alongside_model_config_alias(self) -> None: + """Both ``model_config`` (aliased) and ``model_family`` are top-level + keys on the serialized dict — no collision with the alias.""" + response = self._make_response("lightgbm") + dumped = response.model_dump(by_alias=True) + assert dumped["model_config"] == {"model_type": "naive"} + assert dumped["model_family"] == ModelFamily.TREE + # Both alias and computed field present: + assert "model_config" in dumped + assert "model_family" in dumped + + def test_model_family_propagates_to_serialized_json(self) -> None: + """model_family round-trips through model_dump_json with the str value.""" + response = self._make_response("prophet_like") + json_str = response.model_dump_json(by_alias=True) + assert '"model_family":"additive"' in json_str diff --git a/app/shared/feature_frames/rows.py b/app/shared/feature_frames/rows.py index 9aed8adc..5d40c970 100644 --- a/app/shared/feature_frames/rows.py +++ b/app/shared/feature_frames/rows.py @@ -82,7 +82,23 @@ def build_historical_feature_rows( Row-major feature matrix ``[n_observations][n_features]``; ``NaN`` marks a lag whose source day precedes the series, and ``days_since_launch`` when the product has no launch date. + + Raises: + ValueError: When ``dates``, ``quantities``, and ``prices`` do not all + share the same length, or ``baseline_price`` is not finite and > 0. """ + n_dates = len(dates) + if len(quantities) != n_dates or len(prices) != n_dates: + raise ValueError( + "build_historical_feature_rows: dates, quantities, and prices must have " + f"equal lengths (got dates={n_dates}, quantities={len(quantities)}, " + f"prices={len(prices)})" + ) + if not math.isfinite(baseline_price) or baseline_price <= 0.0: + raise ValueError( + "build_historical_feature_rows: baseline_price must be finite and > 0, " + f"got {baseline_price!r}" + ) calendar_columns = build_calendar_columns(dates) rows: list[list[float]] = [] for index, day in enumerate(dates): @@ -158,7 +174,8 @@ def build_future_feature_rows( Raises: ValueError: When ``gap`` is negative, ``test_prices`` does not align - with ``test_dates``, or a canonical column cannot be sourced. + with ``test_dates``, ``baseline_price`` is not finite and > 0, or a + canonical column cannot be sourced. """ horizon = len(test_dates) if gap < 0: @@ -168,6 +185,11 @@ def build_future_feature_rows( f"build_future_feature_rows: test_prices has {len(test_prices)} entries " f"but test_dates has {horizon} — they must align" ) + if not math.isfinite(baseline_price) or baseline_price <= 0.0: + raise ValueError( + "build_future_feature_rows: baseline_price must be finite and > 0, " + f"got {baseline_price!r}" + ) # Lags: build for gap + horizon days, then drop the gap lead-in so row j # corresponds to test day j. NaN-where-future is enforced by the builder. diff --git a/docs/_base/ARCHITECTURE.md b/docs/_base/ARCHITECTURE.md index 0b6fbb88..2c087069 100644 --- a/docs/_base/ARCHITECTURE.md +++ b/docs/_base/ARCHITECTURE.md @@ -68,6 +68,30 @@ ForecastLabAI repo | RAG embeddings | RAG service → OpenAI / Ollama | HTTPS | `OPENAI_API_KEY` env or Ollama LAN URL | | Agent LLM calls | Agents → Anthropic/OpenAI/Gemini | HTTPS | provider API key env | +### Cross-slice read-only import pattern + +When a feature slice needs to call a service method or read a schema from a +**different** feature slice (e.g., `forecasting/service.py` → `RegistryService`): + +- Import at the **call site** (inside the method), not at module scope, IF + any of these are true: + - The upstream slice's `schemas.py` imports a type from this slice + - The downstream slice is loaded by `alembic/env.py` at migration time + - The import would close an SQLAlchemy registry cycle + +- Prefer importing the **service class** over the ORM model — calls go through + the public surface, not the persistence layer. + +- Document the lazy import with a single-line NOTE comment at the top of the + file naming the cycle it breaks. + +Existing precedents: +- `app/features/explainability/service.py:57` — read-only `ModelRun` import +- `app/features/forecasting/service.py` — lazy `RegistryService` / `JobService` / + `RunStatus` imports inside `get_feature_metadata_for_*` methods (added by + PRP-31; required because `RunResponse.model_family` computed_field closes + the cycle at alembic cold-boot) + ## Deployment Flow (Causal Chain) ``` diff --git a/docs/user-guide/dashboard-guide.md b/docs/user-guide/dashboard-guide.md index 83aec0bd..114aab0d 100644 --- a/docs/user-guide/dashboard-guide.md +++ b/docs/user-guide/dashboard-guide.md @@ -37,9 +37,15 @@ row opens a detail page. its **detail page**: profile, KPIs, revenue and lifecycle-demand curves, and a top-stores drilldown. - **Model Runs** (`/explorer/runs`) — every trained model tracked in the registry. - A run **detail page** shows its configuration, metrics, and runtime info as JSON, - cross-links to the store/product, an artifact-integrity check, and a compare link. - Two runs can be compared side by side (config diff + metrics diff with deltas). + A **Family** badge column distinguishes baseline, tree, and additive models at + a glance. The run **detail page** shows configuration, metrics, runtime info, + cross-links to the store/product, an artifact-integrity check, a compare link, + and (for non-baseline runs) the canonical feature columns plus a feature + importance panel — see + [Advanced Model Metadata](./feature-reference.md#advanced-model-metadata) in the + Feature Reference for the data model and error semantics. Two runs can be + compared side by side (config diff, metrics diff with deltas, and same-family + feature importance side-by-side). - **Jobs** (`/explorer/jobs`) — submitted train/predict/backtest jobs. A job **detail page** shows parameters, result JSON, error details, the linked run, a cancel action, and live status polling. diff --git a/docs/user-guide/feature-reference.md b/docs/user-guide/feature-reference.md index 91e1ba51..71e2d21b 100644 --- a/docs/user-guide/feature-reference.md +++ b/docs/user-guide/feature-reference.md @@ -53,13 +53,61 @@ features never use information from the future. Trains demand-forecasting models and generates predictions. -- `POST /forecasting/train` — train a model. Supported model types: `naive`, - `seasonal_naive`, `moving_average` (baselines) and `lightgbm` (machine learning). +- `POST /forecasting/train` — train a model. Supported model types: + - **Baselines**: `naive`, `seasonal_naive`, `moving_average` — always available. + - **Tree (feature-aware)**: `regression` (HistGradientBoostingRegressor, always + available), `lightgbm` (requires the `ml-lightgbm` extra), `xgboost` (requires + the `ml-xgboost` extra). + - **Additive (feature-aware)**: `prophet_like` — a Ridge regressor over the + canonical 14-column feature frame; always available. - `POST /forecasting/predict` — generate horizon predictions from a trained model. +- `GET /forecasting/runs/{run_id}/feature-metadata` — return the canonical + feature columns the trained model consumed and the fitted estimator's native + feature importance (tree models) or signed coefficients (additive + `prophet_like`). See **Advanced Model Metadata** below. +- `GET /forecasting/jobs/{job_id}/feature-metadata` — the job-keyed sibling of + the run endpoint; use it from the forecast viz page, which only holds a + `job_id`. The three baselines exist as honest comparison points — a machine-learning model is only worth using if it beats them. +### Advanced Model Metadata + +Every model returned by `/registry/runs` carries a computed `model_family` field +— `baseline` (naive / seasonal_naive / moving_average), `tree` (regression / +lightgbm / xgboost), or `additive` (prophet_like). The dashboard surfaces this +in four places: a **Family** badge column on the runs explorer, a badge + cards +on the run detail page (the 14 canonical feature columns and a feature +importance panel), a side-by-side comparison on the run compare page (rendered +only when both runs share a non-baseline family), and a collapsible importance +panel on the forecast viz page tied to the train job. + +For a `tree`-family run the panel renders non-negative bars whose length +reflects relative magnitude (the boosters' native `feature_importances_` +attribute — LightGBM's `'split'`, XGBoost's `'weight'`, etc.; the label is +shown at the top of the panel). For an `additive` (`prophet_like`) run the +panel preserves the **sign** of the Ridge coefficient — a positive coefficient +renders green with a `TrendingUp` icon; a negative one renders red with +`TrendingDown`. + +> **Correlation, not causation.** Feature importance is model-derived. It +> reflects how much each feature reduced the model's training error — not +> real-world causation. Two products with similar importance profiles are not +> necessarily driven by the same business factors. + +Three error semantics map cleanly to RFC 7807 `application/problem+json`: +- **400 `BAD_REQUEST`** — the run is a baseline (no native importance vector + exists). The panel renders a neutral muted message. +- **404 `NOT_FOUND`** — the run or job is not in the registry. +- **422 `UNPROCESSABLE_ENTITY`** — the run has no artifact yet + (`pending` / `running` / `failed`), the artifact file has been deleted from + disk, an optional `ml-*` extra is not installed at unpickle time, or the + underlying estimator does not expose `feature_importances_` (sklearn's + `HistGradientBoostingRegressor`, used by `regression`, does not). + The 422 type URI is `UNPROCESSABLE_ENTITY` (distinct from `VALIDATION_ERROR`, + which is reserved for input failures). + ## Backtesting Measures how accurate a model would have been, using time-series cross-validation. diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md index f3109d4b..4f371c2a 100644 --- a/docs/user-guide/getting-started.md +++ b/docs/user-guide/getting-started.md @@ -92,6 +92,9 @@ You can also watch the same pipeline run live in the browser on the **Showcase** migrations are applied (`uv run alembic upgrade head`). - **API keys** — the AI agent and RAG features need `OPENAI_API_KEY` and/or `ANTHROPIC_API_KEY` set in `.env`. Forecasting and the dashboard work without them. +- **Browser dogfood / UI verification** — run `./scripts/dogfood-browser.sh` to + verify Playwright + snap chromium are ready for headless dashboard exercises; + pass a Python file path to execute it through the prepared environment. ## Next Steps diff --git a/examples/models/feature_frame_contract.md b/examples/models/feature_frame_contract.md index 5e408fe6..13bced7a 100644 --- a/examples/models/feature_frame_contract.md +++ b/examples/models/feature_frame_contract.md @@ -52,7 +52,7 @@ same order*: `canonical_feature_columns()` returns these **14 columns, in this order**: -``` +```text lag_1, lag_7, lag_14, lag_28, dow_sin, dow_cos, month_sin, month_cos, is_weekend, is_month_end, price_factor, promo_active, is_holiday, days_since_launch diff --git a/examples/models/model_interface.md b/examples/models/model_interface.md index cd6a1424..3f06fce1 100644 --- a/examples/models/model_interface.md +++ b/examples/models/model_interface.md @@ -238,7 +238,7 @@ Predicts the average of the last `window_size` observations. ### Regression Forecaster -``` +```text ŷ[t+h] = HistGradientBoostingRegressor.predict(X[t+h]) ``` @@ -248,7 +248,7 @@ it REQUIRES a feature frame — see [`feature_frame_contract.md`](feature_frame_ ### LightGBM Forecaster -``` +```text ŷ[t+h] = LGBMRegressor.predict(X[t+h]) ``` @@ -260,7 +260,7 @@ behind the `ml-lightgbm` extra and the `forecast_enable_lightgbm` flag. ### XGBoost Forecaster -``` +```text ŷ[t+h] = XGBRegressor.predict(X[t+h]) ``` @@ -273,7 +273,7 @@ fixed `random_state`, no stochastic subsampling), and NaN-tolerant ### Prophet-like Forecaster -``` +```text ŷ[t+h] = intercept + trend[t+h] + seasonality[t+h] + holiday_regressor[t+h] ``` diff --git a/frontend/src/components/common/model-family-badge.test.tsx b/frontend/src/components/common/model-family-badge.test.tsx new file mode 100644 index 00000000..6749c461 --- /dev/null +++ b/frontend/src/components/common/model-family-badge.test.tsx @@ -0,0 +1,35 @@ +import { afterEach, describe, expect, it } from 'vitest' +import { cleanup, render, screen } from '@testing-library/react' +import { ModelFamilyBadge } from './model-family-badge' + +afterEach(cleanup) + +// MLZOO-D / PRP-31 — closes the test-requirements.md gap flagged by +// prp-quality-agent (M3). Three trivial cases: one per family. + +describe('ModelFamilyBadge', () => { + it('renders the baseline family with the secondary variant', () => { + render() + const badge = screen.getByTestId('model-family-badge') + expect(badge).toBeTruthy() + expect(badge.getAttribute('data-family')).toBe('baseline') + expect(badge.className).toContain('bg-secondary') + expect(screen.getByText('Baseline')).toBeTruthy() + }) + + it('renders the tree family with the default (primary) variant', () => { + render() + const badge = screen.getByTestId('model-family-badge') + expect(badge.getAttribute('data-family')).toBe('tree') + expect(badge.className).toContain('bg-primary') + expect(screen.getByText('Tree')).toBeTruthy() + }) + + it('renders the additive family with the outline variant', () => { + render() + const badge = screen.getByTestId('model-family-badge') + expect(badge.getAttribute('data-family')).toBe('additive') + expect(badge.className).toContain('border-border') + expect(screen.getByText('Additive')).toBeTruthy() + }) +}) diff --git a/frontend/src/components/common/model-family-badge.tsx b/frontend/src/components/common/model-family-badge.tsx new file mode 100644 index 00000000..b62a6555 --- /dev/null +++ b/frontend/src/components/common/model-family-badge.tsx @@ -0,0 +1,55 @@ +import { Activity, LineChart, TreePine } from 'lucide-react' +import { Badge } from '@/components/ui/badge' +import { cn } from '@/lib/utils' +import type { ModelFamily } from '@/types/api' + +/** + * MLZOO-D / PRP-31 — Family badge rendered on the runs explorer, run detail, + * run compare and forecast viz pages. Pure derivation; no hooks, no fetch. + * + * Variant + icon map: + * baseline → secondary + Activity (the simple "what-just-happened" family) + * tree → default + TreePine (the gradient-boosted family) + * additive → outline + LineChart (the prophet_like Ridge family) + */ + +interface ModelFamilyBadgeProps { + family: ModelFamily + className?: string +} + +const FAMILY_LABEL: Record = { + baseline: 'Baseline', + tree: 'Tree', + additive: 'Additive', +} + +const FAMILY_VARIANT: Record = { + baseline: 'secondary', + tree: 'default', + additive: 'outline', +} + +const FAMILY_ICON: Record< + ModelFamily, + typeof Activity | typeof TreePine | typeof LineChart +> = { + baseline: Activity, + tree: TreePine, + additive: LineChart, +} + +export function ModelFamilyBadge({ family, className }: ModelFamilyBadgeProps) { + const Icon = FAMILY_ICON[family] + return ( + + + {FAMILY_LABEL[family]} + + ) +} diff --git a/frontend/src/components/explainability/feature-importance-panel.test.tsx b/frontend/src/components/explainability/feature-importance-panel.test.tsx new file mode 100644 index 00000000..acf3f3f9 --- /dev/null +++ b/frontend/src/components/explainability/feature-importance-panel.test.tsx @@ -0,0 +1,110 @@ +import { afterEach, describe, expect, it } from 'vitest' +import { cleanup, render, screen } from '@testing-library/react' +import { ApiError } from '@/lib/api' +import { FeatureImportancePanel } from './feature-importance-panel' + +afterEach(cleanup) +import type { + FeatureImportanceItem, + FeatureMetadataResponse, +} from '@/types/api' + +const treeItems: FeatureImportanceItem[] = [ + { name: 'price_factor', importance: 120.5, kind: 'tree', rank: 1 }, + { name: 'lag_7', importance: 80.2, kind: 'tree', rank: 2 }, + { name: 'is_holiday', importance: 30.1, kind: 'tree', rank: 3 }, +] + +const linearItems: FeatureImportanceItem[] = [ + { name: 'price_factor', importance: 2.3, kind: 'linear_coef', rank: 1 }, + { name: 'is_holiday', importance: -1.7, kind: 'linear_coef', rank: 2 }, + { name: 'promo_active', importance: 0.4, kind: 'linear_coef', rank: 3 }, +] + +const sampleTree: FeatureMetadataResponse = { + run_id: 'run-tree', + model_type: 'lightgbm', + model_family: 'tree', + feature_columns: treeItems.map((i) => i.name), + features: treeItems, + importance_type: 'split', +} + +const sampleLinear: FeatureMetadataResponse = { + run_id: 'run-additive', + model_type: 'prophet_like', + model_family: 'additive', + feature_columns: linearItems.map((i) => i.name), + features: linearItems, + importance_type: 'ridge_coef', +} + +describe('FeatureImportancePanel', () => { + it('renders tree-kind items with positive bars and primary colour', () => { + render() + expect(screen.getByText('Feature Importance')).toBeTruthy() + const rows = screen.getAllByTestId('feature-importance-row') + expect(rows.length).toBe(treeItems.length) + // All rows are tree-kind. + rows.forEach((row) => { + expect(row.getAttribute('data-kind')).toBe('tree') + }) + // The importance_type tag renders ("split" for the LightGBM default). + expect(screen.getByText('split')).toBeTruthy() + // The verbatim correlation-vs-causation caveat is present. + expect( + screen.getByText(/Importance is model-derived. It reflects how much each feature/), + ).toBeTruthy() + }) + + it('renders linear-coef items with signed values', () => { + render() + const rows = screen.getAllByTestId('feature-importance-row') + rows.forEach((row) => { + expect(row.getAttribute('data-kind')).toBe('linear_coef') + }) + // The negative coefficient renders with its sign preserved. + expect(screen.getByText(/-1\.700/)).toBeTruthy() + // The positive coefficients render with the right magnitude. + expect(screen.getByText(/2\.300/)).toBeTruthy() + // The additive family description is present. + expect( + screen.getByText(/Additive Ridge coefficients\. Sign indicates direction/), + ).toBeTruthy() + }) + + it('renders a loading state', () => { + render() + expect(screen.getByText(/Loading feature importance/)).toBeTruthy() + }) + + it('renders a neutral message for a 400 (baseline family)', () => { + const apiError = new ApiError('baseline only', 400) + render() + expect(screen.getByText(/tree and additive model families only/)).toBeTruthy() + }) + + it('renders a neutral message for a 422 (no artifact / missing extra)', () => { + const apiError = new ApiError('no artifact', 422) + render() + expect( + screen.getByText(/available once training completes and the artifact is saved/), + ).toBeTruthy() + }) + + it('renders a destructive message for an unexpected error', () => { + render() + expect(screen.getByText('boom')).toBeTruthy() + }) + + it('renders an empty-features neutral message', () => { + render( + , + ) + expect( + screen.getByText(/No feature importance values are available/), + ).toBeTruthy() + }) +}) diff --git a/frontend/src/components/explainability/feature-importance-panel.tsx b/frontend/src/components/explainability/feature-importance-panel.tsx new file mode 100644 index 00000000..175ea1e6 --- /dev/null +++ b/frontend/src/components/explainability/feature-importance-panel.tsx @@ -0,0 +1,204 @@ +import type { ReactNode } from 'react' +import { + AlertTriangle, + BarChart3, + Info, + Minus, + TrendingDown, + TrendingUp, +} from 'lucide-react' +import { ApiError, formatNumber, getErrorMessage } from '@/lib/api' +import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' +import { LoadingState } from '@/components/common/loading-state' +import { ModelFamilyBadge } from '@/components/common/model-family-badge' +import { cn } from '@/lib/utils' +import type { FeatureImportanceItem, FeatureMetadataResponse } from '@/types/api' + +// MLZOO-D / PRP-31 — sibling of ExplanationPanel (PRP-28). One component, +// two display modes: 'tree' (positive-only bars, neutral colour) and +// 'linear_coef' (signed bars with direction colour + icon). The mode is +// selected per item by FeatureImportanceItem.kind so a single panel can +// render any feature-aware family without the consumer page knowing the +// family up front (DECISIONS LOCKED #4). + +interface FeatureImportancePanelProps { + data?: FeatureMetadataResponse + isLoading?: boolean + error?: unknown +} + +const FAMILY_DESCRIPTION: Record = { + tree: 'Tree-based feature importance (model-derived). Bar length = relative magnitude.', + additive: + 'Additive Ridge coefficients. Sign indicates direction; bar length = |coefficient|.', + baseline: 'Feature importance is not available for baseline models.', +} + +function PanelShell({ + family, + importanceType, + children, +}: { + family?: FeatureMetadataResponse['model_family'] + importanceType?: string | null + children: ReactNode +}) { + return ( + + + + + Feature Importance + {family ? : null} + + + {family ? FAMILY_DESCRIPTION[family] : 'Feature importance for the trained model.'} + {importanceType ? ( + + {importanceType} + + ) : null} + + + {children} + + ) +} + +function NeutralMessage({ message }: { message: string }) { + return ( +
+ + {message} +
+ ) +} + +function DestructiveMessage({ message }: { message: string }) { + return ( +
+ + {message} +
+ ) +} + +function ImportanceRow({ + item, + maxAbs, +}: { + item: FeatureImportanceItem + maxAbs: number +}) { + const widthPercent = maxAbs > 0 ? (Math.abs(item.importance) / maxAbs) * 100 : 0 + const isLinear = item.kind === 'linear_coef' + const isPositive = item.importance >= 0 + const barColour = isLinear + ? isPositive + ? 'bg-success/70' + : 'bg-destructive/70' + : 'bg-primary/60' + const textColour = isLinear + ? isPositive + ? 'text-success' + : 'text-destructive' + : 'text-foreground' + const Icon = isLinear ? (isPositive ? TrendingUp : TrendingDown) : Minus + return ( +
  • + + {item.name} + +
    +
    +
    + + + {formatNumber(item.importance, 3)} + +
  • + ) +} + +/** Error-branch helper: pick the right shell + message for an `ApiError`. */ +function errorPanel(error: unknown): ReactNode { + if (error instanceof ApiError) { + if (error.status === 400) { + return ( + + + + ) + } + if (error.status === 422) { + return ( + + + + ) + } + } + return ( + + + + ) +} + +export function FeatureImportancePanel({ + data, + isLoading, + error, +}: FeatureImportancePanelProps) { + if (isLoading) { + return ( + + + + ) + } + + if (error) { + return errorPanel(error) + } + + if (!data) { + return null + } + + if (data.features.length === 0) { + return ( + + + + ) + } + + const maxAbs = data.features.reduce( + (acc, item) => Math.max(acc, Math.abs(item.importance)), + 0, + ) + + return ( + +
      + {data.features.map((item) => ( + + ))} +
    +

    + Importance is model-derived. It reflects how much each feature reduced + the model's training error — not real-world causation. +

    +
    + ) +} diff --git a/frontend/src/hooks/use-feature-metadata.ts b/frontend/src/hooks/use-feature-metadata.ts new file mode 100644 index 00000000..cb47933e --- /dev/null +++ b/frontend/src/hooks/use-feature-metadata.ts @@ -0,0 +1,42 @@ +import { useQuery } from '@tanstack/react-query' +import { api } from '@/lib/api' +import type { FeatureMetadataResponse } from '@/types/api' + +/** + * Load feature columns + learned importance for a registry run (MLZOO-D). + * + * Disabled until `runId` is set; `retry: false` because every error path + * — 400 (baseline family), 404 (missing run), 422 (no artifact / missing + * extra / HistGBR-no-importance gap) — is a final answer, not a transient. + * Mirrors `useRunExplanation` (PRP-28) exactly so future maintainers see + * the run-keyed/job-keyed pair as a structural twin. + */ +export function useRunFeatureMetadata(runId: string, enabled = true) { + return useQuery({ + queryKey: ['feature-metadata', 'run', runId], + queryFn: () => + api(`/forecasting/runs/${runId}/feature-metadata`), + enabled: enabled && !!runId, + retry: false, + }) +} + +/** + * Load feature columns + learned importance for a completed train job. + * + * Use this — not {@link useRunFeatureMetadata} — when the caller has only a + * `job_id`. `trainJob.result.run_id` is the **forecast-artifact key** + * (`uuid.uuid4().hex[:12]`), NOT a registry UUID; calling + * `useRunFeatureMetadata(trainJob.result.run_id, ...)` would 404 because the + * `/forecasting/runs/{run_id}` endpoint treats its path param as a registry + * UUID. Recorded in memory `[[scenario-run-id-vs-registry-run-id]]`. + */ +export function useJobFeatureMetadata(jobId: string, enabled = true) { + return useQuery({ + queryKey: ['feature-metadata', 'job', jobId], + queryFn: () => + api(`/forecasting/jobs/${jobId}/feature-metadata`), + enabled: enabled && !!jobId, + retry: false, + }) +} diff --git a/frontend/src/pages/explorer/run-compare.tsx b/frontend/src/pages/explorer/run-compare.tsx index 3aff8672..fc9ff285 100644 --- a/frontend/src/pages/explorer/run-compare.tsx +++ b/frontend/src/pages/explorer/run-compare.tsx @@ -2,9 +2,12 @@ import { Link, useSearchParams } from 'react-router-dom' import { format } from 'date-fns' import { ArrowDown, ArrowLeft, ArrowUp } from 'lucide-react' import { useRuns, useCompareRuns } from '@/hooks/use-runs' +import { useRunFeatureMetadata } from '@/hooks/use-feature-metadata' +import { FeatureImportancePanel } from '@/components/explainability/feature-importance-panel' import { JsonBlock } from '@/components/common/json-block' import { ErrorDisplay } from '@/components/common/error-display' import { LoadingState } from '@/components/common/loading-state' +import { ModelFamilyBadge } from '@/components/common/model-family-badge' import { StatusBadge } from '@/components/common/status-badge' import { getStatusVariant } from '@/lib/status-utils' import { Button } from '@/components/ui/button' @@ -87,6 +90,15 @@ export default function RunComparePage() { const runsQuery = useRuns({ page: 1, pageSize: 100 }) const compareQuery = useCompareRuns(a, b, !!a && !!b) + // MLZOO-D / PRP-31 — feature-importance side-by-side. Cross-family pairs + // render a muted message and DO NOT fetch (the muted muted message lives + // in `sameFamily === false` branch; both hooks gated on `enabled: sameFamily`). + const familyA = compareQuery.data?.run_a.model_family + const familyB = compareQuery.data?.run_b.model_family + const sameFamily = !!familyA && familyA === familyB && familyA !== 'baseline' + const metaA = useRunFeatureMetadata(a, sameFamily) + const metaB = useRunFeatureMetadata(b, sameFamily) + function selectRun(slot: 'a' | 'b', runId: string) { setParams((prev) => { const next = new URLSearchParams(prev) @@ -171,6 +183,15 @@ export default function RunComparePage() { {comparison.run_a.model_type} {comparison.run_b.model_type} + + Family + + + + + + + Status @@ -264,6 +285,38 @@ export default function RunComparePage() { )}
    + + + + Feature Importance (Run A vs Run B) + + Native importance / coefficients side by side. Cross-family + comparisons are not rendered because magnitudes are not + comparable between tree and additive models. + + + + {sameFamily ? ( +
    + + +
    + ) : ( +

    + Feature-importance comparison is only meaningful when both + runs share a non-baseline model family. +

    + )} +
    +
    )} diff --git a/frontend/src/pages/explorer/run-detail.tsx b/frontend/src/pages/explorer/run-detail.tsx index f769e07f..c3a0a366 100644 --- a/frontend/src/pages/explorer/run-detail.tsx +++ b/frontend/src/pages/explorer/run-detail.tsx @@ -11,10 +11,13 @@ import { } from 'lucide-react' import { useRun, useVerifyArtifact } from '@/hooks/use-runs' import { useRunExplanation } from '@/hooks/use-explanations' +import { useRunFeatureMetadata } from '@/hooks/use-feature-metadata' import { ExplanationPanel } from '@/components/explainability/explanation-panel' +import { FeatureImportancePanel } from '@/components/explainability/feature-importance-panel' import { JsonBlock } from '@/components/common/json-block' import { ErrorDisplay } from '@/components/common/error-display' import { LoadingState } from '@/components/common/loading-state' +import { ModelFamilyBadge } from '@/components/common/model-family-badge' import { StatusBadge } from '@/components/common/status-badge' import { getStatusVariant } from '@/lib/status-utils' import { Button } from '@/components/ui/button' @@ -46,6 +49,16 @@ export default function RunDetailPage() { // The explanation panel self-handles a 400 for non-baseline (lightgbm) runs. const explanationQuery = useRunExplanation(runId ?? '', !!runId) + // MLZOO-D / PRP-31 — load the new feature-metadata only for non-baseline + // runs. `enabled: false` makes TanStack Query skip the fetch entirely, so + // baseline runs render nothing extra and never see the 400 burst the panel + // would otherwise have to swallow. + const featureMetaQuery = useRunFeatureMetadata( + runId ?? '', + !!runId && runQuery.data?.model_family !== undefined && + runQuery.data?.model_family !== 'baseline', + ) + if (!runId) { return (
    @@ -88,6 +101,7 @@ export default function RunDetailPage() {

    {run.run_id}

    {run.status} +

    {run.model_type}

    @@ -107,6 +121,12 @@ export default function RunDetailPage() {
    +
    +
    Family
    +
    + +
    +
    Store
    @@ -169,6 +189,59 @@ export default function RunDetailPage() { error={explanationQuery.error} /> + {run.model_family !== 'baseline' && ( + <> + + + + Feature Metadata + + + + The canonical 14-column feature frame the model consumed at + training time. + + + + {featureMetaQuery.isLoading ? ( + + ) : featureMetaQuery.data ? ( + <> +
      + {featureMetaQuery.data.feature_columns.map((name) => ( +
    • + {name} +
    • + ))} +
    + {featureMetaQuery.data.importance_type && ( +

    + Importance type:{' '} + + {featureMetaQuery.data.importance_type} + +

    + )} + + ) : ( +

    + Feature metadata unavailable for this run. +

    + )} +
    +
    + + + + )} +
    diff --git a/frontend/src/pages/explorer/runs.tsx b/frontend/src/pages/explorer/runs.tsx index 5fc1bf63..367f379c 100644 --- a/frontend/src/pages/explorer/runs.tsx +++ b/frontend/src/pages/explorer/runs.tsx @@ -7,6 +7,7 @@ import { DataTable } from '@/components/data-table/data-table' import { DataTableToolbar } from '@/components/data-table/data-table-toolbar' import { DataTableColumnHeader } from '@/components/data-table/data-table-column-header' import { StatusBadge } from '@/components/common/status-badge' +import { ModelFamilyBadge } from '@/components/common/model-family-badge' import { getStatusVariant } from '@/lib/status-utils' import { ErrorDisplay } from '@/components/common/error-display' import { Button } from '@/components/ui/button' @@ -18,7 +19,15 @@ import { DEFAULT_PAGE_SIZE, ROUTES } from '@/lib/constants' // Allow-lists for the URL-driven filter/sort params. A hand-edited URL value // outside these is dropped (treated as "no filter") rather than blind-cast. const RUN_STATUSES: readonly RunStatus[] = ['pending', 'running', 'success', 'failed', 'archived'] -const MODEL_TYPES = ['naive', 'seasonal_naive', 'moving_average', 'lightgbm'] as const +const MODEL_TYPES = [ + 'naive', + 'seasonal_naive', + 'moving_average', + 'regression', + 'lightgbm', + 'xgboost', + 'prophet_like', +] as const const RUN_SORT_KEYS = ['created_at', 'model_type', 'status', 'store_id', 'product_id'] as const const columns: ColumnDef[] = [ @@ -45,6 +54,12 @@ const columns: ColumnDef[] = [ header: ({ column }) => , cell: ({ row }) => {row.original.model_type}, }, + { + accessorKey: 'model_family', + header: 'Family', + enableSorting: false, + cell: ({ row }) => , + }, { accessorKey: 'store_id', header: ({ column }) => , @@ -84,6 +99,7 @@ const csvColumns: CsvColumn[] = [ { key: 'run_id', header: 'Run ID' }, { key: 'status', header: 'Status' }, { key: 'model_type', header: 'Model Type' }, + { key: 'model_family', header: 'Family' }, { key: 'store_id', header: 'Store' }, { key: 'product_id', header: 'Product' }, { key: 'data_window_start', header: 'Data Window Start' }, @@ -193,7 +209,10 @@ export default function RunsExplorerPage() { { label: 'Naive', value: 'naive' }, { label: 'Seasonal Naive', value: 'seasonal_naive' }, { label: 'Moving Average', value: 'moving_average' }, + { label: 'Regression', value: 'regression' }, { label: 'LightGBM', value: 'lightgbm' }, + { label: 'XGBoost', value: 'xgboost' }, + { label: 'Prophet-like', value: 'prophet_like' }, ], }, { diff --git a/frontend/src/pages/visualize/backtest.tsx b/frontend/src/pages/visualize/backtest.tsx index 3e36d763..e25994ce 100644 --- a/frontend/src/pages/visualize/backtest.tsx +++ b/frontend/src/pages/visualize/backtest.tsx @@ -48,11 +48,18 @@ interface BacktestResult { } } -/** Forecasting models a backtest can run (LightGBM needs a trained run, excluded). */ +// MLZOO-D / PRP-31 — Feature-aware backtesting (B.2) made the four advanced +// families reachable from the UI. The allow-list now includes all seven +// canonical model types; see PRP-MLZOO-B.2 for the per-fold X_train/X_future +// split that keeps the feature-aware backtest leakage-safe. const MODEL_OPTIONS = [ { value: 'naive', label: 'Naive' }, { value: 'seasonal_naive', label: 'Seasonal Naive' }, { value: 'moving_average', label: 'Moving Average' }, + { value: 'regression', label: 'Regression (HistGBR)' }, + { value: 'lightgbm', label: 'LightGBM' }, + { value: 'xgboost', label: 'XGBoost' }, + { value: 'prophet_like', label: 'Prophet-like (additive)' }, ] const foldCsvColumns: CsvColumn[] = [ diff --git a/frontend/src/pages/visualize/forecast.tsx b/frontend/src/pages/visualize/forecast.tsx index bed6cfe9..4cdbdd58 100644 --- a/frontend/src/pages/visualize/forecast.tsx +++ b/frontend/src/pages/visualize/forecast.tsx @@ -3,7 +3,15 @@ import { Link } from 'react-router-dom' import { BarChart3, Download, ExternalLink, Loader2, Play } from 'lucide-react' import { useJob, useCreateJob } from '@/hooks/use-jobs' import { useJobExplanation } from '@/hooks/use-explanations' +import { useJobFeatureMetadata } from '@/hooks/use-feature-metadata' import { ExplanationPanel } from '@/components/explainability/explanation-panel' +import { FeatureImportancePanel } from '@/components/explainability/feature-importance-panel' +import { ModelFamilyBadge } from '@/components/common/model-family-badge' +import { + Collapsible, + CollapsibleContent, + CollapsibleTrigger, +} from '@/components/ui/collapsible' import { TimeSeriesChart } from '@/components/charts/time-series-chart' import { EmptyState } from '@/components/common/error-display' import { JobPicker } from '@/components/common/job-picker' @@ -64,6 +72,19 @@ export default function ForecastPage() { const isPredictDone = job?.status === 'completed' && job?.job_type === 'predict' const explanationQuery = useJobExplanation(job?.job_id ?? '', !!job && isPredictDone) + // MLZOO-D / PRP-31 — feature-importance for the train job that produced + // the loaded predict. CRITICAL: forecast.tsx never has a registry run_id; + // `trainJob.result.run_id` (line 49 above) is the FORECAST-ARTIFACT KEY + // (`uuid.uuid4().hex[:12]`, see `app/features/forecasting/service.py:270`), + // NOT a registry UUID. Calling `useRunFeatureMetadata(trainRunId, ...)` + // would 404 because `/forecasting/runs/{run_id}` treats `{run_id}` as a + // registry UUID. We use the job-keyed sibling + // (`/forecasting/jobs/{job_id}/feature-metadata`) which loads the bundle + // from `job.result.model_path` directly. Recorded in memory + // `[[scenario-run-id-vs-registry-run-id]]`. + const trainJobMetadata = useJobFeatureMetadata(trainJobId, !!trainJobId) + const trainFamily = trainJobMetadata.data?.model_family + async function handleRunForecast() { if (!trainRunId) return setRunError(null) @@ -261,6 +282,36 @@ export default function ForecastPage() { error={explanationQuery.error} /> )} + + {/* MLZOO-D — collapsible feature-importance panel for the train job. */} + {trainJobId && ( + + + + + + Model details + {trainFamily ? ( + + ) : null} + + + Expand to see the trained model's feature importance. + + + + + + + + + + + )} )} diff --git a/frontend/src/types/api.ts b/frontend/src/types/api.ts index b29b23f1..c38469fb 100644 --- a/frontend/src/types/api.ts +++ b/frontend/src/types/api.ts @@ -170,10 +170,17 @@ export interface LifecycleCurveResponse { // === Registry === export type RunStatus = 'pending' | 'running' | 'success' | 'failed' | 'archived' +// MLZOO-D / PRP-31: derived from model_type on the backend as a Pydantic +// computed_field (no DB column, no migration). Surfaced on every ModelRun +// response so the runs explorer, run detail, run compare, and forecast viz +// pages can render a consistent Family badge. +export type ModelFamily = 'baseline' | 'tree' | 'additive' + export interface ModelRun { run_id: string status: RunStatus model_type: string + model_family: ModelFamily model_config: Record feature_config: Record | null config_hash: string @@ -195,6 +202,26 @@ export interface ModelRun { updated_at: string } +// MLZOO-D / PRP-31: response shape for the two feature-metadata endpoints +// (/forecasting/runs/{run_id}/feature-metadata and the jobs/{job_id} sibling). +// When sourced from a job, `run_id` is the **artifact key** (12-char hex) +// parsed from the bundle file name, NOT a registry UUID. +export interface FeatureImportanceItem { + name: string + importance: number // signed for kind='linear_coef' (prophet_like ridge coef) + kind: 'tree' | 'linear_coef' + rank: number // 1-indexed by |importance| descending +} + +export interface FeatureMetadataResponse { + run_id: string + model_type: string + model_family: ModelFamily + feature_columns: string[] + features: FeatureImportanceItem[] + importance_type: string | null +} + export interface RunListResponse extends PaginatedResponse { runs: ModelRun[] } diff --git a/scripts/dogfood-browser.sh b/scripts/dogfood-browser.sh new file mode 100755 index 00000000..61add50c --- /dev/null +++ b/scripts/dogfood-browser.sh @@ -0,0 +1,63 @@ +#!/usr/bin/env bash +# scripts/dogfood-browser.sh — wrapper that prepares the environment for +# native-Python Playwright dogfooding against snap chromium. +# +# Background: on the maintainer's WSL host, the Playwright MCP and +# `npx playwright install` both fail; the only path that works is native +# Python Playwright pointed at `/snap/bin/chromium`. This script removes +# the rediscovery cost. See issue #262 and PRP-31 system review § +# "Update Browser-Dogfood automation". +# +# Usage: +# ./scripts/dogfood-browser.sh # verify prereqs + print launch snippet +# ./scripts/dogfood-browser.sh path/to/script.py # verify prereqs + exec the script with uv + +set -euo pipefail + +CHROMIUM_PATH="${CHROMIUM_PATH:-/snap/bin/chromium}" +LAUNCH_ARGS='["--no-sandbox", "--disable-setuid-sandbox", "--disable-dev-shm-usage"]' + +echo "scripts/dogfood-browser.sh" +echo "================================================================" + +# 1. snap chromium present? +if [ ! -x "$CHROMIUM_PATH" ]; then + echo "ERROR: $CHROMIUM_PATH not found or not executable" >&2 + echo "Install with: sudo snap install chromium" >&2 + exit 1 +fi +echo "OK chromium executable: $CHROMIUM_PATH" + +# 2. playwright importable from the project venv? +if ! uv run python -c "from playwright.sync_api import sync_playwright" >/dev/null 2>&1; then + echo "WARN playwright not importable from .venv — installing now" + uv pip install playwright +fi +echo "OK playwright importable: $(uv run python -c 'import playwright; print(playwright.__version__)' 2>/dev/null)" + +# 3. backend + frontend up? +if ! curl -s -m 3 http://localhost:8123/health >/dev/null 2>&1; then + echo "WARN uvicorn :8123 unreachable — start with 'uv run uvicorn app.main:app --port 8123'" +fi +if ! curl -s -m 3 -o /dev/null -w '%{http_code}' http://localhost:5173/ | grep -q 200 2>/dev/null; then + echo "WARN vite :5173 unreachable — start with 'cd frontend && ./node_modules/.bin/vite --host 0.0.0.0'" +fi + +echo "================================================================" +echo "Use this Playwright launch snippet in your dogfood script:" +echo "" +echo " from playwright.sync_api import sync_playwright" +echo " with sync_playwright() as p:" +echo " browser = p.chromium.launch(" +echo " headless=True," +echo " executable_path=\"$CHROMIUM_PATH\"," +echo " args=$LAUNCH_ARGS," +echo " )" +echo "" + +# Optional: exec a passed script +if [ "$#" -ge 1 ]; then + echo "================================================================" + echo "Executing: uv run python $*" + exec uv run python "$@" +fi diff --git a/uv.lock b/uv.lock index 3ac17729..7d54e978 100644 --- a/uv.lock +++ b/uv.lock @@ -821,7 +821,7 @@ wheels = [ [[package]] name = "forecastlabai" -version = "0.2.15" +version = "0.2.16" source = { editable = "." } dependencies = [ { name = "alembic" },