Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# INITIAL-MLZOO-A-foundation-feature-frames.md - Feature-Aware Forecasting Foundation

## FEATURE:

Create the foundation for feature-aware forecasting in ForecastLabAI.

This is the first MLZOO PRP input and should become PRP-29. It must not implement LightGBM, XGBoost, Prophet-like models, frontend UI, explainability UI, hyperparameter search, or portfolio/global orchestration. Its job is to make the existing forecasting layer capable of supporting future advanced ML models without breaking current baseline forecasters.

Goals:

- Define a feature-aware forecasting contract that supports `fit(y, X=None)` and `predict(horizon, X=None)`.
- Preserve existing target-only baseline models: `naive`, `seasonal_naive`, and `moving_average`.
- Define historical training feature-frame requirements.
- Define future prediction feature-frame requirements.
- Add or document leakage-safe feature-frame generation rules.
- Add load-bearing leakage tests that prove future rows do not use future target values.
- Make future advanced models possible without adding their dependencies yet.

Expected user value:

- ForecastLabAI gains a safe foundation for serious ML forecasting.
- Future LightGBM/XGBoost/Prophet-like work can build on a tested frame contract.
- Scenario simulation and explainability can later depend on a consistent feature-frame interface.

Recommended user story:

As a forecasting engineer,
I want a leakage-safe feature-frame contract for training and prediction,
So that advanced ML models can be added without breaking baseline models or leaking future data.

Out of scope:

- LightGBM implementation.
- XGBoost implementation.
- Prophet-like implementation.
- New database migrations unless absolutely required.
- Frontend pages.
- Agent tools.
- Hyperparameter search.

## EXAMPLES:

Read these before PRP creation:

- `docs/optional-features/05-advanced-ml-model-zoo.md`
- Full feature vision and risks.

- `PRPs/INITIAL/INITIAL-5.md`
- Existing forecasting model brief.

- `docs/PHASE/4-FORECASTING.md`
- Current forecasting layer documentation.

- `app/features/forecasting/models.py`
- Existing `BaseForecaster` and baseline model implementations.

- `app/features/forecasting/schemas.py`
- Existing model config schemas and discriminated union pattern.

- `app/features/forecasting/service.py`
- Existing train/predict orchestration.

- `app/features/forecasting/persistence.py`
- Existing `ModelBundle` persistence.

- `app/features/featuresets/service.py`
- Existing time-safe feature computation.

- `app/features/featuresets/schemas.py`
- Feature configuration schemas.

- `app/features/featuresets/tests/test_leakage.py`
- Existing leakage tests to mirror and extend.

- `app/features/backtesting/service.py`
- Current backtesting integration points.

Potential example artifacts:

- `examples/models/feature_frame_contract.md`
- Describes historical and future frame shape, required columns, safe/unsafe feature classes.

## DOCUMENTATION:

- scikit-learn estimator interface conventions: https://scikit-learn.org/stable/developers/develop.html
- scikit-learn Pipeline composition: https://scikit-learn.org/stable/modules/compose.html
- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html
- Pydantic documentation: https://docs.pydantic.dev/latest/
- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html

## OTHER CONSIDERATIONS:

This PRP is primarily about contracts and leakage safety.

Required decisions:

- How to represent feature-aware models without forcing every baseline model to require `X`.
- Whether to introduce a `FeatureAwareForecaster` protocol/base class or extend the existing base interface only.
- Where historical training frames are built.
- Where future prediction frames are built.
- Which feature classes are safe for future frames:
- Safe: calendar features known in advance.
- Conditionally safe: lag/rolling features generated from historical tail and prior predictions.
- Unsafe unless explicitly supplied: future price, promotion, inventory, markdown, exogenous signals.
- How to reject missing future features instead of silently filling misleading defaults.

Validation expectations:

- Existing baseline forecasting tests still pass.
- New feature-frame contract tests exist.
- New leakage tests prove future target values are not used.
- Backtesting remains time-safe.
- `uv run pytest -q -m "not integration"` should pass.
- `uv run ruff check app tests` should pass for touched Python code.

Important gotchas:

- Do not break current target-only baseline forecasters.
- Do not add LightGBM or other heavy ML dependencies in this PRP.
- Do not silently convert unknown future exogenous values into zeros.
- Do not let training frames include rows after the cutoff date.
- Do not let future prediction frames read true future targets.

101 changes: 101 additions & 0 deletions PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# INITIAL-MLZOO-B-lightgbm-first-model.md - LightGBM First Advanced Model

## FEATURE:

Add the first advanced feature-aware model to ForecastLabAI after the MLZOO foundation is merged.

Preferred model: LightGBM.

Fallback model: sklearn `HistGradientBoostingRegressor` or another sklearn-native gradient boosting model if LightGBM creates unacceptable dependency or CI risk.

This PRP must depend on `INITIAL-MLZOO-A-foundation-feature-frames.md` being implemented first.

Goals:

- Add one advanced model config schema.
- Add one feature-aware model implementation.
- Support deterministic training.
- Integrate with forecasting train/predict.
- Integrate with backtesting.
- Persist model metadata needed for reproducibility.
- Preserve all existing baseline model behavior.

Out of scope:

- XGBoost.
- Prophet-like models.
- Hyperparameter search.
- Portfolio/global models.
- Frontend model administration.
- Explainability UI.

## EXAMPLES:

Read these before PRP creation:

- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md`
- Required prerequisite.

- `docs/optional-features/05-advanced-ml-model-zoo.md`
- Full advanced model vision.

- `app/features/forecasting/models.py`
- Model factory and baseline model patterns.

- `app/features/forecasting/schemas.py`
- Model config schema patterns.

- `app/features/forecasting/service.py`
- Training/prediction service integration.

- `app/features/forecasting/persistence.py`
- Model bundle save/load behavior.

- `app/features/backtesting/service.py`
- Backtesting orchestration.

- `app/features/registry/service.py`
- Registry run metadata patterns.

Potential example artifacts:

- `examples/models/advanced_lightgbm.py`
- Minimal training/prediction example.

## DOCUMENTATION:

- LightGBM documentation: https://lightgbm.readthedocs.io/
- LightGBM Python API: https://lightgbm.readthedocs.io/en/stable/Python-API.html
- LightGBM parameters: https://lightgbm.readthedocs.io/en/stable/Parameters.html
- scikit-learn HistGradientBoostingRegressor: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html
- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html
- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html
- Pydantic documentation: https://docs.pydantic.dev/latest/

## OTHER CONSIDERATIONS:

Dependency strategy is the main open risk.

Required decisions:

- Whether to add LightGBM as a hard dependency, optional dependency group, or defer to sklearn fallback.
- Exact advanced model config fields.
- How model dependency versions are captured in registry/runtime metadata.
- How prediction rejects missing future feature frames.

Recommended defaults:

- Use fixed `random_state` from settings.
- Start with single store/product training.
- Keep the first config conservative.
- Avoid hyperparameter search.
- Persist feature column order.

Validation expectations:

- Config schema tests.
- Deterministic training tests.
- Save/load persistence tests.
- Backtesting integration test comparing baseline and advanced model path.
- Tests proving baselines still work unchanged.

59 changes: 59 additions & 0 deletions PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# INITIAL-MLZOO-C-xgboost-prophet-extensions.md - XGBoost and Prophet-like Extensions

## FEATURE:

Extend the Advanced ML Model Zoo after the feature-frame foundation and first advanced model path are stable.

This INITIAL is for later work, not PRP-29.

Goals:

- Add XGBoost as a second tree-based feature-aware model.
- Add a Prophet-like additive model path or choose the real Prophet dependency if justified.
- Support holiday/regressor-style features where appropriate.
- Add model-family-specific validation and metadata.

Out of scope:

- Foundation feature-frame work.
- First advanced model architecture.
- Frontend/explainability polish unless explicitly needed.
- Hyperparameter search unless separately scoped.

## EXAMPLES:

Read these before PRP creation:

- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md`
- Foundation dependency.

- `PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md`
- First advanced model pattern to follow.

- `app/features/forecasting/models.py`
- Model factory and advanced model pattern.

- `app/features/forecasting/schemas.py`
- Config schema pattern.

- `app/features/featuresets/service.py`
- Regressor and calendar feature source.

## DOCUMENTATION:

- XGBoost documentation: https://xgboost.readthedocs.io/en/stable/
- XGBoost Python package documentation: https://xgboost.readthedocs.io/en/stable/python/
- XGBoost parameters: https://xgboost.readthedocs.io/en/stable/parameter.html
- Prophet documentation: https://facebook.github.io/prophet/docs/quick_start.html
- Prophet seasonality, holidays, and regressors: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html
- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html
- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html

## OTHER CONSIDERATIONS:

- XGBoost should mirror the first advanced model path where possible.
- Prophet-like work should be carefully evaluated because dependency weight and API shape differ from sklearn-style regressors.
- Real Prophet support should be chosen only if install/runtime constraints are acceptable.
- A lightweight additive sklearn model may be safer than the real Prophet dependency.
- Holiday/regressor support must use known-in-advance or explicitly supplied future values.

Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# INITIAL-MLZOO-D-frontend-registry-explainability.md - Frontend, Registry, and Explainability Polish

## FEATURE:

Expose Advanced ML Model Zoo capabilities in the product after backend model contracts and at least one advanced model are stable.

This INITIAL is for later work, not PRP-29.

Goals:

- Add model selection UI where useful.
- Surface advanced model metadata in run detail and comparison pages.
- Show feature config, feature columns, dependency versions, and model family metadata.
- Add basic feature importance or explanation hooks where available.
- Update docs/admin surfaces so operators understand advanced model constraints.

Out of scope:

- Core feature-frame foundation.
- First advanced model backend implementation.
- XGBoost/Prophet backend implementation.
- Full SHAP explainability unless separately scoped.

## EXAMPLES:

Read these before PRP creation:

- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md`
- Foundation dependency.

- `PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md`
- First advanced model dependency.

- `frontend/src/pages/explorer/runs.tsx`
- Existing run table.

- `frontend/src/pages/explorer/run-detail.tsx`
- Existing run detail surface.

- `frontend/src/pages/explorer/run-compare.tsx`
- Existing comparison surface.

- `frontend/src/pages/visualize/forecast.tsx`
- Forecast visualization page.

- `frontend/src/pages/visualize/backtest.tsx`
- Backtest visualization page.

- `app/features/registry/schemas.py`
- Backend response contracts for run metadata.

## DOCUMENTATION:

- React Router documentation: https://reactrouter.com/home
- TanStack Query documentation: https://tanstack.com/query/latest/docs/framework/react/overview
- TanStack Table documentation: https://tanstack.com/table/latest/docs/overview
- shadcn/ui documentation: https://ui.shadcn.com/docs
- Recharts documentation: https://recharts.org/en-US/
- SHAP documentation: https://shap.readthedocs.io/en/stable/
- scikit-learn permutation importance: https://scikit-learn.org/stable/modules/permutation_importance.html

## OTHER CONSIDERATIONS:

- Do not create frontend controls before backend contracts are stable.
- Avoid adding a large admin panel if run detail and comparison pages are enough.
- Keep advanced model metadata readable and compact.
- Feature importance must be clearly labeled as model-derived, not causal truth.
- Browser QA is required for all frontend additions.

Loading
Loading