Release: Phase 3 Feature Engineering by w7-mgfcode · Pull Request #25 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-01-31T23:34:39Z

Summary

Merges dev branch into main with Phase 3 Feature Engineering implementation.

Phase 3: Feature Engineering (PRP-4)

FeatureEngineeringService: Time-safe feature computation with CRITICAL leakage prevention
API Endpoints: POST /featuresets/compute, POST /featuresets/preview
Feature Types:
- Lag features (shift with positive values only)
- Rolling features (shift(1) before rolling to exclude current observation)
- Calendar features (cyclical sin/cos encoding)
- Exogenous features (price lags, price change, stockout flags)
Imputation Strategies: zero, ffill, bfill, mean, expanding_mean, drop
55 tests including comprehensive leakage prevention tests

Code Quality Fixes

Validate store_id/product_id presence (no silent defaults)
HTTP 400 for unsupported date types
Added time-safe expanding_mean imputation strategy
Fixed price_pct_change_7d leakage with shift(1)
Documentation improvements

Validation

✅ Ruff: All checks passed
✅ MyPy: 0 errors
✅ Pyright: 0 errors
✅ Pytest: 55 tests passed

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Comprehensive time-safe feature engineering: lags, rolling aggregations, calendar features, imputation, and multi-horizon support
- New compute and preview API endpoints for feature generation with cutoff enforcement and group isolation
Documentation
- Added Feature Engineering docs, architecture updates, README examples, and a Phase 3 feature-engineering writeup
Examples & Demos
- Demo script and example usage for compute/preview workflows
Tests
- Extensive unit and leakage tests for configs, service, routes, and edge cases
Chores
- New runtime dependencies and configuration settings for feature limits

_{✏️ Tip: You can customize this high-level summary in your review settings.}

* docs: update INITIAL-4 and INITIAL-5 with additional references - Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(featuresets): implement time-safe feature engineering layer Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update INITIAL-5.md * docs: update documentation for Phase 3 Feature Engineering - README.md: Add featuresets module to project structure and API endpoints - docs/ARCHITECTURE.md: Add Feature Engineering section (section 6) - docs/PHASE-index.md: Mark Phase 3 as completed with summary - docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(featuresets): address code review feedback and prevent data leakage Routes: - Validate store_id/product_id presence (no silent defaults to 0) - Convert ValueError for unsupported date types to HTTP 400 Service: - Add expanding_mean imputation strategy (time-safe alternative) - Add warnings when bfill/mean strategies are used (leakage risk) - Fix price_pct_change_7d to use shift(1) before pct_change Schemas: - Add expanding_mean to ImputationConfig Literal type - Document time-safety of each imputation strategy - Fix PreviewFeaturesRequest docstring: GET → POST Documentation: - Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md - Fix PRP-4 to show POST for preview endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: format schemas.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

sourcery-ai

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

socket-security · 2026-01-31T23:34:52Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	numpy@2.4.1
	pandas@3.0.0

View full report

coderabbitai · 2026-01-31T23:34:52Z

📝 Walkthrough

Walkthrough

Adds a time-safe feature engineering subsystem: Pydantic configs, a FeatureEngineeringService, data loader and compute helpers, FastAPI endpoints (/featuresets/compute, /featuresets/preview), extensive tests, docs, examples, and new pandas/numpy dependencies; integrates router and config settings into the app.

Changes

Cohort / File(s)	Summary
Documentation & Phase Records `INITIAL-4.md`, `INITIAL-5.md`, `README.md`, `docs/ARCHITECTURE.md`, `docs/PHASE-index.md`, `docs/PHASE/3-FEATURE_ENGINEERING.md`	Added and updated documentation describing the feature engineering design, API endpoints, leakage prevention, implementation details, and phase completion.
Public API & Package Exports `app/features/featuresets/__init__.py`	New package initializer re-exporting schemas, service, and result types for external use.
Schemas & Config `app/features/featuresets/schemas.py`, `app/core/config.py`	Introduced Pydantic schema hierarchy and config models (LagConfig, RollingConfig, CalendarConfig, ExogenousConfig, ImputationConfig, FeatureSetConfig), API models (Compute/Preview requests/responses), validation, hashing, and three new Settings fields (feature_max_lookback_days, feature_max_lag, feature_max_window).
Feature Engineering Core `app/features/featuresets/service.py`	Implemented FeatureEngineeringService, FeatureComputationResult, FeatureDataLoader, and compute_features_for_series with time-safe steps (cutoff enforcement, imputation, lag/rolling/calendar/exogenous features) and observability.
API Routes & Integration `app/features/featuresets/routes.py`, `app/main.py`	Added FastAPI routes POST /featuresets/compute and POST /featuresets/preview, request validation, DB loading integration, error translation, and registered the router in main app.
Tests & Fixtures `app/features/featuresets/tests/*`	Added test package, fixtures (conftest.py), and comprehensive tests: schema validation (test_schemas.py), service behavior (test_service.py), leakage/group isolation/cutoff tests (test_leakage.py), and routes tests scaffolding.
Examples & Demo `examples/compute_features_demo.py`	New demo script illustrating compute/preview API usage and a sample feature configuration.
Dependencies `pyproject.toml`	Added runtime deps: `pandas>=3.0.0`, `numpy>=2.4.1`; added dev stubs (`pandas-stubs`) in dependency groups.
Small docs/initial notes `INITIAL-4.md`	Added documentation links and other-considerations bullets (imputation, tooling, compute overhead) — documentation-only edits.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as API Routes
    participant Loader as FeatureData<br/>Loader
    participant DB as Database
    participant Service as FeatureEngineering<br/>Service

    Client->>API: POST /featuresets/compute (store_id, product_id, cutoff_date, config)
    activate API
    API->>Loader: load_sales_data(store_id, product_id, start_date, end_date)
    activate Loader
    Loader->>DB: Query sales & calendar
    DB-->>Loader: Return rows
    Loader-->>API: DataFrame
    deactivate Loader

    API->>Service: compute_features(df, cutoff_date)
    activate Service
    Service->>Service: _apply_imputation (group-aware)
    Service->>Service: _compute_lag_features (grouped shift)
    Service->>Service: _compute_rolling_features (shift(1) then agg)
    Service->>Service: _compute_calendar_features
    Service->>Service: _compute_exogenous_features
    Service-->>API: FeatureComputationResult (df, feature_columns, stats)
    deactivate Service

    API->>API: Serialize FeatureRow list, null_counts, duration_ms
    API-->>Client: ComputeFeaturesResponse
    deactivate API

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

w7-learn

Poem

🐰 Soft paws tap on timestamps’ rim,

I stitch past days so features brim,
Lags and windows, shifted right,
No peeks ahead — I keep the sight,
Rabbit hums: safe data, bright.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Release: Phase 3 Feature Engineering' clearly and concisely summarizes the primary change—delivering the complete Phase 3 Feature Engineering system as the main feature of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 98.88% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

w7-learn · 2026-01-31T23:40:10Z

@coderabbitai help

coderabbitai · 2026-01-31T23:40:17Z

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
- @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
- @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@app/features/featuresets/service.py`:
- Around line 215-217: The loop that sets min_per incorrectly treats 0 as falsy;
change the expression in the loop (where min_per is assigned for each window in
config.windows) to explicitly check for None: use "config.min_periods if
config.min_periods is not None else window" (so an explicit 0 is preserved and
only None falls back to window), updating the assignment in the function/method
that iterates over config.windows and prepares arguments for pandas' rolling().

In `@app/features/featuresets/tests/test_schemas.py`:
- Around line 118-138: Update the TestImputationConfig.test_valid_strategies to
include the missing "expanding_mean" strategy in the ImputationConfig(...)
strategies dict and update the assertion to expect 6 strategies; locate the test
class TestImputationConfig and method test_valid_strategies and modify the
strategies mapping and the assert len(config.strategies) to reflect the six
supported strategies ("zero","ffill","bfill","mean","expanding_mean","drop").

In `@docs/PHASE/3-FEATURE_ENGINEERING.md`:
- Around line 3-6: Update the release reference string "Release: PR `#24`" to the
current PR number by replacing it with "Release: PR `#25`" in the
PHASE/3-FEATURE_ENGINEERING.md content (look for the exact text "Release: PR
`#24`"); ensure the surrounding markdown remains unchanged and commit the updated
file.

In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 40-42: The PRP currently documents the `/featuresets/preview`
endpoint as GET but the implementation uses POST; update every occurrence in
this document to reflect POST (change the checklist line and any other mentions
of GET for `/featuresets/preview`) so the spec matches the implemented endpoint.
Ensure references to the preview endpoint (e.g., "/featuresets/preview", any
summary lines or bullets) use POST and adjust any example calls or verbs
accordingly.

🧹 Nitpick comments (7)

pyproject.toml (1)
164-167: Consider consolidating dev dependencies in a single location.

The [dependency-groups] section (PEP 735) adds pandas-stubs separately from the existing [project.optional-dependencies] dev group (lines 23-32). Having dev dependencies split across two sections could cause confusion.

Consider moving pandas-stubs to the existing dev optional-dependencies for consistency:
♻️ Suggested consolidation
 [project.optional-dependencies]
 dev = [
     "pytest>=8.3.0",
     "pytest-asyncio>=0.24.0",
     "pytest-cov>=6.0.0",
     "httpx>=0.28.0",
     "ruff>=0.8.0",
     "mypy>=1.13.0",
     "pyright>=1.1.390",
+    "pandas-stubs>=2.3.3.260113",
 ]
Then remove the [dependency-groups] section if no other tools rely on it.
app/features/featuresets/schemas.py (1)
177-184: Minor inconsistency: price_lags allows empty tuple unlike lags.

The LagConfig.lags validator requires at least one lag (line 75-76), but ExogenousConfig.price_lags validator only checks for positive values without rejecting an empty tuple. This allows price_lags=() which may be intentional if include_price=False, but could be confusing.

Consider adding a similar check if empty price_lags should be rejected when include_price=True:
♻️ Optional: Add conditional validation
     `@field_validator`("price_lags")
     `@classmethod`
     def validate_price_lags_positive(cls, v: tuple[int, ...]) -> tuple[int, ...]:
         """Ensure all price lags are positive."""
+        # Note: Empty tuple is allowed when include_price=False
         if any(lag <= 0 for lag in v):
             raise ValueError("All price lags must be positive integers")
         return v
Or add a model_validator to enforce non-empty price_lags when include_price=True.
app/features/featuresets/routes.py (2)

259-311: Extract shared row-conversion logic to avoid drift between compute and preview.

The row/feature extraction block is duplicated; any future fix (like the numeric handling above) will need to be applied twice. Consider a small helper to centralize this logic.

3-8: The current NaN and numeric handling works correctly with pandas 3.0.0.

Pandas 3.0.0's to_dict("records") converts NumPy scalars (e.g., np.int64, np.float64) to native Python int and float types, so they pass the isinstance(val, (int, float)) check without issue. The current code using isinstance(val, float) and math.isnan(val) correctly detects NaN values.

That said, using pd.isna(val) is slightly more robust for edge cases, and using numbers.Integral and numbers.Real from the numbers module is more semantically correct for numeric type checking. Consider these as optional improvements rather than bug fixes.
docs/PHASE-index.md (1)
147-152: Clarify imputation defaults vs full strategy set.

The bullet reads like the only imputation options are zero/ffill. Consider marking them as defaults and pointing to the config for the full set to avoid ambiguity.
✏️ Suggested wording
- **Imputation**: Zero-fill for sales, forward-fill for prices
+ **Imputation**: Default strategies like zero-fill for sales and forward-fill for prices (see ImputationConfig for full set)
docs/PHASE/3-FEATURE_ENGINEERING.md (1)
9-12: Tighten redundant wording in the executive summary.

Small wording tweak for clarity (matches the style lint hint).
✏️ Suggested wording
- ensuring that features computed at any cutoff date only use data available at that point in time.
+ ensuring that features computed at any cutoff date only use data available at that time.
docs/ARCHITECTURE.md (1)
207-210: Trim repeated “only” in the lag bullet.

Minor style cleanup per lint hint.
✏️ Suggested wording
- - **Lag features**: `shift(lag)` with positive lag only — ensures only past data accessed
+ - **Lag features**: `shift(lag)` with positive lag only — ensures past data is accessed

Service: - Fix min_periods falsy check to explicit None check (preserves 0) Tests: - Add expanding_mean to test_valid_strategies, expect 6 strategies Documentation: - Update PR reference from #24 to #25 - Fix all GET /featuresets/preview to POST in PRP-4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@app/features/featuresets/service.py`:
- Around line 257-296: _compute_calendar_features currently ignores
config.include_is_holiday so the holiday flag never gets added to outputs;
modify the method to check config.include_is_holiday and when true set
result["is_holiday"] = dates["is_holiday"].astype(int) (or dates.dt.is_holiday
if using a datetime accessor) and append "is_holiday" to columns so the feature
is returned and included in stats; ensure you follow the same casting/typing
pattern used for is_weekend/is_month_end in the function.
- Around line 360-404: The _compute_exogenous_features method currently ignores
ExogenousConfig.include_promo and include_inventory; either implement those
feature computations or fail fast. Update _compute_exogenous_features to (A)
implement promo features when config.include_promo is true by reading the promo
column (e.g., "is_promo" or configured name), creating lagged promo columns
analogous to price (use groupby(self.entity_cols).shift(lag) for each
config.promo_lags and any pct/change or flag derivations), and (B) implement
inventory features when config.include_inventory is true by using the inventory
column (e.g., "on_hand" or configured name) to produce lagged inventory columns
and derived indicators (like inventory_pct_change or inventory_level_lag_N)
similar to how price/stockout are done; if you prefer validation instead, add an
explicit check at the start of _compute_exogenous_features that raises a
RuntimeError when config.include_promo or config.include_inventory is true but
no corresponding implementation/columns exist, referencing the ExogenousConfig
flags and the _compute_exogenous_features method so reviewers can locate
changes.

In `@docs/PHASE/3-FEATURE_ENGINEERING.md`:
- Around line 19-33: Update the documentation to include the missing
expanding_mean imputation strategy: in the section describing ImputationConfig
and the deliverables table listing strategies, add "expanding_mean" to the list
of strategies alongside zero, ffill, bfill, mean, drop, and update any
descriptive text to briefly note that expanding_mean computes a cumulative mean
per series (expanding window) as an imputation option; ensure the symbol
ImputationConfig and any mentions of FeatureSetConfig/ComputeFeaturesRequest
remain consistent after adding the new strategy.

In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 773-775: The current pseudocode uses a falsy check when assigning
min_per inside the loop over config.windows (for window in config.windows) which
will override an explicit 0 in config.min_periods; change the condition to
explicitly test for None so min_per preserves 0 (i.e., set min_per to
config.min_periods if config.min_periods is not None, otherwise to window),
updating the assignment that defines min_per.
- Around line 1123-1144: The preview_features() example is calling the compute
endpoint and using lookback_days which doesn't match the Preview request schema;
change the POST URL from "/featuresets/compute" to "/featuresets/preview" and
update the JSON payload to conform to the PreviewRequest schema (remove or
replace "lookback_days" and any compute-specific fields with the
preview-specific fields expected by PreviewRequest), keeping the same
identifying caller function preview_features() so the example exercises the
preview endpoint and schema correctly.

coderabbitai · 2026-01-31T23:52:50Z

+        if config.include_day_of_week:
+            dow = dates.dt.dayofweek  # 0=Monday, 6=Sunday
+            if config.use_cyclical_encoding:
+                result["dow_sin"] = np.sin(2 * np.pi * dow / 7)
+                result["dow_cos"] = np.cos(2 * np.pi * dow / 7)
+                columns.extend(["dow_sin", "dow_cos"])
+            else:
+                result["day_of_week"] = dow
+                columns.append("day_of_week")
+
+        if config.include_month:
+            month = dates.dt.month
+            if config.use_cyclical_encoding:
+                result["month_sin"] = np.sin(2 * np.pi * month / 12)
+                result["month_cos"] = np.cos(2 * np.pi * month / 12)
+                columns.extend(["month_sin", "month_cos"])
+            else:
+                result["month"] = month
+                columns.append("month")
+
+        if config.include_quarter:
+            result["quarter"] = dates.dt.quarter
+            columns.append("quarter")
+
+        if config.include_year:
+            result["year"] = dates.dt.year
+            columns.append("year")
+
+        if config.include_is_weekend:
+            result["is_weekend"] = dates.dt.dayofweek.isin([5, 6]).astype(int)
+            columns.append("is_weekend")
+
+        if config.include_is_month_end:
+            result["is_month_end"] = dates.dt.is_month_end.astype(int)
+            columns.append("is_month_end")
+
+        # is_holiday would require calendar table lookup
+        # Handled separately if data is joined from Calendar table
+
+        return result, columns


⚠️ Potential issue | 🟠 Major

Honor include_is_holiday in feature outputs.
include_is_holiday is set in config and the calendar data can be merged, but _compute_calendar_features never adds the is_holiday column to feature_columns, so it’s omitted from responses and stats.

🔧 Suggested fix

if config.include_is_month_end: result["is_month_end"] = dates.dt.is_month_end.astype(int) columns.append("is_month_end") - # is_holiday would require calendar table lookup - # Handled separately if data is joined from Calendar table + if config.include_is_holiday and "is_holiday" in result.columns: + result["is_holiday"] = result["is_holiday"].astype(int) + columns.append("is_holiday")

🤖 Prompt for AI Agents

In `@app/features/featuresets/service.py` around lines 257 - 296, _compute_calendar_features currently ignores config.include_is_holiday so the holiday flag never gets added to outputs; modify the method to check config.include_is_holiday and when true set result["is_holiday"] = dates["is_holiday"].astype(int) (or dates.dt.is_holiday if using a datetime accessor) and append "is_holiday" to columns so the feature is returned and included in stats; ensure you follow the same casting/typing pattern used for is_weekend/is_month_end in the function.

coderabbitai · 2026-01-31T23:52:50Z

+    def _compute_exogenous_features(self, df: pd.DataFrame) -> tuple[pd.DataFrame, list[str]]:
+        """Compute exogenous features (price, promo, inventory).
+
+        CRITICAL: All exogenous features are lagged to prevent leakage.
+
+        Args:
+            df: Input dataframe with exogenous columns.
+
+        Returns:
+            Tuple of (dataframe with exogenous features, list of new column names).
+        """
+        config = self.config.exogenous_config
+        if config is None:
+            raise RuntimeError("_compute_exogenous_features called without exogenous_config")
+
+        result = df.copy()
+        columns: list[str] = []
+
+        # Price features (if price column exists)
+        if config.include_price and "unit_price" in df.columns:
+            for lag in config.price_lags:
+                col_name = f"price_lag_{lag}"
+                result[col_name] = df.groupby(self.entity_cols, observed=True)["unit_price"].shift(
+                    lag
+                )
+                columns.append(col_name)
+
+            if config.include_price_change:
+                # CRITICAL: shift(1) before pct_change to prevent using current price
+                # This computes: (price[t-1] - price[t-8]) / price[t-8]
+                # Without shift(1), it would use current price at t, causing leakage
+                result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[
+                    "unit_price"
+                ].transform(lambda x: x.shift(1).pct_change(periods=7))
+                columns.append("price_pct_change_7d")
+
+        # Stockout flag (if inventory column exists)
+        if config.include_stockout_flag and "is_stockout" in df.columns:
+            # Lagged stockout flag (yesterday's stockout)
+            result["stockout_lag_1"] = df.groupby(self.entity_cols, observed=True)[
+                "is_stockout"
+            ].shift(1)
+            columns.append("stockout_lag_1")
+
+        return result, columns


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for promo/inventory feature references and ExogenousConfig definition rg -n "include_promo|include_inventory|promo|inventory" app/features/featuresets -g '*.py'

Repository: w7-mgfcode/ForecastLabAI

Length of output: 911

🏁 Script executed:

#!/bin/bash # Find and examine ExogenousConfig definition fd -e py -x grep -l "class ExogenousConfig" {} \;

Repository: w7-mgfcode/ForecastLabAI

Length of output: 105

🏁 Script executed:

#!/bin/bash # Get the full path and examine ExogenousConfig to see what fields it defines rg -n "class ExogenousConfig" -A 50 app/features/featuresets -g '*.py'

Repository: w7-mgfcode/ForecastLabAI

Length of output: 4094

🏁 Script executed:

#!/bin/bash # Search for any handling of include_promo or include_inventory beyond what we've seen rg -n "include_promo|include_inventory" app/features/featuresets -g '*.py' -B 2 -A 2

Repository: w7-mgfcode/ForecastLabAI

Length of output: 990

🏁 Script executed:

#!/bin/bash # Check if there are any other methods in the service that might compute promo/inventory rg -n "def.*promo|def.*inventory" app/features/featuresets -g '*.py'

Repository: w7-mgfcode/ForecastLabAI

Length of output: 50

🏁 Script executed:

#!/bin/bash # Check if _compute_exogenous_features is called/tested and how these flags are used rg -n "_compute_exogenous_features" app/features/featuresets -g '*.py' -B 2 -A 2

Repository: w7-mgfcode/ForecastLabAI

Length of output: 1249

Implement promo and inventory feature computation or add validation to prevent silent misconfiguration.

ExogenousConfig exposes include_promo (defaults to True) and include_inventory flags, but _compute_exogenous_features() only implements price and stockout features. These flags are silently ignored with no error, creating a functional gap where users expecting promo/inventory features receive none. Either implement the missing feature computation or add explicit validation to raise an error if these flags are enabled.

🤖 Prompt for AI Agents

In `@app/features/featuresets/service.py` around lines 360 - 404, The _compute_exogenous_features method currently ignores ExogenousConfig.include_promo and include_inventory; either implement those feature computations or fail fast. Update _compute_exogenous_features to (A) implement promo features when config.include_promo is true by reading the promo column (e.g., "is_promo" or configured name), creating lagged promo columns analogous to price (use groupby(self.entity_cols).shift(lag) for each config.promo_lags and any pct/change or flag derivations), and (B) implement inventory features when config.include_inventory is true by using the inventory column (e.g., "on_hand" or configured name) to produce lagged inventory columns and derived indicators (like inventory_pct_change or inventory_level_lag_N) similar to how price/stockout are done; if you prefer validation instead, add an explicit check at the start of _compute_exogenous_features that raises a RuntimeError when config.include_promo or config.include_inventory is true but no corresponding implementation/columns exist, referencing the ExogenousConfig flags and the _compute_exogenous_features method so reviewers can locate changes.

coderabbitai · 2026-01-31T23:52:50Z

+### 1. Feature Configuration Schemas
+
+**File**: `app/features/featuresets/schemas.py`
+
+Pydantic v2 schemas with frozen configs for reproducibility:
+
+| Schema | Purpose |
+|--------|---------|
+| `LagConfig` | Lag feature configuration (lags, target_column, fill_value) |
+| `RollingConfig` | Rolling window configuration (windows, aggregations, min_periods) |
+| `CalendarConfig` | Calendar feature configuration (day_of_week, month, cyclical encoding) |
+| `ExogenousConfig` | Exogenous feature configuration (price_lags, stockout flags) |
+| `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, drop) |
+| `FeatureSetConfig` | Complete feature configuration with config_hash() |
+| `ComputeFeaturesRequest` | API request schema with validation |


⚠️ Potential issue | 🟡 Minor

Document the new expanding_mean strategy.
The service supports expanding_mean, but the doc lists only five strategies in both the deliverables table and the Imputation section.

📝 Suggested edits

-| `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, drop) | +| `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, expanding_mean, drop) | -**Strategies**: `zero`, `ffill`, `bfill`, `mean`, `drop` +**Strategies**: `zero`, `ffill`, `bfill`, `mean`, `expanding_mean`, `drop`

Also applies to: 199-211

🤖 Prompt for AI Agents

In `@docs/PHASE/3-FEATURE_ENGINEERING.md` around lines 19 - 33, Update the documentation to include the missing expanding_mean imputation strategy: in the section describing ImputationConfig and the deliverables table listing strategies, add "expanding_mean" to the list of strategies alongside zero, ffill, bfill, mean, drop, and update any descriptive text to briefly note that expanding_mean computes a cumulative mean per series (expanding window) as an imputation option; ensure the symbol ImputationConfig and any mentions of FeatureSetConfig/ComputeFeaturesRequest remain consistent after adding the new strategy.

coderabbitai · 2026-01-31T23:52:50Z

+        for window in config.windows:
+            min_per = config.min_periods if config.min_periods else window
+


⚠️ Potential issue | 🟡 Minor

Preserve explicit min_periods=0 in pseudocode.
The PRP uses a falsy check, which would override an explicit 0. Align this snippet with the implemented is None check.

📝 Suggested edit

- min_per = config.min_periods if config.min_periods else window + min_per = window if config.min_periods is None else config.min_periods

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for window in config.windows:

min_per = config.min_periods if config.min_periods else window

for window in config.windows:

min_per = window if config.min_periods is None else config.min_periods

🤖 Prompt for AI Agents

In `@PRPs/PRP-4-feature-engineering.md` around lines 773 - 775, The current pseudocode uses a falsy check when assigning min_per inside the loop over config.windows (for window in config.windows) which will override an explicit 0 in config.min_periods; change the condition to explicitly test for None so min_per preserves 0 (i.e., set min_per to config.min_periods if config.min_periods is not None, otherwise to window), updating the assignment that defines min_per.

coderabbitai · 2026-01-31T23:52:50Z

+async def preview_features() -> None:
+    """Preview features for store 1, product 1."""
+    async with httpx.AsyncClient() as client:
+        response = await client.post(
+            "http://localhost:8123/featuresets/compute",
+            json={
+                "store_id": 1,
+                "product_id": 1,
+                "cutoff_date": "2024-01-31",
+                "lookback_days": 60,
+                "config": {
+                    "schema_version": "1.0",
+                    "name": "preview_test",
+                    "lag_config": {
+                        "schema_version": "1.0",
+                        "lags": [1, 7, 14],
+                    },
+                    "rolling_config": {
+                        "schema_version": "1.0",
+                        "windows": [7, 14],
+                        "aggregations": ["mean", "std"],
+                    },


⚠️ Potential issue | 🟡 Minor

Preview example should call the preview endpoint and schema.
The preview_features.py example posts to /featuresets/compute and uses lookback_days, which doesn’t match the Preview request schema.

📝 Suggested edit

- response = await client.post( - "http://localhost:8123/featuresets/compute", + response = await client.post( + "http://localhost:8123/featuresets/preview", json={ "store_id": 1, "product_id": 1, "cutoff_date": "2024-01-31", - "lookback_days": 60, + "sample_rows": 10, "config": {

🤖 Prompt for AI Agents

In `@PRPs/PRP-4-feature-engineering.md` around lines 1123 - 1144, The preview_features() example is calling the compute endpoint and using lookback_days which doesn't match the Preview request schema; change the POST URL from "/featuresets/compute" to "/featuresets/preview" and update the JSON payload to conform to the PreviewRequest schema (remove or replace "lookback_days" and any compute-specific fields with the preview-specific fields expected by PreviewRequest), keeping the same identifying caller function preview_features() so the example exercises the preview endpoint and schema correctly.

* feat(featuresets): implement time-safe feature engineering layer (#24) * docs: update INITIAL-4 and INITIAL-5 with additional references - Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(featuresets): implement time-safe feature engineering layer Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update INITIAL-5.md * docs: update documentation for Phase 3 Feature Engineering - README.md: Add featuresets module to project structure and API endpoints - docs/ARCHITECTURE.md: Add Feature Engineering section (section 6) - docs/PHASE-index.md: Mark Phase 3 as completed with summary - docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(featuresets): address code review feedback and prevent data leakage Routes: - Validate store_id/product_id presence (no silent defaults to 0) - Convert ValueError for unsupported date types to HTTP 400 Service: - Add expanding_mean imputation strategy (time-safe alternative) - Add warnings when bfill/mean strategies are used (leakage risk) - Fix price_pct_change_7d to use shift(1) before pct_change Schemas: - Add expanding_mean to ImputationConfig Literal type - Document time-safety of each imputation strategy - Fix PreviewFeaturesRequest docstring: GET → POST Documentation: - Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md - Fix PRP-4 to show POST for preview endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: format schemas.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review feedback Service: - Fix min_periods falsy check to explicit None check (preserves 0) Tests: - Add expanding_mean to test_valid_strategies, expect 6 strategies Documentation: - Update PR reference from #24 to #25 - Fix all GET /featuresets/preview to POST in PRP-4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update DAILY-FLOW.md for Phase 4 Forecasting (#27) * docs: update DAILY-FLOW.md for Phase 4 Forecasting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PRP-5 for Forecasting module Comprehensive PRP including: - Model zoo (naive, seasonal naive, moving average) - Unified BaseForecaster interface (fit/predict/serialize) - ModelBundle persistence with joblib - 15 ordered implementation tasks - 40+ test cases specified - Integration with FeatureEngineeringService Confidence: 8/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * feat(forecasting): implement baseline model zoo and unified interface (#28) * feat(forecasting): implement baseline model zoo and unified interface Add forecasting module (PRP-5) with: - BaseForecaster ABC with scikit-learn-style interface (fit/predict) - NaiveForecaster, SeasonalNaiveForecaster, MovingAverageForecaster - ModelBundle persistence with joblib serialization - POST /forecasting/train and /forecasting/predict endpoints - ForecastingService for orchestration - 81 unit tests covering schemas, models, persistence, and service - Example scripts demonstrating each baseline model - LightGBM placeholder (feature-flagged, not yet implemented) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Update documentation for forecasting module (PRP-5) - Add forecasting API endpoints to README.md with examples - Update ARCHITECTURE.md with forecasting implementation details - Add scikit-learn and joblib to dependencies list - Add forecasting config variables to .env.example - Mark forecasting module as IMPLEMENTED in architecture docs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address CI lint and type check failures - Add type: ignore for intentional type mismatch in frozen config test - Add S101 ignore for examples/ to allow assert statements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix(forecasting): add security validations and fix documentation Security improvements: - Add constructor validation for season_length >= 1 in SeasonalNaiveForecaster - Add constructor validation for window_size >= 1 in MovingAverageForecaster - Add path traversal prevention in ForecastingService.predict() - Validate .joblib extension and artifacts directory containment - Log rejection reasons for security auditing Test improvements: - Fix get_settings patching to wrap ForecastingService construction - Add tests for constructor validation - Add tests for path traversal and extension validation Documentation fixes: - Fix config parameter names in ARCHITECTURE.md (season_length, window_size) - Fix README example to use season_length instead of seasonal_period - Fix markdown issues in PRP-5 (code fences, ATX headings) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

sourcery-ai Bot reviewed Jan 31, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 31, 2026

View reviewed changes

Comment thread app/features/featuresets/service.py

Comment thread app/features/featuresets/tests/test_schemas.py

Comment thread docs/PHASE/3-FEATURE_ENGINEERING.md

Comment thread PRPs/PRP-4-feature-engineering.md

w7-learn approved these changes Jan 31, 2026

View reviewed changes

w7-mgfcode merged commit 888c082 into main Jan 31, 2026
11 of 12 checks passed

coderabbitai Bot reviewed Jan 31, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request Feb 2, 2026

Release: Phase 10 Dashboard + The Forge (Randomized Database Seeder) #67

Merged

5 tasks

This was referenced May 14, 2026

feat(release): cut v0.2.9 with phase 2 e2e + strict-mode + main protection (#108) #124

Merged

feat: release v0.2.10 — demo showcase page + e2e pipeline #134

Merged

		for window in config.windows:
		min_per = config.min_periods if config.min_periods else window

Conversation

w7-mgfcode commented Jan 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 3: Feature Engineering (PRP-4)

Code Quality Fixes

Validation

Summary by CodeRabbit

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

socket-security Bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

w7-learn commented Jan 31, 2026

Uh oh!

coderabbitai Bot commented Jan 31, 2026

Chat

CodeRabbit commands

Other keywords and placeholders

CodeRabbit configuration file (.coderabbit.yaml)

Status, support, documentation and community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

w7-mgfcode commented Jan 31, 2026 •

edited by coderabbitai Bot

Loading

socket-security Bot commented Jan 31, 2026 •

edited

Loading

coderabbitai Bot commented Jan 31, 2026 •

edited

Loading

CodeRabbit configuration file (`.coderabbit.yaml`)