feat(featuresets): implement time-safe feature engineering layer by w7-mgfcode · Pull Request #23 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-01-31T23:10:14Z

Summary

Implement complete feature engineering module (PRP-4) with time-safe feature computation
Add FeatureEngineeringService with CRITICAL leakage prevention patterns
Create FastAPI endpoints for feature computation and preview
Comprehensive test suite with 55 tests including leakage prevention tests

Features

Lag features: Positive shift() only to prevent future data access
Rolling features: shift(1) BEFORE rolling to exclude current observation
Calendar features: Cyclical encoding (sin/cos) for day of week, month
Group isolation: Entity-aware groupby prevents cross-series leakage
Imputation strategies: Zero-fill for sales, forward-fill for prices
Cutoff enforcement: Data filtered before any computation

Files Added

app/features/featuresets/ - Complete module (schemas, service, routes)
app/features/featuresets/tests/ - 55 unit tests
examples/compute_features_demo.py - Demo script
PRPs/PRP-4-feature-engineering.md - PRP document

Test plan

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added feature engineering pipeline with time-safe computation and leakage prevention.
- Introduced /featuresets/compute and /featuresets/preview API endpoints for configurable feature generation.
- Support for lag, rolling window, calendar, and exogenous features with flexible imputation strategies.
Documentation
- Added example demo script demonstrating feature computation and preview workflows.
Dependencies
- Added pandas and numpy as runtime dependencies.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

sourcery-ai

Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

coderabbitai · 2026-01-31T23:10:28Z

📝 Walkthrough

Walkthrough

This PR introduces a production-ready, time-safe feature engineering vertical slice for ForecastLabAI. It adds configuration schemas for lag, rolling, calendar, exogenous, and imputation features; a FeatureEngineeringService implementing leakage-free feature computation; FastAPI endpoints for computing and previewing features; comprehensive tests; and example scripts. Dependencies on pandas and numpy are added.

Changes

Cohort / File(s)	Summary
Documentation & Configuration `INITIAL-4.md`, `INITIAL-5.md`, `app/core/config.py`	Added feature engineering documentation references and three new Settings fields (feature_max_lookback_days, feature_max_lag, feature_max_window) to enforce size constraints.
Feature Engineering Schemas `app/features/featuresets/schemas.py`, `app/features/featuresets/__init__.py`	Introduced Pydantic-based configuration schemas (FeatureSetConfig, LagConfig, RollingConfig, CalendarConfig, ExogenousConfig, ImputationConfig) with validation, immutability, and config-hash support; added API request/response schemas (ComputeFeaturesRequest/Response, PreviewFeaturesRequest) with FeatureRow data model.
Feature Engineering Service & Data Loading `app/features/featuresets/service.py`	Implemented FeatureEngineeringService with time-safe feature computation (cutoff handling, lag, rolling, calendar, exogenous, imputation); added FeatureDataLoader for database retrieval; introduced compute_features_for_series orchestration function.
API Routes & Integration `app/features/featuresets/routes.py`, `app/main.py`	Added FastAPI router with POST /featuresets/compute and GET /featuresets/preview endpoints; integrated router into main app; both endpoints handle data validation, NaN normalization, error mapping, and structured logging.
Tests & Fixtures `app/features/featuresets/tests/*`	Added conftest.py with time-series and config fixtures; comprehensive test suites for schemas (validation, defaults, immutability), service (lag, rolling, calendar, imputation, cutoff enforcement), leakage detection (cross-series isolation, future-data prevention), and route integration.
Examples & Demo Scripts `examples/compute_features_demo.py`	Added demonstration script showing feature engineering API usage via httpx, including sample config creation, preview and full computation invocations, error handling, and result parsing.
Dependencies `pyproject.toml`	Added pandas>=3.0.0 and numpy>=2.4.1 as runtime dependencies; introduced pandas-stubs>=2.3.3.260113 in dev dependency group.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Route as FastAPI Route
    participant DataLoader as FeatureDataLoader
    participant DB as Database
    participant Service as FeatureEngineeringService
    
    Client->>Route: POST /featuresets/compute (store_id, product_id, cutoff_date, config)
    Route->>Route: Validate ComputeFeaturesRequest
    Route->>DataLoader: load_sales_data(store_id, product_id, date_range)
    DataLoader->>DB: Query SalesDaily records
    DB-->>DataLoader: Return sales data
    Route->>DataLoader: load_calendar_data(date_range)
    DataLoader->>DB: Query Calendar records
    DB-->>DataLoader: Return calendar data
    Route->>Service: compute_features(merged_df, cutoff_date)
    Service->>Service: Sort & filter by cutoff_date
    Service->>Service: Apply imputation (group-aware)
    Service->>Service: Compute lag features (per-entity)
    Service->>Service: Compute rolling features (shifted)
    Service->>Service: Compute calendar features (with cyclical encoding)
    Service->>Service: Compute exogenous features
    Service->>Service: Collect stats (nulls, row counts)
    Service-->>Route: Return FeatureComputationResult
    Route->>Route: Convert DataFrame to FeatureRow list
    Route->>Route: Handle NaN/None normalization
    Route-->>Client: ComputeFeaturesResponse (rows, columns, hash, stats)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

chore: merge dev to main (Phase 1 Data Platform) #13: The feature engineering service directly depends on SalesDaily and Calendar models introduced in the data-platform PR, creating a code-level integration point.

Suggested reviewers

w7-learn

Poem

🐰 Features flow like clover in spring,
Lags and rolling means in a line,
Cutoff dates keep the leakage away,
Time-safe computing—a feature buffet! 🌾

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: implementing a time-safe feature engineering layer, which is the primary objective of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 98.88% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/prp-4-feature-engineering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

socket-security · 2026-01-31T23:10:31Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	numpy@2.4.1
	pandas@3.0.0

View full report

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@app/features/featuresets/routes.py`:
- Around line 135-136: The code silently defaults missing store_id/product_id to
0 by using record.get("store_id", 0) / record.get("product_id", 0); update the
parsing logic that constructs the store_id/product_id (where record is used) to
explicitly validate presence and value (e.g., require the keys exist and that
int(record["store_id"]) >= 1 and int(record["product_id"]) >= 1), and on failure
raise a clear exception (ValueError) or log an error including the offending
record content instead of silently using 0 so corrupted rows are surfaced
immediately.
- Around line 129-130: The raise ValueError for unsupported date types (the
branch raising "Cannot extract date from {type(date_val)}") can surface as a
500; update the handler around that logic in app/features/featuresets/routes.py
so invalid date types are caught and translated into a client-facing error
(e.g., raise a ValidationError or an HTTP 400/BadRequest exception) that
includes the offending type and a clear message; locate the block that inspects
date_val, replace or wrap the ValueError with a caught/converted error response
(or re-raise a framework ValidationError/HTTPException) so callers receive a 4xx
with details instead of an unhandled 500.

In `@app/features/featuresets/service.py`:
- Around line 324-329: The bfill and mean strategies (in the branch where
strategy == "bfill" and strategy == "mean" that operate on result grouped by
self.entity_cols) leak future information; update the function/class docstring
to note this limitation, add an "expanding_mean" (or "expanding-mean") option
that computes groupby(...)[col].transform(lambda x:
x.expanding().mean().fillna(...)) so each row only uses past values, and in
configuration/validation logic emit a warning or error when users select "bfill"
or plain "mean" for time-series use to guide them toward "expanding_mean".
Ensure references to the symbols result, self.entity_cols, strategy, bfill,
mean, and expanding_mean are used so reviewers can locate the changes.
- Around line 362-367: The price_pct_change_7d calculation leaks by using the
current row's unit_price; change the pipeline to compute percent change on a
1-day-shifted series so only historical prices are used (i.e., take
df.groupby(self.entity_cols, observed=True)["unit_price"].shift(1) and then
.pct_change(periods=7) so the feature at time T uses prices up through T-1).
Keep the result key price_pct_change_7d and
columns.append("price_pct_change_7d") as before.

In `@INITIAL-4.md`:
- Around line 21-23: Replace the three bare URLs
(https://scikit-learn.org/stable/modules/compose.html,
https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast,
https://www.sktime.net/en/stable/api_reference/transformations.html) with
Markdown links using the [text](url) syntax (e.g., [scikit-learn compose
module](https://scikit-learn.org/stable/modules/compose.html)) so they are not
bare URLs and comply with markdownlint MD034; update the visible link text to a
concise descriptive label for each URL.

In `@INITIAL-5.md`:
- Around line 31-33: Replace the bare URLs with markdown links by converting
each plain URL into the [text](url) form; specifically update the three
occurrences (https://scikit-learn.org/stable/modules/compose.html,
https://scikit-learn.org/stable/glossary.html,
https://scikit-learn.org/stable/model_persistence.html) to descriptive link text
(e.g., [Scikit-learn: ColumnTransformer
documentation](https://scikit-learn.org/stable/modules/compose.html)) so the
file no longer triggers MD034; edit the lines containing those URLs in
INITIAL-5.md (look for the exact URL strings) and replace them with appropriate
markdown link syntax.

In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 8-12: The PRP lists the preview endpoint as GET but the
implemented route uses router.post("/preview"); update the documentation entry
in PRP-4-feature-engineering.md to read "POST /featuresets/preview" instead of
"GET /featuresets/preview" so it matches the implemented route
(router.post("/preview")) and the API behavior.

🧹 Nitpick comments (13)

pyproject.toml (2)
163-167: Avoid dev-dependency drift between optional-dependencies and dependency-groups.
pandas-stubs only appears in dependency-groups.dev, so pip install -e .[dev] won’t include it. Consider adding it to the dev extra (or document that dependency-groups is the required path).
♻️ Suggested consolidation
 [project.optional-dependencies]
 dev = [
     "pytest>=8.3.0",
     "pytest-asyncio>=0.24.0",
     "pytest-cov>=6.0.0",
     "httpx>=0.28.0",
     "ruff>=0.8.0",
     "mypy>=1.13.0",
     "pyright>=1.1.390",
+    "pandas-stubs>=2.3.3.260113",
 ]
19-20: Consider upper-bounding pandas and numpy versions, and plan a staged upgrade for pandas 3.0.0.

The specified versions are available: pandas 3.0.0 (released Jan 21, 2026) and numpy 2.4.1 (released Jan 10, 2026) both support Python 3.12. However, pandas 3.0.0 release notes recommend upgrading to pandas 2.3.x first to clear deprecation warnings before migrating to 3.0.0. Adding upper bounds prevents unexpected major version upgrades and is a good practice.
♻️ Proposed guardrail version range
-    "pandas>=3.0.0",
-    "numpy>=2.4.1",
+    "pandas>=3.0.0,<4.0.0",
+    "numpy>=2.4.1,<3.0.0",
app/core/config.py (1)
38-41: Add Pydantic Field bounds for feature caps.
These caps act as guardrails; adding ge/le prevents invalid env overrides and documents intent.
♻️ Suggested change
-from pydantic_settings import BaseSettings, SettingsConfigDict
+from pydantic import Field
+from pydantic_settings import BaseSettings, SettingsConfigDict
@@
-    feature_max_lookback_days: int = 1095  # 3 years
-    feature_max_lag: int = 365
-    feature_max_window: int = 90
+    feature_max_lookback_days: int = Field(1095, ge=30, le=1825, description="Max lookback days")
+    feature_max_lag: int = Field(365, ge=1, le=730, description="Max lag value")
+    feature_max_window: int = Field(90, ge=1, le=365, description="Max rolling window")
app/features/featuresets/tests/conftest.py (1)
140-147: Consider adding schema_version to the nested LagConfig for consistency.
This keeps fixtures aligned if schema_version becomes required across configs.
♻️ Suggested tweak
-        lag_config=LagConfig(lags=(1,)),
+        lag_config=LagConfig(schema_version="1.0", lags=(1,)),
app/features/featuresets/tests/test_service.py (1)

234-248: Consider strengthening the ffill assertion.

The assertion non_null_count >= len(result.df) - 1 is permissive and may not catch subtle bugs. If the first row has a valid value, forward fill should result in zero NaNs. Consider asserting the exact expected behavior based on your test fixture's data.
app/features/featuresets/schemas.py (2)
177-183: Inconsistent empty tuple handling for price_lags.

Unlike LagConfig.validate_lags_positive and RollingConfig.validate_windows_positive, this validator doesn't reject an empty tuple. Consider adding a check for consistency, or document if empty price_lags is intentionally allowed (to disable price lagging while keeping other exogenous features).
♻️ Suggested fix for consistency
 `@field_validator`("price_lags")
 `@classmethod`
 def validate_price_lags_positive(cls, v: tuple[int, ...]) -> tuple[int, ...]:
     """Ensure all price lags are positive."""
+    if not v:
+        raise ValueError("At least one price lag must be specified")
     if any(lag <= 0 for lag in v):
         raise ValueError("All price lags must be positive integers")
     return v
186-207: Mutable dict in frozen model undermines immutability guarantee.

ImputationConfig.strategies is a dict, which is mutable. Although Pydantic v2's frozen=True prevents reassignment of the field itself (e.g., config.strategies = {...}), it does not prevent in-place mutation of the dict contents (e.g., config.strategies["new_col"] = "zero"). This is known as "faux immutability" and can undermine the intent of a frozen model.

Consider using an immutable mapping type (e.g., types.MappingProxyType or a frozendict library) or document this limitation.
app/features/featuresets/routes.py (1)
105-139: Extract shared record-to-FeatureRow conversion logic.

The record conversion logic (handling NaN/None in features and date extraction) is nearly identical between compute_features and preview_features. Consider extracting a helper function to reduce duplication and ensure consistent behavior.
♻️ Suggested helper extraction
def _convert_records_to_rows(
    records: list[dict],
    feature_columns: list[str],
    date_column: str,
) -> list[FeatureRow]:
    """Convert DataFrame records to FeatureRow objects."""
    rows: list[FeatureRow] = []
    for record in records:
        features: dict[str, float | int | None] = {}
        for col in feature_columns:
            val = record.get(col)
            if val is None or (isinstance(val, float) and math.isnan(val)):
                features[col] = None
            elif isinstance(val, (int, float)):
                features[col] = float(val) if isinstance(val, float) else int(val)
            else:
                features[col] = None

        date_val = record.get(date_column)
        if isinstance(date_val, pd.Timestamp):
            row_date = date_val.date()
        elif isinstance(date_val, date_type):
            row_date = date_val
        elif date_val is not None and hasattr(date_val, "date"):
            row_date = date_val.date()
        else:
            raise ValueError(f"Cannot extract date from {type(date_val)}")

        rows.append(
            FeatureRow(
                date=row_date,
                store_id=int(record.get("store_id", 0)),
                product_id=int(record.get("product_id", 0)),
                features=features,
            )
        )
    return rows
Also applies to: 243-276
examples/compute_features_demo.py (1)
22-24: Consider making API_BASE configurable.

Hardcoding localhost:8123 limits flexibility. Consider using an environment variable with a default for easier testing against different environments.
♻️ Suggested improvement
+import os
+
 # API configuration
-API_BASE = "http://localhost:8123"
+API_BASE = os.environ.get("FORECASTLAB_API_BASE", "http://localhost:8123")
 FEATURES_ENDPOINT = f"{API_BASE}/featuresets"
app/features/featuresets/service.py (4)
26-27: Unused TYPE_CHECKING block.

The TYPE_CHECKING block contains only pass and no conditional imports. Consider removing it or adding the intended type-only imports.
🔧 Proposed fix
-if TYPE_CHECKING:
-    pass
-
183-191: Inconsistent use of df vs result in lag computation.

Line 186-188 uses df (the original input) for the groupby/shift operation, but then assigns to result. This works because df and result have the same data at this point (result = df.copy()), but it's inconsistent with other methods like _compute_rolling_features which also use df. While functionally correct here, using result consistently would be clearer for maintainability.

267-275: Cyclical encoding for month may not correctly represent adjacency of December and January.

The current formula np.sin(2 * np.pi * month / 12) maps month values 1-12. However, this means:

January (1) → sin(2π/12) ≈ 0.5

December (12) → sin(2π) = 0

For proper cyclical encoding where December and January should be adjacent, consider using (month - 1) / 12 to map months to [0, 11/12]:
🔧 Proposed fix
         if config.include_month:
             month = dates.dt.month
             if config.use_cyclical_encoding:
-                result["month_sin"] = np.sin(2 * np.pi * month / 12)
-                result["month_cos"] = np.cos(2 * np.pi * month / 12)
+                result["month_sin"] = np.sin(2 * np.pi * (month - 1) / 12)
+                result["month_cos"] = np.cos(2 * np.pi * (month - 1) / 12)
                 columns.extend(["month_sin", "month_cos"])
386-444: Consider adding explicit error handling for database operations.

The load_sales_data method executes a database query but doesn't handle potential exceptions (e.g., connection failures, query timeouts). While the caller (compute_features_for_series) may handle this at a higher level, and the routes.py shows DatabaseError handling, documenting or handling database errors here could improve debuggability.

coderabbitai · 2026-01-31T23:18:17Z

+            else:
+                raise ValueError(f"Cannot extract date from {type(date_val)}")


⚠️ Potential issue | 🟡 Minor

Unhandled ValueError may leak as 500 error.

The ValueError raised here for invalid date types will propagate as an unhandled exception, resulting in a generic 500 error. Consider catching this and converting to a more informative error response (e.g., ValidationError or a custom error with details).

🛡️ Suggested error handling

+from app.core.exceptions import ValidationError ... else: - raise ValueError(f"Cannot extract date from {type(date_val)}") + raise ValidationError( + message=f"Cannot extract date from column '{request.config.date_column}'", + details={"date_type": str(type(date_val))}, + )

🤖 Prompt for AI Agents

In `@app/features/featuresets/routes.py` around lines 129 - 130, The raise ValueError for unsupported date types (the branch raising "Cannot extract date from {type(date_val)}") can surface as a 500; update the handler around that logic in app/features/featuresets/routes.py so invalid date types are caught and translated into a client-facing error (e.g., raise a ValidationError or an HTTP 400/BadRequest exception) that includes the offending type and a clear message; locate the block that inspects date_val, replace or wrap the ValueError with a caught/converted error response (or re-raise a framework ValidationError/HTTPException) so callers receive a 4xx with details instead of an unhandled 500.

coderabbitai · 2026-01-31T23:18:17Z

+                    store_id=int(record.get("store_id", 0)),
+                    product_id=int(record.get("product_id", 0)),


⚠️ Potential issue | 🟡 Minor

Silent default of 0 for missing store_id/product_id.

Using record.get("store_id", 0) and record.get("product_id", 0) silently defaults to 0 if these columns are missing. Since the API validates that store_id >= 1 and product_id >= 1, a 0 value would indicate corrupted data. Consider raising an error or logging a warning instead of silently defaulting.

🤖 Prompt for AI Agents

In `@app/features/featuresets/routes.py` around lines 135 - 136, The code silently defaults missing store_id/product_id to 0 by using record.get("store_id", 0) / record.get("product_id", 0); update the parsing logic that constructs the store_id/product_id (where record is used) to explicitly validate presence and value (e.g., require the keys exist and that int(record["store_id"]) >= 1 and int(record["product_id"]) >= 1), and on failure raise a clear exception (ValueError) or log an error including the offending record content instead of silently using 0 so corrupted rows are surfaced immediately.

coderabbitai · 2026-01-31T23:18:17Z

+            elif strategy == "bfill":
+                result[col] = result.groupby(self.entity_cols, observed=True)[col].bfill()
+            elif strategy == "mean":
+                result[col] = result.groupby(self.entity_cols, observed=True)[col].transform(
+                    lambda x: x.fillna(x.mean())
+                )


⚠️ Potential issue | 🟡 Minor

Potential time leakage in bfill and mean imputation strategies.

While the data is filtered to cutoff_date, these strategies can still cause subtle time leakage within the training window:

bfill: Fills missing values with future values from later rows in the same series

mean: Uses the mean of all values in the group, including those after the current row

For time-series forecasting, this means a row at time t could be imputed using information from time t+1, t+2, etc., which wouldn't be available at prediction time.

Consider:

Documenting this limitation in the docstring

Adding an "expanding mean" strategy that only uses past values

Warning users in configuration validation

🤖 Prompt for AI Agents

In `@app/features/featuresets/service.py` around lines 324 - 329, The bfill and mean strategies (in the branch where strategy == "bfill" and strategy == "mean" that operate on result grouped by self.entity_cols) leak future information; update the function/class docstring to note this limitation, add an "expanding_mean" (or "expanding-mean") option that computes groupby(...)[col].transform(lambda x: x.expanding().mean().fillna(...)) so each row only uses past values, and in configuration/validation logic emit a warning or error when users select "bfill" or plain "mean" for time-series use to guide them toward "expanding_mean". Ensure references to the symbols result, self.entity_cols, strategy, bfill, mean, and expanding_mean are used so reviewers can locate the changes.

coderabbitai · 2026-01-31T23:18:17Z

+            if config.include_price_change:
+                # Price change vs 7 days ago
+                result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[
+                    "unit_price"
+                ].pct_change(periods=7)
+                columns.append("price_pct_change_7d")


⚠️ Potential issue | 🟠 Major

Potential leakage in price_pct_change_7d calculation.

The current implementation uses pct_change(periods=7) directly, which compares the current price to the price 7 days ago. At prediction time for day T, you wouldn't know price_T—only price_{T-1} and earlier.

To maintain consistency with the time-safe approach used elsewhere (e.g., shift(1) before rolling), consider shifting before calculating percent change:

🛡️ Proposed fix

if config.include_price_change: - # Price change vs 7 days ago - result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[ + # Price change: yesterday's price vs 8 days ago (shifted to avoid leakage) + result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[ "unit_price" - ].pct_change(periods=7) + ].shift(1).pct_change(periods=7) columns.append("price_pct_change_7d")

🤖 Prompt for AI Agents

In `@app/features/featuresets/service.py` around lines 362 - 367, The price_pct_change_7d calculation leaks by using the current row's unit_price; change the pipeline to compute percent change on a 1-day-shifted series so only historical prices are used (i.e., take df.groupby(self.entity_cols, observed=True)["unit_price"].shift(1) and then .pct_change(periods=7) so the feature at time T uses prices up through T-1). Keep the result key price_pct_change_7d and columns.append("price_pct_change_7d") as before.

coderabbitai · 2026-01-31T23:18:17Z

+- https://scikit-learn.org/stable/modules/compose.html
+- https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast
+- https://www.sktime.net/en/stable/api_reference/transformations.html


⚠️ Potential issue | 🟡 Minor

Fix markdownlint MD034: replace bare URLs with links.
These links should be wrapped for lint compliance.

📝 Suggested fix

-- https://scikit-learn.org/stable/modules/compose.html -- https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast -- https://www.sktime.net/en/stable/api_reference/transformations.html +- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html) +- [MLForecast blog](https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast) +- [sktime transformations API](https://www.sktime.net/en/stable/api_reference/transformations.html)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- https://scikit-learn.org/stable/modules/compose.html

- https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast

- https://www.sktime.net/en/stable/api_reference/transformations.html

- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html)

- [MLForecast blog](https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast)

- [sktime transformations API](https://www.sktime.net/en/stable/api_reference/transformations.html)

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 21-21: Bare URL used

(MD034, no-bare-urls)

[warning] 22-22: Bare URL used

(MD034, no-bare-urls)

[warning] 23-23: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In `@INITIAL-4.md` around lines 21 - 23, Replace the three bare URLs (https://scikit-learn.org/stable/modules/compose.html, https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast, https://www.sktime.net/en/stable/api_reference/transformations.html) with Markdown links using the [text](url) syntax (e.g., [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html)) so they are not bare URLs and comply with markdownlint MD034; update the visible link text to a concise descriptive label for each URL.

coderabbitai · 2026-01-31T23:18:17Z

+- https://scikit-learn.org/stable/modules/compose.html
+- https://scikit-learn.org/stable/glossary.html
+- https://scikit-learn.org/stable/model_persistence.html


⚠️ Potential issue | 🟡 Minor

Fix markdownlint MD034: replace bare URLs with links.
The linter warnings are valid; wrap these in link syntax.

📝 Suggested fix

-- https://scikit-learn.org/stable/modules/compose.html -- https://scikit-learn.org/stable/glossary.html -- https://scikit-learn.org/stable/model_persistence.html +- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html) +- [scikit-learn glossary](https://scikit-learn.org/stable/glossary.html) +- [scikit-learn model persistence](https://scikit-learn.org/stable/model_persistence.html)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- https://scikit-learn.org/stable/modules/compose.html

- https://scikit-learn.org/stable/glossary.html

- https://scikit-learn.org/stable/model_persistence.html

- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html)

- [scikit-learn glossary](https://scikit-learn.org/stable/glossary.html)

- [scikit-learn model persistence](https://scikit-learn.org/stable/model_persistence.html)

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 31-31: Bare URL used

(MD034, no-bare-urls)

[warning] 32-32: Bare URL used

(MD034, no-bare-urls)

[warning] 33-33: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In `@INITIAL-5.md` around lines 31 - 33, Replace the bare URLs with markdown links by converting each plain URL into the [text](url) form; specifically update the three occurrences (https://scikit-learn.org/stable/modules/compose.html, https://scikit-learn.org/stable/glossary.html, https://scikit-learn.org/stable/model_persistence.html) to descriptive link text (e.g., [Scikit-learn: ColumnTransformer documentation](https://scikit-learn.org/stable/modules/compose.html)) so the file no longer triggers MD034; edit the lines containing those URLs in INITIAL-5.md (look for the exact URL strings) and replace them with appropriate markdown link syntax.

coderabbitai · 2026-01-31T23:18:17Z

+- `FeatureSetConfig` — Pydantic-based configuration schema with versioning
+- `FeatureEngineeringService` — Core service computing features with cutoff enforcement
+- `POST /featuresets/compute` — API endpoint to compute features for a series
+- `GET /featuresets/preview` — Preview feature output for debugging/exploration
+- Time-safety enforced at every level (lags, rolling, calendar)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the actual HTTP method used for the preview endpoint rg -n "featuresets/preview" --type py

Repository: w7-mgfcode/ForecastLabAI

Length of output: 152

🏁 Script executed:

#!/bin/bash # Search for the preview endpoint route definition more broadly rg -n "preview" --type py -B 2 -A 2 | grep -E "(def|@|/featuresets)"

Repository: w7-mgfcode/ForecastLabAI

Length of output: 3366

Change preview endpoint from GET to POST in documentation.
The actual route implementation uses @router.post("/preview") (app/features/featuresets/routes.py:178), not GET. Update the PRP to reflect POST /featuresets/preview for consistency.

🤖 Prompt for AI Agents

In `@PRPs/PRP-4-feature-engineering.md` around lines 8 - 12, The PRP lists the preview endpoint as GET but the implemented route uses router.post("/preview"); update the documentation entry in PRP-4-feature-engineering.md to read "POST /featuresets/preview" instead of "GET /featuresets/preview" so it matches the implemented route (router.post("/preview")) and the API behavior.

w7-learn and others added 3 commits January 31, 2026 21:57

docs: update INITIAL-5.md

dcb4e7c

sourcery-ai Bot reviewed Jan 31, 2026

View reviewed changes

w7-mgfcode closed this Jan 31, 2026

coderabbitai Bot reviewed Jan 31, 2026

View reviewed changes

		else:
		raise ValueError(f"Cannot extract date from {type(date_val)}")

		store_id=int(record.get("store_id", 0)),
		product_id=int(record.get("product_id", 0)),

Conversation

w7-mgfcode commented Jan 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Files Added

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

socket-security Bot commented Jan 31, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

w7-mgfcode commented Jan 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 31, 2026 •

edited

Loading