feat(featuresets): implement time-safe feature engineering layer#23
feat(featuresets): implement time-safe feature engineering layer#23w7-mgfcode wants to merge 3 commits into
Conversation
- Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
📝 WalkthroughWalkthroughThis PR introduces a production-ready, time-safe feature engineering vertical slice for ForecastLabAI. It adds configuration schemas for lag, rolling, calendar, exogenous, and imputation features; a FeatureEngineeringService implementing leakage-free feature computation; FastAPI endpoints for computing and previewing features; comprehensive tests; and example scripts. Dependencies on pandas and numpy are added. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Route as FastAPI Route
participant DataLoader as FeatureDataLoader
participant DB as Database
participant Service as FeatureEngineeringService
Client->>Route: POST /featuresets/compute (store_id, product_id, cutoff_date, config)
Route->>Route: Validate ComputeFeaturesRequest
Route->>DataLoader: load_sales_data(store_id, product_id, date_range)
DataLoader->>DB: Query SalesDaily records
DB-->>DataLoader: Return sales data
Route->>DataLoader: load_calendar_data(date_range)
DataLoader->>DB: Query Calendar records
DB-->>DataLoader: Return calendar data
Route->>Service: compute_features(merged_df, cutoff_date)
Service->>Service: Sort & filter by cutoff_date
Service->>Service: Apply imputation (group-aware)
Service->>Service: Compute lag features (per-entity)
Service->>Service: Compute rolling features (shifted)
Service->>Service: Compute calendar features (with cyclical encoding)
Service->>Service: Compute exogenous features
Service->>Service: Collect stats (nulls, row counts)
Service-->>Route: Return FeatureComputationResult
Route->>Route: Convert DataFrame to FeatureRow list
Route->>Route: Handle NaN/None normalization
Route-->>Client: ComputeFeaturesResponse (rows, columns, hash, stats)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@app/features/featuresets/routes.py`:
- Around line 135-136: The code silently defaults missing store_id/product_id to
0 by using record.get("store_id", 0) / record.get("product_id", 0); update the
parsing logic that constructs the store_id/product_id (where record is used) to
explicitly validate presence and value (e.g., require the keys exist and that
int(record["store_id"]) >= 1 and int(record["product_id"]) >= 1), and on failure
raise a clear exception (ValueError) or log an error including the offending
record content instead of silently using 0 so corrupted rows are surfaced
immediately.
- Around line 129-130: The raise ValueError for unsupported date types (the
branch raising "Cannot extract date from {type(date_val)}") can surface as a
500; update the handler around that logic in app/features/featuresets/routes.py
so invalid date types are caught and translated into a client-facing error
(e.g., raise a ValidationError or an HTTP 400/BadRequest exception) that
includes the offending type and a clear message; locate the block that inspects
date_val, replace or wrap the ValueError with a caught/converted error response
(or re-raise a framework ValidationError/HTTPException) so callers receive a 4xx
with details instead of an unhandled 500.
In `@app/features/featuresets/service.py`:
- Around line 324-329: The bfill and mean strategies (in the branch where
strategy == "bfill" and strategy == "mean" that operate on result grouped by
self.entity_cols) leak future information; update the function/class docstring
to note this limitation, add an "expanding_mean" (or "expanding-mean") option
that computes groupby(...)[col].transform(lambda x:
x.expanding().mean().fillna(...)) so each row only uses past values, and in
configuration/validation logic emit a warning or error when users select "bfill"
or plain "mean" for time-series use to guide them toward "expanding_mean".
Ensure references to the symbols result, self.entity_cols, strategy, bfill,
mean, and expanding_mean are used so reviewers can locate the changes.
- Around line 362-367: The price_pct_change_7d calculation leaks by using the
current row's unit_price; change the pipeline to compute percent change on a
1-day-shifted series so only historical prices are used (i.e., take
df.groupby(self.entity_cols, observed=True)["unit_price"].shift(1) and then
.pct_change(periods=7) so the feature at time T uses prices up through T-1).
Keep the result key price_pct_change_7d and
columns.append("price_pct_change_7d") as before.
In `@INITIAL-4.md`:
- Around line 21-23: Replace the three bare URLs
(https://scikit-learn.org/stable/modules/compose.html,
https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast,
https://www.sktime.net/en/stable/api_reference/transformations.html) with
Markdown links using the [text](url) syntax (e.g., [scikit-learn compose
module](https://scikit-learn.org/stable/modules/compose.html)) so they are not
bare URLs and comply with markdownlint MD034; update the visible link text to a
concise descriptive label for each URL.
In `@INITIAL-5.md`:
- Around line 31-33: Replace the bare URLs with markdown links by converting
each plain URL into the [text](url) form; specifically update the three
occurrences (https://scikit-learn.org/stable/modules/compose.html,
https://scikit-learn.org/stable/glossary.html,
https://scikit-learn.org/stable/model_persistence.html) to descriptive link text
(e.g., [Scikit-learn: ColumnTransformer
documentation](https://scikit-learn.org/stable/modules/compose.html)) so the
file no longer triggers MD034; edit the lines containing those URLs in
INITIAL-5.md (look for the exact URL strings) and replace them with appropriate
markdown link syntax.
In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 8-12: The PRP lists the preview endpoint as GET but the
implemented route uses router.post("/preview"); update the documentation entry
in PRP-4-feature-engineering.md to read "POST /featuresets/preview" instead of
"GET /featuresets/preview" so it matches the implemented route
(router.post("/preview")) and the API behavior.
🧹 Nitpick comments (13)
pyproject.toml (2)
163-167: Avoid dev-dependency drift between optional-dependencies and dependency-groups.
pandas-stubsonly appears independency-groups.dev, sopip install -e .[dev]won’t include it. Consider adding it to the dev extra (or document that dependency-groups is the required path).♻️ Suggested consolidation
[project.optional-dependencies] dev = [ "pytest>=8.3.0", "pytest-asyncio>=0.24.0", "pytest-cov>=6.0.0", "httpx>=0.28.0", "ruff>=0.8.0", "mypy>=1.13.0", "pyright>=1.1.390", + "pandas-stubs>=2.3.3.260113", ]
19-20: Consider upper-bounding pandas and numpy versions, and plan a staged upgrade for pandas 3.0.0.The specified versions are available: pandas 3.0.0 (released Jan 21, 2026) and numpy 2.4.1 (released Jan 10, 2026) both support Python 3.12. However, pandas 3.0.0 release notes recommend upgrading to pandas 2.3.x first to clear deprecation warnings before migrating to 3.0.0. Adding upper bounds prevents unexpected major version upgrades and is a good practice.
♻️ Proposed guardrail version range
- "pandas>=3.0.0", - "numpy>=2.4.1", + "pandas>=3.0.0,<4.0.0", + "numpy>=2.4.1,<3.0.0",app/core/config.py (1)
38-41: Add Pydantic Field bounds for feature caps.
These caps act as guardrails; addingge/leprevents invalid env overrides and documents intent.♻️ Suggested change
-from pydantic_settings import BaseSettings, SettingsConfigDict +from pydantic import Field +from pydantic_settings import BaseSettings, SettingsConfigDict @@ - feature_max_lookback_days: int = 1095 # 3 years - feature_max_lag: int = 365 - feature_max_window: int = 90 + feature_max_lookback_days: int = Field(1095, ge=30, le=1825, description="Max lookback days") + feature_max_lag: int = Field(365, ge=1, le=730, description="Max lag value") + feature_max_window: int = Field(90, ge=1, le=365, description="Max rolling window")app/features/featuresets/tests/conftest.py (1)
140-147: Consider addingschema_versionto the nested LagConfig for consistency.
This keeps fixtures aligned if schema_version becomes required across configs.♻️ Suggested tweak
- lag_config=LagConfig(lags=(1,)), + lag_config=LagConfig(schema_version="1.0", lags=(1,)),app/features/featuresets/tests/test_service.py (1)
234-248: Consider strengthening the ffill assertion.The assertion
non_null_count >= len(result.df) - 1is permissive and may not catch subtle bugs. If the first row has a valid value, forward fill should result in zero NaNs. Consider asserting the exact expected behavior based on your test fixture's data.app/features/featuresets/schemas.py (2)
177-183: Inconsistent empty tuple handling forprice_lags.Unlike
LagConfig.validate_lags_positiveandRollingConfig.validate_windows_positive, this validator doesn't reject an empty tuple. Consider adding a check for consistency, or document if empty price_lags is intentionally allowed (to disable price lagging while keeping other exogenous features).♻️ Suggested fix for consistency
`@field_validator`("price_lags") `@classmethod` def validate_price_lags_positive(cls, v: tuple[int, ...]) -> tuple[int, ...]: """Ensure all price lags are positive.""" + if not v: + raise ValueError("At least one price lag must be specified") if any(lag <= 0 for lag in v): raise ValueError("All price lags must be positive integers") return v
186-207: Mutable dict in frozen model undermines immutability guarantee.
ImputationConfig.strategiesis adict, which is mutable. Although Pydantic v2'sfrozen=Trueprevents reassignment of the field itself (e.g.,config.strategies = {...}), it does not prevent in-place mutation of the dict contents (e.g.,config.strategies["new_col"] = "zero"). This is known as "faux immutability" and can undermine the intent of a frozen model.Consider using an immutable mapping type (e.g.,
types.MappingProxyTypeor afrozendictlibrary) or document this limitation.app/features/featuresets/routes.py (1)
105-139: Extract shared record-to-FeatureRow conversion logic.The record conversion logic (handling NaN/None in features and date extraction) is nearly identical between
compute_featuresandpreview_features. Consider extracting a helper function to reduce duplication and ensure consistent behavior.♻️ Suggested helper extraction
def _convert_records_to_rows( records: list[dict], feature_columns: list[str], date_column: str, ) -> list[FeatureRow]: """Convert DataFrame records to FeatureRow objects.""" rows: list[FeatureRow] = [] for record in records: features: dict[str, float | int | None] = {} for col in feature_columns: val = record.get(col) if val is None or (isinstance(val, float) and math.isnan(val)): features[col] = None elif isinstance(val, (int, float)): features[col] = float(val) if isinstance(val, float) else int(val) else: features[col] = None date_val = record.get(date_column) if isinstance(date_val, pd.Timestamp): row_date = date_val.date() elif isinstance(date_val, date_type): row_date = date_val elif date_val is not None and hasattr(date_val, "date"): row_date = date_val.date() else: raise ValueError(f"Cannot extract date from {type(date_val)}") rows.append( FeatureRow( date=row_date, store_id=int(record.get("store_id", 0)), product_id=int(record.get("product_id", 0)), features=features, ) ) return rowsAlso applies to: 243-276
examples/compute_features_demo.py (1)
22-24: Consider making API_BASE configurable.Hardcoding
localhost:8123limits flexibility. Consider using an environment variable with a default for easier testing against different environments.♻️ Suggested improvement
+import os + # API configuration -API_BASE = "http://localhost:8123" +API_BASE = os.environ.get("FORECASTLAB_API_BASE", "http://localhost:8123") FEATURES_ENDPOINT = f"{API_BASE}/featuresets"app/features/featuresets/service.py (4)
26-27: UnusedTYPE_CHECKINGblock.The
TYPE_CHECKINGblock contains onlypassand no conditional imports. Consider removing it or adding the intended type-only imports.🔧 Proposed fix
-if TYPE_CHECKING: - pass -
183-191: Inconsistent use ofdfvsresultin lag computation.Line 186-188 uses
df(the original input) for the groupby/shift operation, but then assigns toresult. This works becausedfandresulthave the same data at this point (result = df.copy()), but it's inconsistent with other methods like_compute_rolling_featureswhich also usedf. While functionally correct here, usingresultconsistently would be clearer for maintainability.
267-275: Cyclical encoding for month may not correctly represent adjacency of December and January.The current formula
np.sin(2 * np.pi * month / 12)maps month values 1-12. However, this means:
- January (1) → sin(2π/12) ≈ 0.5
- December (12) → sin(2π) = 0
For proper cyclical encoding where December and January should be adjacent, consider using
(month - 1) / 12to map months to [0, 11/12]:🔧 Proposed fix
if config.include_month: month = dates.dt.month if config.use_cyclical_encoding: - result["month_sin"] = np.sin(2 * np.pi * month / 12) - result["month_cos"] = np.cos(2 * np.pi * month / 12) + result["month_sin"] = np.sin(2 * np.pi * (month - 1) / 12) + result["month_cos"] = np.cos(2 * np.pi * (month - 1) / 12) columns.extend(["month_sin", "month_cos"])
386-444: Consider adding explicit error handling for database operations.The
load_sales_datamethod executes a database query but doesn't handle potential exceptions (e.g., connection failures, query timeouts). While the caller (compute_features_for_series) may handle this at a higher level, and the routes.py showsDatabaseErrorhandling, documenting or handling database errors here could improve debuggability.
| else: | ||
| raise ValueError(f"Cannot extract date from {type(date_val)}") |
There was a problem hiding this comment.
Unhandled ValueError may leak as 500 error.
The ValueError raised here for invalid date types will propagate as an unhandled exception, resulting in a generic 500 error. Consider catching this and converting to a more informative error response (e.g., ValidationError or a custom error with details).
🛡️ Suggested error handling
+from app.core.exceptions import ValidationError
...
else:
- raise ValueError(f"Cannot extract date from {type(date_val)}")
+ raise ValidationError(
+ message=f"Cannot extract date from column '{request.config.date_column}'",
+ details={"date_type": str(type(date_val))},
+ )🤖 Prompt for AI Agents
In `@app/features/featuresets/routes.py` around lines 129 - 130, The raise
ValueError for unsupported date types (the branch raising "Cannot extract date
from {type(date_val)}") can surface as a 500; update the handler around that
logic in app/features/featuresets/routes.py so invalid date types are caught and
translated into a client-facing error (e.g., raise a ValidationError or an HTTP
400/BadRequest exception) that includes the offending type and a clear message;
locate the block that inspects date_val, replace or wrap the ValueError with a
caught/converted error response (or re-raise a framework
ValidationError/HTTPException) so callers receive a 4xx with details instead of
an unhandled 500.
| store_id=int(record.get("store_id", 0)), | ||
| product_id=int(record.get("product_id", 0)), |
There was a problem hiding this comment.
Silent default of 0 for missing store_id/product_id.
Using record.get("store_id", 0) and record.get("product_id", 0) silently defaults to 0 if these columns are missing. Since the API validates that store_id >= 1 and product_id >= 1, a 0 value would indicate corrupted data. Consider raising an error or logging a warning instead of silently defaulting.
🤖 Prompt for AI Agents
In `@app/features/featuresets/routes.py` around lines 135 - 136, The code silently
defaults missing store_id/product_id to 0 by using record.get("store_id", 0) /
record.get("product_id", 0); update the parsing logic that constructs the
store_id/product_id (where record is used) to explicitly validate presence and
value (e.g., require the keys exist and that int(record["store_id"]) >= 1 and
int(record["product_id"]) >= 1), and on failure raise a clear exception
(ValueError) or log an error including the offending record content instead of
silently using 0 so corrupted rows are surfaced immediately.
| elif strategy == "bfill": | ||
| result[col] = result.groupby(self.entity_cols, observed=True)[col].bfill() | ||
| elif strategy == "mean": | ||
| result[col] = result.groupby(self.entity_cols, observed=True)[col].transform( | ||
| lambda x: x.fillna(x.mean()) | ||
| ) |
There was a problem hiding this comment.
Potential time leakage in bfill and mean imputation strategies.
While the data is filtered to cutoff_date, these strategies can still cause subtle time leakage within the training window:
bfill: Fills missing values with future values from later rows in the same seriesmean: Uses the mean of all values in the group, including those after the current row
For time-series forecasting, this means a row at time t could be imputed using information from time t+1, t+2, etc., which wouldn't be available at prediction time.
Consider:
- Documenting this limitation in the docstring
- Adding an "expanding mean" strategy that only uses past values
- Warning users in configuration validation
🤖 Prompt for AI Agents
In `@app/features/featuresets/service.py` around lines 324 - 329, The bfill and
mean strategies (in the branch where strategy == "bfill" and strategy == "mean"
that operate on result grouped by self.entity_cols) leak future information;
update the function/class docstring to note this limitation, add an
"expanding_mean" (or "expanding-mean") option that computes
groupby(...)[col].transform(lambda x: x.expanding().mean().fillna(...)) so each
row only uses past values, and in configuration/validation logic emit a warning
or error when users select "bfill" or plain "mean" for time-series use to guide
them toward "expanding_mean". Ensure references to the symbols result,
self.entity_cols, strategy, bfill, mean, and expanding_mean are used so
reviewers can locate the changes.
| if config.include_price_change: | ||
| # Price change vs 7 days ago | ||
| result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[ | ||
| "unit_price" | ||
| ].pct_change(periods=7) | ||
| columns.append("price_pct_change_7d") |
There was a problem hiding this comment.
Potential leakage in price_pct_change_7d calculation.
The current implementation uses pct_change(periods=7) directly, which compares the current price to the price 7 days ago. At prediction time for day T, you wouldn't know price_T—only price_{T-1} and earlier.
To maintain consistency with the time-safe approach used elsewhere (e.g., shift(1) before rolling), consider shifting before calculating percent change:
🛡️ Proposed fix
if config.include_price_change:
- # Price change vs 7 days ago
- result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[
+ # Price change: yesterday's price vs 8 days ago (shifted to avoid leakage)
+ result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[
"unit_price"
- ].pct_change(periods=7)
+ ].shift(1).pct_change(periods=7)
columns.append("price_pct_change_7d")🤖 Prompt for AI Agents
In `@app/features/featuresets/service.py` around lines 362 - 367, The
price_pct_change_7d calculation leaks by using the current row's unit_price;
change the pipeline to compute percent change on a 1-day-shifted series so only
historical prices are used (i.e., take df.groupby(self.entity_cols,
observed=True)["unit_price"].shift(1) and then .pct_change(periods=7) so the
feature at time T uses prices up through T-1). Keep the result key
price_pct_change_7d and columns.append("price_pct_change_7d") as before.
| - https://scikit-learn.org/stable/modules/compose.html | ||
| - https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast | ||
| - https://www.sktime.net/en/stable/api_reference/transformations.html |
There was a problem hiding this comment.
Fix markdownlint MD034: replace bare URLs with links.
These links should be wrapped for lint compliance.
📝 Suggested fix
-- https://scikit-learn.org/stable/modules/compose.html
-- https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast
-- https://www.sktime.net/en/stable/api_reference/transformations.html
+- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html)
+- [MLForecast blog](https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast)
+- [sktime transformations API](https://www.sktime.net/en/stable/api_reference/transformations.html)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - https://scikit-learn.org/stable/modules/compose.html | |
| - https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast | |
| - https://www.sktime.net/en/stable/api_reference/transformations.html | |
| - [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html) | |
| - [MLForecast blog](https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast) | |
| - [sktime transformations API](https://www.sktime.net/en/stable/api_reference/transformations.html) |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 21-21: Bare URL used
(MD034, no-bare-urls)
[warning] 22-22: Bare URL used
(MD034, no-bare-urls)
[warning] 23-23: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In `@INITIAL-4.md` around lines 21 - 23, Replace the three bare URLs
(https://scikit-learn.org/stable/modules/compose.html,
https://www.nixtla.io/blog/automated-time-series-feature-engineering-with-mlforecast?utm_source=chatgpt.com#introduction-to-mlforecast,
https://www.sktime.net/en/stable/api_reference/transformations.html) with
Markdown links using the [text](url) syntax (e.g., [scikit-learn compose
module](https://scikit-learn.org/stable/modules/compose.html)) so they are not
bare URLs and comply with markdownlint MD034; update the visible link text to a
concise descriptive label for each URL.
| - https://scikit-learn.org/stable/modules/compose.html | ||
| - https://scikit-learn.org/stable/glossary.html | ||
| - https://scikit-learn.org/stable/model_persistence.html |
There was a problem hiding this comment.
Fix markdownlint MD034: replace bare URLs with links.
The linter warnings are valid; wrap these in link syntax.
📝 Suggested fix
-- https://scikit-learn.org/stable/modules/compose.html
-- https://scikit-learn.org/stable/glossary.html
-- https://scikit-learn.org/stable/model_persistence.html
+- [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html)
+- [scikit-learn glossary](https://scikit-learn.org/stable/glossary.html)
+- [scikit-learn model persistence](https://scikit-learn.org/stable/model_persistence.html)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - https://scikit-learn.org/stable/modules/compose.html | |
| - https://scikit-learn.org/stable/glossary.html | |
| - https://scikit-learn.org/stable/model_persistence.html | |
| - [scikit-learn compose module](https://scikit-learn.org/stable/modules/compose.html) | |
| - [scikit-learn glossary](https://scikit-learn.org/stable/glossary.html) | |
| - [scikit-learn model persistence](https://scikit-learn.org/stable/model_persistence.html) |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 31-31: Bare URL used
(MD034, no-bare-urls)
[warning] 32-32: Bare URL used
(MD034, no-bare-urls)
[warning] 33-33: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In `@INITIAL-5.md` around lines 31 - 33, Replace the bare URLs with markdown links
by converting each plain URL into the [text](url) form; specifically update the
three occurrences (https://scikit-learn.org/stable/modules/compose.html,
https://scikit-learn.org/stable/glossary.html,
https://scikit-learn.org/stable/model_persistence.html) to descriptive link text
(e.g., [Scikit-learn: ColumnTransformer
documentation](https://scikit-learn.org/stable/modules/compose.html)) so the
file no longer triggers MD034; edit the lines containing those URLs in
INITIAL-5.md (look for the exact URL strings) and replace them with appropriate
markdown link syntax.
| - `FeatureSetConfig` — Pydantic-based configuration schema with versioning | ||
| - `FeatureEngineeringService` — Core service computing features with cutoff enforcement | ||
| - `POST /featuresets/compute` — API endpoint to compute features for a series | ||
| - `GET /featuresets/preview` — Preview feature output for debugging/exploration | ||
| - Time-safety enforced at every level (lags, rolling, calendar) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the actual HTTP method used for the preview endpoint
rg -n "featuresets/preview" --type pyRepository: w7-mgfcode/ForecastLabAI
Length of output: 152
🏁 Script executed:
#!/bin/bash
# Search for the preview endpoint route definition more broadly
rg -n "preview" --type py -B 2 -A 2 | grep -E "(def|@|/featuresets)"Repository: w7-mgfcode/ForecastLabAI
Length of output: 3366
Change preview endpoint from GET to POST in documentation.
The actual route implementation uses @router.post("/preview") (app/features/featuresets/routes.py:178), not GET. Update the PRP to reflect POST /featuresets/preview for consistency.
🤖 Prompt for AI Agents
In `@PRPs/PRP-4-feature-engineering.md` around lines 8 - 12, The PRP lists the
preview endpoint as GET but the implemented route uses router.post("/preview");
update the documentation entry in PRP-4-feature-engineering.md to read "POST
/featuresets/preview" instead of "GET /featuresets/preview" so it matches the
implemented route (router.post("/preview")) and the API behavior.
Summary
Features
Files Added
app/features/featuresets/- Complete module (schemas, service, routes)app/features/featuresets/tests/- 55 unit testsexamples/compute_features_demo.py- Demo scriptPRPs/PRP-4-feature-engineering.md- PRP documentTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
/featuresets/computeand/featuresets/previewAPI endpoints for configurable feature generation.Documentation
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.