Release: Phase 3 Feature Engineering#25
Conversation
* docs: update INITIAL-4 and INITIAL-5 with additional references - Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(featuresets): implement time-safe feature engineering layer Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update INITIAL-5.md * docs: update documentation for Phase 3 Feature Engineering - README.md: Add featuresets module to project structure and API endpoints - docs/ARCHITECTURE.md: Add Feature Engineering section (section 6) - docs/PHASE-index.md: Mark Phase 3 as completed with summary - docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(featuresets): address code review feedback and prevent data leakage Routes: - Validate store_id/product_id presence (no silent defaults to 0) - Convert ValueError for unsupported date types to HTTP 400 Service: - Add expanding_mean imputation strategy (time-safe alternative) - Add warnings when bfill/mean strategies are used (leakage risk) - Fix price_pct_change_7d to use shift(1) before pct_change Schemas: - Add expanding_mean to ImputationConfig Literal type - Document time-safety of each imputation strategy - Fix PreviewFeaturesRequest docstring: GET → POST Documentation: - Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md - Fix PRP-4 to show POST for preview endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: format schemas.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
📝 WalkthroughWalkthroughAdds a time-safe feature engineering subsystem: Pydantic configs, a FeatureEngineeringService, data loader and compute helpers, FastAPI endpoints (/featuresets/compute, /featuresets/preview), extensive tests, docs, examples, and new pandas/numpy dependencies; integrates router and config settings into the app. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API Routes
participant Loader as FeatureData<br/>Loader
participant DB as Database
participant Service as FeatureEngineering<br/>Service
Client->>API: POST /featuresets/compute (store_id, product_id, cutoff_date, config)
activate API
API->>Loader: load_sales_data(store_id, product_id, start_date, end_date)
activate Loader
Loader->>DB: Query sales & calendar
DB-->>Loader: Return rows
Loader-->>API: DataFrame
deactivate Loader
API->>Service: compute_features(df, cutoff_date)
activate Service
Service->>Service: _apply_imputation (group-aware)
Service->>Service: _compute_lag_features (grouped shift)
Service->>Service: _compute_rolling_features (shift(1) then agg)
Service->>Service: _compute_calendar_features
Service->>Service: _compute_exogenous_features
Service-->>API: FeatureComputationResult (df, feature_columns, stats)
deactivate Service
API->>API: Serialize FeatureRow list, null_counts, duration_ms
API-->>Client: ComputeFeaturesResponse
deactivate API
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@app/features/featuresets/service.py`:
- Around line 215-217: The loop that sets min_per incorrectly treats 0 as falsy;
change the expression in the loop (where min_per is assigned for each window in
config.windows) to explicitly check for None: use "config.min_periods if
config.min_periods is not None else window" (so an explicit 0 is preserved and
only None falls back to window), updating the assignment in the function/method
that iterates over config.windows and prepares arguments for pandas' rolling().
In `@app/features/featuresets/tests/test_schemas.py`:
- Around line 118-138: Update the TestImputationConfig.test_valid_strategies to
include the missing "expanding_mean" strategy in the ImputationConfig(...)
strategies dict and update the assertion to expect 6 strategies; locate the test
class TestImputationConfig and method test_valid_strategies and modify the
strategies mapping and the assert len(config.strategies) to reflect the six
supported strategies ("zero","ffill","bfill","mean","expanding_mean","drop").
In `@docs/PHASE/3-FEATURE_ENGINEERING.md`:
- Around line 3-6: Update the release reference string "Release: PR `#24`" to the
current PR number by replacing it with "Release: PR `#25`" in the
PHASE/3-FEATURE_ENGINEERING.md content (look for the exact text "Release: PR
`#24`"); ensure the surrounding markdown remains unchanged and commit the updated
file.
In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 40-42: The PRP currently documents the `/featuresets/preview`
endpoint as GET but the implementation uses POST; update every occurrence in
this document to reflect POST (change the checklist line and any other mentions
of GET for `/featuresets/preview`) so the spec matches the implemented endpoint.
Ensure references to the preview endpoint (e.g., "/featuresets/preview", any
summary lines or bullets) use POST and adjust any example calls or verbs
accordingly.
🧹 Nitpick comments (7)
pyproject.toml (1)
164-167: Consider consolidating dev dependencies in a single location.The
[dependency-groups]section (PEP 735) addspandas-stubsseparately from the existing[project.optional-dependencies] devgroup (lines 23-32). Having dev dependencies split across two sections could cause confusion.Consider moving
pandas-stubsto the existingdevoptional-dependencies for consistency:♻️ Suggested consolidation
[project.optional-dependencies] dev = [ "pytest>=8.3.0", "pytest-asyncio>=0.24.0", "pytest-cov>=6.0.0", "httpx>=0.28.0", "ruff>=0.8.0", "mypy>=1.13.0", "pyright>=1.1.390", + "pandas-stubs>=2.3.3.260113", ]Then remove the
[dependency-groups]section if no other tools rely on it.app/features/featuresets/schemas.py (1)
177-184: Minor inconsistency:price_lagsallows empty tuple unlikelags.The
LagConfig.lagsvalidator requires at least one lag (line 75-76), butExogenousConfig.price_lagsvalidator only checks for positive values without rejecting an empty tuple. This allowsprice_lags=()which may be intentional ifinclude_price=False, but could be confusing.Consider adding a similar check if empty price_lags should be rejected when
include_price=True:♻️ Optional: Add conditional validation
`@field_validator`("price_lags") `@classmethod` def validate_price_lags_positive(cls, v: tuple[int, ...]) -> tuple[int, ...]: """Ensure all price lags are positive.""" + # Note: Empty tuple is allowed when include_price=False if any(lag <= 0 for lag in v): raise ValueError("All price lags must be positive integers") return vOr add a
model_validatorto enforce non-empty price_lags wheninclude_price=True.app/features/featuresets/routes.py (2)
259-311: Extract shared row-conversion logic to avoid drift between compute and preview.The row/feature extraction block is duplicated; any future fix (like the numeric handling above) will need to be applied twice. Consider a small helper to centralize this logic.
3-8: The current NaN and numeric handling works correctly with pandas 3.0.0.Pandas 3.0.0's
to_dict("records")converts NumPy scalars (e.g.,np.int64,np.float64) to native Pythonintandfloattypes, so they pass theisinstance(val, (int, float))check without issue. The current code usingisinstance(val, float) and math.isnan(val)correctly detects NaN values.That said, using
pd.isna(val)is slightly more robust for edge cases, and usingnumbers.Integralandnumbers.Realfrom thenumbersmodule is more semantically correct for numeric type checking. Consider these as optional improvements rather than bug fixes.docs/PHASE-index.md (1)
147-152: Clarify imputation defaults vs full strategy set.The bullet reads like the only imputation options are zero/ffill. Consider marking them as defaults and pointing to the config for the full set to avoid ambiguity.
✏️ Suggested wording
- **Imputation**: Zero-fill for sales, forward-fill for prices + **Imputation**: Default strategies like zero-fill for sales and forward-fill for prices (see ImputationConfig for full set)docs/PHASE/3-FEATURE_ENGINEERING.md (1)
9-12: Tighten redundant wording in the executive summary.Small wording tweak for clarity (matches the style lint hint).
✏️ Suggested wording
- ensuring that features computed at any cutoff date only use data available at that point in time. + ensuring that features computed at any cutoff date only use data available at that time.docs/ARCHITECTURE.md (1)
207-210: Trim repeated “only” in the lag bullet.Minor style cleanup per lint hint.
✏️ Suggested wording
- - **Lag features**: `shift(lag)` with positive lag only — ensures only past data accessed + - **Lag features**: `shift(lag)` with positive lag only — ensures past data is accessed
Service: - Fix min_periods falsy check to explicit None check (preserves 0) Tests: - Add expanding_mean to test_valid_strategies, expect 6 strategies Documentation: - Update PR reference from #24 to #25 - Fix all GET /featuresets/preview to POST in PRP-4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Service: - Fix min_periods falsy check to explicit None check (preserves 0) Tests: - Add expanding_mean to test_valid_strategies, expect 6 strategies Documentation: - Update PR reference from #24 to #25 - Fix all GET /featuresets/preview to POST in PRP-4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@app/features/featuresets/service.py`:
- Around line 257-296: _compute_calendar_features currently ignores
config.include_is_holiday so the holiday flag never gets added to outputs;
modify the method to check config.include_is_holiday and when true set
result["is_holiday"] = dates["is_holiday"].astype(int) (or dates.dt.is_holiday
if using a datetime accessor) and append "is_holiday" to columns so the feature
is returned and included in stats; ensure you follow the same casting/typing
pattern used for is_weekend/is_month_end in the function.
- Around line 360-404: The _compute_exogenous_features method currently ignores
ExogenousConfig.include_promo and include_inventory; either implement those
feature computations or fail fast. Update _compute_exogenous_features to (A)
implement promo features when config.include_promo is true by reading the promo
column (e.g., "is_promo" or configured name), creating lagged promo columns
analogous to price (use groupby(self.entity_cols).shift(lag) for each
config.promo_lags and any pct/change or flag derivations), and (B) implement
inventory features when config.include_inventory is true by using the inventory
column (e.g., "on_hand" or configured name) to produce lagged inventory columns
and derived indicators (like inventory_pct_change or inventory_level_lag_N)
similar to how price/stockout are done; if you prefer validation instead, add an
explicit check at the start of _compute_exogenous_features that raises a
RuntimeError when config.include_promo or config.include_inventory is true but
no corresponding implementation/columns exist, referencing the ExogenousConfig
flags and the _compute_exogenous_features method so reviewers can locate
changes.
In `@docs/PHASE/3-FEATURE_ENGINEERING.md`:
- Around line 19-33: Update the documentation to include the missing
expanding_mean imputation strategy: in the section describing ImputationConfig
and the deliverables table listing strategies, add "expanding_mean" to the list
of strategies alongside zero, ffill, bfill, mean, drop, and update any
descriptive text to briefly note that expanding_mean computes a cumulative mean
per series (expanding window) as an imputation option; ensure the symbol
ImputationConfig and any mentions of FeatureSetConfig/ComputeFeaturesRequest
remain consistent after adding the new strategy.
In `@PRPs/PRP-4-feature-engineering.md`:
- Around line 773-775: The current pseudocode uses a falsy check when assigning
min_per inside the loop over config.windows (for window in config.windows) which
will override an explicit 0 in config.min_periods; change the condition to
explicitly test for None so min_per preserves 0 (i.e., set min_per to
config.min_periods if config.min_periods is not None, otherwise to window),
updating the assignment that defines min_per.
- Around line 1123-1144: The preview_features() example is calling the compute
endpoint and using lookback_days which doesn't match the Preview request schema;
change the POST URL from "/featuresets/compute" to "/featuresets/preview" and
update the JSON payload to conform to the PreviewRequest schema (remove or
replace "lookback_days" and any compute-specific fields with the
preview-specific fields expected by PreviewRequest), keeping the same
identifying caller function preview_features() so the example exercises the
preview endpoint and schema correctly.
| if config.include_day_of_week: | ||
| dow = dates.dt.dayofweek # 0=Monday, 6=Sunday | ||
| if config.use_cyclical_encoding: | ||
| result["dow_sin"] = np.sin(2 * np.pi * dow / 7) | ||
| result["dow_cos"] = np.cos(2 * np.pi * dow / 7) | ||
| columns.extend(["dow_sin", "dow_cos"]) | ||
| else: | ||
| result["day_of_week"] = dow | ||
| columns.append("day_of_week") | ||
|
|
||
| if config.include_month: | ||
| month = dates.dt.month | ||
| if config.use_cyclical_encoding: | ||
| result["month_sin"] = np.sin(2 * np.pi * month / 12) | ||
| result["month_cos"] = np.cos(2 * np.pi * month / 12) | ||
| columns.extend(["month_sin", "month_cos"]) | ||
| else: | ||
| result["month"] = month | ||
| columns.append("month") | ||
|
|
||
| if config.include_quarter: | ||
| result["quarter"] = dates.dt.quarter | ||
| columns.append("quarter") | ||
|
|
||
| if config.include_year: | ||
| result["year"] = dates.dt.year | ||
| columns.append("year") | ||
|
|
||
| if config.include_is_weekend: | ||
| result["is_weekend"] = dates.dt.dayofweek.isin([5, 6]).astype(int) | ||
| columns.append("is_weekend") | ||
|
|
||
| if config.include_is_month_end: | ||
| result["is_month_end"] = dates.dt.is_month_end.astype(int) | ||
| columns.append("is_month_end") | ||
|
|
||
| # is_holiday would require calendar table lookup | ||
| # Handled separately if data is joined from Calendar table | ||
|
|
||
| return result, columns |
There was a problem hiding this comment.
Honor include_is_holiday in feature outputs.
include_is_holiday is set in config and the calendar data can be merged, but _compute_calendar_features never adds the is_holiday column to feature_columns, so it’s omitted from responses and stats.
🔧 Suggested fix
if config.include_is_month_end:
result["is_month_end"] = dates.dt.is_month_end.astype(int)
columns.append("is_month_end")
- # is_holiday would require calendar table lookup
- # Handled separately if data is joined from Calendar table
+ if config.include_is_holiday and "is_holiday" in result.columns:
+ result["is_holiday"] = result["is_holiday"].astype(int)
+ columns.append("is_holiday")🤖 Prompt for AI Agents
In `@app/features/featuresets/service.py` around lines 257 - 296,
_compute_calendar_features currently ignores config.include_is_holiday so the
holiday flag never gets added to outputs; modify the method to check
config.include_is_holiday and when true set result["is_holiday"] =
dates["is_holiday"].astype(int) (or dates.dt.is_holiday if using a datetime
accessor) and append "is_holiday" to columns so the feature is returned and
included in stats; ensure you follow the same casting/typing pattern used for
is_weekend/is_month_end in the function.
| def _compute_exogenous_features(self, df: pd.DataFrame) -> tuple[pd.DataFrame, list[str]]: | ||
| """Compute exogenous features (price, promo, inventory). | ||
|
|
||
| CRITICAL: All exogenous features are lagged to prevent leakage. | ||
|
|
||
| Args: | ||
| df: Input dataframe with exogenous columns. | ||
|
|
||
| Returns: | ||
| Tuple of (dataframe with exogenous features, list of new column names). | ||
| """ | ||
| config = self.config.exogenous_config | ||
| if config is None: | ||
| raise RuntimeError("_compute_exogenous_features called without exogenous_config") | ||
|
|
||
| result = df.copy() | ||
| columns: list[str] = [] | ||
|
|
||
| # Price features (if price column exists) | ||
| if config.include_price and "unit_price" in df.columns: | ||
| for lag in config.price_lags: | ||
| col_name = f"price_lag_{lag}" | ||
| result[col_name] = df.groupby(self.entity_cols, observed=True)["unit_price"].shift( | ||
| lag | ||
| ) | ||
| columns.append(col_name) | ||
|
|
||
| if config.include_price_change: | ||
| # CRITICAL: shift(1) before pct_change to prevent using current price | ||
| # This computes: (price[t-1] - price[t-8]) / price[t-8] | ||
| # Without shift(1), it would use current price at t, causing leakage | ||
| result["price_pct_change_7d"] = df.groupby(self.entity_cols, observed=True)[ | ||
| "unit_price" | ||
| ].transform(lambda x: x.shift(1).pct_change(periods=7)) | ||
| columns.append("price_pct_change_7d") | ||
|
|
||
| # Stockout flag (if inventory column exists) | ||
| if config.include_stockout_flag and "is_stockout" in df.columns: | ||
| # Lagged stockout flag (yesterday's stockout) | ||
| result["stockout_lag_1"] = df.groupby(self.entity_cols, observed=True)[ | ||
| "is_stockout" | ||
| ].shift(1) | ||
| columns.append("stockout_lag_1") | ||
|
|
||
| return result, columns |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for promo/inventory feature references and ExogenousConfig definition
rg -n "include_promo|include_inventory|promo|inventory" app/features/featuresets -g '*.py'Repository: w7-mgfcode/ForecastLabAI
Length of output: 911
🏁 Script executed:
#!/bin/bash
# Find and examine ExogenousConfig definition
fd -e py -x grep -l "class ExogenousConfig" {} \;Repository: w7-mgfcode/ForecastLabAI
Length of output: 105
🏁 Script executed:
#!/bin/bash
# Get the full path and examine ExogenousConfig to see what fields it defines
rg -n "class ExogenousConfig" -A 50 app/features/featuresets -g '*.py'Repository: w7-mgfcode/ForecastLabAI
Length of output: 4094
🏁 Script executed:
#!/bin/bash
# Search for any handling of include_promo or include_inventory beyond what we've seen
rg -n "include_promo|include_inventory" app/features/featuresets -g '*.py' -B 2 -A 2Repository: w7-mgfcode/ForecastLabAI
Length of output: 990
🏁 Script executed:
#!/bin/bash
# Check if there are any other methods in the service that might compute promo/inventory
rg -n "def.*promo|def.*inventory" app/features/featuresets -g '*.py'Repository: w7-mgfcode/ForecastLabAI
Length of output: 50
🏁 Script executed:
#!/bin/bash
# Check if _compute_exogenous_features is called/tested and how these flags are used
rg -n "_compute_exogenous_features" app/features/featuresets -g '*.py' -B 2 -A 2Repository: w7-mgfcode/ForecastLabAI
Length of output: 1249
Implement promo and inventory feature computation or add validation to prevent silent misconfiguration.
ExogenousConfig exposes include_promo (defaults to True) and include_inventory flags, but _compute_exogenous_features() only implements price and stockout features. These flags are silently ignored with no error, creating a functional gap where users expecting promo/inventory features receive none. Either implement the missing feature computation or add explicit validation to raise an error if these flags are enabled.
🤖 Prompt for AI Agents
In `@app/features/featuresets/service.py` around lines 360 - 404, The
_compute_exogenous_features method currently ignores
ExogenousConfig.include_promo and include_inventory; either implement those
feature computations or fail fast. Update _compute_exogenous_features to (A)
implement promo features when config.include_promo is true by reading the promo
column (e.g., "is_promo" or configured name), creating lagged promo columns
analogous to price (use groupby(self.entity_cols).shift(lag) for each
config.promo_lags and any pct/change or flag derivations), and (B) implement
inventory features when config.include_inventory is true by using the inventory
column (e.g., "on_hand" or configured name) to produce lagged inventory columns
and derived indicators (like inventory_pct_change or inventory_level_lag_N)
similar to how price/stockout are done; if you prefer validation instead, add an
explicit check at the start of _compute_exogenous_features that raises a
RuntimeError when config.include_promo or config.include_inventory is true but
no corresponding implementation/columns exist, referencing the ExogenousConfig
flags and the _compute_exogenous_features method so reviewers can locate
changes.
| ### 1. Feature Configuration Schemas | ||
|
|
||
| **File**: `app/features/featuresets/schemas.py` | ||
|
|
||
| Pydantic v2 schemas with frozen configs for reproducibility: | ||
|
|
||
| | Schema | Purpose | | ||
| |--------|---------| | ||
| | `LagConfig` | Lag feature configuration (lags, target_column, fill_value) | | ||
| | `RollingConfig` | Rolling window configuration (windows, aggregations, min_periods) | | ||
| | `CalendarConfig` | Calendar feature configuration (day_of_week, month, cyclical encoding) | | ||
| | `ExogenousConfig` | Exogenous feature configuration (price_lags, stockout flags) | | ||
| | `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, drop) | | ||
| | `FeatureSetConfig` | Complete feature configuration with config_hash() | | ||
| | `ComputeFeaturesRequest` | API request schema with validation | |
There was a problem hiding this comment.
Document the new expanding_mean strategy.
The service supports expanding_mean, but the doc lists only five strategies in both the deliverables table and the Imputation section.
📝 Suggested edits
-| `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, drop) |
+| `ImputationConfig` | Imputation strategies (zero, ffill, bfill, mean, expanding_mean, drop) |
-**Strategies**: `zero`, `ffill`, `bfill`, `mean`, `drop`
+**Strategies**: `zero`, `ffill`, `bfill`, `mean`, `expanding_mean`, `drop`Also applies to: 199-211
🤖 Prompt for AI Agents
In `@docs/PHASE/3-FEATURE_ENGINEERING.md` around lines 19 - 33, Update the
documentation to include the missing expanding_mean imputation strategy: in the
section describing ImputationConfig and the deliverables table listing
strategies, add "expanding_mean" to the list of strategies alongside zero,
ffill, bfill, mean, drop, and update any descriptive text to briefly note that
expanding_mean computes a cumulative mean per series (expanding window) as an
imputation option; ensure the symbol ImputationConfig and any mentions of
FeatureSetConfig/ComputeFeaturesRequest remain consistent after adding the new
strategy.
| for window in config.windows: | ||
| min_per = config.min_periods if config.min_periods else window | ||
|
|
There was a problem hiding this comment.
Preserve explicit min_periods=0 in pseudocode.
The PRP uses a falsy check, which would override an explicit 0. Align this snippet with the implemented is None check.
📝 Suggested edit
- min_per = config.min_periods if config.min_periods else window
+ min_per = window if config.min_periods is None else config.min_periods📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for window in config.windows: | |
| min_per = config.min_periods if config.min_periods else window | |
| for window in config.windows: | |
| min_per = window if config.min_periods is None else config.min_periods |
🤖 Prompt for AI Agents
In `@PRPs/PRP-4-feature-engineering.md` around lines 773 - 775, The current
pseudocode uses a falsy check when assigning min_per inside the loop over
config.windows (for window in config.windows) which will override an explicit 0
in config.min_periods; change the condition to explicitly test for None so
min_per preserves 0 (i.e., set min_per to config.min_periods if
config.min_periods is not None, otherwise to window), updating the assignment
that defines min_per.
| async def preview_features() -> None: | ||
| """Preview features for store 1, product 1.""" | ||
| async with httpx.AsyncClient() as client: | ||
| response = await client.post( | ||
| "http://localhost:8123/featuresets/compute", | ||
| json={ | ||
| "store_id": 1, | ||
| "product_id": 1, | ||
| "cutoff_date": "2024-01-31", | ||
| "lookback_days": 60, | ||
| "config": { | ||
| "schema_version": "1.0", | ||
| "name": "preview_test", | ||
| "lag_config": { | ||
| "schema_version": "1.0", | ||
| "lags": [1, 7, 14], | ||
| }, | ||
| "rolling_config": { | ||
| "schema_version": "1.0", | ||
| "windows": [7, 14], | ||
| "aggregations": ["mean", "std"], | ||
| }, |
There was a problem hiding this comment.
Preview example should call the preview endpoint and schema.
The preview_features.py example posts to /featuresets/compute and uses lookback_days, which doesn’t match the Preview request schema.
📝 Suggested edit
- response = await client.post(
- "http://localhost:8123/featuresets/compute",
+ response = await client.post(
+ "http://localhost:8123/featuresets/preview",
json={
"store_id": 1,
"product_id": 1,
"cutoff_date": "2024-01-31",
- "lookback_days": 60,
+ "sample_rows": 10,
"config": {🤖 Prompt for AI Agents
In `@PRPs/PRP-4-feature-engineering.md` around lines 1123 - 1144, The
preview_features() example is calling the compute endpoint and using
lookback_days which doesn't match the Preview request schema; change the POST
URL from "/featuresets/compute" to "/featuresets/preview" and update the JSON
payload to conform to the PreviewRequest schema (remove or replace
"lookback_days" and any compute-specific fields with the preview-specific fields
expected by PreviewRequest), keeping the same identifying caller function
preview_features() so the example exercises the preview endpoint and schema
correctly.
* feat(featuresets): implement time-safe feature engineering layer (#24) * docs: update INITIAL-4 and INITIAL-5 with additional references - Add scikit-learn, mlforecast, and sktime documentation links - Add considerations for imputation logic, agent tooling, and computation overhead - Add model persistence documentation references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(featuresets): implement time-safe feature engineering layer Add complete feature engineering module with: - Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation) - FeatureEngineeringService with CRITICAL leakage prevention: - Lag features use positive shift() only - Rolling features use shift(1) BEFORE rolling to exclude current observation - Group-aware operations prevent cross-series leakage - Cutoff date filtering before any computation - FastAPI endpoints: POST /featuresets/compute and /featuresets/preview - Comprehensive test suite (55 tests) including leakage prevention tests - Example demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update INITIAL-5.md * docs: update documentation for Phase 3 Feature Engineering - README.md: Add featuresets module to project structure and API endpoints - docs/ARCHITECTURE.md: Add Feature Engineering section (section 6) - docs/PHASE-index.md: Mark Phase 3 as completed with summary - docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(featuresets): address code review feedback and prevent data leakage Routes: - Validate store_id/product_id presence (no silent defaults to 0) - Convert ValueError for unsupported date types to HTTP 400 Service: - Add expanding_mean imputation strategy (time-safe alternative) - Add warnings when bfill/mean strategies are used (leakage risk) - Fix price_pct_change_7d to use shift(1) before pct_change Schemas: - Add expanding_mean to ImputationConfig Literal type - Document time-safety of each imputation strategy - Fix PreviewFeaturesRequest docstring: GET → POST Documentation: - Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md - Fix PRP-4 to show POST for preview endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: format schemas.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review feedback Service: - Fix min_periods falsy check to explicit None check (preserves 0) Tests: - Add expanding_mean to test_valid_strategies, expect 6 strategies Documentation: - Update PR reference from #24 to #25 - Fix all GET /featuresets/preview to POST in PRP-4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update DAILY-FLOW.md for Phase 4 Forecasting (#27) * docs: update DAILY-FLOW.md for Phase 4 Forecasting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PRP-5 for Forecasting module Comprehensive PRP including: - Model zoo (naive, seasonal naive, moving average) - Unified BaseForecaster interface (fit/predict/serialize) - ModelBundle persistence with joblib - 15 ordered implementation tasks - 40+ test cases specified - Integration with FeatureEngineeringService Confidence: 8/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * feat(forecasting): implement baseline model zoo and unified interface (#28) * feat(forecasting): implement baseline model zoo and unified interface Add forecasting module (PRP-5) with: - BaseForecaster ABC with scikit-learn-style interface (fit/predict) - NaiveForecaster, SeasonalNaiveForecaster, MovingAverageForecaster - ModelBundle persistence with joblib serialization - POST /forecasting/train and /forecasting/predict endpoints - ForecastingService for orchestration - 81 unit tests covering schemas, models, persistence, and service - Example scripts demonstrating each baseline model - LightGBM placeholder (feature-flagged, not yet implemented) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Update documentation for forecasting module (PRP-5) - Add forecasting API endpoints to README.md with examples - Update ARCHITECTURE.md with forecasting implementation details - Add scikit-learn and joblib to dependencies list - Add forecasting config variables to .env.example - Mark forecasting module as IMPLEMENTED in architecture docs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address CI lint and type check failures - Add type: ignore for intentional type mismatch in frozen config test - Add S101 ignore for examples/ to allow assert statements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix(forecasting): add security validations and fix documentation Security improvements: - Add constructor validation for season_length >= 1 in SeasonalNaiveForecaster - Add constructor validation for window_size >= 1 in MovingAverageForecaster - Add path traversal prevention in ForecastingService.predict() - Validate .joblib extension and artifacts directory containment - Log rejection reasons for security auditing Test improvements: - Fix get_settings patching to wrap ForecastingService construction - Add tests for constructor validation - Add tests for path traversal and extension validation Documentation fixes: - Fix config parameter names in ARCHITECTURE.md (season_length, window_size) - Fix README example to use season_length instead of seasonal_period - Fix markdown issues in PRP-5 (code fences, ATX headings) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Merges dev branch into main with Phase 3 Feature Engineering implementation.
Phase 3: Feature Engineering (PRP-4)
POST /featuresets/compute,POST /featuresets/previewCode Quality Fixes
expanding_meanimputation strategyprice_pct_change_7dleakage with shift(1)Validation
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Examples & Demos
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.