Skip to content

feat(featuresets): implement time-safe feature engineering layer#24

Merged
w7-mgfcode merged 6 commits into
devfrom
feat/prp-4-feature-engineering
Jan 31, 2026
Merged

feat(featuresets): implement time-safe feature engineering layer#24
w7-mgfcode merged 6 commits into
devfrom
feat/prp-4-feature-engineering

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

Summary

  • Implement complete feature engineering module (PRP-4) with time-safe feature computation
  • Add FeatureEngineeringService with CRITICAL leakage prevention patterns
  • Create FastAPI endpoints for feature computation and preview
  • Comprehensive test suite with 55 tests including leakage prevention tests

Features

  • Lag features: Positive shift() only to prevent future data access
  • Rolling features: shift(1) BEFORE rolling to exclude current observation
  • Calendar features: Cyclical encoding (sin/cos) for day of week, month
  • Group isolation: Entity-aware groupby prevents cross-series leakage
  • Imputation strategies: Zero-fill for sales, forward-fill for prices
  • Cutoff enforcement: Data filtered before any computation

Files Added

  • app/features/featuresets/ - Complete module (schemas, service, routes)
  • app/features/featuresets/tests/ - 55 unit tests
  • examples/compute_features_demo.py - Demo script
  • PRPs/PRP-4-feature-engineering.md - PRP document

Test plan

  • All 55 unit tests passing
  • Ruff linting: clean
  • MyPy: no issues
  • Pyright: 0 errors
  • Manual API testing with seeded data

🤖 Generated with Claude Code

w7-learn and others added 3 commits January 31, 2026 21:57
- Add scikit-learn, mlforecast, and sktime documentation links
- Add considerations for imputation logic, agent tooling, and computation overhead
- Add model persistence documentation references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add complete feature engineering module with:
- Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation)
- FeatureEngineeringService with CRITICAL leakage prevention:
  - Lag features use positive shift() only
  - Rolling features use shift(1) BEFORE rolling to exclude current observation
  - Group-aware operations prevent cross-series leakage
  - Cutoff date filtering before any computation
- FastAPI endpoints: POST /featuresets/compute and /featuresets/preview
- Comprehensive test suite (55 tests) including leakage prevention tests
- Example demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 31, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/prp-4-feature-engineering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jan 31, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednumpy@​2.4.17510010010070
Addedpandas@​3.0.07610010010080

View full report

w7-learn and others added 3 commits January 31, 2026 23:21
- README.md: Add featuresets module to project structure and API endpoints
- docs/ARCHITECTURE.md: Add Feature Engineering section (section 6)
- docs/PHASE-index.md: Mark Phase 3 as completed with summary
- docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Routes:
- Validate store_id/product_id presence (no silent defaults to 0)
- Convert ValueError for unsupported date types to HTTP 400

Service:
- Add expanding_mean imputation strategy (time-safe alternative)
- Add warnings when bfill/mean strategies are used (leakage risk)
- Fix price_pct_change_7d to use shift(1) before pct_change

Schemas:
- Add expanding_mean to ImputationConfig Literal type
- Document time-safety of each imputation strategy
- Fix PreviewFeaturesRequest docstring: GET → POST

Documentation:
- Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md
- Fix PRP-4 to show POST for preview endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@w7-mgfcode w7-mgfcode merged commit 8541553 into dev Jan 31, 2026
8 checks passed
@w7-mgfcode w7-mgfcode deleted the feat/prp-4-feature-engineering branch January 31, 2026 23:32
w7-mgfcode pushed a commit that referenced this pull request Jan 31, 2026
Service:
- Fix min_periods falsy check to explicit None check (preserves 0)

Tests:
- Add expanding_mean to test_valid_strategies, expect 6 strategies

Documentation:
- Update PR reference from #24 to #25
- Fix all GET /featuresets/preview to POST in PRP-4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-mgfcode pushed a commit that referenced this pull request Jan 31, 2026
Service:
- Fix min_periods falsy check to explicit None check (preserves 0)

Tests:
- Add expanding_mean to test_valid_strategies, expect 6 strategies

Documentation:
- Update PR reference from #24 to #25
- Fix all GET /featuresets/preview to POST in PRP-4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-mgfcode added a commit that referenced this pull request Jan 31, 2026
* feat(featuresets): implement time-safe feature engineering layer (#24)

* docs: update INITIAL-4 and INITIAL-5 with additional references

- Add scikit-learn, mlforecast, and sktime documentation links
- Add considerations for imputation logic, agent tooling, and computation overhead
- Add model persistence documentation references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(featuresets): implement time-safe feature engineering layer

Add complete feature engineering module with:
- Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation)
- FeatureEngineeringService with CRITICAL leakage prevention:
  - Lag features use positive shift() only
  - Rolling features use shift(1) BEFORE rolling to exclude current observation
  - Group-aware operations prevent cross-series leakage
  - Cutoff date filtering before any computation
- FastAPI endpoints: POST /featuresets/compute and /featuresets/preview
- Comprehensive test suite (55 tests) including leakage prevention tests
- Example demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update INITIAL-5.md

* docs: update documentation for Phase 3 Feature Engineering

- README.md: Add featuresets module to project structure and API endpoints
- docs/ARCHITECTURE.md: Add Feature Engineering section (section 6)
- docs/PHASE-index.md: Mark Phase 3 as completed with summary
- docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(featuresets): address code review feedback and prevent data leakage

Routes:
- Validate store_id/product_id presence (no silent defaults to 0)
- Convert ValueError for unsupported date types to HTTP 400

Service:
- Add expanding_mean imputation strategy (time-safe alternative)
- Add warnings when bfill/mean strategies are used (leakage risk)
- Fix price_pct_change_7d to use shift(1) before pct_change

Schemas:
- Add expanding_mean to ImputationConfig Literal type
- Document time-safety of each imputation strategy
- Fix PreviewFeaturesRequest docstring: GET → POST

Documentation:
- Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md
- Fix PRP-4 to show POST for preview endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: format schemas.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review feedback

Service:
- Fix min_periods falsy check to explicit None check (preserves 0)

Tests:
- Add expanding_mean to test_valid_strategies, expect 6 strategies

Documentation:
- Update PR reference from #24 to #25
- Fix all GET /featuresets/preview to POST in PRP-4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
w7-mgfcode added a commit that referenced this pull request Feb 1, 2026
* feat(featuresets): implement time-safe feature engineering layer (#24)

* docs: update INITIAL-4 and INITIAL-5 with additional references

- Add scikit-learn, mlforecast, and sktime documentation links
- Add considerations for imputation logic, agent tooling, and computation overhead
- Add model persistence documentation references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(featuresets): implement time-safe feature engineering layer

Add complete feature engineering module with:
- Pydantic schemas for feature configuration (lag, rolling, calendar, exogenous, imputation)
- FeatureEngineeringService with CRITICAL leakage prevention:
  - Lag features use positive shift() only
  - Rolling features use shift(1) BEFORE rolling to exclude current observation
  - Group-aware operations prevent cross-series leakage
  - Cutoff date filtering before any computation
- FastAPI endpoints: POST /featuresets/compute and /featuresets/preview
- Comprehensive test suite (55 tests) including leakage prevention tests
- Example demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update INITIAL-5.md

* docs: update documentation for Phase 3 Feature Engineering

- README.md: Add featuresets module to project structure and API endpoints
- docs/ARCHITECTURE.md: Add Feature Engineering section (section 6)
- docs/PHASE-index.md: Mark Phase 3 as completed with summary
- docs/PHASE/3-FEATURE_ENGINEERING.md: Create detailed phase documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(featuresets): address code review feedback and prevent data leakage

Routes:
- Validate store_id/product_id presence (no silent defaults to 0)
- Convert ValueError for unsupported date types to HTTP 400

Service:
- Add expanding_mean imputation strategy (time-safe alternative)
- Add warnings when bfill/mean strategies are used (leakage risk)
- Fix price_pct_change_7d to use shift(1) before pct_change

Schemas:
- Add expanding_mean to ImputationConfig Literal type
- Document time-safety of each imputation strategy
- Fix PreviewFeaturesRequest docstring: GET → POST

Documentation:
- Convert bare URLs to markdown links in INITIAL-4.md, INITIAL-5.md
- Fix PRP-4 to show POST for preview endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: format schemas.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review feedback

Service:
- Fix min_periods falsy check to explicit None check (preserves 0)

Tests:
- Add expanding_mean to test_valid_strategies, expect 6 strategies

Documentation:
- Update PR reference from #24 to #25
- Fix all GET /featuresets/preview to POST in PRP-4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update DAILY-FLOW.md for Phase 4 Forecasting (#27)

* docs: update DAILY-FLOW.md for Phase 4 Forecasting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add PRP-5 for Forecasting module

Comprehensive PRP including:
- Model zoo (naive, seasonal naive, moving average)
- Unified BaseForecaster interface (fit/predict/serialize)
- ModelBundle persistence with joblib
- 15 ordered implementation tasks
- 40+ test cases specified
- Integration with FeatureEngineeringService

Confidence: 8/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat(forecasting): implement baseline model zoo and unified interface (#28)

* feat(forecasting): implement baseline model zoo and unified interface

Add forecasting module (PRP-5) with:
- BaseForecaster ABC with scikit-learn-style interface (fit/predict)
- NaiveForecaster, SeasonalNaiveForecaster, MovingAverageForecaster
- ModelBundle persistence with joblib serialization
- POST /forecasting/train and /forecasting/predict endpoints
- ForecastingService for orchestration
- 81 unit tests covering schemas, models, persistence, and service
- Example scripts demonstrating each baseline model
- LightGBM placeholder (feature-flagged, not yet implemented)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: Update documentation for forecasting module (PRP-5)

- Add forecasting API endpoints to README.md with examples
- Update ARCHITECTURE.md with forecasting implementation details
- Add scikit-learn and joblib to dependencies list
- Add forecasting config variables to .env.example
- Mark forecasting module as IMPLEMENTED in architecture docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address CI lint and type check failures

- Add type: ignore for intentional type mismatch in frozen config test
- Add S101 ignore for examples/ to allow assert statements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(forecasting): add security validations and fix documentation

Security improvements:
- Add constructor validation for season_length >= 1 in SeasonalNaiveForecaster
- Add constructor validation for window_size >= 1 in MovingAverageForecaster
- Add path traversal prevention in ForecastingService.predict()
- Validate .joblib extension and artifacts directory containment
- Log rejection reasons for security auditing

Test improvements:
- Fix get_settings patching to wrap ForecastingService construction
- Add tests for constructor validation
- Add tests for path traversal and extension validation

Documentation fixes:
- Fix config parameter names in ARCHITECTURE.md (season_length, window_size)
- Fix README example to use season_length instead of seasonal_period
- Fix markdown issues in PRP-5 (code fences, ATX headings)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants