feat: Phase 9 Agentic Layer - Release v0.3.0#56
Conversation
…ents (#36) * docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(registry): implement model registry for run tracking and deployments Add model registry feature (PRP-7) with: - ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info), DeploymentAlias for mutable deployment pointers - Storage: LocalFSProvider with SHA-256 integrity verification and path traversal prevention, abstract interface for future S3/GCS support - Service: RegistryService with state machine validation, duplicate detection, config hashing, and run comparison - API endpoints: CRUD for runs and aliases, artifact verification, run comparison with config/metrics diffs - Database: Alembic migration with GIN indexes for JSONB containment queries - Tests: 103 unit tests (schemas, storage, service) + 24 integration tests - Example: registry_demo.py demonstrating full workflow Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED Aliases can only point to SUCCESS runs for deployment safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for model registry implementation - README.md: Add registry to project structure, API endpoints section, and example reference - docs/ARCHITECTURE.md: Update section 7.6 with full implementation details, add registry endpoints to section 8, mark Phase 1 complete - docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed completion entries for Forecasting, Backtesting, and Registry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PHASE documentation for forecasting, backtesting, and registry Create missing phase documentation files to complete the project's implementation records: - 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict endpoints, and joblib persistence - 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies, metrics calculation, and baseline comparisons - 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases, and SHA-256 artifact integrity verification Update PHASE-index.md to link to the new documentation files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(registry): resolve type checking issues with Pydantic model_config alias - Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking - Use model_config_data instead of model_config alias in tests to avoid collision with Pydantic's reserved model_config attribute - Update _model_to_response to use model_validate() for proper alias handling - Change docker-compose postgres port to 5433 to avoid conflicts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve CI failures for registry PR - Import registry models in alembic/env.py for schema validation - Fix import order and remove extraneous f-strings in registry_demo.py - Add type: ignore comments for frozen model tests with pydantic.mypy plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: prevent db_session fixtures from dropping all tables The data_platform and root conftest.py db_session fixtures were dropping all tables after each test, causing subsequent integration tests to fail when they couldn't find migrated tables. Changes: - Remove Base.metadata.drop_all from db_session fixtures - Tests now rely on migrations for table creation - Each test just rolls back its own changes Also fixes ruff format issue in examples/registry_demo.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add proper test data cleanup to db_session fixtures Update data_platform and ingest test fixtures to clean up test data explicitly instead of dropping all tables or just rolling back. - data_platform: delete test stores, products, calendar entries - ingest: delete test stores, products, sales, calendar entries This ensures test isolation while preserving migrated tables. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use separate session for test cleanup to avoid transaction issues When tests cause integrity errors, the session enters a failed state. Use a fresh session for cleanup to avoid PendingRollbackError. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use contextlib.suppress instead of try-except-pass Replace try-except-pass patterns with contextlib.suppress to satisfy ruff S110 linting rule. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add date range filter to SalesDaily cleanup in ingest tests - Enforce artifact_hash presence before verification in registry routes - Compute SHA256 from saved file instead of source in storage - Fix override_get_db to mirror production transaction semantics - Filter DeploymentAlias cleanup to only test runs - Update database port to 5433 in config and .env.example - Add language identifiers to fenced code blocks (MD040) - Fix table formatting for markdownlint MD060 - Update PR reference in PHASE/6-MODEL_REGISTRY.md - Convert bare URLs to markdown links in INITIAL-7.md - Wrap __init__.py in backticks in PRP-7 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: release v0.2.0 (#37) * feat(registry): implement model registry for run tracking and deployments (#36) * docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(registry): implement model registry for run tracking and deployments Add model registry feature (PRP-7) with: - ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info), DeploymentAlias for mutable deployment pointers - Storage: LocalFSProvider with SHA-256 integrity verification and path traversal prevention, abstract interface for future S3/GCS support - Service: RegistryService with state machine validation, duplicate detection, config hashing, and run comparison - API endpoints: CRUD for runs and aliases, artifact verification, run comparison with config/metrics diffs - Database: Alembic migration with GIN indexes for JSONB containment queries - Tests: 103 unit tests (schemas, storage, service) + 24 integration tests - Example: registry_demo.py demonstrating full workflow Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED Aliases can only point to SUCCESS runs for deployment safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for model registry implementation - README.md: Add registry to project structure, API endpoints section, and example reference - docs/ARCHITECTURE.md: Update section 7.6 with full implementation details, add registry endpoints to section 8, mark Phase 1 complete - docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed completion entries for Forecasting, Backtesting, and Registry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PHASE documentation for forecasting, backtesting, and registry Create missing phase documentation files to complete the project's implementation records: - 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict endpoints, and joblib persistence - 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies, metrics calculation, and baseline comparisons - 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases, and SHA-256 artifact integrity verification Update PHASE-index.md to link to the new documentation files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(registry): resolve type checking issues with Pydantic model_config alias - Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking - Use model_config_data instead of model_config alias in tests to avoid collision with Pydantic's reserved model_config attribute - Update _model_to_response to use model_validate() for proper alias handling - Change docker-compose postgres port to 5433 to avoid conflicts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve CI failures for registry PR - Import registry models in alembic/env.py for schema validation - Fix import order and remove extraneous f-strings in registry_demo.py - Add type: ignore comments for frozen model tests with pydantic.mypy plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: prevent db_session fixtures from dropping all tables The data_platform and root conftest.py db_session fixtures were dropping all tables after each test, causing subsequent integration tests to fail when they couldn't find migrated tables. Changes: - Remove Base.metadata.drop_all from db_session fixtures - Tests now rely on migrations for table creation - Each test just rolls back its own changes Also fixes ruff format issue in examples/registry_demo.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add proper test data cleanup to db_session fixtures Update data_platform and ingest test fixtures to clean up test data explicitly instead of dropping all tables or just rolling back. - data_platform: delete test stores, products, calendar entries - ingest: delete test stores, products, sales, calendar entries This ensures test isolation while preserving migrated tables. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use separate session for test cleanup to avoid transaction issues When tests cause integrity errors, the session enters a failed state. Use a fresh session for cleanup to avoid PendingRollbackError. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use contextlib.suppress instead of try-except-pass Replace try-except-pass patterns with contextlib.suppress to satisfy ruff S110 linting rule. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: code improvements and documentation fixes - Add date range filter to SalesDaily cleanup in ingest tests - Enforce artifact_hash presence before verification in registry routes - Compute SHA256 from saved file instead of source in storage - Fix override_get_db to mirror production transaction semantics - Filter DeploymentAlias cleanup to only test runs - Update database port to 5433 in config and .env.example - Add language identifiers to fenced code blocks (MD040) - Fix table formatting for markdownlint MD060 - Update PR reference in PHASE/6-MODEL_REGISTRY.md - Convert bare URLs to markdown links in INITIAL-7.md - Wrap __init__.py in backticks in PRP-7 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * chore(main): release 0.2.0 (#38) Release-As: 0.2.0 Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * chore(main): release 0.2.0 (#39) * chore(main): release 0.2.0 * chore: trigger CI --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gabe@w7dev <gabor@w7-7.net> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
sync: align dev with main after v0.2.0 release
* docs(initial-8): expand serving layer requirements Add specifications for job-driven orchestration, dimension discovery endpoints, standardized API protocols (filtering/pagination), and agent-first API design patterns for LLM tool-calling optimization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp-8): add serving layer implementation spec Comprehensive PRP for FastAPI serving layer including: - Dimensions module for store/product discovery endpoints - Analytics module for KPI/drilldown queries - Jobs module for async-ready task orchestration - RFC 7807 problem details for semantic error responses - OpenAPI export optimization for LLM tool-calling 26 tasks with validation gates and 8.5/10 confidence score. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(serving-layer): implement PRP-8 agent-first API design Add RFC 7807 Problem Details for semantic error responses: - ProblemDetail schema with type URIs and error codes - application/problem+json content type - Validation exception handler with field-level errors Add dimensions module for store/product discovery: - GET /dimensions/stores with pagination, filtering, search - GET /dimensions/products with pagination, filtering, search - LLM-optimized Field descriptions for tool-calling Add analytics module for KPI aggregations: - GET /analytics/kpis with date range and dimension filters - GET /analytics/drilldowns for store/product/category/region/date - Revenue share and ranking calculations Add jobs module for async-ready task orchestration: - POST /jobs for train/predict/backtest operations - Job model with JSONB params/results - Status transitions: pending → running → completed/failed Integration: - New settings: analytics_max_rows, jobs_retention_days - Register routers in main.py - Alembic migration for jobs table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for PRP-8 serving layer Update README.md: - Add dimensions, analytics, jobs modules to project structure - Document new API endpoints with examples - Add RFC 7807 error response documentation Update docs/ARCHITECTURE.md: - Mark serving layer section as implemented - Add configuration settings for new modules - Update roadmap with Phase-2 completion Update docs/PHASE-index.md: - Add Phase 7 (Serving Layer) as completed - Update phase overview table - Add version history entry Create docs/PHASE/7-SERVING_LAYER.md: - Comprehensive phase documentation - API endpoint specifications - Database schema and migration details - Usage examples and test coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: fix ruff formatting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…ling - Add validate_date_range helper to analytics routes for reusable date validation - Apply date range validation to both get_kpis and get_drilldowns endpoints - Fix total_revenue_all calculation to use full dataset before limiting - Add run_id to train job result for downstream predict jobs - Fix predict job to resolve run_id to model metadata from bundle - Update test fixtures to use 32-char hex IDs per schema requirements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
sync: merge main (v0.2.1) into dev
…IAL-9/10/11) (#47) * docs: restructure INITIAL-9 into modular three-phase roadmap Decompose monolithic INITIAL-9 into three specialized technical phases: - INITIAL-9: RAG Knowledge Base ("The Memory") - pgvector + OpenAI embeddings - Markdown/OpenAPI-aware chunking - Semantic retrieval endpoints - INITIAL-10: Agentic Layer ("The Brain") - PydanticAI agents (Experiment Orchestrator, RAG Assistant) - Tool orchestration with structured outputs - Human-in-the-loop approval workflow - INITIAL-11: ForecastLab Dashboard ("The Face") - React 19 + Vite + shadcn/ui - TanStack Table/Query for data management - Recharts for time series visualization - Agent chat interface with streaming Update PHASE-index.md and DAILY-FLOW.md to align with new structure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp): add PRP-9 RAG Knowledge Base implementation plan Comprehensive PRP for INITIAL-9 RAG Knowledge Base feature: - pgvector + SQLAlchemy 2.0 integration patterns - Markdown-aware and OpenAPI-aware chunking - Async OpenAI embeddings with batch processing - HNSW index for cosine similarity search - 15 ordered implementation tasks - 5-level validation loop (syntax → types → unit → integration → smoke) - Full ORM models and Pydantic schemas - Known gotchas and anti-patterns documented Confidence score: 8.5/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp): add PRP-10 Agentic Layer implementation plan Comprehensive PRP for INITIAL-10 Agentic Layer feature: - PydanticAI agent framework integration - Experiment Orchestrator Agent (backtest → compare → deploy) - RAG Assistant Agent (query → retrieve → answer with citations) - Human-in-the-loop approval workflow for sensitive actions - WebSocket streaming for real-time token delivery - Session persistence with JSONB message history - 17 ordered implementation tasks - Tool definitions for registry, backtesting, forecasting, RAG - Full Pydantic schemas and ORM models Confidence score: 7.5/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive PRP for INITIAL-11 (The Face) with: - 24 implementation tasks across 6 phases - React 19 + Vite + shadcn/ui + TanStack Table/Query - TypeScript types matching all backend API schemas - Reusable DataTable with server-side pagination - TimeSeriesChart component with Recharts - WebSocket hook for agent chat streaming - Complete documentation links and gotchas Confidence score: 7.5/10 (chat depends on INITIAL-10) Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat(rag): implement PRP-9 RAG Knowledge Base with pgvector
Add RAG (Retrieval-Augmented Generation) knowledge base feature for
semantic document indexing and retrieval using PostgreSQL pgvector.
Key components:
- Document indexing with markdown-aware and OpenAPI-aware chunking
- Semantic retrieval using cosine similarity with configurable thresholds
- Idempotent re-indexing via SHA-256 content hash comparison
- OpenAI text-embedding-3-small for embeddings (1536 dimensions)
- HNSW index for fast approximate nearest neighbor search
API endpoints:
- POST /rag/index - Index documents with automatic chunking
- POST /rag/retrieve - Semantic search with relevance scoring
- GET /rag/sources - List indexed sources with statistics
- DELETE /rag/sources/{source_id} - Remove source and chunks
Includes:
- ORM models: DocumentSource, DocumentChunk with Vector column
- Pydantic v2 schemas with strict validation
- 68 unit tests + 14 integration tests
- Migration for pgvector extension and RAG tables
- Examples and environment configuration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(rag): add Ollama embedding provider with OpenAI-compatible API
- Add EmbeddingProvider abstract base class with provider pattern
- Refactor existing OpenAI code to OpenAIEmbeddingProvider
- Add OllamaEmbeddingProvider using /v1/embeddings endpoint
- Supports configurable dimensions parameter
- Uses OpenAI-compatible response format
- Add config settings: rag_embedding_provider, ollama_base_url, ollama_embedding_model
- Add migration for dynamic embedding dimension support
- Update tests for both providers (25 tests)
Enables local/LAN embedding generation without OpenAI API dependency.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Ollama embedding provider documentation
- Update .env.example with Ollama configuration options
- Add RAG Knowledge Base section to README with:
- Embedding provider options (OpenAI/Ollama)
- Example index and retrieve requests
- Configuration examples for both providers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Phase 8 RAG Knowledge Base documentation
- Create docs/PHASE/8-RAG_KNOWLEDGE_BASE.md with full phase details
- Update docs/PHASE-index.md:
- Mark Phase 8 as Completed in overview table
- Add Phase 8 summary to Completed Phases section
- Add entry to Version History
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): add RAG models import to alembic env and format tests
- Add rag models import to alembic/env.py for schema validation
- Format test_embeddings.py to pass ruff format check
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Make migration deterministic by hardcoding dimension values instead of reading from environment (alembic migration) - Add pyyaml dependency for YAML parsing in OpenAPI chunker - Fix token count logging to capture original count before truncation - Add path traversal protection to RAG service _read_content_from_path (mirrors registry/storage.py pattern) - Fix markdown linting issues: - Add language tags to fenced code blocks (MD040) - Fix table pipe spacing (MD060) - Fix index_docs.py to treat 200 same as 201 for idempotent responses - Add test for path traversal protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: RAG Knowledge Base, Serving Layer, and Model Registry (#50) * feat(registry): implement model registry for run tracking and deployments (#36) * docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(registry): implement model registry for run tracking and deployments Add model registry feature (PRP-7) with: - ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info), DeploymentAlias for mutable deployment pointers - Storage: LocalFSProvider with SHA-256 integrity verification and path traversal prevention, abstract interface for future S3/GCS support - Service: RegistryService with state machine validation, duplicate detection, config hashing, and run comparison - API endpoints: CRUD for runs and aliases, artifact verification, run comparison with config/metrics diffs - Database: Alembic migration with GIN indexes for JSONB containment queries - Tests: 103 unit tests (schemas, storage, service) + 24 integration tests - Example: registry_demo.py demonstrating full workflow Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED Aliases can only point to SUCCESS runs for deployment safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for model registry implementation - README.md: Add registry to project structure, API endpoints section, and example reference - docs/ARCHITECTURE.md: Update section 7.6 with full implementation details, add registry endpoints to section 8, mark Phase 1 complete - docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed completion entries for Forecasting, Backtesting, and Registry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PHASE documentation for forecasting, backtesting, and registry Create missing phase documentation files to complete the project's implementation records: - 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict endpoints, and joblib persistence - 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies, metrics calculation, and baseline comparisons - 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases, and SHA-256 artifact integrity verification Update PHASE-index.md to link to the new documentation files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(registry): resolve type checking issues with Pydantic model_config alias - Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking - Use model_config_data instead of model_config alias in tests to avoid collision with Pydantic's reserved model_config attribute - Update _model_to_response to use model_validate() for proper alias handling - Change docker-compose postgres port to 5433 to avoid conflicts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve CI failures for registry PR - Import registry models in alembic/env.py for schema validation - Fix import order and remove extraneous f-strings in registry_demo.py - Add type: ignore comments for frozen model tests with pydantic.mypy plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: prevent db_session fixtures from dropping all tables The data_platform and root conftest.py db_session fixtures were dropping all tables after each test, causing subsequent integration tests to fail when they couldn't find migrated tables. Changes: - Remove Base.metadata.drop_all from db_session fixtures - Tests now rely on migrations for table creation - Each test just rolls back its own changes Also fixes ruff format issue in examples/registry_demo.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add proper test data cleanup to db_session fixtures Update data_platform and ingest test fixtures to clean up test data explicitly instead of dropping all tables or just rolling back. - data_platform: delete test stores, products, calendar entries - ingest: delete test stores, products, sales, calendar entries This ensures test isolation while preserving migrated tables. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use separate session for test cleanup to avoid transaction issues When tests cause integrity errors, the session enters a failed state. Use a fresh session for cleanup to avoid PendingRollbackError. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use contextlib.suppress instead of try-except-pass Replace try-except-pass patterns with contextlib.suppress to satisfy ruff S110 linting rule. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: code improvements and documentation fixes - Add date range filter to SalesDaily cleanup in ingest tests - Enforce artifact_hash presence before verification in registry routes - Compute SHA256 from saved file instead of source in storage - Fix override_get_db to mirror production transaction semantics - Filter DeploymentAlias cleanup to only test runs - Update database port to 5433 in config and .env.example - Add language identifiers to fenced code blocks (MD040) - Fix table formatting for markdownlint MD060 - Update PR reference in PHASE/6-MODEL_REGISTRY.md - Convert bare URLs to markdown links in INITIAL-7.md - Wrap __init__.py in backticks in PRP-7 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * sync: update dev from phase-6 (#40) * chore: release v0.2.0 (#37) * feat(registry): implement model registry for run tracking and deployments (#36) * docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(registry): implement model registry for run tracking and deployments Add model registry feature (PRP-7) with: - ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info), DeploymentAlias for mutable deployment pointers - Storage: LocalFSProvider with SHA-256 integrity verification and path traversal prevention, abstract interface for future S3/GCS support - Service: RegistryService with state machine validation, duplicate detection, config hashing, and run comparison - API endpoints: CRUD for runs and aliases, artifact verification, run comparison with config/metrics diffs - Database: Alembic migration with GIN indexes for JSONB containment queries - Tests: 103 unit tests (schemas, storage, service) + 24 integration tests - Example: registry_demo.py demonstrating full workflow Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED Aliases can only point to SUCCESS runs for deployment safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for model registry implementation - README.md: Add registry to project structure, API endpoints section, and example reference - docs/ARCHITECTURE.md: Update section 7.6 with full implementation details, add registry endpoints to section 8, mark Phase 1 complete - docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed completion entries for Forecasting, Backtesting, and Registry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add PHASE documentation for forecasting, backtesting, and registry Create missing phase documentation files to complete the project's implementation records: - 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict endpoints, and joblib persistence - 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies, metrics calculation, and baseline comparisons - 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases, and SHA-256 artifact integrity verification Update PHASE-index.md to link to the new documentation files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(registry): resolve type checking issues with Pydantic model_config alias - Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking - Use model_config_data instead of model_config alias in tests to avoid collision with Pydantic's reserved model_config attribute - Update _model_to_response to use model_validate() for proper alias handling - Change docker-compose postgres port to 5433 to avoid conflicts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve CI failures for registry PR - Import registry models in alembic/env.py for schema validation - Fix import order and remove extraneous f-strings in registry_demo.py - Add type: ignore comments for frozen model tests with pydantic.mypy plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: prevent db_session fixtures from dropping all tables The data_platform and root conftest.py db_session fixtures were dropping all tables after each test, causing subsequent integration tests to fail when they couldn't find migrated tables. Changes: - Remove Base.metadata.drop_all from db_session fixtures - Tests now rely on migrations for table creation - Each test just rolls back its own changes Also fixes ruff format issue in examples/registry_demo.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add proper test data cleanup to db_session fixtures Update data_platform and ingest test fixtures to clean up test data explicitly instead of dropping all tables or just rolling back. - data_platform: delete test stores, products, calendar entries - ingest: delete test stores, products, sales, calendar entries This ensures test isolation while preserving migrated tables. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use separate session for test cleanup to avoid transaction issues When tests cause integrity errors, the session enters a failed state. Use a fresh session for cleanup to avoid PendingRollbackError. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use contextlib.suppress instead of try-except-pass Replace try-except-pass patterns with contextlib.suppress to satisfy ruff S110 linting rule. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: code improvements and documentation fixes - Add date range filter to SalesDaily cleanup in ingest tests - Enforce artifact_hash presence before verification in registry routes - Compute SHA256 from saved file instead of source in storage - Fix override_get_db to mirror production transaction semantics - Filter DeploymentAlias cleanup to only test runs - Update database port to 5433 in config and .env.example - Add language identifiers to fenced code blocks (MD040) - Fix table formatting for markdownlint MD060 - Update PR reference in PHASE/6-MODEL_REGISTRY.md - Convert bare URLs to markdown links in INITIAL-7.md - Wrap __init__.py in backticks in PRP-7 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * chore(main): release 0.2.0 (#38) Release-As: 0.2.0 Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * chore(main): release 0.2.0 (#39) * chore(main): release 0.2.0 * chore: trigger CI --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gabe@w7dev <gabor@w7-7.net> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * feat(serving-layer): implement PRP-8 agent-first API design (#42) * docs(initial-8): expand serving layer requirements Add specifications for job-driven orchestration, dimension discovery endpoints, standardized API protocols (filtering/pagination), and agent-first API design patterns for LLM tool-calling optimization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp-8): add serving layer implementation spec Comprehensive PRP for FastAPI serving layer including: - Dimensions module for store/product discovery endpoints - Analytics module for KPI/drilldown queries - Jobs module for async-ready task orchestration - RFC 7807 problem details for semantic error responses - OpenAPI export optimization for LLM tool-calling 26 tasks with validation gates and 8.5/10 confidence score. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(serving-layer): implement PRP-8 agent-first API design Add RFC 7807 Problem Details for semantic error responses: - ProblemDetail schema with type URIs and error codes - application/problem+json content type - Validation exception handler with field-level errors Add dimensions module for store/product discovery: - GET /dimensions/stores with pagination, filtering, search - GET /dimensions/products with pagination, filtering, search - LLM-optimized Field descriptions for tool-calling Add analytics module for KPI aggregations: - GET /analytics/kpis with date range and dimension filters - GET /analytics/drilldowns for store/product/category/region/date - Revenue share and ranking calculations Add jobs module for async-ready task orchestration: - POST /jobs for train/predict/backtest operations - Job model with JSONB params/results - Status transitions: pending → running → completed/failed Integration: - New settings: analytics_max_rows, jobs_retention_days - Register routers in main.py - Alembic migration for jobs table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update documentation for PRP-8 serving layer Update README.md: - Add dimensions, analytics, jobs modules to project structure - Document new API endpoints with examples - Add RFC 7807 error response documentation Update docs/ARCHITECTURE.md: - Mark serving layer section as implemented - Add configuration settings for new modules - Update roadmap with Phase-2 completion Update docs/PHASE-index.md: - Add Phase 7 (Serving Layer) as completed - Update phase overview table - Add version history entry Create docs/PHASE/7-SERVING_LAYER.md: - Comprehensive phase documentation - API endpoint specifications - Database schema and migration details - Usage examples and test coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: fix ruff formatting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix(serving-layer): improve analytics validation and jobs run_id handling - Add validate_date_range helper to analytics routes for reusable date validation - Apply date range validation to both get_kpis and get_drilldowns endpoints - Fix total_revenue_all calculation to use full dataset before limiting - Add run_id to train job result for downstream predict jobs - Fix predict job to resolve run_id to model metadata from bundle - Update test fixtures to use 32-char hex IDs per schema requirements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: format jobs service Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: restructure roadmap into modular three-phase architecture (INITIAL-9/10/11) (#47) * docs: restructure INITIAL-9 into modular three-phase roadmap Decompose monolithic INITIAL-9 into three specialized technical phases: - INITIAL-9: RAG Knowledge Base ("The Memory") - pgvector + OpenAI embeddings - Markdown/OpenAPI-aware chunking - Semantic retrieval endpoints - INITIAL-10: Agentic Layer ("The Brain") - PydanticAI agents (Experiment Orchestrator, RAG Assistant) - Tool orchestration with structured outputs - Human-in-the-loop approval workflow - INITIAL-11: ForecastLab Dashboard ("The Face") - React 19 + Vite + shadcn/ui - TanStack Table/Query for data management - Recharts for time series visualization - Agent chat interface with streaming Update PHASE-index.md and DAILY-FLOW.md to align with new structure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp): add PRP-9 RAG Knowledge Base implementation plan Comprehensive PRP for INITIAL-9 RAG Knowledge Base feature: - pgvector + SQLAlchemy 2.0 integration patterns - Markdown-aware and OpenAPI-aware chunking - Async OpenAI embeddings with batch processing - HNSW index for cosine similarity search - 15 ordered implementation tasks - 5-level validation loop (syntax → types → unit → integration → smoke) - Full ORM models and Pydantic schemas - Known gotchas and anti-patterns documented Confidence score: 8.5/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp): add PRP-10 Agentic Layer implementation plan Comprehensive PRP for INITIAL-10 Agentic Layer feature: - PydanticAI agent framework integration - Experiment Orchestrator Agent (backtest → compare → deploy) - RAG Assistant Agent (query → retrieve → answer with citations) - Human-in-the-loop approval workflow for sensitive actions - WebSocket streaming for real-time token delivery - Session persistence with JSONB message history - 17 ordered implementation tasks - Tool definitions for registry, backtesting, forecasting, RAG - Full Pydantic schemas and ORM models Confidence score: 7.5/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * docs(prp): add PRP-11 ForecastLab Dashboard implementation plan (#48) Comprehensive PRP for INITIAL-11 (The Face) with: - 24 implementation tasks across 6 phases - React 19 + Vite + shadcn/ui + TanStack Table/Query - TypeScript types matching all backend API schemas - Reusable DataTable with server-side pagination - TimeSeriesChart component with Recharts - WebSocket hook for agent chat streaming - Complete documentation links and gotchas Confidence score: 7.5/10 (chat depends on INITIAL-10) Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * feat(rag): implement PRP-9 RAG Knowledge Base with pgvector (#49) * feat(rag): implement PRP-9 RAG Knowledge Base with pgvector Add RAG (Retrieval-Augmented Generation) knowledge base feature for semantic document indexing and retrieval using PostgreSQL pgvector. Key components: - Document indexing with markdown-aware and OpenAPI-aware chunking - Semantic retrieval using cosine similarity with configurable thresholds - Idempotent re-indexing via SHA-256 content hash comparison - OpenAI text-embedding-3-small for embeddings (1536 dimensions) - HNSW index for fast approximate nearest neighbor search API endpoints: - POST /rag/index - Index documents with automatic chunking - POST /rag/retrieve - Semantic search with relevance scoring - GET /rag/sources - List indexed sources with statistics - DELETE /rag/sources/{source_id} - Remove source and chunks Includes: - ORM models: DocumentSource, DocumentChunk with Vector column - Pydantic v2 schemas with strict validation - 68 unit tests + 14 integration tests - Migration for pgvector extension and RAG tables - Examples and environment configuration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(rag): add Ollama embedding provider with OpenAI-compatible API - Add EmbeddingProvider abstract base class with provider pattern - Refactor existing OpenAI code to OpenAIEmbeddingProvider - Add OllamaEmbeddingProvider using /v1/embeddings endpoint - Supports configurable dimensions parameter - Uses OpenAI-compatible response format - Add config settings: rag_embedding_provider, ollama_base_url, ollama_embedding_model - Add migration for dynamic embedding dimension support - Update tests for both providers (25 tests) Enables local/LAN embedding generation without OpenAI API dependency. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Ollama embedding provider documentation - Update .env.example with Ollama configuration options - Add RAG Knowledge Base section to README with: - Embedding provider options (OpenAI/Ollama) - Example index and retrieve requests - Configuration examples for both providers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Phase 8 RAG Knowledge Base documentation - Create docs/PHASE/8-RAG_KNOWLEDGE_BASE.md with full phase details - Update docs/PHASE-index.md: - Mark Phase 8 as Completed in overview table - Add Phase 8 summary to Completed Phases section - Add entry to Version History Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): add RAG models import to alembic env and format tests - Add rag models import to alembic/env.py for schema validation - Format test_embeddings.py to pass ruff format check Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review issues for RAG module and docs - Make migration deterministic by hardcoding dimension values instead of reading from environment (alembic migration) - Add pyyaml dependency for YAML parsing in OpenAPI chunker - Fix token count logging to capture original count before truncation - Add path traversal protection to RAG service _read_content_from_path (mirrors registry/storage.py pattern) - Fix markdown linting issues: - Add language tags to fenced code blocks (MD040) - Fix table pipe spacing (MD060) - Fix index_docs.py to treat 200 same as 201 for idempotent responses - Add test for path traversal protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * ci: add release-please branch trigger and wire workflow_dispatch ref (#52) - Add 'release-please--branches--**' pattern to match actual release-please branch naming (e.g., release-please--branches--main--components--forecastlabai) - Add 'ref' input to workflow_dispatch with proper type declaration - Wire ref input to all checkout steps via CHECKOUT_REF env var - Use inputs.ref || github.ref for predictable fallback behavior - Update concurrency group to respect manual ref input Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * chore(main): release 0.2.2 (#51) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gabor Szabo <168316277+w7-mgfcode@users.noreply.github.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
sync: merge main into dev after v0.2.2 release
Post Phase-9 review updates: - Bump pydantic-ai from 0.1.0 to 1.48.0 (v1 stable release) - Update Claude model identifier to claude-sonnet-4-5 format - Add service method mappings for tool implementations - Add mock_pydantic_ai_agent fixture pattern - Increase confidence score from 7.5 to 8.0/10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add full agentic layer for autonomous experiment orchestration and evidence-grounded Q&A: - Add PydanticAI agents (experiment, rag_assistant) with lazy initialization - Create agent tools for registry, backtesting, forecasting, and RAG - Implement AgentService with session management and approval workflow - Add REST routes and WebSocket streaming endpoint - Create Alembic migration for agent_session table with JSONB storage - Add 92 unit tests with full type checking coverage - Update config with agent settings (provider, model, session TTL) Human-in-the-loop approval required for create_alias and archive_run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mark Phase 9 as completed in PHASE-index.md with comprehensive summary - Create new docs/PHASE/9-AGENTIC_LAYER.md with full implementation details - Executive summary, deliverables, and architecture highlights - Database schema (agent_session table) - Agent definitions (Experiment Orchestrator, RAG Assistant) - Tool modules (registry, backtesting, forecasting, RAG) - Service layer API, REST routes, and WebSocket streaming - Configuration settings and environment variables - Test coverage (92 unit tests) and validation results - Directory structure and next phase preparation - Update README.md to include Agentic Layer - Add to Features section - Add comprehensive API endpoints section with examples - Update project structure to include agents/ and rag/ features Phase 9 implements PydanticAI-based agents for autonomous experimentation and evidence-grounded Q&A with human-in-the-loop approval workflow. Related: PR #55 (+7,835 additions, 92 unit tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ayer) - Update component diagram to include Agentic Layer and Agent Sessions - Update backend layout to include rag/ and agents/ features - Add comprehensive Section 9: RAG Knowledge Base (marked as IMPLEMENTED) - OpenAI and Ollama embedding providers - pgvector HNSW indexing, idempotent content hash - API endpoints, database schema, configuration - Location, tests, and migration details - Add comprehensive Section 10: Agentic Layer (marked as IMPLEMENTED) - PydanticAI agents (Experiment Orchestrator, RAG Assistant) - Session management with JSONB message history - Human-in-the-loop approval workflow - WebSocket streaming architecture - Tool integration, database schema, configuration - Location, tests, and dependencies - Update Section 11: Dashboard to include Agent Chat Interface - Renumber Quality section from 11 to 12 - Update Section 13: Roadmap with completed phases 0-9 - Detailed phase descriptions with PRP references - Phase 10 (Dashboard), 11 (ML Models), 12 (Production) as pending Phase 8 (PRP-9) and Phase 9 (PRP-10) now fully documented in architecture. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix Ruff formatting issues: - Reformat 7 files in app/features/agents/ Fix test failures: - test_create_session_invalid_type: change expected status from 400 to 422 (Pydantic validation) - test_health_with_agents: change expected status from 'healthy' to 'ok' (actual health endpoint response) Fix schema validation: - Import models in __init__.py to register AgentSession with SQLAlchemy metadata - Prevents "relation agent_session does not exist" error in alembic check All CI checks should now pass: - Ruff format: ✅ 7 files reformatted - Tests: ✅ 2 test assertions fixed - Schema validation: ✅ Models properly registered Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The agents models were not being imported in alembic/env.py, causing Alembic to not detect the agent_session table definition. This led to schema validation failures where Alembic thought the table should be removed. Fix: Add agents models import to alembic/env.py alongside other feature model imports.
The models import is now exported via __all__, so it's considered used and doesn't need the noqa: F401 directive.
Add comprehensive Google Gemini model support to PydanticAI agents: - Add google_api_key and agent_thinking_budget to Settings - Add model identifier validation (provider:model-name format) - Add fail-fast API key validation with clear error messages - Update agent creation to validate API keys before initialization - Support Gemini extended reasoning (thinking mode) for complex tasks Supported providers: - anthropic: Claude models (default) - openai: GPT models (fallback) - google-gla: Gemini via AI Studio (new) - google-vertex: Gemini via Vertex AI (new) Testing: - Add 9 configuration validation tests - All 101 agent tests pass - Type checking (mypy + pyright) green - Linting (ruff) green Documentation: - Update .env.example with Gemini configuration guide - Update Phase 9 docs with multi-provider table and reasoning guide - Zero breaking changes (backward compatible) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Apply ruff formatter to config and base agent files to fix CI lint check. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Merging PRP-10 Agentic Layer implementation with full commit history preserved.
There was a problem hiding this comment.
Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds a new Agentic Layer: env/config entries, PydanticAI agents (Experiment Orchestrator, RAG Assistant), tools (registry/backtesting/forecasting/RAG), DB migration + ORM for agent sessions, service orchestration, REST/WebSocket endpoints, package exports, tests, docs, and CI tweak. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Routes as FastAPI Routes
participant Service as AgentService
participant Agent as PydanticAI Agent
participant LLM as LLM Provider
participant Tools as Tools
participant DB as Database
Client->>Routes: POST /agents/sessions (agent_type, context)
Routes->>Service: create_session()
Service->>DB: INSERT agent_session
DB-->>Service: session_id
Service-->>Routes: SessionResponse
Routes-->>Client: 201 Created
Client->>Routes: POST /agents/sessions/{id}/chat (message)
Routes->>Service: chat(session_id, message)
Service->>DB: LOAD session + history
Service->>Agent: Run(message + history)
Agent->>LLM: prompt + tool context
LLM-->>Agent: token stream / final response
Agent->>Tools: Execute tool(...)
Tools->>DB: Query/update
DB-->>Tools: results
Tools-->>Agent: tool results
Agent-->>Service: ChatResult (messages, usage, pending_action?)
Service->>DB: UPDATE agent_session
Service-->>Routes: ChatResponse
Routes-->>Client: 200 OK
sequenceDiagram
participant Agent as PydanticAI Agent
participant Tool as Tool Function
participant Service as AgentService
participant Approval as Human Approval
participant DB as Database
Agent->>Tool: call create_alias(...)
Tool->>Service: requires_approval("create_alias")?
Service-->>Tool: true
Tool->>DB: write pending_action to agent_session
DB-->>Tool: pending_action persisted
Tool-->>Agent: approval_pending_message
Note over Approval,DB: Human reviews via /agents/{id}/approve
Approval->>Service: approve_action(session_id, approved=true)
Service->>DB: load session, clear pending_action when executed
Service->>Tool: re-run create_alias with approval
Tool->>DB: create alias / update run
DB-->>Tool: alias created
Tool-->>Agent: result
Service->>DB: update session history/status
sequenceDiagram
participant WSClient as WebSocket Client
participant WS as Agents WebSocket
participant Service as AgentService
participant Agent as PydanticAI Agent
participant LLM as LLM Provider
WSClient->>WS: connect + {session_id, message}
WS->>Service: stream_chat(session_id, message)
Service->>Agent: run with streaming enabled
Agent->>LLM: send prompt
LLM-->>Agent: token delta stream
Agent->>WS: emit TextDeltaEvent
WS-->>WSClient: {"type":"text_delta","delta":"..."}
Agent->>WS: emit ToolCallStartEvent / ToolCallEndEvent
WS-->>WSClient: tool events
Agent->>WS: emit CompleteEvent
WS-->>WSClient: {"type":"complete","usage":{...}}
WSClient->>WS: disconnect
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In @.env.example:
- Around line 56-92: Update the .env.example so the env var names map to
Settings fields: rename AGENT_MAX_TOOL_CALLS_PER_TURN to AGENT_MAX_TOOL_CALLS
(maps to agent_max_tool_calls) and rename AGENT_APPROVAL_REQUIRED_ACTIONS to
AGENT_REQUIRE_APPROVAL (maps to agent_require_approval). Also change the list
value to a JSON array string (e.g., ["create_alias","archive_run"]) and add a
short comment showing that JSON-array format is required for safe parsing;
remove or deprecate the old names to avoid confusion.
In `@app/core/config.py`:
- Around line 122-146: The validator validate_model_identifier currently allows
identifiers like "anthropic:" because it only checks the provider; update
validate_model_identifier (decorated for agent_default_model and
agent_fallback_model) to split on ":" and ensure the part after the colon (model
name) is non-empty and not just whitespace, and raise a ValueError with a clear
message when the model-name is missing or blank; keep the existing provider
validation and error messages but add this guard immediately after splitting.
In `@app/features/agents/service.py`:
- Around line 468-486: The placeholder result for approved actions must be
replaced with real re-execution of the pending tool: locate the approval handler
around the ApprovalResponse creation (references: action_id, approved, result,
status, db.flush, ApprovalResponse) and call the existing executor used
elsewhere (e.g., the function that runs pending tools or a new helper like
execute_pending_action(action_id) that replays the original action payload),
capture its output into result, set status to "executed" on success or
"rejected"/"expired" on failure/timeouts, persist any state changes to the DB
before returning, and ensure exceptions are caught and logged so
ApprovalResponse returns actual execution output and correct status.
In `@app/features/agents/tools/backtesting_tools.py`:
- Around line 38-60: The docstring for the factory that returns
NaiveModelConfig, SeasonalNaiveModelConfig, or MovingAverageModelConfig
incorrectly lists "linear_regression"; update the docstring to enumerate the
actual supported model_type values ("naive", "seasonal_naive", "moving_average")
and adjust any parameter description (e.g., season_length) to match behavior, or
alternatively implement a LinearRegressionModelConfig and handle
"linear_regression" in the function; refer to the existing return branches
(NaiveModelConfig, SeasonalNaiveModelConfig, MovingAverageModelConfig) to locate
the code to change.
- Around line 246-266: The MAE percent-improvement calculation can divide by
zero; update the branches that compute pct_better (the blocks using
metrics_a/metrics_b, main_a/main_b and writing into
comparison["recommendation"]) to guard the denominator: before computing
pct_better check if the denominator (mae_b in the mae_a < mae_b branch and mae_a
in the mae_b < mae_a branch) is zero; if it is, produce a recommendation that
uses an absolute MAE difference or says the improvement is
"infinite/undetermined" instead of performing the division; apply the same safe
logic to both branches so no ZeroDivisionError can occur.
In `@app/features/agents/tools/forecasting_tools.py`:
- Around line 35-57: Update the function docstring to accurately reflect the
supported model_type options (naive, seasonal_naive, moving_average) and remove
the incorrect reference to linear_regression; also note that seasonal_naive
accepts a season_length (default 7) as used when creating
SeasonalNaiveModelConfig, and ensure the "Returns" and "Raises" sections remain
correct for NaiveModelConfig, SeasonalNaiveModelConfig, and
MovingAverageModelConfig.
- Around line 60-119: Add input guards in the agent tools before calling the
service: in train_model(), verify train_start_date <= train_end_date and
raise/return an error if not (do this right after logging and before
_create_model_config / service.train_model call); in predict(), verify horizon
is <= forecast_max_horizon (use the existing forecast_max_horizon setting or
constant) and raise/return an error if exceeded before calling
ForecastingService.predict. Also consider mirroring these checks in
ForecastingService (methods train_model and predict) so service-level validation
exists if agent tools are bypassed.
In `@app/features/agents/tools/registry_tools.py`:
- Around line 73-88: The code does an unchecked conversion RunStatus(status)
which will raise an uncaught ValueError for invalid strings; wrap that
conversion in a try/except ValueError block in the function that uses
RegistryService.list_runs, validate the incoming status string before calling
RunStatus, and on failure raise a clear error (e.g., raise ValueError("Invalid
run status: '...'; valid values: [...]") or convert to an HTTP 400 via
HTTPException if this is an API layer). Update the code around the
RunStatus(status) line (and adjust callers of list_runs if needed) so invalid
status values are handled gracefully before calling service.list_runs and return
the same result.model_dump() on success.
In `@app/features/agents/websocket.py`:
- Around line 26-42: The current get_db_for_websocket yields a single
AsyncSession for the entire WebSocket lifetime which can cause stale data and
memory growth; change to a session-per-message approach by removing the
long-lived AsyncSession dependency from websocket_stream and instead obtain a
session maker (get_session_maker) inside websocket_stream and create/close an
async session for each incoming message (or periodically) using "async with
session_maker() as session" around message handling; alternatively provide a
dependency that yields a session factory/callable rather than a session so
websocket_stream can create short-lived sessions on demand (ensure each session
is closed to avoid accumulating ORM objects and stale state).
In `@docs/PHASE/9-AGENTIC_LAYER.md`:
- Around line 4-6: Update the PR reference string that currently points to
"https://github.com/w7-mgfcode/ForecastLabAI/pull/55" and the visible PR number
"#55" to the correct PR number "#56" (and corresponding URL "/pull/56") so the
PR link and the PR reference in the document match the objectives; look for the
PR link and the PR number near the header referencing PRP-10-agentic-layer.md
and INITIAL-10.md and replace both occurrences.
In `@README.md`:
- Around line 513-572: Update the README's Agentic Layer configuration to match
the defaults and names in app/core/config.py: change AGENT_TEMPERATURE from 0.0
to 0.1, AGENT_SESSION_TTL_MINUTES from 30 to 120, and
AGENT_APPROVAL_TIMEOUT_MINUTES from 5 to 60; replace the documented
AGENT_MODEL_NAME with the actual config key agent_default_model (and mention
agent_fallback_model as the fallback); add the missing configuration entries
agent_timeout_seconds, agent_retry_attempts, agent_retry_delay_seconds,
agent_max_sessions_per_user, and agent_enable_streaming to the examples and
defaults; and ensure all environment variable names and example values in the
README match the symbols used in app/core/config.py.
🧹 Nitpick comments (12)
pyproject.toml (1)
165-169: Scope Pyright's unknown-type relaxations to the agents package.Global downgrades can mask unrelated issues. Using
executionEnvironmentswith arootpath targetingapp/features/agentsallows you to relax these checks only where needed while maintaining stricter type coverage elsewhere:Example implementation
[tool.pyright] include = ["app"] @@ -# Agents feature uses PydanticAI which has partial type coverage -# Relax unknown type checks for dynamic agent result handling -reportUnknownVariableType = "warning" -reportUnknownArgumentType = "warning" -reportUnknownMemberType = "warning" +executionEnvironments = [ + { root = "app/features/agents", + reportUnknownVariableType = "warning", + reportUnknownArgumentType = "warning", + reportUnknownMemberType = "warning" + } +]app/features/agents/tests/test_models.py (1)
163-168: Test validates UUID format, not model behavior.This test validates that
uuid.uuid4().hexproduces a 32-char hex string, which is testing Python's uuid module rather than theAgentSessionmodel itself. Consider renaming to clarify intent or converting to a test that validates the model'ssession_idfield constraints (e.g., length validation in the schema layer).app/features/agents/agents/base.py (2)
1-14: Consider using the project's logging helper for consistency.Line 14 uses
structlog.get_logger()directly. The codebase hasapp.core.logging.get_logger()(see relevant snippet fromapp/core/logging.py, Lines 58-68) which provides consistent logger configuration. Consider using it for uniformity, though this is a minor consistency point.
56-82: Add defensive check for model identifier format.Line 66 uses
model.split(":")[0]which will return the full string if no colon exists, potentially causing silent misconfiguration. Consider adding validation:🛡️ Proposed fix to validate model format
def validate_api_key_for_model(model: str) -> None: ... settings = get_settings() - provider = model.split(":")[0] + if ":" not in model: + raise ValueError( + f"Invalid model identifier '{model}'. Expected format: 'provider:model-name'" + ) + provider = model.split(":")[0] if provider == "anthropic" and not settings.anthropic_api_key:app/features/agents/agents/experiment.py (2)
151-209: Add error handling for date parsing.Lines 194-195 use
date.fromisoformat()which raisesValueErroron invalid format. Since LLMs may produce malformed dates, consider wrapping with error handling:🛡️ Proposed fix for date parsing error handling
# Parse date strings - start = date.fromisoformat(start_date) - end = date.fromisoformat(end_date) + try: + start = date.fromisoformat(start_date) + end = date.fromisoformat(end_date) + except ValueError as e: + return { + "error": f"Invalid date format: {e}. Expected YYYY-MM-DD.", + "start_date": start_date, + "end_date": end_date, + } return await run_backtest(
211-227: Tool using@agent.tool_plaindoes not track tool calls.This tool uses
@agent.tool_plainwhich doesn't receiveRunContext, so it cannot incrementtool_call_count. This is intentional for pure functions, but the metric tracking will undercount total tool invocations. If tracking is important for auditing, consider switching to@agent.tooland incrementing the counter.app/features/agents/tests/test_tools.py (1)
177-228: Consider adding test coverage forrun_backtest.The
TestBacktestingToolsclass only testscompare_backtest_results. Consider adding tests for therun_backtestfunction to verify it correctly builds configurations and callsBacktestingService.app/features/agents/service.py (2)
565-584: Type signature mismatch in_deserialize_messages.The return type is
list[ModelMessage]but the implementation returnslist[dict[str, Any]]directly. While the comment explains that PydanticAI accepts dicts, this type mismatch could cause confusion and type checker issues elsewhere.Consider updating the return type to reflect the actual behavior:
def _deserialize_messages( self, data: list[dict[str, Any]], - ) -> list[ModelMessage]: + ) -> list[dict[str, Any]] | list[ModelMessage]:Or cast explicitly with documentation:
return cast(list[ModelMessage], data) # PydanticAI accepts dicts as message history
256-259: Tool calls list is always empty in response.Lines 256-259 initialize
tool_callsas an empty list and the comment explains that PydanticAI doesn't expose tool call details in the result object. Thetool_callsfield inChatResponsewill always be empty, which may confuse API consumers expecting to see what tools were called.Consider either documenting this limitation in the API response schema or removing the field until it can be populated.
app/features/agents/websocket.py (1)
134-160: Consider moving import to module level.The
datetimeimport at line 148 is inside the function. While functional, moving it to the module-level imports would be more consistent with Python conventions.📝 Proposed fix
At the top of the file (around line 8):
import json from collections.abc import AsyncGenerator +from datetime import UTC, datetimeIn the function:
-async def _send_error( - websocket: WebSocket, - error: str, - error_type: str = "unknown", - recoverable: bool = True, -) -> None: - """Send error event to WebSocket client...""" - from datetime import UTC, datetime - - await websocket.send_json( +async def _send_error( + websocket: WebSocket, + error: str, + error_type: str = "unknown", + recoverable: bool = True, +) -> None: + """Send error event to WebSocket client...""" + await websocket.send_json(docs/PHASE/9-AGENTIC_LAYER.md (2)
125-133: Add language specifier to code block.The code block for
ExperimentReportis missing a language specifier. Static analysis flagged this (MD040).📝 Proposed fix
**Structured Output**: `ExperimentReport` -``` +```python class ExperimentReport(BaseModel):
164-171: Add language specifier to code block.The code block for
RAGAnsweris also missing a language specifier.📝 Proposed fix
**Structured Output**: `RAGAnswer` -``` +```python class RAGAnswer(BaseModel):
- .env.example: rename env vars to match Settings fields (AGENT_MAX_TOOL_CALLS, AGENT_REQUIRE_APPROVAL with JSON array format), update defaults to match config.py - config.py: validate model name is non-empty in model identifier - service.py: implement real action execution in approve_action instead of placeholder, add _execute_pending_action helper - backtesting_tools.py: fix docstring model types, add zero division guards in compare_backtest_results - forecasting_tools.py: fix docstring, add date range and horizon validation guards - registry_tools.py: add RunStatus validation before enum conversion - websocket.py: change to session-per-message pattern to prevent stale data and memory growth - docs/PHASE/9-AGENTIC_LAYER.md: update PR reference from #55 to #56 - README.md: update Agentic Layer config to match config.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GITHUB_TOKEN doesn't trigger workflows on commits/PRs it creates (GitHub security feature to prevent infinite loops). This causes release PRs to not have CI running automatically. Solution: Use RELEASE_PAT if available, fallback to GITHUB_TOKEN. To enable: Create a fine-grained PAT with contents:write and pull-requests:write permissions, then add as RELEASE_PAT secret. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@app/features/agents/service.py`:
- Around line 382-412: The streaming path in stream_chat() currently finalizes
the session without checking for approvals; mirror the approval detection from
chat() by inspecting final_result (the variable final_result from
result.get_data()) for approval triggers and, if detected, set session.state =
AWAITING_APPROVAL and persist pending approval metadata on session, then emit a
StreamEvent with event_type="approval_required" (include whatever approval
payload/chat metadata you use) before yielding the final "complete" event;
ensure you still update
session.last_activity/expires_at/message_history/total_tokens_used and call
await db.flush() after modifying the session so the approval state is persisted.
- Around line 262-283: The pending-action flow is dropping required arguments
and the approval trigger is unreachable; change the approval detection to look
for a structured pending_action on result.data (e.g., if hasattr(result_data,
"pending_action") and result_data.pending_action) instead of relying only on
approval_required, and when creating session.pending_action in the approval
branch populate "arguments" from result_data.pending_action (or extract
alias_name, run_id, description when present) so the shape matches what
_execute_pending_action expects; update any code paths that set pending_action
(e.g., tool_create_alias/tool_archive_run outputs) to include the arguments so
session.pending_action, SessionStatus.AWAITING_APPROVAL, and
_format_pending_action receive the real argument payload rather than an empty
{}.
In `@app/features/agents/tools/forecasting_tools.py`:
- Around line 183-188: Add a lower-bound guard for horizon in the validation
block that currently uses get_settings() and checks
settings.forecast_max_horizon: explicitly check that horizon is an integer > 0
(reject horizon <= 0) and raise a ValueError with a clear message (e.g.,
"horizon must be a positive integer") so the tool does not accept non-positive
horizons before proceeding to forecasting logic in forecasting_tools.py.
In `@docs/PHASE/9-AGENTIC_LAYER.md`:
- Around line 571-576: The examples section shows files under the
examples/agents/ directory but the Known Limitations section still claims no
examples exist; update the documentation so both are consistent by either
removing the examples/agents/ listing or, preferably, updating the Known
Limitations text to acknowledge the existing examples and optionally list the
three example files (experiment_demo.py, rag_query.http, websocket_client.py) so
the examples section and Known Limitations align; modify the examples block and
the Known Limitations paragraph in PHASE/9-AGENTIC_LAYER.md accordingly.
- Around line 578-579: The README contains a hard-coded diff summary text
("**Total Lines**: 7,835 additions, 89 deletions") that will become stale;
remove that literal line or replace it with a dynamically generated value
injected by CI or a pre-commit script; look for the exact string "**Total
Lines**:" in docs/PHASE/9-AGENTIC_LAYER.md and either delete the hard-coded
counts or change the content to reference an automated script/template that
computes additions/deletions at build time.
| examples/ | ||
| └── agents/ # (Planned) Usage examples | ||
| ├── experiment_demo.py | ||
| ├── rag_query.http | ||
| └── websocket_client.py | ||
| ``` |
There was a problem hiding this comment.
Resolve examples section inconsistency.
The directory listing includes example files under examples/agents/, but the Known Limitations section says no examples exist. Please align these sections.
Also applies to: 603-604
🤖 Prompt for AI Agents
In `@docs/PHASE/9-AGENTIC_LAYER.md` around lines 571 - 576, The examples section
shows files under the examples/agents/ directory but the Known Limitations
section still claims no examples exist; update the documentation so both are
consistent by either removing the examples/agents/ listing or, preferably,
updating the Known Limitations text to acknowledge the existing examples and
optionally list the three example files (experiment_demo.py, rag_query.http,
websocket_client.py) so the examples section and Known Limitations align; modify
the examples block and the Known Limitations paragraph in
PHASE/9-AGENTIC_LAYER.md accordingly.
- Add approval detection to stream_chat() mirroring chat() logic - Emit approval_required StreamEvent when human approval is needed - Add horizon > 0 validation in predict forecasting tool - Fix pending-action argument extraction for both dict and object types - Update phase 9 docs: remove hard-coded line counts, fix examples section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@app/features/agents/service.py`:
- Around line 752-767: The action handler for action_type "create_alias" and
"archive_run" is silently defaulting required fields to empty strings which
leads to opaque downstream errors; update the block handling action_type (the
branch that calls create_alias and archive_run) to validate required arguments
explicitly—check that arguments contains a non-empty "alias_name" before calling
create_alias and a non-empty "run_id" before calling either create_alias or
archive_run, and raise a clear ValueError (e.g., "Missing required argument:
alias_name" or "Missing required argument: run_id") if validation fails so
callers get precise error messages instead of downstream failures.
- Around line 722-729: The code constructs a PendingAction using
datetime.fromisoformat on pending["created_at"] and pending["expires_at"] which
will raise ValueError for empty or malformed strings; update the PendingAction
construction to parse those fields defensively (e.g., wrap
datetime.fromisoformat calls for created_at and expires_at in a small helper or
try/except) and fall back to a safe default (None or a sentinel like
datetime.min) when parsing fails so PendingAction(...) never raises on
bad/missing timestamps.
In `@docs/PHASE/9-AGENTIC_LAYER.md`:
- Around line 388-398: Update the incorrect model identifier strings in the
agentic layer docs: change the fallback/cost-optimized model identifier
`google-gla:gemini-3-flash` to `google-gla:gemini-3-flash-preview` and change
the reasoning-heavy model identifier `google-gla:gemini-2-5-pro` to
`google-gla:gemini-2.5-pro` so they match the PydanticAI v1.48.0 provider:model
format used elsewhere (refer to the table entries and the "Reasoning-heavy"
bullet).
🧹 Nitpick comments (3)
app/features/agents/tools/forecasting_tools.py (1)
15-29: Consider using the centralizedget_loggerfor consistency.The module uses
structlog.get_logger()directly, but the project provides a configured logger viaapp.core.logging.get_logger(). Using the centralized helper ensures consistent logging configuration (e.g., request_id binding) across the codebase.♻️ Suggested change
-import structlog -from sqlalchemy.ext.asyncio import AsyncSession +from sqlalchemy.ext.asyncio import AsyncSession from app.core.config import get_settings +from app.core.logging import get_logger from app.features.forecasting.schemas import ( ModelConfig, MovingAverageModelConfig, @@ -27,7 +27,7 @@ ) from app.features.forecasting.service import ForecastingService -logger = structlog.get_logger() +logger = get_logger(__name__)app/features/agents/service.py (2)
261-316: Consider extracting approval detection logic to reduce duplication.The approval detection logic (lines 261-316) is nearly identical to the code in
stream_chat()(lines 430-483). This violates the DRY principle and creates maintenance burden—any bug fix or enhancement would need to be applied in two places.♻️ Suggested refactor
Extract the approval detection into a private helper method:
def _extract_approval_from_result( self, result_data: Any, now: datetime, ) -> tuple[bool, dict[str, Any] | None]: """Extract pending approval info from agent result. Returns: Tuple of (pending_approval, pending_action_dict or None). """ if hasattr(result_data, "pending_action") and result_data.pending_action: pending_action_data = result_data.pending_action if isinstance(pending_action_data, dict): action_type = pending_action_data.get("action_type", "unknown") arguments = pending_action_data.get("arguments", {}) description = pending_action_data.get( "description", f"Agent requested approval for {action_type}" ) else: action_type = getattr(pending_action_data, "action_type", "unknown") arguments = getattr(pending_action_data, "arguments", {}) description = getattr( pending_action_data, "description", f"Agent requested approval for {action_type}", ) return True, { "action_id": uuid.uuid4().hex[:16], "action_type": action_type, "description": description, "arguments": arguments, "created_at": now.isoformat(), "expires_at": ( now + timedelta(minutes=self.settings.agent_approval_timeout_minutes) ).isoformat(), } # Legacy fallback if hasattr(result_data, "approval_required") and result_data.approval_required: return True, { "action_id": uuid.uuid4().hex[:16], "action_type": "unknown", "description": "Agent requested approval for an action", "arguments": {}, "created_at": now.isoformat(), "expires_at": ( now + timedelta(minutes=self.settings.agent_approval_timeout_minutes) ).isoformat(), } return False, NoneThen use in both
chat()andstream_chat():pending_approval, pending_action_dict = self._extract_approval_from_result(result_data, now) if pending_action_dict: session.pending_action = pending_action_dict session.status = SessionStatus.AWAITING_APPROVAL.value pending_action = self._format_pending_action(session.pending_action)
588-598: Execution failure status may be misleading.When an approved action fails to execute (line 598), the status is set to
"rejected". This is semantically misleading—the action was approved but failed to execute. TheApprovalResponse.statustype isLiteral["executed", "rejected", "expired"], which doesn't have a"failed"option.Consider either:
- Adding
"failed"to theApprovalResponse.statusliteral type in schemas.py- Documenting this semantic overload where "rejected" means "approved but failed"
| return PendingAction( | ||
| action_id=pending.get("action_id", ""), | ||
| action_type=pending.get("action_type", ""), | ||
| description=pending.get("description", ""), | ||
| arguments=pending.get("arguments", {}), | ||
| created_at=datetime.fromisoformat(pending.get("created_at", "")), | ||
| expires_at=datetime.fromisoformat(pending.get("expires_at", "")), | ||
| ) |
There was a problem hiding this comment.
Potential ValueError if datetime fields are empty or malformed.
Lines 727-728 call datetime.fromisoformat() on values obtained via .get("created_at", "") and .get("expires_at", ""). If these fields are missing or contain empty strings, datetime.fromisoformat("") will raise ValueError: Invalid isoformat string: ''.
While the code that creates pending_action dicts properly sets these fields, defensive programming would protect against malformed/legacy data.
🛡️ Proposed defensive fix
+ try:
+ created_at = datetime.fromisoformat(pending.get("created_at", ""))
+ expires_at = datetime.fromisoformat(pending.get("expires_at", ""))
+ except ValueError:
+ # Fallback for malformed data
+ now = datetime.now(UTC)
+ created_at = now
+ expires_at = now + timedelta(minutes=self.settings.agent_approval_timeout_minutes)
+
return PendingAction(
action_id=pending.get("action_id", ""),
action_type=pending.get("action_type", ""),
description=pending.get("description", ""),
arguments=pending.get("arguments", {}),
- created_at=datetime.fromisoformat(pending.get("created_at", "")),
- expires_at=datetime.fromisoformat(pending.get("expires_at", "")),
+ created_at=created_at,
+ expires_at=expires_at,
)🤖 Prompt for AI Agents
In `@app/features/agents/service.py` around lines 722 - 729, The code constructs a
PendingAction using datetime.fromisoformat on pending["created_at"] and
pending["expires_at"] which will raise ValueError for empty or malformed
strings; update the PendingAction construction to parse those fields defensively
(e.g., wrap datetime.fromisoformat calls for created_at and expires_at in a
small helper or try/except) and fall back to a safe default (None or a sentinel
like datetime.min) when parsing fails so PendingAction(...) never raises on
bad/missing timestamps.
| if action_type == "create_alias": | ||
| alias_name = arguments.get("alias_name", "") | ||
| run_id = arguments.get("run_id", "") | ||
| description = arguments.get("description") | ||
| return await create_alias( | ||
| db=db, | ||
| alias_name=alias_name, | ||
| run_id=run_id, | ||
| description=description, | ||
| ) | ||
| elif action_type == "archive_run": | ||
| run_id = arguments.get("run_id", "") | ||
| result = await archive_run(db=db, run_id=run_id) | ||
| if result is None: | ||
| raise ValueError(f"Run not found: {run_id}") | ||
| return result |
There was a problem hiding this comment.
Missing validation for required arguments could cause confusing downstream errors.
Lines 753-754 default alias_name and run_id to empty strings. If these required arguments are missing from the pending action data, the downstream create_alias or archive_run calls will fail with less helpful error messages.
Consider validating required arguments upfront for clearer error reporting.
🛡️ Proposed validation
if action_type == "create_alias":
alias_name = arguments.get("alias_name", "")
run_id = arguments.get("run_id", "")
+ if not alias_name or not run_id:
+ raise ValueError(
+ f"create_alias requires alias_name and run_id, got: {arguments}"
+ )
description = arguments.get("description")
return await create_alias(
db=db,
alias_name=alias_name,
run_id=run_id,
description=description,
)
elif action_type == "archive_run":
run_id = arguments.get("run_id", "")
+ if not run_id:
+ raise ValueError(f"archive_run requires run_id, got: {arguments}")
result = await archive_run(db=db, run_id=run_id)🤖 Prompt for AI Agents
In `@app/features/agents/service.py` around lines 752 - 767, The action handler
for action_type "create_alias" and "archive_run" is silently defaulting required
fields to empty strings which leads to opaque downstream errors; update the
block handling action_type (the branch that calls create_alias and archive_run)
to validate required arguments explicitly—check that arguments contains a
non-empty "alias_name" before calling create_alias and a non-empty "run_id"
before calling either create_alias or archive_run, and raise a clear ValueError
(e.g., "Missing required argument: alias_name" or "Missing required argument:
run_id") if validation fails so callers get precise error messages instead of
downstream failures.
| |----------|------------------------|----------------|-------| | ||
| | Anthropic Claude | `anthropic:claude-sonnet-4-5` | `ANTHROPIC_API_KEY` | Default, recommended for production | | ||
| | OpenAI GPT | `openai:gpt-4o` | `OPENAI_API_KEY` | Fallback model | | ||
| | Google Gemini (AI Studio) | `google-gla:gemini-3-flash` | `GOOGLE_API_KEY` | 60-70% cheaper than Gemini 2.5, 3x faster | | ||
| | Google Vertex AI | `google-vertex:gemini-*` | GCP Service Account | Enterprise deployments with Vertex AI | | ||
|
|
||
| **Model Selection Guide:** | ||
| - **Production**: `anthropic:claude-sonnet-4-5` (best balance of quality/speed/cost) | ||
| - **Cost-optimized**: `google-gla:gemini-3-flash` (fast, cheap, good quality) | ||
| - **Reasoning-heavy**: `google-gla:gemini-2-5-pro` with `agent_thinking_budget=4000` | ||
| - **Maximum quality**: `anthropic:claude-opus-4-5` (highest capability, slower) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
PydanticAI v1.48.0 model identifiers documentation
💡 Result:
In PydanticAI v1.x (including v1.48.0), the documented “model identifier” string format you pass to Agent(...) is:
"<provider>:<model>"(e.g.openai:gpt-5,openrouter:google/gemini-2.5-pro-preview) — PydanticAI will infer the appropriate model class/provider/profile from this string. [1]- If you’re using Pydantic AI Gateway, you can prefix with
"gateway/", e.g.gateway/openai:gpt-5. [2]
Documentation pages:
- Models & Providers → Overview (explains the
<provider>:<model>identifier and auto-selection behavior). [1] - Models & Providers → OpenAI (shows concrete identifier examples like
openai:gpt-5andgateway/openai:gpt-5). [2]
Sources:
[1] (ai.pydantic.dev)
[2] (ai.pydantic.dev)
Citations:
🌐 Web query:
PydanticAI google-gla gemini model names anthropic claude openai
💡 Result:
In PydanticAI, the “model name” string is generally:
<provider>:<model_name>(auto-infers the right model class/provider), e.g.openai:gpt-5oranthropic:claude-sonnet-4-5orgoogle-gla:gemini-2.5-pro. [1][2][3]- Or via Gateway:
gateway/<provider>:<model_name>, e.g.gateway/openai:gpt-5,gateway/anthropic:claude-sonnet-4-5,gateway/gemini:gemini-2.5-pro. [1][2][3]
Google Gemini (Generative Language API = google-gla)
Use:
google-gla:gemini-2.5-pro(example shown in docs) [1]
(You can also use Vertex:google-vertex:gemini-2.5-pro.) [1]
Anthropic Claude
Use:
anthropic:claude-sonnet-4-5(example shown in docs) [2]
For AWS Bedrock-style IDs, you can pass the full Bedrock model id as the model name (example):us.anthropic.claude-sonnet-4-5-20250929-v1:0[2]
OpenAI
Use:
openai:gpt-5(example shown in docs) [3][4]
Sources: [1] [2] [3] [4]
🏁 Script executed:
# Check if there's a repository or documentation in the current codebase
git ls-files | grep -i "pydantic\|model" | head -20Repository: w7-mgfcode/ForecastLabAI
Length of output: 673
🌐 Web query:
google-gla gemini-3-flash pydantic ai 2026
💡 Result:
google-glain PydanticAI means “Google Generative Language API” (Gemini viagenerativelanguage.googleapis.com) and is used with a Google AI Studio API key (GOOGLE_API_KEY). [2]- Gemini 3 Flash exists (Google has rolled it out as a faster/lower-latency “Flash” tier). [3]
- In PydanticAI, the Google integration (
pydantic_ai.models.google.GoogleModel) lists Gemini 3 preview model names includinggemini-3-flash-preview(plusgemini-3-pro-preview, etc.). [1]
Minimal PydanticAI example (GLA + Gemini 3 Flash preview)
# pip install "pydantic-ai-slim[google]"
import os
from pydantic_ai import Agent
os.environ["GOOGLE_API_KEY"] = "YOUR_AI_STUDIO_KEY"
agent = Agent("google-gla:gemini-3-flash-preview")
result = agent.run_sync("Give me a 3-bullet summary of PydanticAI.")
print(result.output)This uses the google-gla:{model_name} shorthand shown in the PydanticAI Google docs. [2]
Sources: PydanticAI Google model API reference (model-name list incl. gemini-3-flash-preview) [1], PydanticAI Google provider docs (what google-gla is / API key setup / google-gla:{model} usage) [2], reporting on Gemini 3 Flash rollout [3].
🌐 Web query:
anthropic claude-opus-4-5 pydantic ai model
💡 Result:
To use Anthropic Claude Opus 4.5 in PydanticAI, pass the Anthropic model name claude-opus-4-5 (Anthropic’s recommended Claude API identifier) to AnthropicModel (or use the Gateway shorthand). [1][2]
pip install "pydantic-ai-slim[anthropic]"
export ANTHROPIC_API_KEY="..."Direct to Anthropic
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
agent = Agent(AnthropicModel("claude-opus-4-5"))Via PydanticAI Gateway
from pydantic_ai import Agent
agent = Agent("gateway/anthropic:claude-opus-4-5")Notes:
- PydanticAI’s Anthropic model type allows “latest listed” names but also accepts arbitrary valid Anthropic model strings (including date-stamped variants if you use them). [3]
🌐 Web query:
"gemini-2.5-pro" vs "gemini-2-5-pro" google pydantic ai
💡 Result:
- Correct model ID (what you pass to APIs / PydanticAI):
gemini-2.5-pro(note the dot). Google’s Vertex AI docs list the Model ID asgemini-2.5-pro. [1] PydanticAI’s Google model docs and type hints also usegemini-2.5-pro. [2][3] gemini-2-5-pro(with extra hyphen): this shows up as a URL slug/path in Google Cloud documentation (e.g.,/models/gemini/2-5-pro), but it is not the model ID you should send as the model name. [1]
PydanticAI examples (use the dotted name):
Agent("google-gla:gemini-2.5-pro")[2]Agent("google-vertex:gemini-2.5-pro")[2]
Sources:
[1] Google Cloud Vertex AI Gemini 2.5 Pro docs (Model ID) (cloud.google.com)
[2] PydanticAI “Google (Gemini)” model docs (ai.pydantic.dev)
[3] PydanticAI API reference listing gemini-2.5-pro as a latest model name (ai.pydantic.dev)
Citations:
- 1: https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro?utm_source=openai
- 2: https://ai.pydantic.dev/models/gemini/?utm_source=openai
- 3: https://ai.pydantic.dev/api/models/gemini/?utm_source=openai
Correct model identifier formats in lines 391 and 397.
The model identifiers referenced have incorrect naming:
- Line 391:
google-gla:gemini-3-flashshould begoogle-gla:gemini-3-flash-preview(missing-previewsuffix) - Line 397:
google-gla:gemini-2-5-proshould begoogle-gla:gemini-2.5-pro(use dot notation, not hyphens)
Line 398 anthropic:claude-opus-4-5 is correct. These follow PydanticAI v1.48.0's <provider>:<model> identifier format.
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 388-388: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
[warning] 388-388: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
🤖 Prompt for AI Agents
In `@docs/PHASE/9-AGENTIC_LAYER.md` around lines 388 - 398, Update the incorrect
model identifier strings in the agentic layer docs: change the
fallback/cost-optimized model identifier `google-gla:gemini-3-flash` to
`google-gla:gemini-3-flash-preview` and change the reasoning-heavy model
identifier `google-gla:gemini-2-5-pro` to `google-gla:gemini-2.5-pro` so they
match the PydanticAI v1.48.0 provider:model format used elsewhere (refer to the
table entries and the "Reasoning-heavy" bullet).
Release: Phase 9 Agentic Layer with PydanticAI
This PR merges the completed Phase 9 (Agentic Layer) from
devtomainfor automatic semantic release.🎯 Summary
Implement autonomous AI agents for experiment orchestration and evidence-grounded Q&A using PydanticAI v1.48.0.
✨ Major Features
Phase 9: Agentic Layer (PRP-10)
Experiment Orchestrator Agent: Autonomous model experimentation workflow
list_models,run_backtest,compare_runs,create_alias,archive_runRAG Assistant Agent: Evidence-grounded Q&A with citations
retrieve_context,format_citationSession Management: PostgreSQL JSONB storage for multi-turn conversations
Human-in-the-Loop Approval: Blocks sensitive actions until approved
WebSocket Streaming: Real-time token delivery for responsive UX
Multi-Provider LLM Support:
Gemini Extended Reasoning: Optional thinking mode for complex reasoning tasks
📊 Changes
agent_sessiontable with JSONB message historyNew Files
app/features/agents/- Complete vertical slice (30 files)alembic/versions/d6e0f2g3h456_create_agent_session_table.pydocs/PHASE/9-AGENTIC_LAYER.md(624 lines)🔧 Configuration
New environment variables:
🧪 Testing
All validation gates pass:
📚 Documentation
docs/ARCHITECTURE.mdwith Phase 9 sectiondocs/PHASE-index.mdwith completion detailsdocs/PHASE/9-AGENTIC_LAYER.mdwith full implementation guideREADME.mdwith agents API endpoints and examples.env.examplewith comprehensive configuration guide🚀 Release Notes
Breaking Changes: None (backward compatible)
New Dependencies:
pydantic-ai>=1.48.0- PydanticAI agent framework (v1 stable)anthropic>=0.50.0- Anthropic SDK for ClaudeCommits Included: 25 commits from dev branch including:
📋 Checklist
🎉 Next Phase
Phase 10: ForecastLab Dashboard (PRP-11)
This PR will trigger automatic semantic versioning via release-please.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Configuration
Database
Documentation
Tests / Chores
✏️ Tip: You can customize this high-level summary in your review settings.