Skip to content

feat: RAG Knowledge Base, Serving Layer, and Model Registry#50

Merged
w7-mgfcode merged 12 commits into
mainfrom
dev
Feb 1, 2026
Merged

feat: RAG Knowledge Base, Serving Layer, and Model Registry#50
w7-mgfcode merged 12 commits into
mainfrom
dev

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

@w7-mgfcode w7-mgfcode commented Feb 1, 2026

Summary

This PR merges the dev branch into main, bringing significant new features and improvements:

  • RAG Knowledge Base (PRP-9): pgvector-based semantic search with OpenAI/Ollama embedding providers, chunking strategies, and citation support
  • Serving Layer (PRP-8): Agent-first API design with dimensions, analytics, and jobs endpoints
  • Model Registry: Run tracking, artifact storage with SHA-256 verification, deployment aliases
  • Documentation: Restructured roadmap into modular three-phase architecture (INITIAL-9/10/11)

Key Changes

RAG Feature (app/features/rag/)

  • Embedding providers: OpenAI (default) and Ollama (local/LAN)
  • Markdown and OpenAPI chunking strategies
  • pgvector HNSW index for similarity search
  • Idempotent indexing via content hash
  • Full test coverage (unit + integration)

Serving Layer Improvements

  • Analytics endpoints for KPI aggregations and drilldowns
  • Dimensions endpoints for store/product discovery
  • Jobs orchestration (sync now, async-ready)
  • RFC 7807 Problem Details error responses

Model Registry

  • JSONB storage for model_config, metrics, runtime_info
  • State machine: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED
  • Deployment aliases for production model management
  • Duplicate detection (configurable: allow/deny/detect)

Files Changed

  • 34 files changed, ~10,943 insertions

Test plan

  • All unit tests pass (pytest -m "not integration")
  • Integration tests pass with PostgreSQL (pytest -m integration)
  • Type checking passes (mypy, pyright)
  • Linting passes (ruff)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • RAG knowledge base: document indexing and semantic search with configurable embedding providers.
    • Agentic layer: autonomous multi-agent workflows for experiment orchestration and approvals.
    • ForecastLab Dashboard: data explorer, time-series visualizers, agent chat, and admin controls.
  • Documentation

    • Extensive architecture and usage docs, examples, and API request samples for RAG, agents, and dashboard.
  • Chores

    • Database schema, migrations, and environment configuration updated to support RAG and agents.

✏️ Tip: You can customize this high-level summary in your review settings.

w7-mgfcode and others added 11 commits February 1, 2026 08:17
…ents (#36)

* docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(registry): implement model registry for run tracking and deployments

Add model registry feature (PRP-7) with:

- ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info),
  DeploymentAlias for mutable deployment pointers
- Storage: LocalFSProvider with SHA-256 integrity verification and path
  traversal prevention, abstract interface for future S3/GCS support
- Service: RegistryService with state machine validation, duplicate
  detection, config hashing, and run comparison
- API endpoints: CRUD for runs and aliases, artifact verification,
  run comparison with config/metrics diffs
- Database: Alembic migration with GIN indexes for JSONB containment queries
- Tests: 103 unit tests (schemas, storage, service) + 24 integration tests
- Example: registry_demo.py demonstrating full workflow

Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED
Aliases can only point to SUCCESS runs for deployment safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for model registry implementation

- README.md: Add registry to project structure, API endpoints section,
  and example reference
- docs/ARCHITECTURE.md: Update section 7.6 with full implementation
  details, add registry endpoints to section 8, mark Phase 1 complete
- docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed
  completion entries for Forecasting, Backtesting, and Registry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add PHASE documentation for forecasting, backtesting, and registry

Create missing phase documentation files to complete the project's
implementation records:

- 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict
  endpoints, and joblib persistence
- 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies,
  metrics calculation, and baseline comparisons
- 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases,
  and SHA-256 artifact integrity verification

Update PHASE-index.md to link to the new documentation files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(registry): resolve type checking issues with Pydantic model_config alias

- Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking
- Use model_config_data instead of model_config alias in tests to avoid collision
  with Pydantic's reserved model_config attribute
- Update _model_to_response to use model_validate() for proper alias handling
- Change docker-compose postgres port to 5433 to avoid conflicts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve CI failures for registry PR

- Import registry models in alembic/env.py for schema validation
- Fix import order and remove extraneous f-strings in registry_demo.py
- Add type: ignore comments for frozen model tests with pydantic.mypy plugin

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: prevent db_session fixtures from dropping all tables

The data_platform and root conftest.py db_session fixtures were dropping
all tables after each test, causing subsequent integration tests to fail
when they couldn't find migrated tables.

Changes:
- Remove Base.metadata.drop_all from db_session fixtures
- Tests now rely on migrations for table creation
- Each test just rolls back its own changes

Also fixes ruff format issue in examples/registry_demo.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add proper test data cleanup to db_session fixtures

Update data_platform and ingest test fixtures to clean up test data
explicitly instead of dropping all tables or just rolling back.

- data_platform: delete test stores, products, calendar entries
- ingest: delete test stores, products, sales, calendar entries

This ensures test isolation while preserving migrated tables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use separate session for test cleanup to avoid transaction issues

When tests cause integrity errors, the session enters a failed state.
Use a fresh session for cleanup to avoid PendingRollbackError.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use contextlib.suppress instead of try-except-pass

Replace try-except-pass patterns with contextlib.suppress to satisfy
ruff S110 linting rule.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add date range filter to SalesDaily cleanup in ingest tests
- Enforce artifact_hash presence before verification in registry routes
- Compute SHA256 from saved file instead of source in storage
- Fix override_get_db to mirror production transaction semantics
- Filter DeploymentAlias cleanup to only test runs
- Update database port to 5433 in config and .env.example
- Add language identifiers to fenced code blocks (MD040)
- Fix table formatting for markdownlint MD060
- Update PR reference in PHASE/6-MODEL_REGISTRY.md
- Convert bare URLs to markdown links in INITIAL-7.md
- Wrap __init__.py in backticks in PRP-7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: release v0.2.0 (#37)

* feat(registry): implement model registry for run tracking and deployments (#36)

* docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(registry): implement model registry for run tracking and deployments

Add model registry feature (PRP-7) with:

- ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info),
  DeploymentAlias for mutable deployment pointers
- Storage: LocalFSProvider with SHA-256 integrity verification and path
  traversal prevention, abstract interface for future S3/GCS support
- Service: RegistryService with state machine validation, duplicate
  detection, config hashing, and run comparison
- API endpoints: CRUD for runs and aliases, artifact verification,
  run comparison with config/metrics diffs
- Database: Alembic migration with GIN indexes for JSONB containment queries
- Tests: 103 unit tests (schemas, storage, service) + 24 integration tests
- Example: registry_demo.py demonstrating full workflow

Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED
Aliases can only point to SUCCESS runs for deployment safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for model registry implementation

- README.md: Add registry to project structure, API endpoints section,
  and example reference
- docs/ARCHITECTURE.md: Update section 7.6 with full implementation
  details, add registry endpoints to section 8, mark Phase 1 complete
- docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed
  completion entries for Forecasting, Backtesting, and Registry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add PHASE documentation for forecasting, backtesting, and registry

Create missing phase documentation files to complete the project's
implementation records:

- 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict
  endpoints, and joblib persistence
- 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies,
  metrics calculation, and baseline comparisons
- 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases,
  and SHA-256 artifact integrity verification

Update PHASE-index.md to link to the new documentation files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(registry): resolve type checking issues with Pydantic model_config alias

- Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking
- Use model_config_data instead of model_config alias in tests to avoid collision
  with Pydantic's reserved model_config attribute
- Update _model_to_response to use model_validate() for proper alias handling
- Change docker-compose postgres port to 5433 to avoid conflicts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve CI failures for registry PR

- Import registry models in alembic/env.py for schema validation
- Fix import order and remove extraneous f-strings in registry_demo.py
- Add type: ignore comments for frozen model tests with pydantic.mypy plugin

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: prevent db_session fixtures from dropping all tables

The data_platform and root conftest.py db_session fixtures were dropping
all tables after each test, causing subsequent integration tests to fail
when they couldn't find migrated tables.

Changes:
- Remove Base.metadata.drop_all from db_session fixtures
- Tests now rely on migrations for table creation
- Each test just rolls back its own changes

Also fixes ruff format issue in examples/registry_demo.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add proper test data cleanup to db_session fixtures

Update data_platform and ingest test fixtures to clean up test data
explicitly instead of dropping all tables or just rolling back.

- data_platform: delete test stores, products, calendar entries
- ingest: delete test stores, products, sales, calendar entries

This ensures test isolation while preserving migrated tables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use separate session for test cleanup to avoid transaction issues

When tests cause integrity errors, the session enters a failed state.
Use a fresh session for cleanup to avoid PendingRollbackError.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use contextlib.suppress instead of try-except-pass

Replace try-except-pass patterns with contextlib.suppress to satisfy
ruff S110 linting rule.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: code improvements and documentation fixes

- Add date range filter to SalesDaily cleanup in ingest tests
- Enforce artifact_hash presence before verification in registry routes
- Compute SHA256 from saved file instead of source in storage
- Fix override_get_db to mirror production transaction semantics
- Filter DeploymentAlias cleanup to only test runs
- Update database port to 5433 in config and .env.example
- Add language identifiers to fenced code blocks (MD040)
- Fix table formatting for markdownlint MD060
- Update PR reference in PHASE/6-MODEL_REGISTRY.md
- Convert bare URLs to markdown links in INITIAL-7.md
- Wrap __init__.py in backticks in PRP-7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* chore(main): release 0.2.0 (#38)

Release-As: 0.2.0

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* chore(main): release 0.2.0 (#39)

* chore(main): release 0.2.0

* chore: trigger CI

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gabe@w7dev <gabor@w7-7.net>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
sync: align dev with main after v0.2.0 release
* docs(initial-8): expand serving layer requirements

Add specifications for job-driven orchestration, dimension discovery
endpoints, standardized API protocols (filtering/pagination), and
agent-first API design patterns for LLM tool-calling optimization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp-8): add serving layer implementation spec

Comprehensive PRP for FastAPI serving layer including:
- Dimensions module for store/product discovery endpoints
- Analytics module for KPI/drilldown queries
- Jobs module for async-ready task orchestration
- RFC 7807 problem details for semantic error responses
- OpenAPI export optimization for LLM tool-calling

26 tasks with validation gates and 8.5/10 confidence score.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(serving-layer): implement PRP-8 agent-first API design

Add RFC 7807 Problem Details for semantic error responses:
- ProblemDetail schema with type URIs and error codes
- application/problem+json content type
- Validation exception handler with field-level errors

Add dimensions module for store/product discovery:
- GET /dimensions/stores with pagination, filtering, search
- GET /dimensions/products with pagination, filtering, search
- LLM-optimized Field descriptions for tool-calling

Add analytics module for KPI aggregations:
- GET /analytics/kpis with date range and dimension filters
- GET /analytics/drilldowns for store/product/category/region/date
- Revenue share and ranking calculations

Add jobs module for async-ready task orchestration:
- POST /jobs for train/predict/backtest operations
- Job model with JSONB params/results
- Status transitions: pending → running → completed/failed

Integration:
- New settings: analytics_max_rows, jobs_retention_days
- Register routers in main.py
- Alembic migration for jobs table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for PRP-8 serving layer

Update README.md:
- Add dimensions, analytics, jobs modules to project structure
- Document new API endpoints with examples
- Add RFC 7807 error response documentation

Update docs/ARCHITECTURE.md:
- Mark serving layer section as implemented
- Add configuration settings for new modules
- Update roadmap with Phase-2 completion

Update docs/PHASE-index.md:
- Add Phase 7 (Serving Layer) as completed
- Update phase overview table
- Add version history entry

Create docs/PHASE/7-SERVING_LAYER.md:
- Comprehensive phase documentation
- API endpoint specifications
- Database schema and migration details
- Usage examples and test coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: fix ruff formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…ling

- Add validate_date_range helper to analytics routes for reusable date validation
- Apply date range validation to both get_kpis and get_drilldowns endpoints
- Fix total_revenue_all calculation to use full dataset before limiting
- Add run_id to train job result for downstream predict jobs
- Fix predict job to resolve run_id to model metadata from bundle
- Update test fixtures to use 32-char hex IDs per schema requirements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
sync: merge main (v0.2.1) into dev
…IAL-9/10/11) (#47)

* docs: restructure INITIAL-9 into modular three-phase roadmap

Decompose monolithic INITIAL-9 into three specialized technical phases:

- INITIAL-9: RAG Knowledge Base ("The Memory")
  - pgvector + OpenAI embeddings
  - Markdown/OpenAPI-aware chunking
  - Semantic retrieval endpoints

- INITIAL-10: Agentic Layer ("The Brain")
  - PydanticAI agents (Experiment Orchestrator, RAG Assistant)
  - Tool orchestration with structured outputs
  - Human-in-the-loop approval workflow

- INITIAL-11: ForecastLab Dashboard ("The Face")
  - React 19 + Vite + shadcn/ui
  - TanStack Table/Query for data management
  - Recharts for time series visualization
  - Agent chat interface with streaming

Update PHASE-index.md and DAILY-FLOW.md to align with new structure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp): add PRP-9 RAG Knowledge Base implementation plan

Comprehensive PRP for INITIAL-9 RAG Knowledge Base feature:

- pgvector + SQLAlchemy 2.0 integration patterns
- Markdown-aware and OpenAPI-aware chunking
- Async OpenAI embeddings with batch processing
- HNSW index for cosine similarity search
- 15 ordered implementation tasks
- 5-level validation loop (syntax → types → unit → integration → smoke)
- Full ORM models and Pydantic schemas
- Known gotchas and anti-patterns documented

Confidence score: 8.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp): add PRP-10 Agentic Layer implementation plan

Comprehensive PRP for INITIAL-10 Agentic Layer feature:

- PydanticAI agent framework integration
- Experiment Orchestrator Agent (backtest → compare → deploy)
- RAG Assistant Agent (query → retrieve → answer with citations)
- Human-in-the-loop approval workflow for sensitive actions
- WebSocket streaming for real-time token delivery
- Session persistence with JSONB message history
- 17 ordered implementation tasks
- Tool definitions for registry, backtesting, forecasting, RAG
- Full Pydantic schemas and ORM models

Confidence score: 7.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive PRP for INITIAL-11 (The Face) with:
- 24 implementation tasks across 6 phases
- React 19 + Vite + shadcn/ui + TanStack Table/Query
- TypeScript types matching all backend API schemas
- Reusable DataTable with server-side pagination
- TimeSeriesChart component with Recharts
- WebSocket hook for agent chat streaming
- Complete documentation links and gotchas

Confidence score: 7.5/10 (chat depends on INITIAL-10)

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat(rag): implement PRP-9 RAG Knowledge Base with pgvector

Add RAG (Retrieval-Augmented Generation) knowledge base feature for
semantic document indexing and retrieval using PostgreSQL pgvector.

Key components:
- Document indexing with markdown-aware and OpenAPI-aware chunking
- Semantic retrieval using cosine similarity with configurable thresholds
- Idempotent re-indexing via SHA-256 content hash comparison
- OpenAI text-embedding-3-small for embeddings (1536 dimensions)
- HNSW index for fast approximate nearest neighbor search

API endpoints:
- POST /rag/index - Index documents with automatic chunking
- POST /rag/retrieve - Semantic search with relevance scoring
- GET /rag/sources - List indexed sources with statistics
- DELETE /rag/sources/{source_id} - Remove source and chunks

Includes:
- ORM models: DocumentSource, DocumentChunk with Vector column
- Pydantic v2 schemas with strict validation
- 68 unit tests + 14 integration tests
- Migration for pgvector extension and RAG tables
- Examples and environment configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(rag): add Ollama embedding provider with OpenAI-compatible API

- Add EmbeddingProvider abstract base class with provider pattern
- Refactor existing OpenAI code to OpenAIEmbeddingProvider
- Add OllamaEmbeddingProvider using /v1/embeddings endpoint
  - Supports configurable dimensions parameter
  - Uses OpenAI-compatible response format
- Add config settings: rag_embedding_provider, ollama_base_url, ollama_embedding_model
- Add migration for dynamic embedding dimension support
- Update tests for both providers (25 tests)

Enables local/LAN embedding generation without OpenAI API dependency.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add Ollama embedding provider documentation

- Update .env.example with Ollama configuration options
- Add RAG Knowledge Base section to README with:
  - Embedding provider options (OpenAI/Ollama)
  - Example index and retrieve requests
  - Configuration examples for both providers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add Phase 8 RAG Knowledge Base documentation

- Create docs/PHASE/8-RAG_KNOWLEDGE_BASE.md with full phase details
- Update docs/PHASE-index.md:
  - Mark Phase 8 as Completed in overview table
  - Add Phase 8 summary to Completed Phases section
  - Add entry to Version History

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(ci): add RAG models import to alembic env and format tests

- Add rag models import to alembic/env.py for schema validation
- Format test_embeddings.py to pass ruff format check

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 1, 2026

📝 Walkthrough

Walkthrough

Adds a new RAG (Retrieval-Augmented Generation) subsystem: database models and migrations with pgvector/HNSW, chunkers for Markdown/OpenAPI, pluggable embedding providers (OpenAI/Ollama), RAG service and API routes, tests, docs, examples, and configuration + FastAPI router integration.

Changes

Cohort / File(s) Summary
RAG Core & API
app/features/rag/__init__.py, app/features/rag/routes.py, app/features/rag/service.py, app/features/rag/schemas.py, app/features/rag/models.py
Adds RAG router, service, Pydantic schemas, and SQLAlchemy models (DocumentSource, DocumentChunk) with pgvector embeddings and HNSW index; implements index/retrieve/list/delete flows and related errors.
Chunking & Embeddings
app/features/rag/chunkers.py, app/features/rag/embeddings.py
Implements Markdown and OpenAPI chunkers (heading- and endpoint-aware) and pluggable embedding providers (OpenAI async client and Ollama HTTP provider) with batching, retries, and singleton factory.
DB Migrations & Alembic
alembic/env.py, alembic/versions/b4c8d9e0f123_create_rag_tables.py, alembic/versions/c5d9e1f2g345_rag_dynamic_embedding_dimension.py
Adds pgvector extension and tables for sources/chunks with HNSW index; migration to adjust embedding dimension deterministically; imports rag models into alembic env.
Config & Env
.env.example, app/core/config.py, pyproject.toml
Adds RAG-related settings (provider, models, dimension, chunking, retrieval, HNSW params) and example env variables; adds RAG dependencies (pgvector, openai, tiktoken, httpx, pyyaml).
Tests
app/features/rag/tests/*, app/features/rag/tests/conftest.py
Extensive unit & integration tests for chunkers, embeddings, schemas, service, and routes with fixtures and mocked embedding service.
App Integration & Examples
app/main.py, examples/rag/index_docs.py, examples/rag/query.http, README.md
Registers RAG router in app, adds indexing/query examples and HTTP snippets, and documents RAG README section (duplicated block present).
Design & Roadmap Docs
INITIAL-9.md, INITIAL-10.md, INITIAL-11.md, PRPs/PRP-9-..., PRPs/PRP-10-..., PRPs/PRP-11-..., docs/*
Adds comprehensive design docs for RAG Knowledge Base, Agentic Layer, and ForecastLab Dashboard plus phase/roadmap updates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RAGRouter as RAG Router
    participant RAGService as RAG Service
    participant EmbeddingProv as Embedding Provider
    participant Database

    Client->>RAGRouter: POST /rag/index (source_type, source_path, content)
    RAGRouter->>RAGService: index_document(request)
    RAGService->>RAGService: compute_content_hash()
    RAGService->>Database: find_existing_source(source_type, source_path)
    alt content unchanged
        Database-->>RAGService: existing_source (matching hash)
        RAGService-->>RAGRouter: IndexResponse(status="unchanged")
    else new/updated
        RAGService->>RAGService: chunk_content() 
        RAGService->>EmbeddingProv: embed_texts(chunk_texts)
        EmbeddingProv-->>RAGService: embeddings[]
        RAGService->>Database: upsert_source_and_chunks(embeddings)
        Database-->>RAGService: upsert_result
        RAGService-->>RAGRouter: IndexResponse(status="indexed"/"updated")
    end
    RAGRouter-->>Client: 201 IndexResponse
Loading
sequenceDiagram
    participant Client
    participant RAGRouter as RAG Router
    participant RAGService as RAG Service
    participant EmbeddingProv as Embedding Provider
    participant Database

    Client->>RAGRouter: POST /rag/retrieve (query, top_k, similarity_threshold)
    RAGRouter->>RAGService: retrieve(request)
    RAGService->>EmbeddingProv: embed_query(query)
    EmbeddingProv-->>RAGService: query_embedding
    RAGService->>Database: search_similar_chunks(query_embedding, top_k)
    Database-->>RAGService: chunks_with_scores
    RAGService->>RAGService: filter_by_threshold(), format_results()
    RAGService-->>RAGRouter: RetrieveResponse(results, timings)
    RAGRouter-->>Client: 200 RetrieveResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

autorelease: pending

Suggested reviewers

  • w7-learn

Poem

🐰 A nibble of code, a thump of delight,

Chunks stitched with headings, embeddings take flight,
Memory in vectors, agents soon to play,
Dashboards will sparkle and guide the day! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the three main features being added: RAG Knowledge Base, Serving Layer, and Model Registry. It is specific, concise, and accurately represents the primary changes in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 97.14% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Feb 1, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedopenai@​2.16.096100100100100
Addedpgvector@​0.4.2100100100100100

View full report

w7-learn
w7-learn previously approved these changes Feb 1, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In `@alembic/versions/c5d9e1f2g345_rag_dynamic_embedding_dimension.py`:
- Around line 26-51: The migration's upgrade() reads RAG_EMBEDDING_DIMENSION
from the environment making the migration non-deterministic; fix by making the
applied dimension explicit and reversible: either hardcode the intended
dimension in this migration (replace the env read in upgrade() and create a new
migration for any future dimension change) and keep downgrade() symmetric, or if
you must parameterize, have upgrade() write the applied dimension into a
persistent metadata table (create or update a small table via op.create_table /
op.execute) and modify downgrade() to read that recorded value to restore the
previous vector size; update references in this file to the functions upgrade(),
downgrade(), and the op.execute/op.create_index/op.drop_index calls so the
dimension used is deterministic and recorded.

In `@app/features/rag/chunkers.py`:
- Around line 406-420: The code uses PyYAML in the YAML fallback for parsing
OpenAPI specs (yaml.safe_load in the try/except that falls back to
MarkdownChunker().chunk), but PyYAML is not declared as a dependency; add
"pyyaml" to the project's dependencies in pyproject.toml (or equivalent
dependency manifest) so importing yaml at runtime is guaranteed to succeed and
YAML parsing is supported for the code path that calls yaml.safe_load and the
MarkdownChunker fallback.

In `@app/features/rag/embeddings.py`:
- Around line 156-217: In embed_texts, capture the original token count before
you truncate each text and use that value in the logger.warning call so the
pre-truncation token count is reported; specifically, call
self.count_tokens(text) once into a variable (e.g., original_token_count) before
calling truncate_to_tokens, update token_count after truncation, and pass
original_token_count to logger.warning instead of recomputing with the
already-truncated text (refer to embed_texts, truncate_to_tokens, count_tokens,
and the logger.warning call).

In `@app/features/rag/service.py`:
- Around line 84-99: The _read_content_from_path method currently reads any
user-provided source_path; add path traversal protection by resolving the
incoming path and ensuring it is contained within an allowed base directory
(e.g., a class-level or injected Path like base_dir) using
Path(source_path).resolve() and then calling .relative_to(base_dir.resolve()) —
if relative_to raises ValueError or the resolved path is outside base_dir, raise
FileNotFoundError or a SecurityError; otherwise proceed to
read_text(encoding="utf-8"). Update references to Path and
_read_content_from_path accordingly and mirror the containment approach used in
app/features/registry/storage.py and app/features/forecasting/persistence.py.

In `@docs/DAILY-FLOW.md`:
- Around line 165-175: The fenced code block showing the phase list (lines
containing "Phase 8: RAG Knowledge Base", "Phase 9: Agentic Layer", "Phase 10:
ForecastLab Dashboard") lacks a language tag and triggers MD040; update the
opening fence from ``` to ```text (i.e., add the text language tag) so the block
becomes a ```text fenced block to satisfy markdownlint and improve readability
tooling.

In `@examples/rag/index_docs.py`:
- Around line 68-86: The logic treating any non-201 response from the /rag/index
call as a failure causes idempotent 200 responses to be counted as failures;
update the response handling in the block that checks response.status_code so
that it treats 200 the same as 201 (parse response.json(), read result["status"]
and update unchanged/indexed/total_chunks/total_tokens accordingly) and only
increment failed and print the FAILED message for other status codes; look for
the response.status_code checks and the variables
unchanged/indexed/failed/total_chunks/total_tokens to locate where to adjust the
conditional.

In `@INITIAL-10.md`:
- Around line 55-112: Several fenced code blocks in the provided Markdown
snippet are missing language identifiers (MD040); update each ``` block to
include the appropriate language specifier (e.g., ```json for the JSON payload
and response, ```python for any Python examples, or ```text for plain text) so
the code fences are tagged consistently; search for the untagged triple-backtick
blocks shown in the diff (the request/response JSON blocks and the other blocks
called out in the comment) and add the correct language token to each opening
fence.
- Around line 17-23: Fix the Markdown table pipe spacing in INITIAL-10.md by
adding spaces around the pipe delimiters for both table blocks (e.g., the header
row, separator row, and each content row shown in the diff) so they conform to
MD060; update the first table (the one with "Component | Technology | Purpose")
and the other table referenced at lines 390-397 to use " | " spacing between
columns consistently.

In `@PRPs/PRP-10-agentic-layer.md`:
- Around line 36-43: The markdown table fails MD060 due to inconsistent pipe
spacing; fix each table row (header and data rows) so there is a single space
after and before every pipe delimiter (e.g., for rows containing `POST
/agents/experiment/run`, `POST /agents/experiment/approve`, `POST
/agents/rag/query`, `GET /agents/status/{session_id}`, `WS /agents/stream`)
ensuring uniform " | " spacing around cells and a matching header separator row
so the linter passes.
- Around line 107-163: Two fenced code blocks listing the project tree are
missing language tags; update both triple-backtick fences that enclose the
"app/…" tree and the "app/features/agents/…" tree to include a language (e.g.,
```text) so they comply with MD040 and improve highlighting; locate the two
blocks in PRP-10-agentic-layer.md that start with "app/" and
"app/features/agents/" and change their opening fences from ``` to ```text
(leave the content and closing fences unchanged).
🧹 Nitpick comments (15)
app/features/rag/chunkers.py (2)

302-337: Sentence splitting regex may have edge cases.

The regex (?<=[.!?])\s+ for sentence splitting will incorrectly split on abbreviations like "Dr. Smith", "U.S.A.", or decimal numbers like "3.14 meters". While this is acceptable for RAG chunking (minor semantic boundary issues), consider documenting this limitation.


597-601: Schema truncation may produce invalid JSON.

Truncating JSON to 500 characters with [:500] can cut mid-structure, producing invalid JSON in the chunk content. This is acceptable for RAG context (humans/LLMs can understand partial schemas), but consider truncating at a higher level or adding "..." to indicate truncation.

💡 Optional improvement
         try:
-            return json.dumps(schema, indent=2)[:500]  # Limit size
+            formatted = json.dumps(schema, indent=2)
+            if len(formatted) > 500:
+                return formatted[:497] + "..."
+            return formatted
         except (TypeError, ValueError):
-            return str(schema)[:500]
+            schema_str = str(schema)
+            if len(schema_str) > 500:
+                return schema_str[:497] + "..."
+            return schema_str
docs/PHASE/8-RAG_KNOWLEDGE_BASE.md (2)

268-273: Add language specifier to fenced code block.

The code block lacks a language specifier. Adding one improves syntax highlighting and satisfies markdownlint.

📝 Suggested fix
-```
+```text
 Ruff:    All checks passed
 MyPy:    0 errors (117 source files)
 Pyright: 0 errors
 Pytest:  82 unit tests passed + 14 integration tests
</details>

---

`279-307`: **Add language specifier for directory structure code block.**



<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 app/
 ├── core/
 ...
app/features/rag/tests/test_service.py (1)

210-224: Consider adding test for list_sources with existing sources.

The test only covers the empty sources scenario. Adding a test with mocked sources would improve coverage of the response mapping logic.

📝 Additional test suggestion
`@pytest.mark.asyncio`
async def test_list_sources_with_data(self):
    """Test listing sources when sources exist."""
    service = RAGService()

    mock_source = MagicMock()
    mock_source.source_id = "src123"
    mock_source.source_type = "markdown"
    mock_source.source_path = "docs/test.md"
    mock_source.content_hash = "abc123"
    mock_source.indexed_at = datetime.now()
    mock_source.metadata_ = {}

    mock_db = AsyncMock()
    mock_result = MagicMock()
    mock_result.all.return_value = [(mock_source, 5)]  # source, chunk_count
    mock_db.execute = AsyncMock(return_value=mock_result)

    response = await service.list_sources(db=mock_db)

    assert response.total_sources == 1
    assert response.total_chunks == 5
    assert response.sources[0].source_id == "src123"
app/features/rag/tests/test_schemas.py (1)

211-243: Consider adding a test for negative relevance_score.

The test validates the upper bound (1.5 raises ValidationError) but doesn't test the lower bound. Based on the schema in app/features/rag/schemas.py (lines 89-113), relevance_score has ge=0.0, le=1.0 constraints.

💡 Suggested addition
# Add after line 243
def test_relevance_score_negative_rejected(self):
    """Test negative relevance score is rejected."""
    with pytest.raises(ValidationError):
        ChunkResult(
            chunk_id="c1",
            source_id="s1",
            source_path="test.md",
            source_type="markdown",
            content="test",
            relevance_score=-0.1,
        )
PRPs/PRP-9-rag-knowledge-base.md (2)

34-39: Minor markdown table formatting.

Static analysis flagged table column spacing inconsistencies (MD060). This is purely cosmetic and doesn't affect readability.

📝 Optional table formatting fix
-| Method | Path | Description |
-|--------|------|-------------|
-| `POST` | `/rag/index` | Index document (markdown/openapi) |
+| Method | Path                       | Description                      |
+| ------ | -------------------------- | -------------------------------- |
+| `POST` | `/rag/index`               | Index document (markdown/openapi)|

104-125: Add language specifier to fenced code blocks.

Static analysis flagged missing language specifiers (MD040). Adding a language hint improves syntax highlighting and accessibility.

📝 Suggested fix
-```
+```text
 app/
 ├── core/
app/features/rag/routes.py (1)

238-271: Consider adding error handling for list_sources.

Unlike other endpoints, list_sources doesn't wrap database errors. While less likely to fail, consistency would improve robustness.

♻️ Optional: Add SQLAlchemyError handling
 async def list_sources(
     db: AsyncSession = Depends(get_db),
 ) -> SourceListResponse:
     """List all indexed sources."""
     service = RAGService()
-    response = await service.list_sources(db=db)
+    try:
+        response = await service.list_sources(db=db)
+    except SQLAlchemyError as e:
+        logger.error(
+            "rag.list_sources_failed",
+            error=str(e),
+            error_type=type(e).__name__,
+            exc_info=True,
+        )
+        raise DatabaseError(
+            message="Failed to list sources",
+            details={"error": str(e)},
+        ) from e

     logger.info(
         "rag.list_sources_completed",
app/features/rag/service.py (1)

168-173: Redundant assignment after logging warning.

If chunks is empty/falsy, the warning is logged, but then chunks = [] is assigned which is redundant since chunks would already be falsy (empty list from chunker).

♻️ Minor cleanup
         if not chunks:
             logger.warning(
                 "rag.index_document_no_chunks",
                 source_path=request.source_path,
             )
-            chunks = []
app/features/rag/models.py (1)

31-32: Empty TYPE_CHECKING block.

The TYPE_CHECKING guard is imported but the block is empty. This can be removed unless forward references are planned.

♻️ Remove unused import
-from typing import TYPE_CHECKING, Any
+from typing import Any

-if TYPE_CHECKING:
-    pass
app/features/rag/tests/conftest.py (2)

241-251: Fixture source_id length is inconsistent with model constraint.

sample_document_source has source_id="test123456789012345678901234" which is 26 characters, but the model defines String(32) and expects a UUID hex (32 chars). While this may work for testing, it's inconsistent.

💡 Fix source_id length
 `@pytest.fixture`
 def sample_document_source() -> DocumentSource:
     """Sample DocumentSource ORM object for testing."""
     return DocumentSource(
-        source_id="test123456789012345678901234",
+        source_id="test12345678901234567890123456",  # 32 chars
         source_type="markdown",

254-265: Fixture chunk_id length is also inconsistent.

sample_document_chunk has chunk_id="chunk12345678901234567890123" which is 27 characters instead of 32.

💡 Fix chunk_id length
 `@pytest.fixture`
 def sample_document_chunk() -> DocumentChunk:
     """Sample DocumentChunk ORM object for testing."""
     return DocumentChunk(
-        chunk_id="chunk12345678901234567890123",
+        chunk_id="chunk1234567890123456789012345",  # 32 chars
         source_id=1,
app/features/rag/schemas.py (2)

46-65: Consider non-negative constraints for response metrics.

This is a small validation hardening to prevent negative counts or durations from leaking.

💡 Suggested validation additions
-    chunks_created: int
-    tokens_processed: int
-    duration_ms: float
+    chunks_created: int = Field(..., ge=0)
+    tokens_processed: int = Field(..., ge=0)
+    duration_ms: float = Field(..., ge=0)

116-129: Add non-negative bounds for timing/counters (optional).

Helps catch regressions if negative metrics ever appear.

💡 Suggested validation additions
-    query_embedding_time_ms: float
-    search_time_ms: float
-    total_chunks_searched: int
+    query_embedding_time_ms: float = Field(..., ge=0)
+    search_time_ms: float = Field(..., ge=0)
+    total_chunks_searched: int = Field(..., ge=0)

Comment thread alembic/versions/c5d9e1f2g345_rag_dynamic_embedding_dimension.py
Comment thread app/features/rag/chunkers.py
Comment thread app/features/rag/embeddings.py
Comment thread app/features/rag/service.py Outdated
Comment thread docs/DAILY-FLOW.md
Comment thread examples/rag/index_docs.py Outdated
Comment thread INITIAL-10.md
Comment thread INITIAL-10.md
Comment on lines +36 to +43
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/agents/experiment/run` | Execute experiment workflow |
| `POST` | `/agents/experiment/approve` | Approve pending action |
| `POST` | `/agents/rag/query` | Query with answer generation |
| `GET` | `/agents/status/{session_id}` | Check session status |
| `WS` | `/agents/stream` | WebSocket for streaming |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix table pipe spacing to satisfy MD060.

Keeps markdownlint clean and consistent table styling.

✅ Suggested fix
-| Method | Path | Description |
-|--------|------|-------------|
-| `POST` | `/agents/experiment/run` | Execute experiment workflow |
-| `POST` | `/agents/experiment/approve` | Approve pending action |
-| `POST` | `/agents/rag/query` | Query with answer generation |
-| `GET` | `/agents/status/{session_id}` | Check session status |
-| `WS` | `/agents/stream` | WebSocket for streaming |
+| Method | Path | Description |
+| ------ | ---- | ----------- |
+| `POST` | `/agents/experiment/run` | Execute experiment workflow |
+| `POST` | `/agents/experiment/approve` | Approve pending action |
+| `POST` | `/agents/rag/query` | Query with answer generation |
+| `GET` | `/agents/status/{session_id}` | Check session status |
+| `WS` | `/agents/stream` | WebSocket for streaming |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/agents/experiment/run` | Execute experiment workflow |
| `POST` | `/agents/experiment/approve` | Approve pending action |
| `POST` | `/agents/rag/query` | Query with answer generation |
| `GET` | `/agents/status/{session_id}` | Check session status |
| `WS` | `/agents/stream` | WebSocket for streaming |
| Method | Path | Description |
| ------ | ---- | ----------- |
| `POST` | `/agents/experiment/run` | Execute experiment workflow |
| `POST` | `/agents/experiment/approve` | Approve pending action |
| `POST` | `/agents/rag/query` | Query with answer generation |
| `GET` | `/agents/status/{session_id}` | Check session status |
| `WS` | `/agents/stream` | WebSocket for streaming |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 37-37: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 37-37: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 37-37: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 37-37: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 37-37: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 37-37: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)

🤖 Prompt for AI Agents
In `@PRPs/PRP-10-agentic-layer.md` around lines 36 - 43, The markdown table fails
MD060 due to inconsistent pipe spacing; fix each table row (header and data
rows) so there is a single space after and before every pipe delimiter (e.g.,
for rows containing `POST /agents/experiment/run`, `POST
/agents/experiment/approve`, `POST /agents/rag/query`, `GET
/agents/status/{session_id}`, `WS /agents/stream`) ensuring uniform " | "
spacing around cells and a matching header separator row so the linter passes.

Comment thread PRPs/PRP-10-agentic-layer.md
- Make migration deterministic by hardcoding dimension values instead
  of reading from environment (alembic migration)
- Add pyyaml dependency for YAML parsing in OpenAPI chunker
- Fix token count logging to capture original count before truncation
- Add path traversal protection to RAG service _read_content_from_path
  (mirrors registry/storage.py pattern)
- Fix markdown linting issues:
  - Add language tags to fenced code blocks (MD040)
  - Fix table pipe spacing (MD060)
- Fix index_docs.py to treat 200 same as 201 for idempotent responses
- Add test for path traversal protection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@w7-learn w7-learn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixes have been pushed to the dev branch. The PR #50 (#50) now includes these changes.

Summary of Fixes Applied
┌────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ File │ Fix │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ alembic/versions/c5d9e1f2g345_...py │ Made migration deterministic by hardcoding PREVIOUS_DIMENSION and TARGET_DIMENSION instead of reading from environment │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ pyproject.toml │ Added pyyaml>=6.0.0 dependency for YAML parsing in OpenAPI chunker │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ app/features/rag/embeddings.py │ Fixed token count logging to capture original count before truncation │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ app/features/rag/service.py │ Added path traversal protection with injectable base_dir parameter (mirrors registry/storage.py pattern) │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ app/features/rag/tests/test_service.py │ Updated test to use base_dir parameter; added path traversal test │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ docs/DAILY-FLOW.md │ Added text language tag to fenced code block (MD040) │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ examples/rag/index_docs.py │ Fixed response handling to treat 200 same as 201 for idempotent responses │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ INITIAL-10.md │ Fixed code block language tags and table pipe spacing (MD040, MD060) │
├────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PRPs/PRP-10-agentic-layer.md │ Fixed table pipe spacing and added language tags to code blocks │
└────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
All tests pass (83 unit tests) and type checking is clean (mypy + pyright).

@w7-mgfcode w7-mgfcode merged commit ad4388d into main Feb 1, 2026
13 of 14 checks passed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@examples/rag/index_docs.py`:
- Around line 126-130: index_readme currently treats anything other than 201 as
failure; make it consistent with index_markdown_docs by accepting both 200 and
201 as successful responses. In the index_readme function, change the
response.status_code check to allow 200 or 201, parse response.json() for
success, and update the success print to use result['chunks_created'] and
result['status'] (or an appropriate field) similarly to index_markdown_docs so
previously-indexed/unchanged README returns a success message instead of "Failed
to index".
🧹 Nitpick comments (3)
PRPs/PRP-10-agentic-layer.md (1)

500-508: Update to latest stable dependency versions and consider exact pinning for production stability.

The doc's minimum versions (pydantic-ai≥0.1.0, anthropic≥0.40.0, websockets≥13.0) are significantly outdated. Latest stable releases are:

  • pydantic-ai: 1.47.0 (Jan 24, 2026)
  • anthropic: 0.76.0 (Jan 13, 2026)
  • websockets: 16.0 (Jan 10, 2026)

Given the rapid iteration of these libraries, pinning exact versions or using narrower ranges (e.g., ~=1.47.0) is recommended to ensure tested compatibility rather than relying on loose >= ranges that may pull in breaking changes.

app/features/rag/service.py (1)

553-612: Consider the trade-off in similarity search result limiting.

The query fetches top_k * 2 results then filters by threshold, which is a reasonable optimization. However, if many results fall below the threshold, the final result set could be smaller than top_k. This is acceptable behavior but worth documenting in the method docstring for clarity.

📝 Optional: Document the behavior
     async def _search_similar_chunks(
         self,
         db: AsyncSession,
         query_embedding: list[float],
         top_k: int,
         threshold: float,
         filters: dict[str, Any] | None,
     ) -> list[ChunkResult]:
         """Search for similar chunks using cosine distance.

         Args:
             db: Database session.
             query_embedding: Query embedding vector.
             top_k: Maximum results to return.
             threshold: Minimum similarity threshold.
             filters: Optional metadata filters.

         Returns:
-            List of chunk results with relevance scores.
+            List of chunk results with relevance scores. May return fewer than
+            top_k results if insufficient chunks meet the similarity threshold.
         """
app/features/rag/embeddings.py (1)

495-536: Consider async cleanup for the Ollama provider singleton.

The OllamaEmbeddingProvider has an async close() method, but reset_embedding_service() doesn't call it. If the singleton is reset while using Ollama, the HTTP client won't be properly closed.

♻️ Proposed fix to handle async cleanup
-def reset_embedding_service() -> None:
+async def reset_embedding_service() -> None:
     """Reset the singleton embedding provider.

     Useful for testing or reconfiguration.
     """
     global _embedding_provider
+    if _embedding_provider is not None:
+        if isinstance(_embedding_provider, OllamaEmbeddingProvider):
+            await _embedding_provider.close()
     _embedding_provider = None

Alternatively, keep the sync version and add a separate async cleanup function if the sync reset is needed for simpler test fixtures:

def reset_embedding_service() -> None:
    """Reset the singleton embedding provider (sync).

    Note: Does not close async resources. Use reset_embedding_service_async()
    for proper cleanup of OllamaEmbeddingProvider.
    """
    global _embedding_provider
    _embedding_provider = None


async def reset_embedding_service_async() -> None:
    """Reset the singleton embedding provider with async cleanup."""
    global _embedding_provider
    if _embedding_provider is not None:
        if isinstance(_embedding_provider, OllamaEmbeddingProvider):
            await _embedding_provider.close()
    _embedding_provider = None

Comment on lines +126 to +130
if response.status_code == 201:
result = response.json()
print(f"README.md indexed: {result['chunks_created']} chunks ({result['status']})")
else:
print(f"Failed to index README.md: {response.status_code}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent response handling compared to index_markdown_docs.

The index_readme function only checks for status code 201, while index_markdown_docs (line 68) correctly handles both 200 and 201. If the README was previously indexed (unchanged content), the endpoint returns 200, causing this function to incorrectly report failure.

🐛 Proposed fix for consistent response handling
-        if response.status_code == 201:
+        if response.status_code in (200, 201):
             result = response.json()
             print(f"README.md indexed: {result['chunks_created']} chunks ({result['status']})")
         else:
             print(f"Failed to index README.md: {response.status_code}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if response.status_code == 201:
result = response.json()
print(f"README.md indexed: {result['chunks_created']} chunks ({result['status']})")
else:
print(f"Failed to index README.md: {response.status_code}")
if response.status_code in (200, 201):
result = response.json()
print(f"README.md indexed: {result['chunks_created']} chunks ({result['status']})")
else:
print(f"Failed to index README.md: {response.status_code}")
🤖 Prompt for AI Agents
In `@examples/rag/index_docs.py` around lines 126 - 130, index_readme currently
treats anything other than 201 as failure; make it consistent with
index_markdown_docs by accepting both 200 and 201 as successful responses. In
the index_readme function, change the response.status_code check to allow 200 or
201, parse response.json() for success, and update the success print to use
result['chunks_created'] and result['status'] (or an appropriate field)
similarly to index_markdown_docs so previously-indexed/unchanged README returns
a success message instead of "Failed to index".

w7-mgfcode added a commit that referenced this pull request Feb 1, 2026
* feat: RAG Knowledge Base, Serving Layer, and Model Registry (#50)

* feat(registry): implement model registry for run tracking and deployments (#36)

* docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(registry): implement model registry for run tracking and deployments

Add model registry feature (PRP-7) with:

- ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info),
  DeploymentAlias for mutable deployment pointers
- Storage: LocalFSProvider with SHA-256 integrity verification and path
  traversal prevention, abstract interface for future S3/GCS support
- Service: RegistryService with state machine validation, duplicate
  detection, config hashing, and run comparison
- API endpoints: CRUD for runs and aliases, artifact verification,
  run comparison with config/metrics diffs
- Database: Alembic migration with GIN indexes for JSONB containment queries
- Tests: 103 unit tests (schemas, storage, service) + 24 integration tests
- Example: registry_demo.py demonstrating full workflow

Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED
Aliases can only point to SUCCESS runs for deployment safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for model registry implementation

- README.md: Add registry to project structure, API endpoints section,
  and example reference
- docs/ARCHITECTURE.md: Update section 7.6 with full implementation
  details, add registry endpoints to section 8, mark Phase 1 complete
- docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed
  completion entries for Forecasting, Backtesting, and Registry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add PHASE documentation for forecasting, backtesting, and registry

Create missing phase documentation files to complete the project's
implementation records:

- 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict
  endpoints, and joblib persistence
- 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies,
  metrics calculation, and baseline comparisons
- 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases,
  and SHA-256 artifact integrity verification

Update PHASE-index.md to link to the new documentation files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(registry): resolve type checking issues with Pydantic model_config alias

- Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking
- Use model_config_data instead of model_config alias in tests to avoid collision
  with Pydantic's reserved model_config attribute
- Update _model_to_response to use model_validate() for proper alias handling
- Change docker-compose postgres port to 5433 to avoid conflicts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve CI failures for registry PR

- Import registry models in alembic/env.py for schema validation
- Fix import order and remove extraneous f-strings in registry_demo.py
- Add type: ignore comments for frozen model tests with pydantic.mypy plugin

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: prevent db_session fixtures from dropping all tables

The data_platform and root conftest.py db_session fixtures were dropping
all tables after each test, causing subsequent integration tests to fail
when they couldn't find migrated tables.

Changes:
- Remove Base.metadata.drop_all from db_session fixtures
- Tests now rely on migrations for table creation
- Each test just rolls back its own changes

Also fixes ruff format issue in examples/registry_demo.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add proper test data cleanup to db_session fixtures

Update data_platform and ingest test fixtures to clean up test data
explicitly instead of dropping all tables or just rolling back.

- data_platform: delete test stores, products, calendar entries
- ingest: delete test stores, products, sales, calendar entries

This ensures test isolation while preserving migrated tables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use separate session for test cleanup to avoid transaction issues

When tests cause integrity errors, the session enters a failed state.
Use a fresh session for cleanup to avoid PendingRollbackError.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use contextlib.suppress instead of try-except-pass

Replace try-except-pass patterns with contextlib.suppress to satisfy
ruff S110 linting rule.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: code improvements and documentation fixes

- Add date range filter to SalesDaily cleanup in ingest tests
- Enforce artifact_hash presence before verification in registry routes
- Compute SHA256 from saved file instead of source in storage
- Fix override_get_db to mirror production transaction semantics
- Filter DeploymentAlias cleanup to only test runs
- Update database port to 5433 in config and .env.example
- Add language identifiers to fenced code blocks (MD040)
- Fix table formatting for markdownlint MD060
- Update PR reference in PHASE/6-MODEL_REGISTRY.md
- Convert bare URLs to markdown links in INITIAL-7.md
- Wrap __init__.py in backticks in PRP-7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* sync: update dev from phase-6 (#40)

* chore: release v0.2.0 (#37)

* feat(registry): implement model registry for run tracking and deployments (#36)

* docs: expand INITIAL-7 with lifecycle, lineage, and artifact integrity details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(registry): implement model registry for run tracking and deployments

Add model registry feature (PRP-7) with:

- ORM models: ModelRun with JSONB columns (model_config, metrics, runtime_info),
  DeploymentAlias for mutable deployment pointers
- Storage: LocalFSProvider with SHA-256 integrity verification and path
  traversal prevention, abstract interface for future S3/GCS support
- Service: RegistryService with state machine validation, duplicate
  detection, config hashing, and run comparison
- API endpoints: CRUD for runs and aliases, artifact verification,
  run comparison with config/metrics diffs
- Database: Alembic migration with GIN indexes for JSONB containment queries
- Tests: 103 unit tests (schemas, storage, service) + 24 integration tests
- Example: registry_demo.py demonstrating full workflow

Run lifecycle: PENDING → RUNNING → SUCCESS/FAILED → ARCHIVED
Aliases can only point to SUCCESS runs for deployment safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for model registry implementation

- README.md: Add registry to project structure, API endpoints section,
  and example reference
- docs/ARCHITECTURE.md: Update section 7.6 with full implementation
  details, add registry endpoints to section 8, mark Phase 1 complete
- docs/PHASE-index.md: Mark phases 4-6 as completed, add detailed
  completion entries for Forecasting, Backtesting, and Registry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add PHASE documentation for forecasting, backtesting, and registry

Create missing phase documentation files to complete the project's
implementation records:

- 4-FORECASTING.md: Model zoo with BaseForecaster interface, train/predict
  endpoints, and joblib persistence
- 5-BACKTESTING.md: Time-series CV with expanding/sliding strategies,
  metrics calculation, and baseline comparisons
- 6-MODEL_REGISTRY.md: Run tracking with state machine, deployment aliases,
  and SHA-256 artifact integrity verification

Update PHASE-index.md to link to the new documentation files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(registry): resolve type checking issues with Pydantic model_config alias

- Add pydantic.mypy plugin to pyproject.toml for proper Pydantic type checking
- Use model_config_data instead of model_config alias in tests to avoid collision
  with Pydantic's reserved model_config attribute
- Update _model_to_response to use model_validate() for proper alias handling
- Change docker-compose postgres port to 5433 to avoid conflicts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve CI failures for registry PR

- Import registry models in alembic/env.py for schema validation
- Fix import order and remove extraneous f-strings in registry_demo.py
- Add type: ignore comments for frozen model tests with pydantic.mypy plugin

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: prevent db_session fixtures from dropping all tables

The data_platform and root conftest.py db_session fixtures were dropping
all tables after each test, causing subsequent integration tests to fail
when they couldn't find migrated tables.

Changes:
- Remove Base.metadata.drop_all from db_session fixtures
- Tests now rely on migrations for table creation
- Each test just rolls back its own changes

Also fixes ruff format issue in examples/registry_demo.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add proper test data cleanup to db_session fixtures

Update data_platform and ingest test fixtures to clean up test data
explicitly instead of dropping all tables or just rolling back.

- data_platform: delete test stores, products, calendar entries
- ingest: delete test stores, products, sales, calendar entries

This ensures test isolation while preserving migrated tables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use separate session for test cleanup to avoid transaction issues

When tests cause integrity errors, the session enters a failed state.
Use a fresh session for cleanup to avoid PendingRollbackError.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: use contextlib.suppress instead of try-except-pass

Replace try-except-pass patterns with contextlib.suppress to satisfy
ruff S110 linting rule.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: code improvements and documentation fixes

- Add date range filter to SalesDaily cleanup in ingest tests
- Enforce artifact_hash presence before verification in registry routes
- Compute SHA256 from saved file instead of source in storage
- Fix override_get_db to mirror production transaction semantics
- Filter DeploymentAlias cleanup to only test runs
- Update database port to 5433 in config and .env.example
- Add language identifiers to fenced code blocks (MD040)
- Fix table formatting for markdownlint MD060
- Update PR reference in PHASE/6-MODEL_REGISTRY.md
- Convert bare URLs to markdown links in INITIAL-7.md
- Wrap __init__.py in backticks in PRP-7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* chore(main): release 0.2.0 (#38)

Release-As: 0.2.0

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* chore(main): release 0.2.0 (#39)

* chore(main): release 0.2.0

* chore: trigger CI

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gabe@w7dev <gabor@w7-7.net>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* feat(serving-layer): implement PRP-8 agent-first API design (#42)

* docs(initial-8): expand serving layer requirements

Add specifications for job-driven orchestration, dimension discovery
endpoints, standardized API protocols (filtering/pagination), and
agent-first API design patterns for LLM tool-calling optimization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp-8): add serving layer implementation spec

Comprehensive PRP for FastAPI serving layer including:
- Dimensions module for store/product discovery endpoints
- Analytics module for KPI/drilldown queries
- Jobs module for async-ready task orchestration
- RFC 7807 problem details for semantic error responses
- OpenAPI export optimization for LLM tool-calling

26 tasks with validation gates and 8.5/10 confidence score.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(serving-layer): implement PRP-8 agent-first API design

Add RFC 7807 Problem Details for semantic error responses:
- ProblemDetail schema with type URIs and error codes
- application/problem+json content type
- Validation exception handler with field-level errors

Add dimensions module for store/product discovery:
- GET /dimensions/stores with pagination, filtering, search
- GET /dimensions/products with pagination, filtering, search
- LLM-optimized Field descriptions for tool-calling

Add analytics module for KPI aggregations:
- GET /analytics/kpis with date range and dimension filters
- GET /analytics/drilldowns for store/product/category/region/date
- Revenue share and ranking calculations

Add jobs module for async-ready task orchestration:
- POST /jobs for train/predict/backtest operations
- Job model with JSONB params/results
- Status transitions: pending → running → completed/failed

Integration:
- New settings: analytics_max_rows, jobs_retention_days
- Register routers in main.py
- Alembic migration for jobs table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update documentation for PRP-8 serving layer

Update README.md:
- Add dimensions, analytics, jobs modules to project structure
- Document new API endpoints with examples
- Add RFC 7807 error response documentation

Update docs/ARCHITECTURE.md:
- Mark serving layer section as implemented
- Add configuration settings for new modules
- Update roadmap with Phase-2 completion

Update docs/PHASE-index.md:
- Add Phase 7 (Serving Layer) as completed
- Update phase overview table
- Add version history entry

Create docs/PHASE/7-SERVING_LAYER.md:
- Comprehensive phase documentation
- API endpoint specifications
- Database schema and migration details
- Usage examples and test coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: fix ruff formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(serving-layer): improve analytics validation and jobs run_id handling

- Add validate_date_range helper to analytics routes for reusable date validation
- Apply date range validation to both get_kpis and get_drilldowns endpoints
- Fix total_revenue_all calculation to use full dataset before limiting
- Add run_id to train job result for downstream predict jobs
- Fix predict job to resolve run_id to model metadata from bundle
- Update test fixtures to use 32-char hex IDs per schema requirements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style: format jobs service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: restructure roadmap into modular three-phase architecture (INITIAL-9/10/11) (#47)

* docs: restructure INITIAL-9 into modular three-phase roadmap

Decompose monolithic INITIAL-9 into three specialized technical phases:

- INITIAL-9: RAG Knowledge Base ("The Memory")
  - pgvector + OpenAI embeddings
  - Markdown/OpenAPI-aware chunking
  - Semantic retrieval endpoints

- INITIAL-10: Agentic Layer ("The Brain")
  - PydanticAI agents (Experiment Orchestrator, RAG Assistant)
  - Tool orchestration with structured outputs
  - Human-in-the-loop approval workflow

- INITIAL-11: ForecastLab Dashboard ("The Face")
  - React 19 + Vite + shadcn/ui
  - TanStack Table/Query for data management
  - Recharts for time series visualization
  - Agent chat interface with streaming

Update PHASE-index.md and DAILY-FLOW.md to align with new structure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp): add PRP-9 RAG Knowledge Base implementation plan

Comprehensive PRP for INITIAL-9 RAG Knowledge Base feature:

- pgvector + SQLAlchemy 2.0 integration patterns
- Markdown-aware and OpenAPI-aware chunking
- Async OpenAI embeddings with batch processing
- HNSW index for cosine similarity search
- 15 ordered implementation tasks
- 5-level validation loop (syntax → types → unit → integration → smoke)
- Full ORM models and Pydantic schemas
- Known gotchas and anti-patterns documented

Confidence score: 8.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp): add PRP-10 Agentic Layer implementation plan

Comprehensive PRP for INITIAL-10 Agentic Layer feature:

- PydanticAI agent framework integration
- Experiment Orchestrator Agent (backtest → compare → deploy)
- RAG Assistant Agent (query → retrieve → answer with citations)
- Human-in-the-loop approval workflow for sensitive actions
- WebSocket streaming for real-time token delivery
- Session persistence with JSONB message history
- 17 ordered implementation tasks
- Tool definitions for registry, backtesting, forecasting, RAG
- Full Pydantic schemas and ORM models

Confidence score: 7.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prp): add PRP-11 ForecastLab Dashboard implementation plan (#48)

Comprehensive PRP for INITIAL-11 (The Face) with:
- 24 implementation tasks across 6 phases
- React 19 + Vite + shadcn/ui + TanStack Table/Query
- TypeScript types matching all backend API schemas
- Reusable DataTable with server-side pagination
- TimeSeriesChart component with Recharts
- WebSocket hook for agent chat streaming
- Complete documentation links and gotchas

Confidence score: 7.5/10 (chat depends on INITIAL-10)

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat(rag): implement PRP-9 RAG Knowledge Base with pgvector (#49)

* feat(rag): implement PRP-9 RAG Knowledge Base with pgvector

Add RAG (Retrieval-Augmented Generation) knowledge base feature for
semantic document indexing and retrieval using PostgreSQL pgvector.

Key components:
- Document indexing with markdown-aware and OpenAPI-aware chunking
- Semantic retrieval using cosine similarity with configurable thresholds
- Idempotent re-indexing via SHA-256 content hash comparison
- OpenAI text-embedding-3-small for embeddings (1536 dimensions)
- HNSW index for fast approximate nearest neighbor search

API endpoints:
- POST /rag/index - Index documents with automatic chunking
- POST /rag/retrieve - Semantic search with relevance scoring
- GET /rag/sources - List indexed sources with statistics
- DELETE /rag/sources/{source_id} - Remove source and chunks

Includes:
- ORM models: DocumentSource, DocumentChunk with Vector column
- Pydantic v2 schemas with strict validation
- 68 unit tests + 14 integration tests
- Migration for pgvector extension and RAG tables
- Examples and environment configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(rag): add Ollama embedding provider with OpenAI-compatible API

- Add EmbeddingProvider abstract base class with provider pattern
- Refactor existing OpenAI code to OpenAIEmbeddingProvider
- Add OllamaEmbeddingProvider using /v1/embeddings endpoint
  - Supports configurable dimensions parameter
  - Uses OpenAI-compatible response format
- Add config settings: rag_embedding_provider, ollama_base_url, ollama_embedding_model
- Add migration for dynamic embedding dimension support
- Update tests for both providers (25 tests)

Enables local/LAN embedding generation without OpenAI API dependency.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add Ollama embedding provider documentation

- Update .env.example with Ollama configuration options
- Add RAG Knowledge Base section to README with:
  - Embedding provider options (OpenAI/Ollama)
  - Example index and retrieve requests
  - Configuration examples for both providers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add Phase 8 RAG Knowledge Base documentation

- Create docs/PHASE/8-RAG_KNOWLEDGE_BASE.md with full phase details
- Update docs/PHASE-index.md:
  - Mark Phase 8 as Completed in overview table
  - Add Phase 8 summary to Completed Phases section
  - Add entry to Version History

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(ci): add RAG models import to alembic env and format tests

- Add rag models import to alembic/env.py for schema validation
- Format test_embeddings.py to pass ruff format check

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review issues for RAG module and docs

- Make migration deterministic by hardcoding dimension values instead
  of reading from environment (alembic migration)
- Add pyyaml dependency for YAML parsing in OpenAPI chunker
- Fix token count logging to capture original count before truncation
- Add path traversal protection to RAG service _read_content_from_path
  (mirrors registry/storage.py pattern)
- Fix markdown linting issues:
  - Add language tags to fenced code blocks (MD040)
  - Fix table pipe spacing (MD060)
- Fix index_docs.py to treat 200 same as 201 for idempotent responses
- Add test for path traversal protection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ci: add release-please branch trigger and wire workflow_dispatch ref (#52)

- Add 'release-please--branches--**' pattern to match actual release-please
  branch naming (e.g., release-please--branches--main--components--forecastlabai)
- Add 'ref' input to workflow_dispatch with proper type declaration
- Wire ref input to all checkout steps via CHECKOUT_REF env var
- Use inputs.ref || github.ref for predictable fallback behavior
- Update concurrency group to respect manual ref input

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* chore(main): release 0.2.2 (#51)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gabor Szabo <168316277+w7-mgfcode@users.noreply.github.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants