Skip to content

RAG Corpus Manager (MVP): one-click bulk-index of bundled project docs #214

@w7-mgfcode

Description

@w7-mgfcode

Problem

A fresh ForecastLabAI install has an empty RAG corpus (0 sources / 0 chunks). The RAG Assistant agent can cite nothing and the Knowledge page is a permanent empty state — despite the repo bundling ~115 markdown files under docs/, PRPs/, and the root.

Today the only ways to populate the corpus are (a) POST /rag/index once per file (~115 calls, each path pasted by hand) or (b) the seeder's synthetic 3-document scenario (throwaway test prose, not the real docs).

Proposed change (MVP)

Add one additive orchestration endpoint — POST /rag/index/project-docs — that discovers the bundled markdown under docs/**, PRPs/**, and a fixed root allow-list (README.md, AGENTS.md, CHANGELOG.md) and indexes each through the existing RAGService.index_document path (reusing its chunking, embedding, SHA-256 content-hash idempotency, and upsert). The Admin → RAG Sources tab gets an Index Project Docs button.

  • No Alembic migration (no schema change — category rides in the existing DocumentSource.metadata_ JSONB).
  • No new slice, no .env var, no app/main.py change.
  • Idempotent — re-runs return every file unchanged.

Scope

MVP only. Stale-detection, re-index, chunk-preview, and the Knowledge source-type filter are deferred to a follow-up ("Full Version").

Spec: docs/optional-features/01-rag-corpus-manager.md. Execution plan: PRPs/PRP-23-rag-corpus-manager.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions