Skip to content

Latest commit

 

History

History
281 lines (208 loc) · 8.97 KB

File metadata and controls

281 lines (208 loc) · 8.97 KB

Knowledge RAG

Local RAG MCP server for Claude Code. Hybrid search (semantic + BM25) over a personal document knowledge base using ChromaDB and Ollama embeddings.

Prerequisites

  • Python 3.11, 3.12, or 3.13. Python 3.14 and later are not supported due to unresolved ChromaDB compatibility issues.
  • Ollama with the nomic-embed-text model.
  • Claude Code.

Installation

macOS

brew install python@3.13 ollama
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Linux (Debian/Ubuntu)

sudo apt install python3.13 python3.13-venv
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows

Use WSL 2 and follow the Linux instructions above.

For native Windows, run install.ps1:

git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
.\install.ps1

MCP Configuration

Add the server to ~/.claude.json. Replace /path/to/knowledge-rag with the actual path.

macOS / Linux:

{
  "mcpServers": {
    "knowledge-rag": {
      "type": "stdio",
      "command": "/path/to/knowledge-rag/venv/bin/python3",
      "args": ["-m", "mcp_server.server"],
      "env": {
        "PYTHONUNBUFFERED": "1",
        "PYTHONPATH": "/path/to/knowledge-rag",
        "ANONYMIZED_TELEMETRY": "False"
      }
    }
  }
}

Windows (native):

{
  "mcpServers": {
    "knowledge-rag": {
      "type": "stdio",
      "command": "cmd",
      "args": ["/c", "cd /d C:\\path\\to\\knowledge-rag && .\\venv\\Scripts\\python.exe -m mcp_server.server"],
      "env": {
        "PYTHONUNBUFFERED": "1",
        "ANONYMIZED_TELEMETRY": "False"
      }
    }
  }
}

Environment variables:

Variable Purpose
PYTHONUNBUFFERED Disables stdout buffering. Required -- without it, JSON-RPC messages may not flush in time.
PYTHONPATH Module search path. Required on macOS/Linux when using direct venv Python without cd.
ANONYMIZED_TELEMETRY Disables ChromaDB telemetry. Optional.

Restart Claude Code after editing the configuration.

Usage

Place documents in documents/, organized by category subdirectories. Each subdirectory name becomes a category. New categories are created automatically.

documents/
├── laravel/
│   └── eloquent-tips.md
├── docker/
│   └── compose-patterns.md
├── security/
│   ├── redteam/
│   └── blueteam/
└── general/
    └── notes.txt

Supported formats: .md, .txt, .pdf, .py, .json.

Documents are indexed automatically on server startup when the index is empty. Use reindex_documents to rebuild.

MCP Tools

Tool Description
search_knowledge Hybrid semantic + BM25 search
get_document Retrieve full document content
save_document Save a new document and index it
reindex_documents Rebuild the search index
list_categories List categories with document counts
list_documents List indexed documents
get_index_stats Index statistics

search_knowledge

Parameter Type Default Description
query string required Search query text
max_results int 5 Maximum results to return (1--20)
category string null Category filter
hybrid_alpha float 0.5 Search balance: 0.0 = keyword only, 1.0 = semantic only

get_document

Parameter Type Description
filepath string Path to the document file

save_document

Parameter Type Default Description
title string required Document title (used as filename)
content string required Document content in markdown format
category string "general" Category subdirectory; new categories are auto-created

reindex_documents

Parameter Type Default Description
force bool false Clear existing index and rebuild from scratch

list_categories

No parameters.

list_documents

Parameter Type Description
category string Optional category filter

get_index_stats

No parameters.

Search Tuning

hybrid_alpha Behavior Use case
0.0 Pure BM25 keyword search Exact terms, CVE IDs, tool names
0.3 Keyword-heavy hybrid Technical queries with specific terms
0.5 Balanced (default) General queries
0.7 Semantic-heavy hybrid Conceptual queries, related topics
1.0 Pure semantic search "How to..." questions, understanding intent

Keyword routing runs before search. When query terms match configured keyword routes (word-boundary regex matching), results are filtered to the matching category. When multiple keywords match different categories, each category is scored by match count and the highest-scoring category wins.

How It Works

The search pipeline has four stages. First, keyword routing checks the query against configured routes using word-boundary regex. If a route matches, search is scoped to that category. Single-word routes use \b boundaries to prevent false positives (e.g., "api" does not match "RAPID"). Multi-word phrases use exact substring matching.

Second, ChromaDB performs vector similarity search using Ollama nomic-embed-text embeddings (768 dimensions). Third, the BM25 index performs exact term matching via the rank-bm25 library. The BM25 index is built lazily from ChromaDB data on the first query.

Fourth, Reciprocal Rank Fusion (RRF) with k=60 combines both rankings. Each result receives a weighted score: hybrid_alpha * 1/(k + semantic_rank) + (1 - hybrid_alpha) * 1/(k + bm25_rank). Results found by both methods are marked "hybrid" in output. Results from only one method are marked "semantic" or "keyword".

Documents are chunked at 1000 characters with 200-character overlap, breaking at paragraph, sentence, or word boundaries. Embeddings are generated in parallel using a ThreadPoolExecutor with 4 workers.

Configuration

Key settings in mcp_server/config.py:

Setting Default Description
chunk_size 1000 Characters per chunk
chunk_overlap 200 Overlap between consecutive chunks
ollama_model nomic-embed-text Ollama embedding model name
ollama_base_url http://localhost:11434 Ollama API endpoint
collection_name knowledge_base ChromaDB collection name
default_results 5 Default search result count
max_results 20 Maximum allowed search results

Keyword routes and category aliases are also defined in config.py. Add new routes to the keyword_routes dict. Add nested path aliases to category_aliases (e.g., "security/redteam": "redteam" maps the nested directory to a flat category name).

Project Structure

knowledge-rag/
├── mcp_server/
│   ├── __init__.py          # Version, exports
│   ├── config.py            # Settings, keyword routes, category aliases
│   ├── ingestion.py         # Document parsing, chunking
│   └── server.py            # MCP tools, ChromaDB, BM25, search engine
├── documents/               # Document storage (by category subdirectory)
├── data/
│   ├── chroma_db/           # ChromaDB vector database
│   └── index_metadata.json  # Index state cache
├── install.ps1              # Windows installer
├── requirements.txt         # Python dependencies
├── CHANGELOG.md
├── LICENSE
└── README.md

Troubleshooting

Ollama not running. Start with ollama serve. Verify connectivity:

curl http://localhost:11434/api/tags

Wrong Python version. Python 3.14 and later are not supported. Check the current version:

python3 --version

To target a specific version when creating a venv:

python3.13 -m venv venv

Empty search results. Confirm documents exist in documents/. Rebuild the index:

reindex_documents(force=true)

MCP server not loading. Verify ~/.claude.json is valid JSON. Check that the command path points to the correct venv Python. Run claude mcp list to confirm the connection. On macOS and Linux, ensure venv/bin/python has execute permission.

ModuleNotFoundError. The MCP configuration must use the venv Python, not the system Python. Activate the venv and install dependencies:

source venv/bin/activate
pip install -r requirements.txt

License

MIT License. See LICENSE.

Authors

Original author: Ailton Rocha (Lyon). Fork maintainer: Andrey Mishchenko.

Version 3.0.0.