Vectorless RAG for Code Repositories
Navigate your codebase like a human expert β using LLM reasoning, not vector similarity.
Traditional RAG (Retrieval-Augmented Generation) for code has fundamental limitations:
| Problem | Description |
|---|---|
| β Vector similarity β Code relevance | "login" and "logout" have similar embeddings, but they're completely different! |
| β Chunking destroys structure | Splitting a class across chunks loses critical context |
| β Can't follow call chains | "Who calls this function?" is nearly impossible with vectors |
| β No architecture understanding | Vectors don't know that auth/ is for authentication |
CodeTree takes a different approach β it builds a hierarchical tree index of your codebase and uses LLM reasoning to navigate it, just like a human developer would:
- β AST-based parsing preserves code structure
- β LLM reasons about which files are relevant
- β Understands module relationships and dependencies
- β Can trace function calls across files
| Feature | Description |
|---|---|
| π« No Vector Database | Uses code structure + LLM reasoning instead of embedding similarity |
| π³ AST-Based Indexing | Parses actual code structure β functions, classes, imports, dependencies |
| π Cross-File Intelligence | Tracks imports, function calls, and dependencies across your entire codebase |
| π§ Reasoning-Based Retrieval | LLM navigates the code tree like a human expert |
| π¬ Natural Language Queries | Ask questions in plain English |
| π Privacy-First | Works with local models (Ollama). Your code never leaves your machine |
| Feature | Vector RAG | CodeTree |
|---|---|---|
| Understands code structure | β | β |
| Cross-file references | β | β |
| "Who calls this function?" | β | β |
| No chunking headaches | β | β |
| Explainable retrieval | β | β |
| Works offline | β | |
| No vector DB needed | β | β |
pip install codetree-ragOr from source:
git clone https://github.com/toller892/Oh-Code-Rag.git
cd Oh-Code-Rag
pip install -e .Set your LLM API key:
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."from codetree import CodeTree
# Index your repository
tree = CodeTree("/path/to/your/repo")
tree.build_index()
# Ask questions about the code
answer = tree.query("How does the authentication system work?")
print(answer)# Index a repository
codetree index /path/to/repo
# Query the codebase
codetree query "Where is database connection handled?"
# Interactive chat mode
codetree chat
# Show code structure
codetree tree
# Find symbol references
codetree find "UserService"Onboarding to New Codebases:
- "What's the overall architecture of this project?"
- "How do requests flow from API to database?"
- "Where should I add a new payment method?"
Code Review & Understanding:
- "What does the processOrder function do?"
- "Who calls the validateUser method?"
- "What happens if authentication fails?"
| Industry | Use Case | Example Query |
|---|---|---|
| FinTech | Audit & Compliance | "How is user data encrypted?" |
| Healthcare | Security Review | "Where is patient data accessed?" |
| E-commerce | Feature Development | "How does the cart system work?" |
| DevOps | Incident Response | "What services depend on Redis?" |
| Education | Code Learning | "Explain the MVC pattern in this app" |
- Legacy Code Migration: Understand old systems before rewriting
- Security Auditing: Find all database queries, API endpoints
- Documentation Generation: Auto-generate architecture docs
- Dependency Analysis: Map out service dependencies
Query:
from codetree import CodeTree
tree = CodeTree("./my-project")
tree.build_index()
answer = tree.query("What's the overall architecture? What are the core modules?")
print(answer)Output:
## Project Architecture
This project follows a modular architecture with these core components:
1. **CodeTree (core.py)** - Main entry point
- `build_index()`: Builds the code tree
- `query()`: Natural language queries
- `find()`: Symbol search
2. **CodeIndexer (indexer.py)** - Index construction
- Recursively parses directories
- Builds TreeNode hierarchy
3. **CodeParser (parser.py)** - AST parsing
- Supports Python, JS, Go, Rust, Java
- Extracts functions, classes, imports
4. **CodeRetriever (retriever.py)** - LLM-based retrieval
- Two-stage: retrieve β answer
- Uses reasoning prompts
## Data Flow
User Query β CodeTree β Retriever β LLM Reasoning β File Selection β Answer
Query:
refs = tree.find("authenticate")
print(refs)Output:
π Found 5 references to 'authenticate':
[function] src/auth/login.py:45 β authenticate
[function] src/auth/oauth.py:78 β authenticate_oauth
[import] src/api/middleware.py β from auth import authenticate
[import] src/api/routes.py β from auth.login import authenticate
[class] src/auth/base.py:12 β Authenticator
Query:
answer = tree.query("How does a user login request flow through the system?")
print(answer)Output:
## Login Request Flow
1. **Entry Point**: `src/api/routes.py`
- @app.post("/login") routes to auth_service.authenticate()
2. **Authentication**: `src/auth/service.py`
- Validates credentials against database
- Generates JWT token on success
3. **Database**: `src/db/users.py`
- get_user_by_email() fetches user record
- verify_password() checks hash
4. **Response**: Returns JWT token or 401 error
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CodeTree β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β CodeParser βββββββΆ CodeIndexer βββββββΆ CodeIndex (JSON) β
β (AST Parse) (Build Tree) (Store) β
β β β
β βΌ β
β Answer βββββββββββ Retrieve βββββββββββ CodeRetriever β
β (Markdown) (Read Files) (LLM Reasoning) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 1: Reasoning-Based Navigation
User: "How does authentication work?"
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM analyzes code tree structure: β
β β
β "Authentication relates to auth module... β
β Let me check src/auth/ directory... β
β login.py and oauth.py look relevant... β
β Also need to check who imports these..." β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Selected Files: [src/auth/login.py, src/auth/oauth.py, ...]
Stage 2: Answer Generation
Read selected files β Generate comprehensive answer with code snippets
| Language | Extensions | Status |
|---|---|---|
| Python | .py, .pyi |
β Full |
| JavaScript | .js, .jsx, .mjs |
β Full |
| TypeScript | .ts, .tsx |
β Full |
| Go | .go |
β Full |
| Rust | .rs |
β Full |
| Java | .java |
β Full |
| C/C++ | .c, .cpp, .h |
π§ Coming Soon |
Create .codetree.yaml in your project:
# LLM Configuration
llm:
provider: openai # openai, anthropic, ollama
model: gpt-4o
temperature: 0.0
max_tokens: 4096
# For local/private deployment
# llm:
# provider: ollama
# model: llama3
# base_url: http://localhost:11434
# Index Settings
index:
languages:
- python
- javascript
- typescript
- go
exclude:
- node_modules
- __pycache__
- .git
- venv
- dist
max_file_size: 100000 # Skip files larger than 100KB| Metric | Small Repo (<100 files) | Medium Repo (<1000 files) | Large Repo (<10000 files) |
|---|---|---|---|
| Index Time | < 5s | < 30s | < 5min |
| Index Size | < 100KB | < 1MB | < 10MB |
| Query Time | 2-5s | 3-8s | 5-15s |
Times depend on LLM provider latency
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas to contribute:
- π Add language parsers (C++, Ruby, PHP, etc.)
- π§ͺ Improve test coverage
- π Documentation and examples
- π Performance optimizations
- π¨ CLI improvements
CodeTree works as an MCP (Model Context Protocol) server, compatible with Claude Desktop, Cline, Continue, and other MCP clients.
pip install codetree-mcpAdd to your Claude Desktop config:
{
"mcpServers": {
"codetree": {
"command": "python",
"args": ["/path/to/Oh-Code-Rag/mcp/server.py"],
"env": {
"OPENAI_API_KEY": "sk-your-key-here"
}
}
}
}| Tool | Description |
|---|---|
codetree_index |
Index a repository |
codetree_query |
Ask questions about code |
codetree_tree |
Show code structure |
codetree_find |
Find symbol references |
codetree_stats |
Get repo statistics |
See mcp/README.md for full documentation.
CodeTree also comes as a Clawdbot skill for AI assistant integration.
pip install codetree-skillOr copy the skill/ folder to your Clawdbot skills directory:
cp -r skill/ ~/.clawdbot/skills/codetree/# Index a repo
./scripts/codetree.sh index /path/to/repo
# Query code
./scripts/codetree.sh query /path/to/repo "How does auth work?"
# Show structure
./scripts/codetree.sh tree /path/to/repo
# Find symbol
./scripts/codetree.sh find /path/to/repo "UserService"See skill/SKILL.md for full documentation.
MIT License - see LICENSE for details.
Inspired by PageIndex β vectorless RAG for documents.
If you find CodeTree useful, please give us a β!