AI-powered code review and repository intelligence platform.
BugViper ingests your repositories into a Neo4j knowledge graph via Tree-sitter AST parsing, then sends a LangGraph-powered agent to review pull requests β finding bugs, security issues, and code quality problems with full codebase context. It also ships a Query interface for full-text and semantic code search, plus an AI chat agent that reasons directly over your graph.
Each indexed repository shows live stats derived directly from the Neo4j graph β file count, function count, class count, and import count. Repositories are indexed once and stay up to date via GitHub push webhooks.
Search any symbol name or keyword across the entire graph. Results are anchored to the exact source line with an inline peek viewer β expand up or down to read surrounding context without leaving the page.
When full-text isn't enough, semantic search embeds your query and returns results ranked by cosine similarity from Neo4j vector indexes. Useful for finding code by intent: "embedding model configuration" returns EmbeddingModelName, RoastResponse, and other conceptually related nodes at 73%, 69%, 68% similarity.
The Ask Agent page connects a ReAct LLM to your Neo4j graph. Ask natural language questions β the agent reasons across 13+ tool calls, cites source files, and shows the relevant code inline. Ask "What embedding do we use?" and it finds the embedder, explains the batch flow, and shows the actual Cypher query.
BugViper materialises your codebase as a property graph β 312 nodes and 336 relationships shown here for a single repository, spanning Function, Class, File, Module, Variable, and Repository node types. Explore it directly in Neo4j Browser or query it from the API.
When a PR is opened the BugViper bot posts a structured top-level comment with:
- Model used and actionable comment count
- Walkthrough table β every changed file and a one-line summary of what changed
- Impact Analysis and Positive Findings sections
Each issue is posted as an inline diff comment pinned to the exact line. Here the agent flagged a bare except Exception: that catches KeyboardInterrupt and SystemExit β severity Low, confidence 7/10 β and suggested a specific fix with a one-line code change you can commit directly from GitHub.
The same review run caught a Medium security issue: LLM error details (rate limits, model names, API keys) leaking into a user-facing response via str(e)[:100]. The agent suggested logging the error server-side and returning a clean fallback message, preventing accidental information disclosure.
When you add a repository, BugViper:
- Clones or downloads the repo
- Runs Tree-sitter parsers (17 languages) to produce ASTs
- Extracts
Function,Class,Variable,File,Module,Repositorynodes - Writes the graph to Neo4j with relationships:
CONTAINS,DEFINES,CALLS,IMPORTS,INHERITS - Calculates cyclomatic complexity at parse time for every function
- Optionally batch-embeds all nodes with
text-embedding-3-smallβ stores vectors in Neo4j vector indexes
GitHub
β
βΌ
Tree-sitter AST (17 languages)
β
Graph Builder ββββββββββΊ Neo4j (nodes + relationships)
β
Embedder (optional) βββββΊ Neo4j (vector indexes: 1536-dim cosine)
Neo4j's full-text search is backed by Apache Lucene β the same engine that powers Elasticsearch. BugViper creates two Lucene indexes at setup time:
| Index | Node types | Fields |
|---|---|---|
code_search |
Function, Class, Variable |
name, docstring, source_code |
file_content_search |
File |
source_code |
Two-tier search strategy (db/queries.py β search_code()):
User query
β
βββΊ Tier 1 β `code_search` Lucene index
β Simple identifiers β phrase search "parse_unified_diff"
β Special characters β AND-keywords token1 AND token2
β
βββΊ Tier 2 β fallback to `file_content_search` (if Tier 1 empty)
Searches raw file content line-by-line
Returns: path + line_number + matching line (no full source dump)
Searching parse_unified_diff hits the function node instantly by name. Searching "Authorization: Bearer" falls through to line-level file content search. Both paths return lean results; the Peek API (/code-finder/peek) then fetches a windowed view around any line on demand, keeping responses fast regardless of file size.
Lucene escaping is applied automatically: clean identifiers get phrase-quoted, anything with special characters is tokenised and joined with AND.
The review pipeline is a two-phase LangGraph graph (code_review_agent/agent/):
PR opened / comment trigger
β
βΌ
Build diff + context prompt
β
βββββββββΌβββββββββββββββββββββ
β Phase 1 β ReAct Explorer β
β LangGraph StateGraph β LLM + 19 Neo4j tools
β MAX_TOOL_ROUNDS = 6 β Stops deterministically
βββββββββββββ¬βββββββββββββββββ
β accumulated messages (diff + tool results)
βΌ
βββββββββββββΌβββββββββββββββββ
β Phase 2 β Synthesizer β
β Plain LLM call β JSON schema embedded in prompt
β Works on any OpenRouter β Robust JSON extraction (fence/prose/raw)
β model β
βββββββββββββ¬βββββββββββββββββ
β
Confidence filter β₯ 7 / 10
β
βΌ
Post inline GitHub comments
Phase 1 β ReAct Exploration
The agent receives the PR diff and iteratively calls tools against Neo4j to build context for the code under review. It is capped at 6 tool rounds using a tool_rounds counter in ReviewExplorerState β no reliance on LangGraph's recursion limit, so accumulated messages are always returned cleanly.
The agent has 19 tools:
| # | Tool | What it queries in Neo4j |
|---|---|---|
| 1 | search_code |
Lucene full-text across Function / Class / Variable / File |
| 2 | peek_code |
Line window from a file stored in the graph |
| 3 | semantic_search |
Vector similarity search (embeddings) |
| 4 | find_function |
Function node by exact or fuzzy name |
| 5 | find_class |
Class node by exact or fuzzy name |
| 6 | find_variable |
Variable by substring |
| 7 | find_by_content |
Symbol bodies containing a pattern |
| 8 | find_by_line |
Raw file content line-by-line |
| 9 | find_module |
Module / package and which files import it |
| 10 | find_imports |
Import statements referencing a module or alias |
| 11 | find_method_usages |
All callers of a function |
| 12 | find_callers |
Call chain tracing upstream |
| 13 | get_class_hierarchy |
Inheritance tree β parents and children |
| 14 | get_change_impact |
Blast radius: how many callers would break |
| 15 | get_complexity |
Cyclomatic complexity for a specific function |
| 16 | get_top_complex_functions |
Highest-risk functions in the repo |
| 17 | get_file_source |
Full file content from the graph |
| 18 | get_language_stats |
Per-language file / function / class counts |
| 19 | get_repo_stats |
Overall graph statistics |
Phase 2 β Structured Synthesis
After exploration, a second LLM call receives all accumulated messages plus a JSON schema embedded directly in the system prompt. The response is parsed robustly β handles code fences, prose wrapping, and raw JSON β so any model on OpenRouter works without needing structured-output API support.
Node types: Repository Β· File Β· Function Β· Class Β· Variable Β· Module
Relationships:
(Repository)-[:CONTAINS]βββΊ(File)
(File)-[:CONTAINS]ββββββββββΊ(Function | Class | Variable)
(File)-[:IMPORTS]βββββββββββΊ(Module)
(Class)-[:CONTAINS]βββββββββΊ(Function)
(Class)-[:INHERITS]βββββββββΊ(Class)
(Function)-[:CALLS]βββββββββΊ(Function)
| Component | Technology |
|---|---|
| API framework | FastAPI + Uvicorn |
| Package manager | uv |
| Database | Neo4j |
| Code parsing | Tree-sitter (17 languages) |
| AI / LLM | LangGraph + LangChain + OpenRouter |
| Embeddings | openai/text-embedding-3-small via OpenRouter |
| GitHub integration | PyGithub + GitHub App webhooks |
| Auth / user data | Firebase Admin SDK + Firestore |
| Observability | Logfire |
| Component | Technology |
|---|---|
| Framework | Next.js 16 (App Router) + React 19 |
| Language | TypeScript (strict mode) |
| Styling | TailwindCSS 4 + shadcn/ui (Radix primitives) |
| Icons | Lucide React |
api/ # FastAPI backend
βββ app.py # Entry point, CORS, router registration
βββ routers/
β βββ ingestion.py # POST /repository, /setup, /github
β βββ query.py # GET /search, /stats, /code-finder/*
β βββ repository.py # GET/DELETE repositories
β βββ webhook.py # POST /onPush, /onComment, /github
βββ services/
βββ review_service.py # PR review pipeline orchestration
βββ push_service.py # Incremental push handling
code_review_agent/ # LangGraph PR review agent
βββ agent/
β βββ review_graph.py # Phase 1: ReAct exploration graph
β βββ runner.py # Two-phase pipeline entry point
β βββ tools.py # 19 Neo4j query tools
β βββ prompts.py # System prompts
βββ models/
βββ agent_schemas.py # AgentFindings, ReviewResults, Issue
db/ # Neo4j database layer
βββ client.py # Connection management + retry
βββ ingestion.py # Graph ingestion service
βββ queries.py # CodeQueryService (search, stats, CRUD)
βββ schema.py # Constraints, Lucene indexes, CYPHER_QUERIES
ingestion/ # Code parsing & ingestion engine
βββ repo_ingestion_engine.py # Main orchestrator
βββ graph_builder.py # Graph construction from ASTs
βββ code_search.py # CodeFinder class
βββ languages/ # 17 per-language Tree-sitter parsers
common/ # Shared utilities
βββ embedder.py # Batch embedding via OpenRouter
βββ diff_parser.py # Unified diff parsing
βββ bugviper_firebase_service.py
frontend/ # Next.js 16 frontend
βββ app/(protected)/
β βββ query/ # Search + Analysis + CodeFinder + Review tabs
β βββ repositories/ # Repo management + ingestion
βββ lib/
βββ api.ts # All fetch wrappers
βββ auth-context.tsx
- Python 3.13+,
uv - Node.js 20+
- Neo4j (local or AuraDB)
- OpenRouter API key
uv sync
cp .env.example .env # fill in variables
uvicorn api.app:app --host 0.0.0.0 --port 8000 --reloadcd frontend && npm install && npm run dev # http://localhost:3000./start.sh # API + Frontend + Ngrok# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...
# LLM
OPENROUTER_API_KEY=...
REVIEW_MODEL=z-ai/glm-5 # any OpenRouter model
# GitHub App
GITHUB_APP_ID=...
GITHUB_PRIVATE_KEY_PATH=...
GITHUB_WEBHOOK_SECRET=...
# Firebase
SERVICE_FILE_LOC=path/to/service-account.json
# Optional
ENABLE_LOGFIRE=true
LOGFIRE_TOKEN=...
API_ALLOWED_ORIGINS=http://localhost:3000
INGESTION_SERVICE_URL= # empty = local; set = Cloud TasksCyclomatic complexity is stored on every Function node at ingestion time:
| Score | Risk |
|---|---|
| 1β5 | Simple |
| 6β10 | Moderate |
| 11β20 | Complex β refactor candidate |
| 20+ | High risk β bugs likely here |
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/ingest/repository |
Ingest a repository |
| POST | /api/v1/ingest/setup |
Init DB schema + indexes |
| GET | /api/v1/repos/ |
List all repositories |
| GET | /api/v1/query/search |
Full-text code search |
| GET | /api/v1/query/code-finder/function |
Find function by name |
| GET | /api/v1/query/code-finder/peek |
Peek lines around a file location |
| GET | /api/v1/query/code-finder/complexity/top |
Most complex functions |
| POST | /api/v1/query/diff-context |
Build RAG context for a diff |
| POST | /api/v1/webhook/github |
GitHub App webhook dispatcher |
Full API docs: /docs (Swagger) and /redoc when the server is running.
# Python
black . # format
ruff check . # lint
mypy . # type check
pytest # tests
pytest --cov # coverage
# Frontend
cd frontend && npm run lint && npm run build- Abstract review model β pluggable per-repo agent configs
- Improve incremental re-indexing on push
- Per-project CLAUDE.md / guidelines injected into review prompts
- Guardrails and output validation
- GitHub push, PR, and branch webhook coverage
- Auto-tag CLAUDE.md from ingested repo







