Skip to content

MabudAlam/BugViper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

97 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BugViper

Status Python Next.js Neo4j

AI-powered code review and repository intelligence platform.

BugViper ingests your repositories into a Neo4j knowledge graph via Tree-sitter AST parsing, then sends a LangGraph-powered agent to review pull requests β€” finding bugs, security issues, and code quality problems with full codebase context. It also ships a Query interface for full-text and semantic code search, plus an AI chat agent that reasons directly over your graph.


Screenshots

Repository Dashboard

Repository Dashboard

Each indexed repository shows live stats derived directly from the Neo4j graph β€” file count, function count, class count, and import count. Repositories are indexed once and stay up to date via GitHub push webhooks.


Full-Text Code Search

Full Text Search

Search any symbol name or keyword across the entire graph. Results are anchored to the exact source line with an inline peek viewer β€” expand up or down to read surrounding context without leaving the page.


Semantic Code Search

Semantic Search

When full-text isn't enough, semantic search embeds your query and returns results ranked by cosine similarity from Neo4j vector indexes. Useful for finding code by intent: "embedding model configuration" returns EmbeddingModelName, RoastResponse, and other conceptually related nodes at 73%, 69%, 68% similarity.


Ask Agent β€” AI Chat Over Your Codebase

Ask Agent Chat

The Ask Agent page connects a ReAct LLM to your Neo4j graph. Ask natural language questions β€” the agent reasons across 13+ tool calls, cites source files, and shows the relevant code inline. Ask "What embedding do we use?" and it finds the embedder, explains the batch flow, and shows the actual Cypher query.


The Knowledge Graph

Neo4j Knowledge Graph

BugViper materialises your codebase as a property graph β€” 312 nodes and 336 relationships shown here for a single repository, spanning Function, Class, File, Module, Variable, and Repository node types. Explore it directly in Neo4j Browser or query it from the API.


PR Review β€” Summary & Walkthrough

PR Review Summary

When a PR is opened the BugViper bot posts a structured top-level comment with:

  • Model used and actionable comment count
  • Walkthrough table β€” every changed file and a one-line summary of what changed
  • Impact Analysis and Positive Findings sections

PR Review β€” Inline Bug Comment

Inline Bug Comment

Each issue is posted as an inline diff comment pinned to the exact line. Here the agent flagged a bare except Exception: that catches KeyboardInterrupt and SystemExit β€” severity Low, confidence 7/10 β€” and suggested a specific fix with a one-line code change you can commit directly from GitHub.


PR Review β€” Inline Security Comment

Inline Security Comment

The same review run caught a Medium security issue: LLM error details (rate limits, model names, API keys) leaking into a user-facing response via str(e)[:100]. The agent suggested logging the error server-side and returning a clean fallback message, preventing accidental information disclosure.


How It Works

1. Ingestion β€” Building the Knowledge Graph

When you add a repository, BugViper:

  1. Clones or downloads the repo
  2. Runs Tree-sitter parsers (17 languages) to produce ASTs
  3. Extracts Function, Class, Variable, File, Module, Repository nodes
  4. Writes the graph to Neo4j with relationships: CONTAINS, DEFINES, CALLS, IMPORTS, INHERITS
  5. Calculates cyclomatic complexity at parse time for every function
  6. Optionally batch-embeds all nodes with text-embedding-3-small β†’ stores vectors in Neo4j vector indexes
    GitHub 
       β”‚
       β–Ό
  Tree-sitter AST (17 languages)
       β”‚
  Graph Builder ─────────► Neo4j  (nodes + relationships)
       β”‚
  Embedder (optional) ────► Neo4j  (vector indexes: 1536-dim cosine)

2. Full-Text Search β€” Apache Lucene inside Neo4j

Neo4j's full-text search is backed by Apache Lucene β€” the same engine that powers Elasticsearch. BugViper creates two Lucene indexes at setup time:

Index Node types Fields
code_search Function, Class, Variable name, docstring, source_code
file_content_search File source_code

Two-tier search strategy (db/queries.py β†’ search_code()):

User query
    β”‚
    β”œβ”€β–Ί Tier 1 β€” `code_search` Lucene index
    β”‚       Simple identifiers  β†’  phrase search   "parse_unified_diff"
    β”‚       Special characters  β†’  AND-keywords    token1 AND token2
    β”‚
    └─► Tier 2 β€” fallback to `file_content_search`  (if Tier 1 empty)
            Searches raw file content line-by-line
            Returns: path + line_number + matching line  (no full source dump)

Searching parse_unified_diff hits the function node instantly by name. Searching "Authorization: Bearer" falls through to line-level file content search. Both paths return lean results; the Peek API (/code-finder/peek) then fetches a windowed view around any line on demand, keeping responses fast regardless of file size.

Lucene escaping is applied automatically: clean identifiers get phrase-quoted, anything with special characters is tokenised and joined with AND.


3. The PR Review Agent β€” How It Finds Issues

The review pipeline is a two-phase LangGraph graph (code_review_agent/agent/):

PR opened / comment trigger
          β”‚
          β–Ό
   Build diff + context prompt
          β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Phase 1 β€” ReAct Explorer  β”‚
  β”‚  LangGraph StateGraph       β”‚  LLM + 19 Neo4j tools
  β”‚  MAX_TOOL_ROUNDS = 6        β”‚  Stops deterministically
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚  accumulated messages (diff + tool results)
              β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Phase 2 β€” Synthesizer     β”‚
  β”‚  Plain LLM call            β”‚  JSON schema embedded in prompt
  β”‚  Works on any OpenRouter   β”‚  Robust JSON extraction (fence/prose/raw)
  β”‚  model                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
    Confidence filter  β‰₯ 7 / 10
              β”‚
              β–Ό
   Post inline GitHub comments

Phase 1 β€” ReAct Exploration

The agent receives the PR diff and iteratively calls tools against Neo4j to build context for the code under review. It is capped at 6 tool rounds using a tool_rounds counter in ReviewExplorerState β€” no reliance on LangGraph's recursion limit, so accumulated messages are always returned cleanly.

The agent has 19 tools:

# Tool What it queries in Neo4j
1 search_code Lucene full-text across Function / Class / Variable / File
2 peek_code Line window from a file stored in the graph
3 semantic_search Vector similarity search (embeddings)
4 find_function Function node by exact or fuzzy name
5 find_class Class node by exact or fuzzy name
6 find_variable Variable by substring
7 find_by_content Symbol bodies containing a pattern
8 find_by_line Raw file content line-by-line
9 find_module Module / package and which files import it
10 find_imports Import statements referencing a module or alias
11 find_method_usages All callers of a function
12 find_callers Call chain tracing upstream
13 get_class_hierarchy Inheritance tree β€” parents and children
14 get_change_impact Blast radius: how many callers would break
15 get_complexity Cyclomatic complexity for a specific function
16 get_top_complex_functions Highest-risk functions in the repo
17 get_file_source Full file content from the graph
18 get_language_stats Per-language file / function / class counts
19 get_repo_stats Overall graph statistics

Phase 2 β€” Structured Synthesis

After exploration, a second LLM call receives all accumulated messages plus a JSON schema embedded directly in the system prompt. The response is parsed robustly β€” handles code fences, prose wrapping, and raw JSON β€” so any model on OpenRouter works without needing structured-output API support.


Graph Schema

Node types: Repository Β· File Β· Function Β· Class Β· Variable Β· Module

Relationships:

(Repository)-[:CONTAINS]──►(File)
(File)-[:CONTAINS]─────────►(Function | Class | Variable)
(File)-[:IMPORTS]──────────►(Module)
(Class)-[:CONTAINS]────────►(Function)
(Class)-[:INHERITS]────────►(Class)
(Function)-[:CALLS]────────►(Function)

Tech Stack

Backend

Component Technology
API framework FastAPI + Uvicorn
Package manager uv
Database Neo4j
Code parsing Tree-sitter (17 languages)
AI / LLM LangGraph + LangChain + OpenRouter
Embeddings openai/text-embedding-3-small via OpenRouter
GitHub integration PyGithub + GitHub App webhooks
Auth / user data Firebase Admin SDK + Firestore
Observability Logfire

Frontend

Component Technology
Framework Next.js 16 (App Router) + React 19
Language TypeScript (strict mode)
Styling TailwindCSS 4 + shadcn/ui (Radix primitives)
Icons Lucide React

Project Structure

api/                         # FastAPI backend
β”œβ”€β”€ app.py                   # Entry point, CORS, router registration
β”œβ”€β”€ routers/
β”‚   β”œβ”€β”€ ingestion.py         # POST /repository, /setup, /github
β”‚   β”œβ”€β”€ query.py             # GET /search, /stats, /code-finder/*
β”‚   β”œβ”€β”€ repository.py        # GET/DELETE repositories
β”‚   └── webhook.py           # POST /onPush, /onComment, /github
└── services/
    β”œβ”€β”€ review_service.py    # PR review pipeline orchestration
    └── push_service.py      # Incremental push handling

code_review_agent/           # LangGraph PR review agent
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ review_graph.py      # Phase 1: ReAct exploration graph
β”‚   β”œβ”€β”€ runner.py            # Two-phase pipeline entry point
β”‚   β”œβ”€β”€ tools.py             # 19 Neo4j query tools
β”‚   └── prompts.py           # System prompts
└── models/
    └── agent_schemas.py     # AgentFindings, ReviewResults, Issue

db/                          # Neo4j database layer
β”œβ”€β”€ client.py                # Connection management + retry
β”œβ”€β”€ ingestion.py             # Graph ingestion service
β”œβ”€β”€ queries.py               # CodeQueryService (search, stats, CRUD)
└── schema.py                # Constraints, Lucene indexes, CYPHER_QUERIES

ingestion/                   # Code parsing & ingestion engine
β”œβ”€β”€ repo_ingestion_engine.py # Main orchestrator
β”œβ”€β”€ graph_builder.py         # Graph construction from ASTs
β”œβ”€β”€ code_search.py           # CodeFinder class
└── languages/               # 17 per-language Tree-sitter parsers

common/                      # Shared utilities
β”œβ”€β”€ embedder.py              # Batch embedding via OpenRouter
β”œβ”€β”€ diff_parser.py           # Unified diff parsing
└── bugviper_firebase_service.py

frontend/                    # Next.js 16 frontend
β”œβ”€β”€ app/(protected)/
β”‚   β”œβ”€β”€ query/               # Search + Analysis + CodeFinder + Review tabs
β”‚   └── repositories/        # Repo management + ingestion
└── lib/
    β”œβ”€β”€ api.ts               # All fetch wrappers
    └── auth-context.tsx

Quick Start

Prerequisites

  • Python 3.13+, uv
  • Node.js 20+
  • Neo4j (local or AuraDB)
  • OpenRouter API key

Backend

uv sync
cp .env.example .env   # fill in variables
uvicorn api.app:app --host 0.0.0.0 --port 8000 --reload

Frontend

cd frontend && npm install && npm run dev   # http://localhost:3000

All-in-one

./start.sh    # API + Frontend + Ngrok

Environment Variables

# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...

# LLM
OPENROUTER_API_KEY=...
REVIEW_MODEL=z-ai/glm-5        # any OpenRouter model

# GitHub App
GITHUB_APP_ID=...
GITHUB_PRIVATE_KEY_PATH=...
GITHUB_WEBHOOK_SECRET=...

# Firebase
SERVICE_FILE_LOC=path/to/service-account.json

# Optional
ENABLE_LOGFIRE=true
LOGFIRE_TOKEN=...
API_ALLOWED_ORIGINS=http://localhost:3000
INGESTION_SERVICE_URL=         # empty = local; set = Cloud Tasks

Cyclomatic complexity is stored on every Function node at ingestion time:

Score Risk
1–5 Simple
6–10 Moderate
11–20 Complex β€” refactor candidate
20+ High risk β€” bugs likely here

Key API Endpoints

Method Path Description
POST /api/v1/ingest/repository Ingest a repository
POST /api/v1/ingest/setup Init DB schema + indexes
GET /api/v1/repos/ List all repositories
GET /api/v1/query/search Full-text code search
GET /api/v1/query/code-finder/function Find function by name
GET /api/v1/query/code-finder/peek Peek lines around a file location
GET /api/v1/query/code-finder/complexity/top Most complex functions
POST /api/v1/query/diff-context Build RAG context for a diff
POST /api/v1/webhook/github GitHub App webhook dispatcher

Full API docs: /docs (Swagger) and /redoc when the server is running.


Development

# Python
black .          # format
ruff check .     # lint
mypy .           # type check
pytest           # tests
pytest --cov     # coverage

# Frontend
cd frontend && npm run lint && npm run build

Roadmap

  • Abstract review model β€” pluggable per-repo agent configs
  • Improve incremental re-indexing on push
  • Per-project CLAUDE.md / guidelines injected into review prompts
  • Guardrails and output validation
  • GitHub push, PR, and branch webhook coverage
  • Auto-tag CLAUDE.md from ingested repo

About

Graph Based Code Indexing and Code Reviews 🐍

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors