GraphRAG Knowledge Graph Web App — Upload documents, automatically build knowledge graphs, visually explore, and chat with AI.
Document Parsing → Knowledge Graph Construction → Visual Exploration → AI Chat
- 📄 Document Parsing — Upload and automatically parse Markdown, PDF, and TXT documents
- 🧠 Knowledge Graph Construction — Extract entities and relationships from documents using LLM, automatically building a structured knowledge graph
- 🌐 Interactive Visualization — 2D force-directed graph powered by Reagraph with click, search, filter, and expand interactions
- 💬 GraphRAG Chat — Enhanced conversational AI using knowledge graph + vector retrieval, with accurate answers and source citations
- 🔌 Split Storage Architecture — PostgreSQL + pgvector as source of truth, Neo4j as graph projection
| Layer | Technology |
|---|---|
| Frontend | React 19 + TypeScript + Vite |
| UI | Radix UI + Vanilla CSS + Framer Motion |
| Visualization | Reagraph (2D Force Layout) |
| State Management | Zustand |
| Backend | Node.js + Express + TypeScript |
| LLM | Gemini / Qwen / OpenAI (OpenAI-compatible) |
| Primary Storage | PostgreSQL 15+ + pgvector |
| Graph Database | Neo4j 5.x (graph traversal and subgraph query) |
| Monorepo | pnpm workspace |
graphen/
├── packages/
│ ├── frontend/ # Frontend (React + Vite)
│ ├── backend/ # Backend (Express + TypeScript)
│ └── shared/ # Shared type definitions
├── cases/ # Sample test cases
├── data/ # Local runtime data (.gitignore)
├── .env.example # Environment variable template
├── pnpm-workspace.yaml # pnpm workspace config
└── tsconfig.base.json # Shared TypeScript config
| Tool | Version | Notes |
|---|---|---|
| Node.js | ≥ 18.x | LTS version recommended (20.x / 22.x) |
| pnpm | ≥ 10.0 | Package manager (npm install -g pnpm) |
| PostgreSQL | ≥ 15.x | Primary database (requires pgvector extension) |
| Neo4j | ≥ 5.x | Graph database (local install or Neo4j Aura Free Tier) |
| LLM API Key | — | API key for Gemini, Qwen, or OpenAI |
Choose one of the following methods:
Option A: Local Installation (Recommended)
Download Neo4j Community Edition from neo4j.com, then start:
# macOS (Homebrew)
brew install neo4j
neo4j startDefault address: bolt://localhost:7687, initial username/password: neo4j/neo4j (you'll be prompted to change the password on first login).
Option B: Neo4j Aura Free Tier (No Installation)
Visit Neo4j Aura to create a free instance and obtain the connection URI and password.
git clone <repo-url>
cd graphen
# Install all dependencies (frontend + backend + shared)
pnpm installcp .env.example .envEdit the .env file with the required settings (see Configuration below):
# ⚠️ Required
LLM_PROVIDER=gemini # or qwen, openai
GEMINI_API_KEY=sk-xxxxxxxxxxxxxxxx
NEO4J_PASSWORD=your-neo4j-password
PG_HOST=localhost
PG_DATABASE=graphen
PG_USER=graphen
# Other settings have default values, adjust as neededIf PG_AUTO_BOOTSTRAP=true (default), backend startup will automatically ensure the configured
PG_USER and PG_DATABASE exist, then initialize the memory schema. It uses
PG_BOOTSTRAP_DATABASE + PG_BOOTSTRAP_USER (or PGUSER/USER fallback) for admin connection.
# Start frontend and backend
pnpm devOnce started, visit:
| Service | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:3001 |
| Health Check | http://localhost:3001/api/health |
💡 Both frontend and backend support Hot Module Replacement (HMR) — changes are reflected immediately.
All configuration is managed via the .env file in the project root (based on the .env.example template).
| Variable | Default | Description |
|---|---|---|
NODE_ENV |
development |
Runtime environment |
PORT |
3001 |
Backend server port |
CORS_ORIGIN |
http://localhost:5173 |
Allowed frontend CORS origin |
LOG_LEVEL |
info |
Log level (debug / info / warn / error) |
GRAPH_SYNC_ENABLED |
true |
Enable/disable GraphSyncWorker |
RUNTIME_PG_REQUIRED |
true |
When NODE_ENV=test, allow non-PG runtime mode if set to false |
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
gemini |
LLM provider (gemini, qwen, openai) |
GEMINI_API_KEY |
(required) | Gemini API key |
QWEN_API_KEY |
(required) | Qwen API key |
OPENAI_API_KEY |
(required) | OpenAI API key |
EMBEDDING_DIMENSIONS |
1024 |
Embedding vector dimensions (must match the selected model) |
💡 For additional settings like
BASE_URLandMODEL_NAME, refer to.env.example.
| Variable | Default | Description |
|---|---|---|
LLM_MAX_CONCURRENT |
5 |
Max concurrent LLM requests |
LLM_MAX_RETRIES |
3 |
Max retries on request failure |
LLM_RETRY_DELAY_MS |
1000 |
Initial retry delay in ms (exponential backoff) |
LLM_REQUESTS_PER_MINUTE |
30 |
Max requests per minute |
LLM_TIMEOUT_MS |
120000 |
Per-request timeout in ms |
| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://localhost:7687 |
Neo4j connection URI |
NEO4J_USER |
neo4j |
Database username |
NEO4J_PASSWORD |
(required) | Database password |
NEO4J_DATABASE |
neo4j |
Database name |
| Variable | Default | Description |
|---|---|---|
PG_HOST |
localhost |
PostgreSQL host |
PG_PORT |
5432 |
PostgreSQL port |
PG_DATABASE |
graphen |
PostgreSQL database name |
PG_USER |
graphen |
PostgreSQL user |
PG_PASSWORD |
"" |
PostgreSQL password |
PG_MAX_CONNECTIONS |
20 |
Connection pool max size |
PG_AUTO_BOOTSTRAP |
true |
Auto-create runtime role/database and initialize memory schema at startup |
PG_BOOTSTRAP_DATABASE |
postgres |
Admin connection database used for bootstrap |
PG_BOOTSTRAP_USER |
"" |
Admin user for bootstrap (PGUSER/USER fallback when empty) |
PG_BOOTSTRAP_PASSWORD |
"" |
Admin password for bootstrap |
PG_VECTOR_HNSW_M |
16 |
pgvector HNSW m parameter |
PG_VECTOR_HNSW_EF_CONSTRUCTION |
200 |
pgvector HNSW build parameter |
PG_VECTOR_EF_SEARCH |
64 |
pgvector query-time ef_search |
| Variable | Default | Description |
|---|---|---|
MAX_UPLOAD_SIZE |
52428800 |
Max file upload size in bytes (default 50 MB) |
CHUNK_SIZE |
1500 |
Text chunk size (tokens) |
CHUNK_OVERLAP |
200 |
Chunk overlap length (tokens) |
MAX_CHUNKS_PER_DOCUMENT |
500 |
Max chunks per document |
MAX_DOCUMENT_ESTIMATED_TOKENS |
500000 |
Max estimated tokens per document |
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_WINDOW_MS |
60000 |
Rate limit time window in ms |
RATE_LIMIT_MAX |
100 |
Max requests per window |
| Variable | Default | Description |
|---|---|---|
CACHE_DIR |
data/cache |
Parsing intermediate result cache directory |
The project provides sample test cases in the /cases directory for quick experimentation:
- Simple Finance: A set of finance-related Markdown documents (relationship identification, market events, transaction networks), suitable for testing graph extraction accuracy and connectivity.
You can upload these files directly into the app to observe the knowledge graph being built.
Run from the project root:
# Start dev server (frontend + backend in parallel)
pnpm dev
# Build all packages
pnpm build
# Type checking
pnpm typecheckOperate on individual packages:
# Frontend only
pnpm --filter @graphen/frontend dev
# Backend only
pnpm --filter @graphen/backend dev
# Run backend tests
pnpm --filter @graphen/backend test
# Run backend integration tests
pnpm --filter @graphen/backend test:integrationMIT LICENSE