Skip to content

iamlinjunhong/graphen

Repository files navigation

Graphen

GraphRAG Knowledge Graph Web App — Upload documents, automatically build knowledge graphs, visually explore, and chat with AI.

Document ParsingKnowledge Graph ConstructionVisual ExplorationAI Chat


✨ Features

  • 📄 Document Parsing — Upload and automatically parse Markdown, PDF, and TXT documents
  • 🧠 Knowledge Graph Construction — Extract entities and relationships from documents using LLM, automatically building a structured knowledge graph
  • 🌐 Interactive Visualization — 2D force-directed graph powered by Reagraph with click, search, filter, and expand interactions
  • 💬 GraphRAG Chat — Enhanced conversational AI using knowledge graph + vector retrieval, with accurate answers and source citations
  • 🔌 Split Storage Architecture — PostgreSQL + pgvector as source of truth, Neo4j as graph projection

🏗️ Tech Stack

Layer Technology
Frontend React 19 + TypeScript + Vite
UI Radix UI + Vanilla CSS + Framer Motion
Visualization Reagraph (2D Force Layout)
State Management Zustand
Backend Node.js + Express + TypeScript
LLM Gemini / Qwen / OpenAI (OpenAI-compatible)
Primary Storage PostgreSQL 15+ + pgvector
Graph Database Neo4j 5.x (graph traversal and subgraph query)
Monorepo pnpm workspace

📁 Project Structure

graphen/
├── packages/
│   ├── frontend/          # Frontend (React + Vite)
│   ├── backend/           # Backend (Express + TypeScript)
│   └── shared/            # Shared type definitions
├── cases/                 # Sample test cases
├── data/                  # Local runtime data (.gitignore)
├── .env.example           # Environment variable template
├── pnpm-workspace.yaml    # pnpm workspace config
└── tsconfig.base.json     # Shared TypeScript config

🚀 Getting Started

1. Prerequisites

Tool Version Notes
Node.js ≥ 18.x LTS version recommended (20.x / 22.x)
pnpm ≥ 10.0 Package manager (npm install -g pnpm)
PostgreSQL ≥ 15.x Primary database (requires pgvector extension)
Neo4j ≥ 5.x Graph database (local install or Neo4j Aura Free Tier)
LLM API Key API key for Gemini, Qwen, or OpenAI

2. Install Neo4j

Choose one of the following methods:

Option A: Local Installation (Recommended)

Download Neo4j Community Edition from neo4j.com, then start:

# macOS (Homebrew)
brew install neo4j
neo4j start

Default address: bolt://localhost:7687, initial username/password: neo4j/neo4j (you'll be prompted to change the password on first login).

Option B: Neo4j Aura Free Tier (No Installation)

Visit Neo4j Aura to create a free instance and obtain the connection URI and password.

3. Clone & Install Dependencies

git clone <repo-url>
cd graphen

# Install all dependencies (frontend + backend + shared)
pnpm install

4. Configure Environment Variables

cp .env.example .env

Edit the .env file with the required settings (see Configuration below):

# ⚠️ Required
LLM_PROVIDER=gemini # or qwen, openai
GEMINI_API_KEY=sk-xxxxxxxxxxxxxxxx
NEO4J_PASSWORD=your-neo4j-password
PG_HOST=localhost
PG_DATABASE=graphen
PG_USER=graphen

# Other settings have default values, adjust as needed

If PG_AUTO_BOOTSTRAP=true (default), backend startup will automatically ensure the configured PG_USER and PG_DATABASE exist, then initialize the memory schema. It uses PG_BOOTSTRAP_DATABASE + PG_BOOTSTRAP_USER (or PGUSER/USER fallback) for admin connection.

5. Start the Development Server

# Start frontend and backend
pnpm dev

Once started, visit:

Service URL
Frontend http://localhost:5173
Backend API http://localhost:3001
Health Check http://localhost:3001/api/health

💡 Both frontend and backend support Hot Module Replacement (HMR) — changes are reflected immediately.


⚙️ Configuration

All configuration is managed via the .env file in the project root (based on the .env.example template).

General

Variable Default Description
NODE_ENV development Runtime environment
PORT 3001 Backend server port
CORS_ORIGIN http://localhost:5173 Allowed frontend CORS origin
LOG_LEVEL info Log level (debug / info / warn / error)
GRAPH_SYNC_ENABLED true Enable/disable GraphSyncWorker
RUNTIME_PG_REQUIRED true When NODE_ENV=test, allow non-PG runtime mode if set to false

LLM

Variable Default Description
LLM_PROVIDER gemini LLM provider (gemini, qwen, openai)
GEMINI_API_KEY (required) Gemini API key
QWEN_API_KEY (required) Qwen API key
OPENAI_API_KEY (required) OpenAI API key
EMBEDDING_DIMENSIONS 1024 Embedding vector dimensions (must match the selected model)

💡 For additional settings like BASE_URL and MODEL_NAME, refer to .env.example.

LLM Rate Limiting & Resilience

Variable Default Description
LLM_MAX_CONCURRENT 5 Max concurrent LLM requests
LLM_MAX_RETRIES 3 Max retries on request failure
LLM_RETRY_DELAY_MS 1000 Initial retry delay in ms (exponential backoff)
LLM_REQUESTS_PER_MINUTE 30 Max requests per minute
LLM_TIMEOUT_MS 120000 Per-request timeout in ms

Neo4j

Variable Default Description
NEO4J_URI bolt://localhost:7687 Neo4j connection URI
NEO4J_USER neo4j Database username
NEO4J_PASSWORD (required) Database password
NEO4J_DATABASE neo4j Database name

PostgreSQL

Variable Default Description
PG_HOST localhost PostgreSQL host
PG_PORT 5432 PostgreSQL port
PG_DATABASE graphen PostgreSQL database name
PG_USER graphen PostgreSQL user
PG_PASSWORD "" PostgreSQL password
PG_MAX_CONNECTIONS 20 Connection pool max size
PG_AUTO_BOOTSTRAP true Auto-create runtime role/database and initialize memory schema at startup
PG_BOOTSTRAP_DATABASE postgres Admin connection database used for bootstrap
PG_BOOTSTRAP_USER "" Admin user for bootstrap (PGUSER/USER fallback when empty)
PG_BOOTSTRAP_PASSWORD "" Admin password for bootstrap
PG_VECTOR_HNSW_M 16 pgvector HNSW m parameter
PG_VECTOR_HNSW_EF_CONSTRUCTION 200 pgvector HNSW build parameter
PG_VECTOR_EF_SEARCH 64 pgvector query-time ef_search

Document Processing

Variable Default Description
MAX_UPLOAD_SIZE 52428800 Max file upload size in bytes (default 50 MB)
CHUNK_SIZE 1500 Text chunk size (tokens)
CHUNK_OVERLAP 200 Chunk overlap length (tokens)
MAX_CHUNKS_PER_DOCUMENT 500 Max chunks per document
MAX_DOCUMENT_ESTIMATED_TOKENS 500000 Max estimated tokens per document

API Rate Limiting

Variable Default Description
RATE_LIMIT_WINDOW_MS 60000 Rate limit time window in ms
RATE_LIMIT_MAX 100 Max requests per window

Data Storage Paths

Variable Default Description
CACHE_DIR data/cache Parsing intermediate result cache directory

🧪 Test Cases

The project provides sample test cases in the /cases directory for quick experimentation:

  • Simple Finance: A set of finance-related Markdown documents (relationship identification, market events, transaction networks), suitable for testing graph extraction accuracy and connectivity.

You can upload these files directly into the app to observe the knowledge graph being built.


📜 Available Scripts

Run from the project root:

# Start dev server (frontend + backend in parallel)
pnpm dev

# Build all packages
pnpm build

# Type checking
pnpm typecheck

Operate on individual packages:

# Frontend only
pnpm --filter @graphen/frontend dev

# Backend only
pnpm --filter @graphen/backend dev

# Run backend tests
pnpm --filter @graphen/backend test

# Run backend integration tests
pnpm --filter @graphen/backend test:integration

📄 License

MIT LICENSE

About

Build, explore, and chat with Knowledge Graphs using GraphRAG and your favorite LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors