Skip to content

Latest commit

 

History

History
348 lines (249 loc) · 8.21 KB

File metadata and controls

348 lines (249 loc) · 8.21 KB

CodeSeeker Storage Configuration

NEW: CodeSeeker now works out-of-the-box with zero setup!

By default, CodeSeeker uses embedded storage (SQLite + Graphology + LRU-cache) that requires no Docker or external databases. Just npm install and go.

CodeSeeker supports two storage modes to fit different use cases:

Storage Modes

Mode Setup Best For
Embedded (default) Zero setup - just npm install Personal use, small-medium projects, getting started
Server Docker or manual setup Large codebases, teams, production environments

Embedded Mode (Default)

Zero configuration required. Works immediately after installation.

What It Uses

Component Technology Persistence
Vector Search SQLite + better-sqlite3 ~/.codeseeker/data/vectors.db
Graph Database Graphology (in-memory) ~/.codeseeker/data/graph.json
Cache LRU-cache (in-memory) ~/.codeseeker/data/cache.json
Projects SQLite ~/.codeseeker/data/projects.db

Data Location

Data is stored in platform-specific locations:

Platform Location
Windows %APPDATA%\codeseeker\data\
macOS ~/Library/Application Support/codeseeker/data/
Linux ~/.local/share/codeseeker/data/

Features

  • Automatic persistence: All data auto-saves every 30 seconds and on exit
  • Crash recovery: Uses SQLite WAL mode for durability
  • No external dependencies: Everything runs in-process
  • Fast startup: No network connections to establish
  • Offline capable: Works without internet

Customizing Data Location

Set the CODESEEKER_DATA_DIR environment variable:

# Windows (PowerShell)
$env:CODESEEKER_DATA_DIR = "D:\codeseeker-data"

# macOS/Linux
export CODESEEKER_DATA_DIR="/custom/path/to/data"

Or create a config file:

// ~/.codeseeker/storage.json (Windows: %APPDATA%\codeseeker\storage.json)
{
  "mode": "embedded",
  "dataDir": "/custom/path/to/data",
  "flushIntervalSeconds": 60
}

Server Mode (Advanced)

For large codebases (100K+ files), teams, or production environments.

Note: Most users don't need server mode. Start with embedded mode and upgrade only if you hit performance limits or need multi-user support.

What It Uses

Component Technology Purpose
Vector Search PostgreSQL + pgvector Scalable vector similarity search
Graph Database Neo4j Powerful graph queries with Cypher
Cache Redis Distributed caching
Projects PostgreSQL Relational data with ACID

Setup Options (Choose One)

Option Best For Documentation
Manual Installation Recommended for most users Database Scripts
Kubernetes Production deployments Kubernetes Templates
Docker Compose Quick testing only (experimental) See below

Manual Installation (Recommended)

Follow the Database Scripts Guide to install PostgreSQL, Neo4j, and Redis manually. This gives you the most control and is recommended for production use.

Docker Compose (Experimental)

⚠️ Docker Compose is experimental and provided for quick local testing only. For production, use manual installation or Kubernetes.

# Start database services only (experimental)
docker-compose up -d database redis neo4j

# Verify services are running
docker-compose ps

Configuration

Create ~/.codeseeker/storage.json:

{
  "mode": "server",
  "server": {
    "postgres": {
      "host": "localhost",
      "port": 5432,
      "database": "codeseeker",
      "user": "codeseeker",
      "password": "your-password"
    },
    "neo4j": {
      "uri": "bolt://localhost:7687",
      "user": "neo4j",
      "password": "your-password"
    },
    "redis": {
      "host": "localhost",
      "port": 6379,
      "password": "optional-password"
    }
  }
}

Environment Variables

You can also configure via environment variables:

# Storage mode
export CODESEEKER_STORAGE_MODE=server

# PostgreSQL
export CODESEEKER_PG_HOST=localhost
export CODESEEKER_PG_PORT=5432
export CODESEEKER_PG_DATABASE=codeseeker
export CODESEEKER_PG_USER=codeseeker
export CODESEEKER_PG_PASSWORD=secret

# Neo4j
export CODESEEKER_NEO4J_URI=bolt://localhost:7687
export CODESEEKER_NEO4J_USER=neo4j
export CODESEEKER_NEO4J_PASSWORD=secret

# Redis
export CODESEEKER_REDIS_HOST=localhost
export CODESEEKER_REDIS_PORT=6379
export CODESEEKER_REDIS_PASSWORD=optional

PostgreSQL Setup

If not using Docker, install PostgreSQL with pgvector:

-- Create database
CREATE DATABASE codeseeker;

-- Enable pgvector extension
CREATE EXTENSION vector;

-- Create user
CREATE USER codeseeker WITH PASSWORD 'your-password';
GRANT ALL PRIVILEGES ON DATABASE codeseeker TO codeseeker;

Neo4j Setup

If not using Docker, install Neo4j Community Edition:

  1. Download from https://neo4j.com/download/
  2. Start the service
  3. Set initial password via Neo4j Browser

Redis Setup

If not using Docker:

# macOS
brew install redis
brew services start redis

# Ubuntu/Debian
sudo apt install redis-server
sudo systemctl start redis

Checking Storage Status

# Check current storage mode and health
codeseeker storage status

# Test server connectivity (server mode)
codeseeker storage test

Migrating Between Modes

Embedded to Server

  1. Configure server mode in storage.json
  2. Run codeseeker init to re-index your project
  3. Existing embedded data remains in place as backup

Server to Embedded

  1. Change mode to embedded in storage.json
  2. Run codeseeker init to re-index your project
  3. Server data remains intact for future use

Persistence Details

Embedded Mode Persistence

Store Format Flush Interval Durability
Vectors SQLite WAL Automatic High (WAL)
Graph JSON 30 seconds Good
Cache JSON 30 seconds Good
Projects SQLite WAL Automatic High (WAL)

Flush Behavior

  • Automatic flush: Every 30 seconds (configurable)
  • Graceful shutdown: Flushes before exit
  • Crash recovery: SQLite WAL protects vector/project data
  • JSON stores: May lose up to 30 seconds of data on crash

Customizing Flush Interval

{
  "mode": "embedded",
  "flushIntervalSeconds": 10
}

Troubleshooting

"Cannot find module 'better-sqlite3'"

Rebuild native modules:

npm rebuild better-sqlite3

"Database is locked"

Only one CodeSeeker process can access embedded storage at a time. Kill any background processes:

# Find CodeSeeker processes
ps aux | grep codeseeker

# Or on Windows
tasklist | findstr codeseeker

Server mode connection errors

  1. Verify services are running
  2. Check firewall settings
  3. Verify credentials in config
  4. Test connectivity:
    # PostgreSQL
    psql -h localhost -U codeseeker -d codeseeker
    
    # Redis
    redis-cli ping
    
    # Neo4j
    cypher-shell -u neo4j -p password

Performance Comparison

Metric Embedded Server
Startup time ~100ms ~500ms+
Vector search (1K docs) ~50ms ~20ms
Vector search (100K docs) ~500ms ~50ms
Graph traversal Good Excellent
Concurrent users 1 Many
Memory usage Low Variable

Recommendation: Start with embedded mode. Switch to server mode when you have:

  • 100K+ files to index
  • Multiple team members
  • High query volume

API Usage

import { getStorageProvider } from '@codeseeker/storage';

// Get the storage provider (auto-configured)
const storage = await getStorageProvider();

// Access individual stores
const vectors = storage.getVectorStore();
const graph = storage.getGraphStore();
const cache = storage.getCacheStore();
const projects = storage.getProjectStore();

// Check health
const health = await storage.healthCheck();
console.log('Storage healthy:', health.healthy);

// Manual flush (usually not needed)
await storage.flushAll();

// Cleanup on shutdown
await storage.closeAll();