Understanding Almanac's architecture helps you make informed decisions about deployment, scaling, and optimization.
┌─────────────────────────────────────────────────────────────┐
│ Client Layer │
│ (Web UI, CLI, SDKs, Custom Applications) │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ REST API Server │
│ (Express.js, TypeScript, Port 3000) │
└─────────┬───────────────────────────────┬───────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────────┐
│ MCP Client Manager │ │ Indexing Engine │
│ (Data Source Layer) │ │ (Vector + Graph Indexing) │
└──────────┬───────────┘ └────────┬─────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Storage Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ MongoDB │ │ Qdrant │ │Memgraph │ │ Redis │ │
│ │(Metadata)│ │ (Vectors)│ │ (Graph) │ │ (Cache) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
Technology: Express.js + TypeScript
Port: 3000 (configurable)
Responsibilities:
- Accept query requests
- Route API calls
- Manage authentication/authorization
- Handle rate limiting
- Coordinate between services
Key Files:
packages/server/src/server.ts- Main serverpackages/server/src/api/- API routes
Purpose: Manages connections to Model Context Protocol servers
Responsibilities:
- Connect/disconnect MCP servers
- Execute tools (fetch data)
- Access resources
- Handle OAuth flows
- Cache tool responses
Architecture:
mcpClientManager
├── clients: Map<serverName, MCPClient>
├── connect(config) → MCPClient
├── disconnect(serverName)
├── executeTool(server, tool, args)
└── getResources(server)Key Files:
packages/server/src/mcp/client.tspackages/server/src/mcp/initialization.ts
Purpose: Transform raw data into searchable vectors and knowledge graphs
Phases:
-
Sync Phase
- Fetch data from MCP servers
- Store in MongoDB
- Track sync state
-
Vector Indexing
- Generate embeddings
- Store in Qdrant
- Enable semantic search
-
Graph Indexing
- Extract entities
- Extract relationships
- Build knowledge graph in Memgraph
Key Files:
packages/indexing-engine/src/- Core indexing logicpackages/server/src/services/indexing/- Service layer
Purpose: Answer queries using hybrid vector + graph retrieval
Query Modes:
Query Request
↓
Mode Selection (naive/local/global/hybrid/mix)
↓
┌────┴────┐
│ Qdrant │ → Vector Search (semantic similarity)
└────┬────┘
↓
┌────┴────┐
│Memgraph │ → Graph Traversal (entities & relationships)
└────┬────┘
↓
Combine Results
↓
Rerank (optional)
↓
Return Top Results
Key Files:
packages/server/src/services/search/lightrag-query.tspackages/server/src/services/llm/reranker.ts
Each database serves a specific purpose optimized for its access patterns:
Use Case: Primary data storage
What It Stores:
- Raw synced records
- MCP server configurations
- Indexing configurations
- User settings
- Metadata
Why MongoDB:
- Flexible schema (different data sources have different fields)
- Fast writes for bulk sync operations
- Rich querying for management operations
- Horizontal scalability
Collections:
{
records: { // Raw data from MCP servers
_id, source, sourceId, content, metadata, ...
},
dataSources: { // MCP server configs
name, transport, args, env, ...
},
indexingConfigs: {// How to index each source
serverName, entityTypes, grouping, ...
}
}Use Case: Semantic search via embeddings
What It Stores:
- Document embeddings (vectors)
- Text chunks
- Metadata for filtering
Why Qdrant:
- Optimized for high-dimensional vectors (3072-d)
- Sub-50ms search on millions of vectors
- Advanced filtering capabilities
- Distributed architecture for scale
Structure:
{
id: "mongo_id",
vector: [0.123, -0.456, ...], // 3072 dimensions
payload: {
text: "Document content",
source: "slack",
recordType: "message",
metadata: {...}
}
}Use Case: Knowledge graph for entity/relationship queries
What It Stores:
- Entities (people, concepts, projects)
- Relationships (works_on, depends_on, discussed_in)
- Properties (types, timestamps, scores)
Why Memgraph:
- Optimized for graph traversal (follow relationships)
- In-memory for speed
- Cypher query language
- Real-time analytics
Structure:
// Nodes
(person:Entity {name: "Alice", type: "person"})
(project:Entity {name: "API Refactor", type: "project"})
// Relationships
(person)-[:WORKS_ON {since: "2024-01-01"}]->(project)Use Case: Performance optimization
What It Stores:
- MCP tool responses (30 min TTL)
- Query results (5 min TTL)
- Rate limiting counters
- Session data
Why Redis:
- Sub-millisecond access
- Automatic expiration (TTL)
- Atomic operations
- Pub/sub for real-time updates
Keys:
mcp:slack:list_channels → cached response
query:hash(query_params) → cached results
ratelimit:ip:123.456.789.0 → request count
1. Trigger Sync (Manual or Scheduled)
↓
2. MCP Client → Fetch Data
↓
3. MongoDB ← Store Raw Records
↓
4. Indexing Engine Processes Records
├─→ Generate Embeddings
│ └─→ Qdrant ← Store Vectors
└─→ Extract Entities & Relationships
└─→ Memgraph ← Build Graph
↓
5. Index Complete
1. User Query → REST API
↓
2. Parse & Validate Request
↓
3. Check Redis Cache
├─→ Cache Hit → Return Cached Results
└─→ Cache Miss → Continue
↓
4. Query Engine (LightRAG)
├─→ Qdrant: Vector Search
│ └─→ Get top_k candidates
├─→ Memgraph: Graph Search (if local/global/hybrid/mix)
│ └─→ Traverse entities/relationships
└─→ Combine Results
↓
5. Rerank (if mode=mix)
└─→ LLM scores each result
↓
6. Filter by score_threshold
↓
7. Return top chunk_top_k results
↓
8. Cache in Redis (5 min TTL)
API Server:
Load Balancer (Nginx)
├─→ API Server 1 (Docker container)
├─→ API Server 2 (Docker container)
└─→ API Server N (Docker container)
Database Layer:
- MongoDB: Replica Set + Sharding
- Qdrant: Distributed cluster
- Memgraph: HA cluster (Enterprise)
- Redis: Cluster mode
Small (< 100K docs):
- 4 CPU, 16GB RAM
- Single server
- Docker Compose
Medium (100K - 1M docs):
- 8 CPU, 32GB RAM
- Single server with more resources
- Or 2-3 servers (API + Databases)
Large (1M - 10M docs):
- 16 CPU, 64GB RAM per server
- Multiple API servers (load balanced)
- Distributed databases
- Dedicated cache layer
Enterprise (> 10M docs):
- Kubernetes cluster
- Auto-scaling based on load
- Multi-region deployment
- Dedicated infrastructure per component
Typical Query (mix mode):
Total: ~450ms
├─ Vector Search (Qdrant): 50ms
├─ Graph Traversal (Memgraph): 100ms
├─ Combining Results: 20ms
├─ Reranking (LLM): 250ms
└─ Response Formatting: 30ms
Fast Query (naive mode):
Total: ~80ms
├─ Vector Search (Qdrant): 50ms
└─ Response Formatting: 30ms
Single Server (8 CPU, 32GB RAM):
- Naive mode: ~200 queries/sec
- Hybrid mode: ~50 queries/sec
- Mix mode: ~20 queries/sec
Clustered (3 servers):
- Naive mode: ~600 queries/sec
- Hybrid mode: ~150 queries/sec
- Mix mode: ~60 queries/sec
Vector Indexing:
- 500-1000 docs/minute (single core)
- 16,000-32,000 docs/minute (32 cores with CONCURRENCY=32)
Graph Indexing:
- 200-400 docs/minute (LLM extraction bottleneck)
- Can run 32 concurrent extractions
Almanac uses parallel processing for performance:
// Configurable via CONCURRENCY env var (default: 32)
const CONCURRENCY = 32;
// Process documents in batches
await Promise.all(batches.map((batch) => processBatch(batch)));Benefits:
- 32x faster than sequential processing
- Efficient CPU utilization
- Configurable based on system resources
Client Request
↓
API Key Validation (optional)
↓
Rate Limiting Check
↓
Request Handler
↓
Data Access (filtered by permissions)
At Rest:
- MongoDB encryption-at-rest (optional)
- Qdrant encrypted volumes
- OAuth tokens encrypted in DB
In Transit:
- HTTPS/TLS for API
- TLS for database connections
- Encrypted MCP connections
Encrypted Fields:
- OAuth access tokens
- OAuth refresh tokens
- API keys
- Environment variables with secrets
Encryption Method:
- AES-256-GCM
- Unique encryption key per deployment
- Automatic via Mongoose hooks
API Metrics:
- Request rate (requests/sec)
- Response time (p50, p95, p99)
- Error rate
- Cache hit rate
Database Metrics:
- Query latency
- Connection pool usage
- Storage size
- Index performance
Indexing Metrics:
- Documents indexed/minute
- Indexing errors
- Queue depth
- Processing time per document
Log Levels:
- DEBUG: Detailed execution logs
- INFO: Important events (sync started, query executed)
- WARN: Recoverable errors (rate limit hit, cache miss)
- ERROR: Critical errors (database down, indexing failed)
Log Format:
{
"timestamp": "2024-01-12T18:00:00Z",
"level": "INFO",
"message": "Query executed",
"duration": 456,
"mode": "mix",
"results": 12
}Single Machine (localhost)
├─ Docker Compose
│ ├─ MongoDB container
│ ├─ Redis container
│ ├─ Qdrant container
│ └─ Memgraph container
├─ Node.js Server (host)
└─ Web UI (host)
Single VM (8 CPU, 32GB RAM)
└─ Docker Compose
├─ API Server container
├─ Web UI container (with Nginx)
├─ MongoDB container
├─ Redis container
├─ Qdrant container
└─ Memgraph container
Kubernetes Cluster
├─ API Server (Deployment, 3 replicas)
│ └─ Auto-scaling (2-10 pods)
├─ MongoDB (StatefulSet)
│ └─ Replica Set (3 nodes)
├─ Qdrant (StatefulSet)
│ └─ Cluster (3+ nodes)
├─ Memgraph (StatefulSet)
│ └─ HA Cluster (2+ nodes)
├─ Redis (Deployment)
│ └─ Cluster mode (6+ nodes)
└─ Ingress (Load Balancer)
- Fast, minimal framework
- Large ecosystem
- TypeScript support
- Battle-tested at scale
- Type safety catches bugs early
- Better IDE support
- Maintainability at scale
- Gradual adoption path
- Standard interface for data sources
- Community-driven ecosystem
- Easy to add new sources
- Separation of concerns
- Better than pure vector search
- Answers "who", "what", "how" questions
- 8x token reduction vs traditional RAG
- Multiple query modes for flexibility
- Data Flow Guide - Detailed data flow diagrams
- Performance Tuning - Optimization strategies
- Deployment Guide - Production setup