Skip to content

Architecture

jstuart0 edited this page Apr 28, 2026 · 2 revisions

Architecture

This page describes how SourceBridge is built internally, aimed at engineers self-hosting, contributing, or extending it.

Component diagram

                    ┌──────────────────────────────────────┐
                    │             Clients                  │
                    │  Web UI · CLI · VS Code · MCP · API  │
                    └──────────────┬───────────────────────┘
                                   │ HTTP / GraphQL / MCP / gRPC
                    ┌──────────────▼───────────────────────┐
                    │          Go API Server               │
                    │  chi router · gqlgen GraphQL         │
                    │  JWT auth · OIDC SSO · REST          │
                    │  tree-sitter indexer (10 languages)  │
                    │  capability registry                 │
                    │  living-wiki dispatcher              │
                    └──────────┬────────────┬──────────────┘
                               │            │ gRPC
               ┌───────────────▼──┐  ┌──────▼───────────────┐
               │    SurrealDB     │  │    Python Worker     │
               │  (embedded or    │  │  gRPC service        │
               │   external)      │  │  AI reasoning        │
               │  42 migrations   │  │  linking             │
               └──────────────────┘  │  knowledge gen       │
                                     │  QA orchestration    │
               ┌──────────────────┐  └──────┬───────────────┘
               │  Redis           │         │
               │  (optional,      │  ┌──────▼───────────────┐
               │  in-memory       │  │   LLM Provider       │
               │  default)        │  │  Cloud or local      │
               └──────────────────┘  └──────────────────────┘

Go API server (internal/, cmd/, cli/)

The central hub. Handles all client-facing traffic and orchestrates the Python worker.

Key responsibilities:

  • HTTP routing via chi
  • GraphQL via gqlgen (internal/api/graphql/)
  • REST endpoints (internal/api/rest/)
  • MCP protocol server (internal/api/rest/mcp.go and mcp_accessors.go, mcp_progress.go)
  • JWT authentication and OIDC SSO (internal/auth/)
  • Code indexing with tree-sitter (internal/indexer/, internal/indexing/)
  • Code graph storage and retrieval (internal/graph/)
  • Subsystem clustering (internal/clustering/)
  • Living-wiki dispatcher, scheduler, sinks, and credentials (internal/livingwiki/)
  • Capability registry (internal/capabilities/)
  • Requirement management (internal/requirements/)
  • Knowledge-artifact lifecycle (internal/knowledge/)
  • QA orchestrator (internal/qa/)
  • Skill card generation (internal/skillcard/)
  • Search service (internal/search/)
  • Soft-delete trash (internal/trash/)

Notable design choices:

  • The indexer runs in the API process (no worker needed for indexing)
  • The capability registry is the single place where edition gating is declared; GraphQL, REST, and MCP all read from it
  • The living-wiki dispatcher starts at boot when enabled and shuts down gracefully with a 30-second drain window
  • Citations use a unified format (internal/citations/) across QA, MCP, and the VS Code plugin

Python gRPC worker (workers/)

The AI reasoning engine. Communicates with the API over gRPC (port 50051 by default).

Services exposed:

  • AnswerQuestion / AnswerQuestionStream — conversational QA
  • GenerateKnowledge — cliff notes, code tours, learning paths, workflow stories
  • ReviewCode — structured code review
  • LinkRequirements — requirement-to-symbol linking
  • ExplainCode — file or snippet explanation

Key behaviors:

  • The worker capability probe runs at API startup to decide whether to activate the agentic retrieval loop
  • Prompt caching (Anthropic cache_control: ephemeral) is applied to the agentic loop to reduce token cost
  • SOURCEBRIDGE_TEST_MODE=1 activates a fake LLM provider for CI without real API calls

Next.js web UI (web/)

Built with React 19, Next.js 15 (App Router), Tailwind CSS.

Notable libraries:

  • @xyflow/react — dependency graph rendering
  • codemirror 6 — code display in file/symbol views
  • recharts — metrics charts
  • mermaid — architecture diagram rendering
  • graphql-request — GraphQL client

The UI connects to the API at NEXT_PUBLIC_API_URL (baked at build time) with a dev proxy via SOURCEBRIDGE_WEB_DEV_PROXY for local development.

SurrealDB

Primary data store. Runs embedded in the API process for single-node installs, or as a separate service for production.

  • Version: v2.2.1 in the Docker Compose files
  • Migrations: 42 migration files in internal/db/migrations/ (.surql format)
  • Migration runner: runs at startup, skips already-applied migrations (internal/db/)
  • Namespace/database: sourcebridge / sourcebridge by default

Major tables (from migrations): repository, symbol, file, requirement, link, job, job_result, cluster, cluster_member, lw_repo_settings, lw_job_results, lw_pages, lw_watermarks, lw_settings.

Internal package map

Package Purpose
internal/api/graphql/ GraphQL schema, resolvers, gqlgen config
internal/api/rest/ REST handlers, MCP server implementation
internal/auth/ JWT, OIDC, session management
internal/capabilities/ Capability registry (edition gating)
internal/citations/ Citation format ((path:start-end)) — shared by all surfaces
internal/clustering/ Label-propagation subsystem clustering
internal/config/ Config struct, Viper loading, validation
internal/db/ SurrealDB client, migration runner
internal/entrypoints/ HTTP route / CLI entry-point classification
internal/graph/ Code graph store and retrieval
internal/indexer/ tree-sitter parsing, language configs
internal/indexing/ Indexing service (shared by MCP, CLI, GraphQL)
internal/jobs/ Async job queue and orchestrator
internal/knowledge/ Knowledge artifact lifecycle, delta invalidation
internal/livingwiki/ Living-wiki subsystem (see below)
internal/qa/ Server-side deep-QA orchestrator
internal/quality/ Quality validators for generated pages
internal/requirements/ Requirement CRUD and traceability
internal/search/ Hybrid search (FTS + vector + structural, RRF fusion)
internal/settings/ Global and per-repo settings persistence
internal/skillcard/ .claude/CLAUDE.md generation
internal/telemetry/ Anonymous install telemetry
internal/trash/ Soft-delete and retention sweep
internal/worker/ gRPC client to Python worker

Living-wiki internal packages

The living-wiki subsystem is the most recently added large surface:

Package Purpose
internal/livingwiki/assembly/ Wires all orchestrator ports at boot (AssembleDispatcher)
internal/livingwiki/ast/ Canonical Page AST with typed blocks and stable IDs
internal/livingwiki/coldstart/ Cold-start runner: generates initial page set via LLM
internal/livingwiki/credentials/ Credential broker with per-job snapshots
internal/livingwiki/governance/ Audit log for credential rotations and edits
internal/livingwiki/manifest/ Dependency manifest for per-page stale tracking
internal/livingwiki/markdown/ AST → Markdown writer
internal/livingwiki/metrics/ Prometheus series for the living-wiki scheduler
internal/livingwiki/orchestrator/ Per-repo job orchestration
internal/livingwiki/scheduler/ Periodic scheduler with leader election and per-repo jitter
internal/livingwiki/sinks/ Sink writers: Confluence, Notion, git-repo
internal/livingwiki/webhook/ Webhook dispatcher with per-repo goroutine serialization

Data flow

Indexing:

User registers repo URL
→ API clones to repo-cache (internal/indexing.Service)
→ tree-sitter parses each file (internal/indexer/)
→ symbols, files, call edges, import edges written to SurrealDB
→ clustering job queued (internal/clustering/)
→ knowledge artifacts marked stale (if any exist)

Knowledge generation (e.g. cliff notes):

User requests cliff notes
→ API enqueues LLM job (internal/jobs/)
→ Python worker generates via LLM
→ Result stored as knowledge artifact in SurrealDB
→ Polling or webhook notifies web UI

QA (agentic path):

User asks question
→ QA orchestrator (internal/qa/) classifies question type
→ Agentic loop: plan → call tools (search, graph, requirements) → synthesize
→ Citations extracted and normalized to (path:start-end)
→ Answer + citations returned to client

Living wiki (cold-start):

Operator enables living wiki for a repo
→ Scheduler wakes, elects leader, acquires lease
→ Cold-start runner fetches clusters + graph metrics
→ LLM generates page AST per architectural area
→ Quality validators gate or pass pages
→ Sink writers publish to Confluence/Notion/git
→ Job result stored for UI display

Multi-tenancy

tenant_id is a first-class field on all major tables. The living-wiki scheduler enforces per-tenant concurrency caps. The capability registry supports per-edition gating independent of tenant. Enterprise features (cross-repo impact, SSO, audit log, notifications, team management, org settings, enterprise reports) are gated by the enterprise edition value in SOURCEBRIDGE_SECURITY_MODE.

Proto definitions

gRPC service definitions live in proto/. Generated Go stubs are in gen/go/. Regenerate with make proto (requires protoc and the Go gRPC plugins).

Clone this wiki locally