Skip to content

Configuration and Models

jstuart0 edited this page May 2, 2026 · 3 revisions

Configuration and Models

SourceBridge reads configuration from a TOML file and environment variables. Environment variables use the SOURCEBRIDGE_ prefix and override file values. The config file is searched in order: ./config.toml, $HOME/.config/sourcebridge/config.toml, /etc/sourcebridge/config.toml.

See config.toml.example for an annotated example.

Server

Variable Config key Default Description
SOURCEBRIDGE_SERVER_HTTP_PORT server.http_port 8080 API server HTTP port
SOURCEBRIDGE_SERVER_GRPC_PORT server.grpc_port 50051 gRPC port for API↔worker communication
SOURCEBRIDGE_SERVER_PUBLIC_BASE_URL server.public_base_url http://localhost:8080 Public-facing URL (used in OAuth callbacks and generated links)
SOURCEBRIDGE_SERVER_CORS_ORIGINS server.cors_origins http://localhost:3000 Comma-separated allowed CORS origins
SOURCEBRIDGE_SERVER_MAX_BODY_SIZE server.max_body_size 10485760 (10 MB) Max HTTP request body size in bytes

Storage

Variable Config key Default Description
SOURCEBRIDGE_STORAGE_SURREAL_MODE storage.surreal_mode embedded embedded or external
SOURCEBRIDGE_STORAGE_SURREAL_URL storage.surreal_url ws://localhost:8000/rpc SurrealDB WebSocket URL (external mode)
SOURCEBRIDGE_STORAGE_SURREAL_NAMESPACE storage.surreal_namespace sourcebridge SurrealDB namespace
SOURCEBRIDGE_STORAGE_SURREAL_DATABASE storage.surreal_database sourcebridge SurrealDB database name
SOURCEBRIDGE_STORAGE_SURREAL_USER storage.surreal_user root SurrealDB username
SOURCEBRIDGE_STORAGE_SURREAL_PASS storage.surreal_pass root SurrealDB password — change in production
SOURCEBRIDGE_STORAGE_SURREAL_DATA_PATH storage.surreal_data_path ./surrealdb-data Data directory for embedded mode
SOURCEBRIDGE_STORAGE_REDIS_MODE storage.redis_mode memory memory or external
SOURCEBRIDGE_STORAGE_REDIS_URL storage.redis_url Redis URL (external mode)
SOURCEBRIDGE_STORAGE_REPO_CACHE_PATH storage.repo_cache_path ./repo-cache Local clone cache for indexed repos

Security

Variable Config key Default Description
SOURCEBRIDGE_SECURITY_JWT_SECRET security.jwt_secret dev-secret-change-in-production JWT signing secret — required in production
SOURCEBRIDGE_SECURITY_JWT_TTL_MINUTES security.jwt_ttl_minutes 1440 JWT expiry (24 hours)
SOURCEBRIDGE_SECURITY_ENCRYPTION_KEY security.encryption_key AES-256 key for field-level encryption (living wiki secrets)
SOURCEBRIDGE_SECURITY_GRPC_AUTH_SECRET security.grpc_auth_secret Shared secret for API↔worker gRPC auth — required in production
SOURCEBRIDGE_SECURITY_MODE security.mode oss oss or enterprise
SOURCEBRIDGE_SECURITY_CSRF_ENABLED security.csrf_enabled true CSRF protection
SOURCEBRIDGE_SECURITY_GITHUB_WEBHOOK_SECRET security.github_webhook_secret HMAC secret for GitHub webhook validation
SOURCEBRIDGE_SECURITY_GITLAB_WEBHOOK_SECRET security.gitlab_webhook_secret HMAC secret for GitLab webhook validation

OIDC SSO

Variable Config key Default Description
SOURCEBRIDGE_SECURITY_OIDC_ISSUER_URL security.oidc.issuer_url OIDC provider issuer URL
SOURCEBRIDGE_SECURITY_OIDC_CLIENT_ID security.oidc.client_id OAuth client ID
SOURCEBRIDGE_SECURITY_OIDC_CLIENT_SECRET security.oidc.client_secret OAuth client secret
SOURCEBRIDGE_SECURITY_OIDC_REDIRECT_URL security.oidc.redirect_url OAuth redirect/callback URL
SOURCEBRIDGE_SECURITY_OIDC_SCOPES security.oidc.scopes Comma-separated OIDC scopes

LLM provider

Variable Config key Default Description
SOURCEBRIDGE_LLM_PROVIDER llm.provider anthropic LLM provider (see table below)
SOURCEBRIDGE_LLM_BASE_URL llm.base_url API endpoint (required for local providers)
SOURCEBRIDGE_LLM_API_KEY llm.api_key API key (cloud providers)
SOURCEBRIDGE_LLM_SUMMARY_MODEL llm.summary_model claude-sonnet-4-20250514 Default model for analysis
SOURCEBRIDGE_LLM_REVIEW_MODEL llm.review_model claude-sonnet-4-20250514 Review operations
SOURCEBRIDGE_LLM_ASK_MODEL llm.ask_model claude-sonnet-4-20250514 Discussion/QA operations
SOURCEBRIDGE_LLM_KNOWLEDGE_MODEL llm.knowledge_model Knowledge generation (cliff notes, etc.)
SOURCEBRIDGE_LLM_ARCHITECTURE_DIAGRAM_MODEL llm.architecture_diagram_model Architecture diagram generation
SOURCEBRIDGE_LLM_REPORT_MODEL llm.report_model Report generation (enterprise)
SOURCEBRIDGE_LLM_TIMEOUT_SECONDS llm.timeout_seconds 900 Per-call LLM timeout (15 minutes)
SOURCEBRIDGE_LLM_ADVANCED_MODE llm.advanced_mode false Enable per-operation model selection

Per-operation model overrides (when advanced_mode = true) are an enterprise capability (per_op_models).

Supported LLM providers

Provider Config value Notes
Anthropic anthropic Recommended for output quality. Claude Sonnet 4, Haiku, etc.
OpenAI openai GPT-4o, GPT-4o-mini, etc.
Google Gemini gemini Gemini 2.5 Pro, Flash, etc.
OpenRouter openrouter 100+ models behind one API key
Ollama ollama Local. Set base_url to http://localhost:11434/v1
vLLM vllm Local, high-throughput PagedAttention
llama.cpp llama-cpp Local, CPU/GPU, GGUF models
SGLang sglang Local, RadixAttention
LM Studio lmstudio Local, desktop GUI, OpenAI-compatible API

All local providers expose an OpenAI-compatible API. Set base_url to the local endpoint.

Model Registry and capability tiers

Admin → Comprehension → Model Registry (/admin/comprehension/models) stores per-model metadata used by the Living Wiki quality validators. This is separate from the active-model selection at Admin → LLM (/admin/llm): the LLM page controls which model runs; the Model Registry controls how strictly the quality gates evaluate its output.

Capability tiers

Each model carries a qualityGateTier that Living Wiki uses to pick appropriate gate thresholds:

Tier Typical models Pattern-match rule
frontier Claude (all), GPT-4o, GPT-4.1, o1, o3, Gemini Pro/Ultra Anthropic provider (all); OpenAI gpt-4*, o1, o3; Gemini pro/ultra
mid gpt-4o-mini, o1-mini, Gemini Flash, open-weights ≥70B OpenAI *-mini/*-nano; Gemini flash; size token ≥70B
local Ollama-served models, open-weights <70B (qwen3:32b, llama3:8b, phi4, etc.) Local inference providers (ollama, vllm, llama-cpp, sglang, lmstudio); size token <70B

The default OSS install (config.toml.example ships qwen3:32b) resolves to TierLocal. Frontier gates (strict citation density, vagueness) are relaxed or demoted to warnings for local-tier runs, so a fresh install does not produce "0 pages generated".

When a model is not in the registry, ClassifyByPattern (internal/llm/modeltier/classify.go) runs the provider fast-path first, then the size parser, then family-name heuristics. Unknown providers default to TierLocal.

Registering a model

To override the pattern-match result for a specific model:

  1. Go to Admin → Comprehension → Model Registry.
  2. Create or update an entry using the model string alone as the key (e.g. qwen3:32b, llama3.1:70b). Model IDs are stored and looked up in lowercase.
  3. Set qualityGateTier to frontier, mid, or local.

The registry key is the model string, not provider/model — if two providers serve the same model name, register them under distinct IDs (e.g. openrouter/anthropic/claude-3-5-sonnet).

Verifying the resolved tier

After a cold-start run, grep the API logs:

kubectl -n sourcebridge logs -l app=sourcebridge-api --tail=500 \
  | grep "resolved quality-gate tier"

Each line includes tier, source (registry or pattern), provider, and model.

Worker

Variable Config key Default Description
SOURCEBRIDGE_WORKER_ADDRESS worker.address localhost:50051 gRPC address of the Python worker

The worker has its own env vars prefixed with SOURCEBRIDGE_WORKER_:

Variable Description
SOURCEBRIDGE_WORKER_GRPC_PORT gRPC listen port (default 50051)
SOURCEBRIDGE_WORKER_LLM_PROVIDER LLM provider for the worker (can differ from API)
SOURCEBRIDGE_WORKER_LLM_BASE_URL Worker LLM API endpoint
SOURCEBRIDGE_WORKER_LLM_MODEL Worker LLM model name
SOURCEBRIDGE_WORKER_LLM_API_KEY Worker LLM API key
SOURCEBRIDGE_WORKER_EMBEDDING_PROVIDER Embedding provider
SOURCEBRIDGE_WORKER_EMBEDDING_BASE_URL Embedding API endpoint
SOURCEBRIDGE_WORKER_EMBEDDING_MODEL Embedding model (default nomic-embed-text)
SOURCEBRIDGE_WORKER_EMBEDDING_DIMENSION Embedding dimension (default 768)
SOURCEBRIDGE_WORKER_GRPC_AUTH_SECRET Must match SOURCEBRIDGE_SECURITY_GRPC_AUTH_SECRET

Indexing

Variable Config key Default Description
SOURCEBRIDGE_INDEXING_MAX_FILE_SIZE_BYTES indexing.max_file_size_bytes 1048576 (1 MB) Skip files larger than this
SOURCEBRIDGE_INDEXING_MAX_CONCURRENCY indexing.max_concurrency 8 Parallel file parsing goroutines
SOURCEBRIDGE_INDEXING_SCIP_ENABLED indexing.scip_enabled true SCIP-based precise indexing

Default ignore globs: node_modules/**, dist/**, .git/**, vendor/**, __pycache__/**.

MCP

Variable Config key Default Description
SOURCEBRIDGE_MCP_ENABLED mcp.enabled false Enable the MCP server
SOURCEBRIDGE_MCP_REPOS mcp.repos Comma-separated repo IDs to expose (empty = all)
SOURCEBRIDGE_MCP_SESSION_TTL mcp.session_ttl 3600 Idle session reap time in seconds
SOURCEBRIDGE_MCP_KEEPALIVE mcp.keepalive 30 SSE keepalive ping interval in seconds
SOURCEBRIDGE_MCP_MAX_SESSIONS mcp.max_sessions 100 Max concurrent sessions (0 = unlimited)

QA (agentic retrieval)

Variable Config key Default Description
SOURCEBRIDGE_QA_SERVER_SIDE_ENABLED qa.server_side_enabled false Enable server-side deep-QA orchestrator
SOURCEBRIDGE_QA_LOCAL_FAST_MODE_SUBPROCESS qa.local_fast_mode_subprocess true Keep subprocess QA path for local dev
SOURCEBRIDGE_QA_QUESTION_MAX_BYTES qa.question_max_bytes 4096 Max question length
SOURCEBRIDGE_QA_SESSION_TOKENS_PER_HOUR qa.session_tokens_per_hour 100000 Token budget per session per hour (0 = disabled)
SOURCEBRIDGE_QA_REPO_TOKENS_PER_DAY qa.repo_tokens_per_day 1000000 Token budget per repo per day
SOURCEBRIDGE_QA_DEPLOYMENT_TOKENS_PER_DAY qa.deployment_tokens_per_day 10000000 Deployment-level token circuit breaker
SOURCEBRIDGE_QA_SYNTHESIS_LANE qa.synthesis_lane 4 Concurrent synthesis calls against the worker
SOURCEBRIDGE_QA_AGENTIC_RETRIEVAL_ENABLED qa.agentic_retrieval_enabled false Enable agentic retrieval loop
SOURCEBRIDGE_QA_AGENTIC_RETRIEVAL_CANARY_PCT qa.agentic_retrieval_canary_pct 0 Staged rollout percentage (0–100)
SOURCEBRIDGE_QA_PROMPT_CACHING_ENABLED qa.prompt_caching_enabled true Anthropic prompt-cache markers
SOURCEBRIDGE_QA_SMART_CLASSIFIER_ENABLED qa.smart_classifier_enabled false LLM-backed question profiler
SOURCEBRIDGE_QA_QUERY_DECOMPOSITION_ENABLED qa.query_decomposition_enabled false Multi-hop query decomposition

Comprehension (knowledge generation)

Variable Config key Default Description
SOURCEBRIDGE_COMPREHENSION_MAX_CONCURRENCY comprehension.max_concurrency 3 Max parallel LLM jobs

Trash (soft-delete recycle bin)

Variable Config key Default Description
SOURCEBRIDGE_TRASH_ENABLED trash.enabled true Enable soft-delete
SOURCEBRIDGE_TRASH_RETENTION_DAYS trash.retention_days 30 Days before permanent deletion (1–365)
SOURCEBRIDGE_TRASH_SWEEP_INTERVAL trash.sweep_interval_sec 21600 Sweep interval in seconds (6 hours)
SOURCEBRIDGE_TRASH_SWEEP_MAX_BATCH trash.max_batch_size 500 Max items per sweep pass

Living wiki

Variable Config key Default Description
SOURCEBRIDGE_LIVING_WIKI_ENABLED living_wiki.enabled false Enable living-wiki feature
SOURCEBRIDGE_LIVING_WIKI_WORKER_COUNT living_wiki.worker_count 4 Goroutines draining the dispatcher queue
SOURCEBRIDGE_LIVING_WIKI_EVENT_TIMEOUT living_wiki.event_timeout 5m Max duration per event handler
SOURCEBRIDGE_LIVING_WIKI_SCHEDULER_INTERVAL living_wiki.scheduler_interval 15m Default regen frequency per repo
SOURCEBRIDGE_LIVING_WIKI_MAX_CONCURRENT_JOBS_PER_TENANT living_wiki.max_concurrent_jobs_per_tenant 5 Per-tenant concurrency cap
SOURCEBRIDGE_LIVING_WIKI_CONFLUENCE_WEBHOOK_SECRET living_wiki.confluence_webhook_secret HMAC secret for Confluence webhooks
SOURCEBRIDGE_LIVING_WIKI_NOTION_WEBHOOK_SECRET living_wiki.notion_webhook_secret Reserved for Notion webhook validation
SOURCEBRIDGE_LIVING_WIKI_KILL_SWITCH false Bypass living-wiki without redeploying

Git

Variable Config key Default Description
SOURCEBRIDGE_GIT_DEFAULT_TOKEN git.default_token PAT used when no per-repo token is provided
SOURCEBRIDGE_GIT_SSH_KEY_PATH git.ssh_key_path Path to SSH private key for SSH clone URLs

Telemetry

Variable Description
SOURCEBRIDGE_TELEMETRY Set to off to disable telemetry. DO_NOT_TRACK=1 also works
SOURCEBRIDGE_TELEMETRY_PLATFORM Override the auto-detected platform string (useful for CI: set to test)

Knowledge generation env vars (not in config struct)

These are read directly from the environment by the comprehension subsystem:

Variable Default Description
SOURCEBRIDGE_SELECTIVE_INVALIDATION true Delta-based artifact invalidation on reindex
SOURCEBRIDGE_SELECTIVE_INVALIDATION_MAX_CHANGES 200 Fall back to blanket-stale above this threshold
SOURCEBRIDGE_DELTA_REGEN_MODE off off, shadow, or live auto-regen
SOURCEBRIDGE_DELTA_REGEN_MAX_PER_INDEX 5 Max artifacts auto-regenerated per reindex
SOURCEBRIDGE_DELTA_REGEN_MAX_PER_REPO_PER_HOUR 20 Rolling-hour cap per repo

Clone this wiki locally