Complete reference for all environment variables, command-line flags, and configuration files across every ATLAS service. All settings have sensible defaults — most users only need to edit .env.
cp .env.example .env
# Edit .env only if you need to change model path or ports
docker compose up -d
The defaults work if your model is at ./models/Qwen3.5-9B-Q6_K.gguf.
These variables are read by docker-compose.yml and control host-side port mappings and model paths. Copy .env.example to .env to configure:
| Variable | Default | Description |
|---|---|---|
ATLAS_MODELS_DIR |
./models |
Host path to directory containing GGUF model weights |
ATLAS_MODEL_FILE |
Qwen3.5-9B-Q6_K.gguf |
Model filename (must exist in ATLAS_MODELS_DIR) |
ATLAS_MODEL_NAME |
Qwen3.5-9B-Q6_K |
Model identifier used in API responses |
ATLAS_CTX_SIZE |
32768 |
Context window size in tokens |
ATLAS_LLAMA_PORT |
8080 |
llama-server host port |
ATLAS_LENS_PORT |
8099 |
Geometric Lens host port |
ATLAS_V3_PORT |
8070 |
V3 Pipeline service host port |
ATLAS_SANDBOX_PORT |
30820 |
Sandbox host port (container listens on 8020) |
ATLAS_PROXY_PORT |
8090 |
atlas-proxy host port (Aider connects here) |
Docker Compose also sets inter-service URLs using Docker networking (e.g., http://llama-server:8080). These are hardcoded in docker-compose.yml and do not need to be configured by users.
The Go proxy that runs the agent loop, routes tool calls, and translates between Aider and the ATLAS stack.
| Variable | Default | Description |
|---|---|---|
ATLAS_PROXY_PORT |
8090 |
Port to listen on |
ATLAS_INFERENCE_URL |
http://localhost:8080 |
llama-server endpoint for generation |
ATLAS_LLAMA_URL |
(falls back to ATLAS_INFERENCE_URL) | llama-server endpoint for grammar-constrained calls |
ATLAS_LENS_URL |
http://localhost:8099 |
Geometric Lens scoring endpoint |
ATLAS_SANDBOX_URL |
http://localhost:30820 |
Sandbox code execution endpoint |
ATLAS_V3_URL |
http://localhost:8070 |
V3 Pipeline service endpoint |
ATLAS_MODEL_NAME |
Qwen3.5-9B-Q6_K |
Model name for API responses |
ATLAS_AGENT_LOOP |
(unset) | Set to 1 to enable tool-call agent loop. When unset or any other value, proxy forwards to llama-server directly. |
ATLAS_V3_CLI |
(unset) | Set to 1 to enable V3 CLI mode (routes all generation through V3 service) |
| Setting | Value | Description |
|---|---|---|
| Max turns (T0 Conversational) | 5 | Text-only chat responses |
| Max turns (T1 Simple) | 30 | Config files, short files, data files |
| Max turns (T2 Feature) | 30 | Feature files routed through V3 pipeline |
| Max turns (T3 Hard) | 60 | Complex multi-file tasks |
| Exploration budget warning | 4 consecutive reads | Injects "write your changes now" |
| Exploration budget skip | 5+ consecutive reads | Skips the read, returns warning |
| Error loop breaker | 3 consecutive failures | Stops agent loop |
| T2 threshold | 50 lines + 3 logic indicators | Minimum for V3 activation |
| write_file rejection | Existing files > 100 lines | Forces use of edit_file |
| Conversation trim | 12 messages max | Keeps system + first user + last 8 |
| Command stdout limit | 8,000 chars | Prevents context flooding |
| Command stderr limit | 4,000 chars | Prevents context flooding |
| Search results limit | 200 matches | Prevents context flooding |
| File search skip | Files > 1 MB | Performance |
| max_tokens | 32,768 | Sent to llama-server |
| temperature | 0.3 | Sent to llama-server |
Python HTTP service that orchestrates the V3 code generation pipeline (PlanSearch, DivSampling, Budget Forcing, PR-CoT, etc.).
| Variable | Default | Description |
|---|---|---|
ATLAS_INFERENCE_URL |
http://localhost:8080 |
llama-server endpoint for generation and embeddings |
ATLAS_LENS_URL |
http://localhost:8099 |
Geometric Lens endpoint for C(x)/G(x) scoring |
ATLAS_SANDBOX_URL |
http://localhost:30820 |
Sandbox endpoint for code execution |
ATLAS_V3_PORT |
8070 |
Port to listen on |
ATLAS_MODEL_NAME |
Qwen3.5-9B-Q6_K |
Model name for API calls |
| Setting | Value | Description |
|---|---|---|
| BASE_TEMPERATURE | 0.6 | Default generation temperature |
| DIVERSITY_TEMPERATURE | 0.8 | Temperature for diverse candidate sampling |
| MAX_TOKENS | 8,192 | Max output tokens per generation call |
| PlanSearch plans | 3 (max 7) | Number of structural plans generated |
| DivSampling perturbations | 12 | 4 roles + 4 instructions + 4 styles |
| Budget Forcing tiers | 5 | nothink (0), light (1024), standard (2048), hard (4096), extreme (8192) |
| PR-CoT perspectives | 4 | logical_consistency, information_completeness, biases, alternative_solutions |
| PR-CoT max rounds | 3 | Maximum repair attempts |
| Refinement max iterations | 2 | Maximum refinement cycles |
| Refinement time budget | 120s | Maximum time for refinement loop |
| Derivation max sub-problems | 5 | Maximum problem decomposition depth |
| Derivation max attempts/step | 3 | Retries per sub-problem |
| Constraint min cosine distance | 0.15 | Prevents hypothesis repetition |
Python FastAPI service for C(x)/G(x) scoring, RAG/project indexing, confidence routing, and pattern caching.
| Variable | Default | Description |
|---|---|---|
GEOMETRIC_LENS_ENABLED |
false |
Enable C(x)/G(x) scoring. Docker Compose sets this to true. |
LLAMA_URL |
http://llama-service:8000 |
llama-server endpoint. Docker Compose overrides to http://llama-server:8080. |
LLAMA_EMBED_URL |
(falls back to LLAMA_URL) | Embedding endpoint. Set separately if using a dedicated embedding server. |
PROJECT_DATA_DIR |
/data/projects |
Directory for project index storage |
REDIS_URL |
redis://redis:6379 |
Redis connection for confidence router and pattern cache. Features using Redis degrade gracefully if unavailable. |
CORS_ORIGINS |
http://localhost:3000,http://localhost:8080 |
Allowed CORS origins (comma-separated) |
CONFIG_PATH |
/app/config/config.yaml |
Path to YAML config file (optional, defaults used if missing) |
API_KEYS_PATH |
/app/secrets/api-keys.json |
Path to API keys JSON (optional) |
API_PORTAL_URL |
http://api-portal:3000 |
API portal URL (K3s deployment only) |
| Parameter | Value | Description |
|---|---|---|
| C(x) sigmoid midpoint | 19.0 | Energy value mapping to 0.5 normalized score |
| C(x) sigmoid steepness | 2.0 | Controls normalization curve sharpness |
| G(x) "likely_correct" threshold | >= 0.7 | Verdict threshold |
| G(x) "uncertain" threshold | >= 0.3 | Verdict threshold |
| G(x) "likely_incorrect" threshold | < 0.3 | Verdict threshold |
| Parameter | Value | Description |
|---|---|---|
| CACHE_HIT route cost | 1 | Cheapest route (k=0 retrieval) |
| FAST_PATH route cost | 50 | Quick route (k=1) |
| STANDARD route cost | 300 | Default route (k=5) |
| HARD_PATH route cost | 1,500 | Expensive route (k=20) |
| BM25 k1 | 1.5 | BM25 term frequency saturation |
| BM25 b | 0.75 | BM25 document length normalization |
| Tree search max depth | 6 | LLM-guided traversal depth |
| Tree search max calls | 40 | Maximum LLM scoring calls |
| Pattern cache STM capacity | 100 | Short-term memory max entries |
Python FastAPI service for isolated code execution with compilation, linting, and testing.
| Variable | Default | Description |
|---|---|---|
MAX_EXECUTION_TIME |
60 |
Maximum execution time in seconds |
MAX_MEMORY_MB |
512 |
Maximum memory per execution in MB |
WORKSPACE_BASE |
/tmp/sandbox |
Base directory for execution workspaces |
| Setting | Value | Description |
|---|---|---|
| Default timeout per request | 30s | Can be overridden per request up to MAX_EXECUTION_TIME |
| stdout truncation | 4,000 chars | Last N chars kept |
| stderr truncation | 2,000 chars | Last N chars kept |
| error_message truncation | 500 chars | First N chars kept |
| Supported languages | 8 | python, javascript, typescript, go, rust, c, cpp, bash |
C++ inference server (llama.cpp) with CUDA GPU acceleration and grammar-constrained JSON output.
Used when running via docker compose up:
| Flag | Value | Description |
|---|---|---|
--model |
/models/${ATLAS_MODEL_FILE} |
Path to GGUF model (inside container) |
--host |
0.0.0.0 |
Listen on all interfaces |
--port |
8080 |
Listen port |
--ctx-size |
${ATLAS_CTX_SIZE:-32768} |
Context window in tokens |
--n-gpu-layers |
99 |
Offload all layers to GPU |
--no-mmap |
— | Disable mmap for stability |
The K3s deployment (inference/entrypoint-v3.1-9b.sh) uses additional flags for production performance:
| Flag | Value | Description |
|---|---|---|
--parallel |
4 |
Parallel request slots |
--cont-batching |
— | Enable continuous batching |
--flash-attn |
on |
Enable flash attention |
--mlock |
— | Lock model in RAM (prevents swapping) |
-b |
4096 |
Batch size |
-ub |
4096 |
Micro-batch size |
-ctk |
q8_0 |
KV cache key quantization |
-ctv |
q4_0 |
KV cache value quantization |
--embeddings |
— | Enable self-embedding endpoint |
--no-cache-prompt |
— | Disable prompt caching |
--ctx-checkpoints |
0 |
Disable context checkpoints |
--jinja |
— | Enable Jinja template support |
--port |
8000 |
K3s uses port 8000 (not 8080) |
Note: Docker Compose uses a simpler configuration optimized for single-user local use. K3s uses the full production configuration with flash attention, KV cache quantization, and multi-slot parallelism.
The standalone Python REPL (pip install -e . && atlas) reads these variables:
| Variable | Default | Description |
|---|---|---|
ATLAS_INFERENCE_URL |
http://localhost:8080 |
llama-server endpoint |
ATLAS_RAG_URL |
http://localhost:8099 |
Geometric Lens endpoint |
ATLAS_SANDBOX_URL |
http://localhost:30820 |
Sandbox endpoint |
ATLAS_MODEL_NAME |
Qwen3.5-9B-Q6_K |
Model name for API calls |
| Parameter | Value | Description |
|---|---|---|
| max_tokens | 8,192 | Max output tokens |
| temperature | 0.6 | Generation temperature |
| top_k | 20 | Top-K sampling |
| top_p | 0.95 | Nucleus sampling |
| stop | ["<|im_end|>"] |
Stop sequence |
Two files in the project root control how Aider interacts with the ATLAS proxy:
| Field | Value | Purpose |
|---|---|---|
name |
openai/atlas |
Model identifier Aider uses to find settings |
edit_format |
whole |
Aider sends full file content (not diffs) |
weak_model_name |
openai/atlas |
Same model for commit messages and summaries |
use_repo_map |
true |
Include repository file tree in context |
send_undo_reply |
true |
Notify model when user undoes a change |
examples_as_sys_msg |
true |
Put few-shot examples in system prompt |
max_tokens |
32768 |
Must match llama-server context window |
temperature |
0.3 |
Low temperature for deterministic tool calls |
streaming |
true |
Enable SSE streaming for real-time output |
| Field | Value | Purpose |
|---|---|---|
max_tokens |
32768 |
Max output tokens |
max_input_tokens |
32768 |
Max input context |
max_output_tokens |
32768 |
Max generation length |
input_cost_per_token |
0 |
Free (local inference) |
output_cost_per_token |
0 |
Free (local inference) |
litellm_provider |
openai |
OpenAI-compatible API protocol |
mode |
chat |
Chat completion mode |
For K3s deployment only. Copy atlas.conf.example to atlas.conf and edit:
# Model
ATLAS_MODELS_DIR="$HOME/models"
ATLAS_MAIN_MODEL="Qwen3.5-9B-Q6_K.gguf"
# Inference
ATLAS_CONTEXT_LENGTH=40960 # Per-slot context (× PARALLEL_SLOTS)
ATLAS_GPU_LAYERS=99
ATLAS_PARALLEL_SLOTS=4
ATLAS_FLASH_ATTENTION=true
# Service NodePorts
ATLAS_LLAMA_NODEPORT=32735
ATLAS_LENS_NODEPORT=31144
ATLAS_SANDBOX_NODEPORT=30820
# Geometric Lens
ATLAS_ENABLE_LENS=true
ATLAS_LENS_CONTEXT_BUDGET=6000
# K3s
ATLAS_NAMESPACE=atlas
See atlas.conf.example for the full documented template with all available options. K3s manifests are generated from these values by scripts/generate-manifests.sh.
Note:
atlas.confis only used by K3s deployment scripts. Docker Compose uses.envinstead. The two files configure different deployment targets and should not be mixed.