Every file in the repository. Click any directory in the tree to jump to its description table.
.env.example — Docker Compose environment template
.gitignore — Git ignore rules
atlas.conf.example — K3s deployment configuration template
docker-compose.yml — 5-service Docker Compose stack
pyproject.toml — Python package definition (atlas CLI entry point)
LICENSE — GNU Affero General Public License v3.0 (AGPL-3.0)
README.md — Project overview, benchmarks, setup
CHANGELOG.md — Release history
CODE_OF_CONDUCT.md — Community guidelines
CONTRIBUTING.md — Contributor guide
proxy/ — Go proxy: agent loop, grammar, tool calls
main.go — HTTP server, chat handler, verify-repair, tier classification
agent.go — Agent loop, LLM dispatch, exploration budget, error recovery
tools.go — 8 tool definitions + executors, tier classifier
grammar.go — JSON schema + GBNF grammar generation
types.go — Shared types: ToolCall, AgentContext, tiers
v3_bridge.go — Go-to-Python V3 service SSE bridge
v3_adapter.go — File requests to V3 pipeline format
build_verify.go — Per-language build verification commands
project.go — Language/framework detection
permissions.go — Permission rules and deny patterns
parallel.go — plan_tasks executor with dependency graph
go.mod — Go module definition
Dockerfile — Multi-stage Go build
README.md — Proxy documentation
atlas-proxy-v2 — Compiled Go binary (gitignored in production)
tui/ — Bubbletea TUI client (Go) — PC-062
main.go — Entry point + Bubbletea program setup
model.go — Bubbletea model: events, chat, textarea, hotkeys
panes.go — Pure pane renderers (pipeline / chat / events / stats / input)
state.go — Pipeline state machine (Envelope → derived UI state)
consumer.go — /events SSE consumer (typed Envelope stream)
chat.go — /v1/agent POST + SSE chat client; /cancel POST
commands.go — Slash command dispatch (/add, /diff, /commit, /run, etc.)
*_test.go — 39 unit + integration tests
go.mod — Go module (github.com/itigges22/atlas-tui)
atlas/ — Python CLI package
benchmark/ — Benchmark runner and datasets
geometric-lens/ — Scoring, RAG, routing, pattern cache
v3-service/ — V3 pipeline HTTP wrapper
main.py — HTTP server, pipeline orchestrator, LLM/Lens/Sandbox adapters
Dockerfile — Container build (CPU PyTorch, port 8070)
sandbox/ — Isolated code execution
executor_server.py — FastAPI server, 8 language executors, linting, error classification
Dockerfile — Container build (Python, Node, Go, Rust, gcc)
inference/ — llama-server configuration
scripts/ — Build, deploy, and training automation
tests/ — Test suite
validate_tests.py — Test runner entry point
conftest.py — Pytest shared fixtures
infrastructure/
integration/
v3/ — V3 module unit tests (22 files)
test_plan_search.py , test_div_sampling.py , test_budget_forcing.py , test_blend_asc.py , test_reasc.py , test_s_star.py , test_candidate_selection.py , test_failure_analysis.py , test_constraint_refinement.py , test_pr_cot.py , test_derivation_chains.py , test_refinement_loop.py , test_metacognitive.py , test_ace_pipeline.py , test_self_test_gen.py , test_lens_feedback.py , test_embedding_store.py , test_ablation_analysis.py , test_ewc.py , test_replay_buffer.py , test_enhanced_retrain.py , test_phase4_validation.py , test_sandbox_adapter.py
docs/ — Documentation
ARCHITECTURE.md — Two-layer architecture, component diagrams, data flow
API.md — HTTP API reference for all 5 services
CLI.md — CLI usage, streaming output, troubleshooting
CONFIGURATION.md — All environment variables and settings
MAP.md — This file
SETUP.md — Installation guide (Docker, bare-metal, K3s)
TROUBLESHOOTING.md — Common issues and solutions
reports/ — Ablation studies, status tracking, migration guides
images/banner.png — README banner image
reports/ablation/ — Published ablation data
README.md — Data format documentation
config.json — Ablation run configuration
preflight.json — Pre-run system checks
condition_a_baseline/ — Baseline (54.9%, 599 tasks)
condition_b_phase1/ — +Phase 1 (67.3%, 599 tasks)
condition_c_phase1_2/ — +Phase 1+2 (67.3%, 599 tasks)
condition_d_phase1_3/ — +Phase 1+3 (74.6%, 599 tasks)
Each condition contains summary.json, v3_lcb/results.json, and v3_lcb/per_task/ (599 per-task JSON files)
File
Description
.env.example
Docker Compose env template: model path, ports (8080/8099/8070/30820/8090), context size
atlas.conf.example
K3s deployment config: model, GPU layers, parallel slots, NodePorts, namespace
docker-compose.yml
5-service stack: llama-server, geometric-lens, v3-service, sandbox, atlas-proxy
pyproject.toml
Python package: atlas CLI entry point (atlas.cli.repl:run), requires Python >= 3.9
.gitignore
Ignores: model weights, pycache , logs, .env, build artifacts
File
Description
README.md
Project overview, 74.6% LCB benchmark, setup instructions, hardware requirements
CHANGELOG.md
Release history: V3.0.1 (2026-04-05), V3.0, V2.5, V2
LICENSE
GNU Affero General Public License v3.0 (AGPL-3.0)
CODE_OF_CONDUCT.md
Contributor Covenant Code of Conduct
CONTRIBUTING.md
How to contribute: fork, branch, test, PR workflow
The core of the V3.0.1 CLI. Hosts /v1/agent (the structured agent endpoint the TUI drives), runs a grammar-constrained agent loop with 8 tools, and routes complex files through the V3 pipeline. /v1/chat/completions is a transparent passthrough to llama-server for OpenAI-compat clients.
File
Lines
Description
main.go
~1600
HTTP server, route registration, verify-repair pipeline scaffolding, format normalization, helpers shared with the agent loop
agent.go
740
Agent loop iteration, JSON schema generation, system prompt building, LLM calls with grammar constraint, exploration budget, truncation recovery, /v1/agent + /cancel handlers
tools.go
905
8 tool definitions (read/write/edit/delete file, run command, search, list dir, plan tasks), per-file tier classifier, V3 routing
grammar.go
192
JSON schema (oneOf: tool_call/text/done) and GBNF grammar for constrained output, tool documentation generation
types.go
390
AgentContext, ToolDef, ToolResult, tier definitions (T0-T3), max turns per tier, permission types
v3_bridge.go
120
HTTP bridge to Python V3 service with SSE progress streaming, Lens scoring bridge
v3_adapter.go
177
Translates file write requests into V3GenerateRequest with project context, framework detection, constraint extraction
build_verify.go
157
Per-file-type verification: tsc, py_compile, go build, cargo check, gcc, bash -n. Framework-specific overrides
project.go
226
Detects language (Node/Python/Rust/Go/C/Shell), framework (Next.js/Flask/Express), build/dev/test commands
permissions.go
150
Allow/deny rules, dangerous pattern detection (rm -rf, .env, credentials), mode-based access
parallel.go
213
plan_tasks executor: topological sort, concurrent sub-task execution (15-turn budget each)
go.mod
—
Go module definition
Dockerfile
—
Multi-stage Go build for containerized deployment
tui/ — Bubbletea TUI Client (Go)
Native terminal UI that consumes both atlas-proxy SSE streams (/events for typed envelopes, /v1/agent for chat). The canonical chat front-end. PC-062.
File
Description
main.go
Entry point. Parses --proxy, spawns SSE consumer goroutine, runs Bubbletea program in alt-screen mode.
model.go
Bubbletea model — owns Envelope channel, chat history, textarea input. Hotkeys: Enter/Ctrl+L/Ctrl+T/Ctrl+R/Ctrl+C. Spinner tick.
panes.go
Pure pane renderers: pipeline (stage table), chat (markdown via glamour), events (log), stats (counter strip), input (textarea wrapper).
state.go
Pipeline state machine — pure function from Envelope sequence to derived UI state (stages, counters, active turn, done).
consumer.go
/events SSE consumer mirroring atlas-proxy's Envelope struct. Reconnect with exponential backoff.
chat.go
/v1/agent POST + chat-protocol SSE parser. /cancel POST for explicit turn abort. Optional bearer auth from secrets/api-keys.json.
commands.go
Slash-command dispatch: /add /drop /context /diff /commit /undo /run /help /quit. Shell-out commands via exec.CommandContext with 60s deadline.
*_test.go
39 tests covering state machine (12), slash commands (8), model integration (11), chat client + bearer loader (8).
go.mod
Go module definition (github.com/itigges22/atlas-tui). Deps: bubbletea, lipgloss, bubbles, glamour.
Standalone REPL for direct interaction with ATLAS services. Used for pipe-mode (echo ... | atlas) and as a fallback when the TUI can't run.
File
Description
cli/repl.py
Main entry point (atlas command). Interactive REPL with /solve, /bench, /status, /help. Pipe mode support.
cli/client.py
HTTP client for llama-server, Geometric Lens, sandbox. Health checks, generation (batch + streaming), scoring, sandbox execution.
cli/display.py
Terminal formatting: banner, colors, status blocks, prompts, separators
cli/commands/solve.py
/solve: generate code from LLM, extract from think blocks, score via Lens, test via sandbox
cli/commands/bench.py
/bench: delegates to benchmark.v3_runner with dataset/strategy/task-count args
cli/commands/status.py
/status: check health of llama-server, Lens, sandbox
benchmark/ — Benchmark Infrastructure
Runner infrastructure for evaluating LLM code generation across multiple datasets.
File
Description
runner.py
Core execution: function mode + stdio mode, LLM API calls, ChatML formatting, code extraction
v2_runner.py
V2 benchmark runner: phases 0-6, telemetry, Mode A/B, crash recovery
v3_runner.py
V3 benchmark runner: full pipeline with ablation conditions A-F
v2_report.py
Markdown report generator from benchmark results
cli.py
CLI entry point: atlas benchmark --humaneval --dry-run etc.
config.py
BenchmarkConfig loaded from atlas.conf
models.py
Data models: BenchmarkTask, AttemptResult, TaskResult, BenchmarkRun
best_of_k.py
Best-of-K candidate evaluation with scoring
geo_learning.py
Geometric learning integration for benchmarks
benchmark/datasets/ — Dataset Loaders
Each loader downloads from HuggingFace (JSON rows API, no pyarrow) and normalizes to BenchmarkTask format.
File
Tasks
Eval Mode
Description
base.py
—
—
Abstract BaseDataset class with download, parse, validate
humaneval.py
164
function
HumanEval function completion
mbpp.py
500
function
MBPP with 3-shot [BEGIN]/[DONE] format
evalplus_humaneval.py
164
function
HumanEval+ (EvalPlus augmented tests)
evalplus_mbpp.py
500
function
MBPP+ (EvalPlus augmented tests)
livecodebench.py
599
stdio
LiveCodeBench v5 from bzantium mirror
gpqa.py
198
mcq
GPQA Diamond from OpenAI blob CSV
ifbench.py
300
ifbench
IFBench instruction-following with loose eval
scicode.py
~80
function
SciCode cross-domain scientific coding
benchmark/analysis/ — Analysis Utilities
benchmark/custom/ — Custom Tasks
benchmark/v3/ — V3 Pipeline Modules
19 Python modules implementing the V3 code generation pipeline. Each module follows a Config + Event + Controller pattern.
Module
Phase
Description
plan_search.py
1A
3-step pipeline: extract constraints -> construct plans -> generate code. 3 plans default, max 7.
div_sampling.py
1B
12 perturbations: 4 roles + 4 instructions + 4 styles. Modular selection by candidate index.
budget_forcing.py
1C
5 tiers (nothink/light/standard/hard/extreme). Wait injection on premature thinking termination. Energy-to-tier sigmoid mapping.
blend_asc.py
2A
Adaptive K from C(x) energy: 4 bands mapping energy to k=1-12 and budget tier.
reasc.py
2B
Early stopping: energy < 0.10 AND bottom-10% logprob confidence > -0.5.
s_star.py
2C
Tiebreaking: generate edge-case inputs where candidates differ, sandbox both, majority wins.
candidate_selection.py
—
4 strategies: lens (min energy), random, logprob (max mean), oracle (first pass).
failure_analysis.py
3A
Categorize failures: wrong_algorithm, implementation_bug, edge_case_miss, time_limit, format_error, partial_correct.
constraint_refinement.py
3B
Generate refined hypotheses from failure analysis. Cosine distance >= 0.15 prevents repetition.
pr_cot.py
3C
4 perspectives (logical_consistency, information_completeness, biases, alternative_solutions) x (analysis + repair) = 8 LLM calls.
derivation_chains.py
3D
Decompose into <= 5 sub-problems, sandbox-verify each, compose final. 7+ LLM calls.
refinement_loop.py
3E
Orchestrator: FailureAnalysis -> ConstraintRefiner -> CodeGen -> Test -> Learn. 2 iters, 120s budget.
metacognitive.py
3F
Model failure pattern library with frequency tracking, compensation injection, effectiveness monitoring.
ace_pipeline.py
3G
Evolving playbooks: Generator-Reflector-Curator pipeline with confidence decay.
self_test_gen.py
util
Generate test cases from problem description. Multiple parsing fallbacks. 50% majority threshold.
lens_feedback.py
util
Online Lens recalibration: collect pass/fail embeddings, trigger retrain at 50-sample intervals.
embedding_store.py
util
Binary append-only embedding storage: task_id + candidate_index + label + 4096-dim float32 vector.
ablation_analysis.py
util
Bootstrap significance tests, pass rate computation across ablation conditions.
geometric-lens/ — Core Service
File
Description
main.py
FastAPI server: 26 endpoints for scoring, indexing, routing, caching, pattern management
pipeline.py
RAG orchestrator: retrieve chunks + patterns -> collect signals -> estimate difficulty -> route -> generate -> verify
config.py
ServerConfig (port 8001), Redis URL, API keys, YAML config loading
storage.py
ProjectMetadata CRUD for indexed projects
verify_loop.py
Verify-repair loop with retry and escalation
sandbox_client.py
HTTP client for sandbox code execution
sandbox_analysis.py
Classify sandbox execution results
requirements.txt
Dependencies: FastAPI, uvicorn, torch (CPU), pydantic, redis, tree-sitter
Dockerfile
Python 3.11-slim, CPU PyTorch, port 8099
geometric-lens/geometric_lens/ — Scoring Models
File
Description
cost_field.py
C(x): 4096->512->128->1 MLP (SiLU + Softplus). 2.16M params. Contrastive ranking loss.
metric_tensor.py
G(x): PCA(4096->128) + diagonal metric tensor + input-dependent modulation. Code exists, not deployed.
service.py
Service layer: lazy model loading, evaluate_combined() (single embedding for C(x)+G(x)), verdict thresholds, hot-reload
training.py
train_cost_field() (200 epochs), retrain_cost_field_bce() (production retrain with class weights, early stopping)
embedding_extractor.py
Calls llama-server POST /v1/embeddings, handles pooled and per-token responses, mean pooling
ewc.py
Elastic Weight Consolidation: Fisher Information Matrix, penalty term, prevents catastrophic forgetting
correction.py
Natural gradient correction: -alpha * G_inv * grad_C. PCA projection/unprojection. Correctability score.
replay_buffer.py
Domain-stratified reservoir sampling. 30% old / 70% new training mix. JSON persistence.
geometric-lens/indexer/ — RAG Indexing
File
Description
ast_parser.py
tree-sitter Python AST parsing: classes, functions, imports, top-level blocks. Fallback regex parser.
tree_builder.py
Build hierarchical TreeIndex from parsed files. Supports incremental updates.
bm25_index.py
Inverted index with BM25 scoring (k1=1.5, b=0.75). CamelCase/snake_case tokenization.
summarizer.py
LLM-generated summaries for tree nodes.
persistence.py
Save/load TreeIndex + BM25Index as JSON to disk.
geometric-lens/retriever/ — RAG Retrieval
File
Description
bm25_search.py
BM25 keyword search: min_score=0.1, top_k=20. Strong match detection (threshold=3.0).
tree_search.py
LLM-guided tree traversal: max_depth=6, max_reasoning_calls=40. Scores children 0-10.
hybrid.py
Routes between bm25_first, tree_only, and both strategies. Deduplication + score normalization.
geometric-lens/router/ — Confidence Router
File
Description
route_selector.py
Thompson Sampling with Beta(alpha,beta) posteriors. 4 routes: CACHE_HIT(1) -> FAST_PATH(50) -> STANDARD(300) -> HARD_PATH(1500).
difficulty_estimator.py
Weighted fusion of 4 signals -> D(x). Adjusts weights when Geometric Lens is available.
signal_collector.py
Collects: pattern_cache_score, retrieval_confidence, query_complexity, geometric_energy, gx_score.
feedback_recorder.py
Records route outcomes to Redis for Thompson Sampling posterior updates.
fallback_chain.py
Retry escalation: CACHE_HIT -> FAST_PATH -> STANDARD -> HARD_PATH -> terminal.
geometric-lens/cache/ — Pattern Cache
File
Description
pattern_store.py
Redis-backed storage: STM (100 max), LTM, PERSISTENT tiers. Sorted set management.
pattern_matcher.py
BM25 index over pattern summaries. Normalized [0,1] similarity scores.
pattern_extractor.py
LLM-driven extraction of reusable patterns from successful task solutions.
pattern_scorer.py
Ebbinghaus decay: recency-weighted composite score for STM/LTM promotion.
co_occurrence.py
Tracks patterns used together. Graph traversal for linked pattern retrieval.
consolidator.py
Category surprise tracking for pattern novelty assessment.
seed_patterns.py
Bootstrap patterns for initial cache population.
v3-service/ — V3 Pipeline HTTP Wrapper
File
Description
main.py
HTTP server (port 8070). Pipeline orchestrator: Phase 0 (probe) -> Phase 2 (allocate K) -> Phase 1 (generate) -> Selection -> Phase 3 (repair). LLMAdapter, EmbedAdapter, SandboxAdapter, BuildVerifier. Imports all 19 V3 modules.
Dockerfile
Python 3.11, CPU PyTorch, copies benchmark/ for V3 module access. Port 8070.
sandbox/ — Isolated Code Execution
File
Description
executor_server.py
FastAPI server (port 8020). 8 language executors with compilation, pytest/pylint for Python, syntax checking, error classification (15 types), output truncation.
Dockerfile
Python 3.11-slim + Node.js 20 + Go 1.22 + Rust stable + gcc/g++. tmpfs workspace, read-only root.
inference/ — llama-server Configuration
File
Description
validate_tests.py
Test runner entry point
conftest.py
Pytest shared fixtures
infrastructure/
test_llm.py
llama-server health and generation tests
test_sandbox.py
Sandbox execution tests
integration/
test_e2e_flow.py
End-to-end pipeline flow test
test_e2e_training.py
End-to-end Lens training test
v3/ — 22 unit tests, one per V3 module
test_plan_search.py test_div_sampling.py test_budget_forcing.py test_blend_asc.py test_reasc.py test_s_star.py test_candidate_selection.py test_failure_analysis.py test_constraint_refinement.py test_pr_cot.py test_derivation_chains.py test_refinement_loop.py test_metacognitive.py test_ace_pipeline.py test_self_test_gen.py test_lens_feedback.py test_embedding_store.py test_ablation_analysis.py test_ewc.py test_replay_buffer.py test_enhanced_retrain.py test_phase4_validation.py test_sandbox_adapter.py
File
Description
ARCHITECTURE.md
Two-layer architecture with 13 Mermaid diagrams, component breakdowns, sequence diagrams
API.md
HTTP API reference: all endpoints for all 5 services, request/response formats
CLI.md
CLI usage, streaming output format, workflow examples, troubleshooting
CONFIGURATION.md
Every environment variable across all services, internal constants, K3s config
MAP.md
This file — repository file map
SETUP.md
Installation: Docker Compose, bare-metal, K3s
TROUBLESHOOTING.md
Common issues and solutions
docs/reports/ — Studies, Status, Migration
docs/reports/ablation/ — Published Evidence
Per-task pass/fail data for all V3 ablation conditions. 2,396 task results across 4 conditions. See README for data format.
Condition
Directory
Pass@1
Tasks
A (baseline)
condition_a_baseline/
54.9%
599
B (+Phase 1)
condition_b_phase1/
67.3%
599
C (+Phase 1+2)
condition_c_phase1_2/
67.3%
599
D (+Phase 1+3)
condition_d_phase1_3/
74.6%
599
Each condition contains summary.json, v3_lcb/results.json, and 599 per-task JSON files in v3_lcb/per_task/.