Skip to content

Latest commit

 

History

History
574 lines (513 loc) · 43.1 KB

File metadata and controls

574 lines (513 loc) · 43.1 KB

ATLAS Repository Map

Every file in the repository. Click any directory in the tree to jump to its description table.


File Tree


Description Tables

Root — Configuration

File Description
.env.example Docker Compose env template: model path, ports (8080/8099/8070/30820/8090), context size
atlas.conf.example K3s deployment config: model, GPU layers, parallel slots, NodePorts, namespace
docker-compose.yml 5-service stack: llama-server, geometric-lens, v3-service, sandbox, atlas-proxy
pyproject.toml Python package: atlas CLI entry point (atlas.cli.repl:run), requires Python >= 3.9
.gitignore Ignores: model weights, pycache, logs, .env, build artifacts

Root — Documentation

File Description
README.md Project overview, 74.6% LCB benchmark, setup instructions, hardware requirements
CHANGELOG.md Release history: V3.0.1 (2026-04-05), V3.0, V2.5, V2
LICENSE GNU Affero General Public License v3.0 (AGPL-3.0)
CODE_OF_CONDUCT.md Contributor Covenant Code of Conduct
CONTRIBUTING.md How to contribute: fork, branch, test, PR workflow

proxy/ — Agent Loop (Go)

The core of the V3.0.1 CLI. Hosts /v1/agent (the structured agent endpoint the TUI drives), runs a grammar-constrained agent loop with 8 tools, and routes complex files through the V3 pipeline. /v1/chat/completions is a transparent passthrough to llama-server for OpenAI-compat clients.

File Lines Description
main.go ~1600 HTTP server, route registration, verify-repair pipeline scaffolding, format normalization, helpers shared with the agent loop
agent.go 740 Agent loop iteration, JSON schema generation, system prompt building, LLM calls with grammar constraint, exploration budget, truncation recovery, /v1/agent + /cancel handlers
tools.go 905 8 tool definitions (read/write/edit/delete file, run command, search, list dir, plan tasks), per-file tier classifier, V3 routing
grammar.go 192 JSON schema (oneOf: tool_call/text/done) and GBNF grammar for constrained output, tool documentation generation
types.go 390 AgentContext, ToolDef, ToolResult, tier definitions (T0-T3), max turns per tier, permission types
v3_bridge.go 120 HTTP bridge to Python V3 service with SSE progress streaming, Lens scoring bridge
v3_adapter.go 177 Translates file write requests into V3GenerateRequest with project context, framework detection, constraint extraction
build_verify.go 157 Per-file-type verification: tsc, py_compile, go build, cargo check, gcc, bash -n. Framework-specific overrides
project.go 226 Detects language (Node/Python/Rust/Go/C/Shell), framework (Next.js/Flask/Express), build/dev/test commands
permissions.go 150 Allow/deny rules, dangerous pattern detection (rm -rf, .env, credentials), mode-based access
parallel.go 213 plan_tasks executor: topological sort, concurrent sub-task execution (15-turn budget each)
go.mod Go module definition
Dockerfile Multi-stage Go build for containerized deployment

tui/ — Bubbletea TUI Client (Go)

Native terminal UI that consumes both atlas-proxy SSE streams (/events for typed envelopes, /v1/agent for chat). The canonical chat front-end. PC-062.

File Description
main.go Entry point. Parses --proxy, spawns SSE consumer goroutine, runs Bubbletea program in alt-screen mode.
model.go Bubbletea model — owns Envelope channel, chat history, textarea input. Hotkeys: Enter/Ctrl+L/Ctrl+T/Ctrl+R/Ctrl+C. Spinner tick.
panes.go Pure pane renderers: pipeline (stage table), chat (markdown via glamour), events (log), stats (counter strip), input (textarea wrapper).
state.go Pipeline state machine — pure function from Envelope sequence to derived UI state (stages, counters, active turn, done).
consumer.go /events SSE consumer mirroring atlas-proxy's Envelope struct. Reconnect with exponential backoff.
chat.go /v1/agent POST + chat-protocol SSE parser. /cancel POST for explicit turn abort. Optional bearer auth from secrets/api-keys.json.
commands.go Slash-command dispatch: /add /drop /context /diff /commit /undo /run /help /quit. Shell-out commands via exec.CommandContext with 60s deadline.
*_test.go 39 tests covering state machine (12), slash commands (8), model integration (11), chat client + bearer loader (8).
go.mod Go module definition (github.com/itigges22/atlas-tui). Deps: bubbletea, lipgloss, bubbles, glamour.

atlas/ — Python CLI

Standalone REPL for direct interaction with ATLAS services. Used for pipe-mode (echo ... | atlas) and as a fallback when the TUI can't run.

File Description
cli/repl.py Main entry point (atlas command). Interactive REPL with /solve, /bench, /status, /help. Pipe mode support.
cli/client.py HTTP client for llama-server, Geometric Lens, sandbox. Health checks, generation (batch + streaming), scoring, sandbox execution.
cli/display.py Terminal formatting: banner, colors, status blocks, prompts, separators
cli/commands/solve.py /solve: generate code from LLM, extract from think blocks, score via Lens, test via sandbox
cli/commands/bench.py /bench: delegates to benchmark.v3_runner with dataset/strategy/task-count args
cli/commands/status.py /status: check health of llama-server, Lens, sandbox

benchmark/ — Benchmark Infrastructure

Runner infrastructure for evaluating LLM code generation across multiple datasets.

File Description
runner.py Core execution: function mode + stdio mode, LLM API calls, ChatML formatting, code extraction
v2_runner.py V2 benchmark runner: phases 0-6, telemetry, Mode A/B, crash recovery
v3_runner.py V3 benchmark runner: full pipeline with ablation conditions A-F
v2_report.py Markdown report generator from benchmark results
cli.py CLI entry point: atlas benchmark --humaneval --dry-run etc.
config.py BenchmarkConfig loaded from atlas.conf
models.py Data models: BenchmarkTask, AttemptResult, TaskResult, BenchmarkRun
best_of_k.py Best-of-K candidate evaluation with scoring
geo_learning.py Geometric learning integration for benchmarks

benchmark/datasets/ — Dataset Loaders

Each loader downloads from HuggingFace (JSON rows API, no pyarrow) and normalizes to BenchmarkTask format.

File Tasks Eval Mode Description
base.py Abstract BaseDataset class with download, parse, validate
humaneval.py 164 function HumanEval function completion
mbpp.py 500 function MBPP with 3-shot [BEGIN]/[DONE] format
evalplus_humaneval.py 164 function HumanEval+ (EvalPlus augmented tests)
evalplus_mbpp.py 500 function MBPP+ (EvalPlus augmented tests)
livecodebench.py 599 stdio LiveCodeBench v5 from bzantium mirror
gpqa.py 198 mcq GPQA Diamond from OpenAI blob CSV
ifbench.py 300 ifbench IFBench instruction-following with loose eval
scicode.py ~80 function SciCode cross-domain scientific coding

benchmark/analysis/ — Analysis Utilities

File Description
cost_analysis.py Token cost and electricity cost analysis
hardware_info.py GPU/CPU detection and reporting
pass_at_k.py pass@k metric calculation

benchmark/custom/ — Custom Tasks

File Description
tasks.json 100 custom benchmark tasks
validate.py Validates custom task format

benchmark/v3/ — V3 Pipeline Modules

19 Python modules implementing the V3 code generation pipeline. Each module follows a Config + Event + Controller pattern.

Module Phase Description
plan_search.py 1A 3-step pipeline: extract constraints -> construct plans -> generate code. 3 plans default, max 7.
div_sampling.py 1B 12 perturbations: 4 roles + 4 instructions + 4 styles. Modular selection by candidate index.
budget_forcing.py 1C 5 tiers (nothink/light/standard/hard/extreme). Wait injection on premature thinking termination. Energy-to-tier sigmoid mapping.
blend_asc.py 2A Adaptive K from C(x) energy: 4 bands mapping energy to k=1-12 and budget tier.
reasc.py 2B Early stopping: energy < 0.10 AND bottom-10% logprob confidence > -0.5.
s_star.py 2C Tiebreaking: generate edge-case inputs where candidates differ, sandbox both, majority wins.
candidate_selection.py 4 strategies: lens (min energy), random, logprob (max mean), oracle (first pass).
failure_analysis.py 3A Categorize failures: wrong_algorithm, implementation_bug, edge_case_miss, time_limit, format_error, partial_correct.
constraint_refinement.py 3B Generate refined hypotheses from failure analysis. Cosine distance >= 0.15 prevents repetition.
pr_cot.py 3C 4 perspectives (logical_consistency, information_completeness, biases, alternative_solutions) x (analysis + repair) = 8 LLM calls.
derivation_chains.py 3D Decompose into <= 5 sub-problems, sandbox-verify each, compose final. 7+ LLM calls.
refinement_loop.py 3E Orchestrator: FailureAnalysis -> ConstraintRefiner -> CodeGen -> Test -> Learn. 2 iters, 120s budget.
metacognitive.py 3F Model failure pattern library with frequency tracking, compensation injection, effectiveness monitoring.
ace_pipeline.py 3G Evolving playbooks: Generator-Reflector-Curator pipeline with confidence decay.
self_test_gen.py util Generate test cases from problem description. Multiple parsing fallbacks. 50% majority threshold.
lens_feedback.py util Online Lens recalibration: collect pass/fail embeddings, trigger retrain at 50-sample intervals.
embedding_store.py util Binary append-only embedding storage: task_id + candidate_index + label + 4096-dim float32 vector.
ablation_analysis.py util Bootstrap significance tests, pass rate computation across ablation conditions.

geometric-lens/ — Core Service

File Description
main.py FastAPI server: 26 endpoints for scoring, indexing, routing, caching, pattern management
pipeline.py RAG orchestrator: retrieve chunks + patterns -> collect signals -> estimate difficulty -> route -> generate -> verify
config.py ServerConfig (port 8001), Redis URL, API keys, YAML config loading
storage.py ProjectMetadata CRUD for indexed projects
verify_loop.py Verify-repair loop with retry and escalation
sandbox_client.py HTTP client for sandbox code execution
sandbox_analysis.py Classify sandbox execution results
requirements.txt Dependencies: FastAPI, uvicorn, torch (CPU), pydantic, redis, tree-sitter
Dockerfile Python 3.11-slim, CPU PyTorch, port 8099

geometric-lens/geometric_lens/ — Scoring Models

File Description
cost_field.py C(x): 4096->512->128->1 MLP (SiLU + Softplus). 2.16M params. Contrastive ranking loss.
metric_tensor.py G(x): PCA(4096->128) + diagonal metric tensor + input-dependent modulation. Code exists, not deployed.
service.py Service layer: lazy model loading, evaluate_combined() (single embedding for C(x)+G(x)), verdict thresholds, hot-reload
training.py train_cost_field() (200 epochs), retrain_cost_field_bce() (production retrain with class weights, early stopping)
embedding_extractor.py Calls llama-server POST /v1/embeddings, handles pooled and per-token responses, mean pooling
ewc.py Elastic Weight Consolidation: Fisher Information Matrix, penalty term, prevents catastrophic forgetting
correction.py Natural gradient correction: -alpha * G_inv * grad_C. PCA projection/unprojection. Correctability score.
replay_buffer.py Domain-stratified reservoir sampling. 30% old / 70% new training mix. JSON persistence.

geometric-lens/indexer/ — RAG Indexing

File Description
ast_parser.py tree-sitter Python AST parsing: classes, functions, imports, top-level blocks. Fallback regex parser.
tree_builder.py Build hierarchical TreeIndex from parsed files. Supports incremental updates.
bm25_index.py Inverted index with BM25 scoring (k1=1.5, b=0.75). CamelCase/snake_case tokenization.
summarizer.py LLM-generated summaries for tree nodes.
persistence.py Save/load TreeIndex + BM25Index as JSON to disk.

geometric-lens/retriever/ — RAG Retrieval

File Description
bm25_search.py BM25 keyword search: min_score=0.1, top_k=20. Strong match detection (threshold=3.0).
tree_search.py LLM-guided tree traversal: max_depth=6, max_reasoning_calls=40. Scores children 0-10.
hybrid.py Routes between bm25_first, tree_only, and both strategies. Deduplication + score normalization.

geometric-lens/router/ — Confidence Router

File Description
route_selector.py Thompson Sampling with Beta(alpha,beta) posteriors. 4 routes: CACHE_HIT(1) -> FAST_PATH(50) -> STANDARD(300) -> HARD_PATH(1500).
difficulty_estimator.py Weighted fusion of 4 signals -> D(x). Adjusts weights when Geometric Lens is available.
signal_collector.py Collects: pattern_cache_score, retrieval_confidence, query_complexity, geometric_energy, gx_score.
feedback_recorder.py Records route outcomes to Redis for Thompson Sampling posterior updates.
fallback_chain.py Retry escalation: CACHE_HIT -> FAST_PATH -> STANDARD -> HARD_PATH -> terminal.

geometric-lens/cache/ — Pattern Cache

File Description
pattern_store.py Redis-backed storage: STM (100 max), LTM, PERSISTENT tiers. Sorted set management.
pattern_matcher.py BM25 index over pattern summaries. Normalized [0,1] similarity scores.
pattern_extractor.py LLM-driven extraction of reusable patterns from successful task solutions.
pattern_scorer.py Ebbinghaus decay: recency-weighted composite score for STM/LTM promotion.
co_occurrence.py Tracks patterns used together. Graph traversal for linked pattern retrieval.
consolidator.py Category surprise tracking for pattern novelty assessment.
seed_patterns.py Bootstrap patterns for initial cache population.

v3-service/ — V3 Pipeline HTTP Wrapper

File Description
main.py HTTP server (port 8070). Pipeline orchestrator: Phase 0 (probe) -> Phase 2 (allocate K) -> Phase 1 (generate) -> Selection -> Phase 3 (repair). LLMAdapter, EmbedAdapter, SandboxAdapter, BuildVerifier. Imports all 19 V3 modules.
Dockerfile Python 3.11, CPU PyTorch, copies benchmark/ for V3 module access. Port 8070.

sandbox/ — Isolated Code Execution

File Description
executor_server.py FastAPI server (port 8020). 8 language executors with compilation, pytest/pylint for Python, syntax checking, error classification (15 types), output truncation.
Dockerfile Python 3.11-slim + Node.js 20 + Go 1.22 + Rust stable + gcc/g++. tmpfs workspace, read-only root.

inference/ — llama-server Configuration

File Description
Dockerfile.v31 V3.1 9B model Docker build. Used by docker-compose. Builds llama.cpp from source with CUDA.
Dockerfile Base llama.cpp build with CUDA support.
Dockerfile.mtp Multi-Token Prediction experimental build.
entrypoint-v3.1-9b.sh K3s 9B production entrypoint: flash-attn, mlock, --parallel 4, KV quant (q8_0/q4_0), embeddings, 160K context.
entrypoint-v3-specdec.sh K3s 14B + spec decode entrypoint: Qwen3-14B main + Qwen3-0.6B draft, embeddings patch.
entrypoint.sh Default entrypoint: basic llama-server launch with configurable flags.
entrypoint-embed.sh Dedicated embedding server entrypoint (nomic-embed-text-v1.5).
entrypoint-mtp.sh MTP experimental entrypoint.
patches/fix-embeddings-spec-decode.patch One-line patch: prevents embedding=true from poisoning draft model context in spec decode.
templates/Qwen3-custom.jinja Custom Qwen3 Jinja2 chat template.
templates/Qwen3-no-think.jinja Qwen3 template that suppresses <think> blocks.

scripts/ — Automation

File Description
install.sh Full K3s installation: prerequisites, GPU Operator, namespace, image build, manifest deployment
uninstall.sh K3s teardown and cleanup
build-containers.sh Build all container images and import to K3s
deploy-9b.sh Deploy Qwen3.5-9B to K3s cluster
generate-manifests.sh Generate K3s manifests from atlas.conf via envsubst
download-models.sh Download model weights from HuggingFace
verify-install.sh Post-install health verification
smoke-test-9b.sh Quick smoke test for 9B deployment
run_full_benchmarks.sh Run all benchmark suites sequentially
run_v31_ablation.sh V3.1 ablation study launcher with conditions A-F
validate_benchmarks.py Validate benchmark results for completeness
derive_ablation.py Derive ablation conditions from raw benchmark runs
retrain_cx.py Retrain C(x) cost field from collected embeddings
retrain_cx_phase0.py Phase 0 C(x) initial training (597 embeddings)
retrain_lens_from_results.py Retrain Lens models from benchmark result embeddings
collect_lens_training_data.py Collect pass/fail embeddings from benchmark runs
prepare_lens_training.py Prepare and validate training data format
lib/config.sh Shared bash config: loads atlas.conf, validates paths, sets defaults

tests/ — Test Suite

File Description
validate_tests.py Test runner entry point
conftest.py Pytest shared fixtures
infrastructure/
test_llm.py llama-server health and generation tests
test_sandbox.py Sandbox execution tests
integration/
test_e2e_flow.py End-to-end pipeline flow test
test_e2e_training.py End-to-end Lens training test
v3/ — 22 unit tests, one per V3 module
test_plan_search.py test_div_sampling.py test_budget_forcing.py test_blend_asc.py test_reasc.py test_s_star.py test_candidate_selection.py test_failure_analysis.py test_constraint_refinement.py test_pr_cot.py test_derivation_chains.py test_refinement_loop.py test_metacognitive.py test_ace_pipeline.py test_self_test_gen.py test_lens_feedback.py test_embedding_store.py test_ablation_analysis.py test_ewc.py test_replay_buffer.py test_enhanced_retrain.py test_phase4_validation.py test_sandbox_adapter.py

docs/ — Documentation

File Description
ARCHITECTURE.md Two-layer architecture with 13 Mermaid diagrams, component breakdowns, sequence diagrams
API.md HTTP API reference: all endpoints for all 5 services, request/response formats
CLI.md CLI usage, streaming output format, workflow examples, troubleshooting
CONFIGURATION.md Every environment variable across all services, internal constants, K3s config
MAP.md This file — repository file map
SETUP.md Installation: Docker Compose, bare-metal, K3s
TROUBLESHOOTING.md Common issues and solutions

docs/reports/ — Studies, Status, Migration

File Description
V3_ABLATION_STUDY.md V3 ablation methodology: conditions A-D, 599 tasks, statistical analysis
V2_5_ABLATION_STUDY.md Historical: V2.5 Geometric Lens ablation study
V2_TO_V2_5_MIGRATION.md Historical: V2 to V2.5 migration guide
V3_STATUS.md Historical: V3 implementation tracking
V3_1_STATUS.md V3.1 implementation status and roadmap

docs/reports/ablation/ — Published Evidence

Per-task pass/fail data for all V3 ablation conditions. 2,396 task results across 4 conditions. See README for data format.

Condition Directory Pass@1 Tasks
A (baseline) condition_a_baseline/ 54.9% 599
B (+Phase 1) condition_b_phase1/ 67.3% 599
C (+Phase 1+2) condition_c_phase1_2/ 67.3% 599
D (+Phase 1+3) condition_d_phase1_3/ 74.6% 599

Each condition contains summary.json, v3_lcb/results.json, and 599 per-task JSON files in v3_lcb/per_task/.