forge-code

A local-model coding assistant built on forge. Define tools, hand them to forge's WorkflowRunner, and get a working coding assistant — forge handles the agentic loop, guardrails, context management, and backend communication.

Sub-agent mode: Split work across two llama-server slots. A main agent coordinates, while research and builder specialists run on a dedicated slot with fresh context per task. The main agent's context stays clean — it only sees summaries, never raw file contents.

Three interfaces:

CLI — readline REPL for quick use
TUI — Textual-based terminal UI with inline diffs, permission modals, live context tracking, and plan mode
VS Code extension — webview chat panel with markdown rendering, side-by-side diffs, and Claude Code-style gutter

Supports Ollama, llama-server (llama.cpp), Llamafile, and Anthropic as backends.

Requirements

Python 3.12+
forge installed
A running LLM backend (see below)

Install

git clone https://github.com/antoinezambelli/forge-code.git
cd forge-code
pip install -e ../forge          # install forge from local checkout
pip install -e .                 # install forge-code

Backend setup (pick one)

llama-server (recommended):

# Install from https://github.com/ggml-org/llama.cpp/releases
llama-server -m path/to/model.gguf --jinja -ngl 999 --port 8080

Ollama (easiest):

ollama pull ministral-3:8b-instruct-2512-q4_K_M

See forge's Backend Setup for full instructions.

Usage

Three interfaces — same tools, same engine:

CLI (readline REPL)

python -m forge_code

Minimal interface. No mid-turn cancellation — use the TUI or VS Code extension for Esc-to-cancel support.

TUI (Textual terminal UI)

python -m forge_code --tui

VS Code Extension

Setup (one time):

cd vscode
npm install
npm run compile

Testing (Extension Development Host):

Open the vscode/ directory in VS Code: code vscode/
Press F5 — this launches a new VS Code window (the "Extension Development Host")
In the new window, open the project you want to work in (e.g. forge-code/ itself)
Open the command palette: Ctrl+Shift+P
Run "forge-code: Open Chat"
The chat panel opens in the sidebar — type a message to start

The original VS Code window shows debug output. Close the Extension Development Host window to stop.

To install permanently (no F5 needed):

cd vscode
npx @vscode/vsce package
code --install-extension forge-code-0.2.0.vsix

Configure in VS Code settings (or use ~/.forge-code.json for defaults):

forge-code.pythonPath — Python interpreter with forge-code installed (default: python)
forge-code.backend — llamaserver | ollama | llamafile (default: llamaserver)
forge-code.model — Model name (required for ollama)
forge-code.port — Backend server port (default: 8080)
forge-code.gguf — Path to GGUF file for managed mode
forge-code.cacheType — KV cache quantization (e.g. q8_0)
forge-code.subAgents — Enable sub-agent mode (default: false)

Configuration

Create ~/.forge-code.json to set defaults across all frontends:

{
    "gguf": "/path/to/model.gguf",
    "cacheType": "q8_0",
    "subAgents": true
}

Resolution order: CLI flags > VS Code settings > config file > defaults.

CLI Options

--backend      llamaserver | ollama | llamafile  (default: llamaserver)
--model        Model name (required for ollama)
--port         Backend server port (default: 8080)
--gguf         Path to GGUF file — managed mode (starts llama-server automatically)
--cache-type   KV cache quantization (e.g. q8_0)
--sub-agents   Enable sub-agent mode (research + build on slot 1)
--tui          Launch the Textual TUI
--server       Run as JSON-RPC server (used by VS Code extension)

Tools

Single-agent mode (default):

Tool	Description	Permission
`bash`	Execute shell commands	write
`view`	Read files with line numbers	read
`edit`	String replacement in files	write
`write`	Create or overwrite files	write
`glob`	Find files by pattern	read
`grep`	Search file contents by regex	read
`respond`	Model's speech channel to the user	—

Sub-agent mode (--sub-agents):

The main agent coordinates via dispatch tools. Specialists run on slot 1 with fresh context:

Tool	Description	Runs on
`research`	Investigate codebase — returns summary with file paths and recommendations	slot 1 (read-only: view, glob, grep)
`build`	Execute coding task — make edits, run tests, report results	slot 1 (full: view, glob, grep, edit, write, bash)
`respond`	Model's speech channel to the user	main (slot 0)

Write tools prompt for permission before executing. Grant always to skip future prompts for that tool within the session. Denying permission cancels the specialist and returns to user input.

TUI Features

Tool call indicators — [*] tool — summary for each tool call
Inline diffs — red/green tinted backgrounds for edit operations
Permission modal — overlay with diff/command preview and Allow/Always/Deny buttons
Status bar — live context usage (tokens + %), retry/rescue counters
Cancel — press Esc to cancel the active turn. The runner finishes the current tool call and stops cleanly. Ctrl+C exits the app.
Plan mode — /plan toggles read-only analysis mode (removes write tools, deepens analysis prompt)

Plan Mode

Type /plan in the TUI input to toggle plan mode. In plan mode:

Write tools (bash, edit, write) are removed from the workflow
The system prompt instructs the model to analyze and recommend rather than modify
The model explores the codebase with read tools and responds with findings and specific edit recommendations

Type /plan again to return to normal mode.

Project Structure

src/forge_code/
  __init__.py
  __main__.py          # Entry point: python -m forge_code
  engine.py            # Shared backend — clients, runners, session, slot manager
  cli.py               # readline REPL (thin frontend)
  tui.py               # Textual TUI (thin frontend)
  server.py            # JSON-RPC server for VS Code (thin frontend)
  display.py           # Shared tool call summarizer (CLI + TUI)
  session.py           # Message list, workflow builder, plan mode
  permissions.py       # CLI permission gate (allow/deny/always)
  prompts/
    system.py          # System prompts: main agent, research, builder, plan mode
  tools/
    __init__.py        # Tool registry: build_tools(), build_main_agent_tools()
    specialists.py     # Research + build dispatch tools, CancelOnDeny
    context.py         # ToolContext, PermissionGate protocol
    bash.py            # Shell execution via subprocess
    view.py            # File reading with line numbers
    edit.py            # String replacement editing
    write.py           # File creation/overwrite
    glob.py            # File pattern matching
    grep.py            # Regex content search
tests/
  76 unit tests across all tools, permissions, and session
  eval/
    eval_runner.py       # Single-scenario runner with scoring
    batch_eval.py        # Multi-model batch runner (JSONL, resume)
    report.py            # ASCII report tables
    judge_batch.py       # Opus judge batch API script
    review.py            # LLM review prompt generation
    dummy_repo/          # Data pipeline codebase (13 modules, 100 tests)
    scenarios/           # 9 scenarios × 3 prompt levels = 27 task variants

Architecture

See docs/ARCHITECTURE.md for the full design and docs/WORKFLOW.md for the per-turn loop.

The key idea: each user message is one WorkflowRunner.run() call with respond as the terminal tool. The model stays in tool-calling mode where forge's full guardrail stack applies. Multi-turn memory is a growing list[Message] passed via initial_messages. Context compaction is handled transparently by forge using real token counts from the backend.

All three frontends are thin UI layers. Backend logic (clients, runners, session, sub-agent orchestration) lives in Engine.

Running Tests

pip install -e ".[dev]"
python -m pytest tests/ -v

Eval Harness

9 scenarios measuring how reliably a model completes real coding tasks — bug fixes, refactors, and feature additions against a dummy data-pipeline repo. See Eval Concept for design rationale.

# Single scenario
python -m tests.eval.eval_runner --scenario B1 --prompt natural --backend llamafile --model ministral-3-14b --verbose

# Batch eval (JSONL output, automatic resume)
python -m tests.eval.batch_eval --config all --runs 10

# Reports
python -m tests.eval.report eval_results.jsonl
python -m tests.eval.report eval_results.jsonl --by-model

Key design choices

Multi-dimensional scoring — syntax validity, test pass rate, edit distance vs reference, pipeline regression, LLM review (deferred). No binary pass/fail.
Prompt specificity axis — each scenario runs at three levels (specific, natural, vague) to measure how well the model handles ambiguity.
Context telemetry — per-step token counts, peak context, tool breakdowns. Designed to empirically measure the effective attention threshold (~12-15K for 8B models).
Partial credit — a model that writes correct logic with a dropped f-string quote scores differently from one that writes wrong logic. Existing benchmarks miss this.

Documentation

Architecture — Design decisions, component overview, what forge provides vs what forge-code builds
Workflow — Per-turn loop, the respond tool, permission flow, plan mode
Eval Concept — Eval suite design, scoring methodology, scenario descriptions

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
docs		docs
reviews		reviews
src/forge_code		src/forge_code
tests		tests
vscode		vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_results.jsonl		eval_results.jsonl
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

forge-code

Requirements

Install

Backend setup (pick one)

Usage

CLI (readline REPL)

TUI (Textual terminal UI)

VS Code Extension

Configuration

CLI Options

Tools

TUI Features

Plan Mode

Project Structure

Architecture

Running Tests

Eval Harness

Key design choices

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

forge-code

Requirements

Install

Backend setup (pick one)

Usage

CLI (readline REPL)

TUI (Textual terminal UI)

VS Code Extension

Configuration

CLI Options

Tools

TUI Features

Plan Mode

Project Structure

Architecture

Running Tests

Eval Harness

Key design choices

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages