Skip to content

antoinezambelli/forge-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

forge-code

Python 3.12+ License: MIT

A local-model coding assistant built on forge. Define tools, hand them to forge's WorkflowRunner, and get a working coding assistant — forge handles the agentic loop, guardrails, context management, and backend communication.

Sub-agent mode: Split work across two llama-server slots. A main agent coordinates, while research and builder specialists run on a dedicated slot with fresh context per task. The main agent's context stays clean — it only sees summaries, never raw file contents.

forge-code demo

Three interfaces:

  • CLI — readline REPL for quick use
  • TUI — Textual-based terminal UI with inline diffs, permission modals, live context tracking, and plan mode
  • VS Code extension — webview chat panel with markdown rendering, side-by-side diffs, and Claude Code-style gutter

Supports Ollama, llama-server (llama.cpp), Llamafile, and Anthropic as backends.

Requirements

  • Python 3.12+
  • forge installed
  • A running LLM backend (see below)

Install

git clone https://github.com/antoinezambelli/forge-code.git
cd forge-code
pip install -e ../forge          # install forge from local checkout
pip install -e .                 # install forge-code

Backend setup (pick one)

llama-server (recommended):

# Install from https://github.com/ggml-org/llama.cpp/releases
llama-server -m path/to/model.gguf --jinja -ngl 999 --port 8080

Ollama (easiest):

ollama pull ministral-3:8b-instruct-2512-q4_K_M

See forge's Backend Setup for full instructions.

Usage

Three interfaces — same tools, same engine:

CLI (readline REPL)

python -m forge_code

Minimal interface. No mid-turn cancellation — use the TUI or VS Code extension for Esc-to-cancel support.

TUI (Textual terminal UI)

python -m forge_code --tui

VS Code Extension

Setup (one time):

cd vscode
npm install
npm run compile

Testing (Extension Development Host):

  1. Open the vscode/ directory in VS Code: code vscode/
  2. Press F5 — this launches a new VS Code window (the "Extension Development Host")
  3. In the new window, open the project you want to work in (e.g. forge-code/ itself)
  4. Open the command palette: Ctrl+Shift+P
  5. Run "forge-code: Open Chat"
  6. The chat panel opens in the sidebar — type a message to start

The original VS Code window shows debug output. Close the Extension Development Host window to stop.

To install permanently (no F5 needed):

cd vscode
npx @vscode/vsce package
code --install-extension forge-code-0.2.0.vsix

Configure in VS Code settings (or use ~/.forge-code.json for defaults):

  • forge-code.pythonPath — Python interpreter with forge-code installed (default: python)
  • forge-code.backendllamaserver | ollama | llamafile (default: llamaserver)
  • forge-code.model — Model name (required for ollama)
  • forge-code.port — Backend server port (default: 8080)
  • forge-code.gguf — Path to GGUF file for managed mode
  • forge-code.cacheType — KV cache quantization (e.g. q8_0)
  • forge-code.subAgents — Enable sub-agent mode (default: false)

Configuration

Create ~/.forge-code.json to set defaults across all frontends:

{
    "gguf": "/path/to/model.gguf",
    "cacheType": "q8_0",
    "subAgents": true
}

Resolution order: CLI flags > VS Code settings > config file > defaults.

CLI Options

--backend      llamaserver | ollama | llamafile  (default: llamaserver)
--model        Model name (required for ollama)
--port         Backend server port (default: 8080)
--gguf         Path to GGUF file — managed mode (starts llama-server automatically)
--cache-type   KV cache quantization (e.g. q8_0)
--sub-agents   Enable sub-agent mode (research + build on slot 1)
--tui          Launch the Textual TUI
--server       Run as JSON-RPC server (used by VS Code extension)

Tools

Single-agent mode (default):

Tool Description Permission
bash Execute shell commands write
view Read files with line numbers read
edit String replacement in files write
write Create or overwrite files write
glob Find files by pattern read
grep Search file contents by regex read
respond Model's speech channel to the user

Sub-agent mode (--sub-agents):

The main agent coordinates via dispatch tools. Specialists run on slot 1 with fresh context:

Tool Description Runs on
research Investigate codebase — returns summary with file paths and recommendations slot 1 (read-only: view, glob, grep)
build Execute coding task — make edits, run tests, report results slot 1 (full: view, glob, grep, edit, write, bash)
respond Model's speech channel to the user main (slot 0)

Write tools prompt for permission before executing. Grant always to skip future prompts for that tool within the session. Denying permission cancels the specialist and returns to user input.

TUI Features

  • Tool call indicators[*] tool — summary for each tool call
  • Inline diffs — red/green tinted backgrounds for edit operations
  • Permission modal — overlay with diff/command preview and Allow/Always/Deny buttons
  • Status bar — live context usage (tokens + %), retry/rescue counters
  • Cancel — press Esc to cancel the active turn. The runner finishes the current tool call and stops cleanly. Ctrl+C exits the app.
  • Plan mode/plan toggles read-only analysis mode (removes write tools, deepens analysis prompt)

Plan Mode

Type /plan in the TUI input to toggle plan mode. In plan mode:

  • Write tools (bash, edit, write) are removed from the workflow
  • The system prompt instructs the model to analyze and recommend rather than modify
  • The model explores the codebase with read tools and responds with findings and specific edit recommendations

Type /plan again to return to normal mode.

Project Structure

src/forge_code/
  __init__.py
  __main__.py          # Entry point: python -m forge_code
  engine.py            # Shared backend — clients, runners, session, slot manager
  cli.py               # readline REPL (thin frontend)
  tui.py               # Textual TUI (thin frontend)
  server.py            # JSON-RPC server for VS Code (thin frontend)
  display.py           # Shared tool call summarizer (CLI + TUI)
  session.py           # Message list, workflow builder, plan mode
  permissions.py       # CLI permission gate (allow/deny/always)
  prompts/
    system.py          # System prompts: main agent, research, builder, plan mode
  tools/
    __init__.py        # Tool registry: build_tools(), build_main_agent_tools()
    specialists.py     # Research + build dispatch tools, CancelOnDeny
    context.py         # ToolContext, PermissionGate protocol
    bash.py            # Shell execution via subprocess
    view.py            # File reading with line numbers
    edit.py            # String replacement editing
    write.py           # File creation/overwrite
    glob.py            # File pattern matching
    grep.py            # Regex content search
tests/
  76 unit tests across all tools, permissions, and session
  eval/
    eval_runner.py       # Single-scenario runner with scoring
    batch_eval.py        # Multi-model batch runner (JSONL, resume)
    report.py            # ASCII report tables
    judge_batch.py       # Opus judge batch API script
    review.py            # LLM review prompt generation
    dummy_repo/          # Data pipeline codebase (13 modules, 100 tests)
    scenarios/           # 9 scenarios × 3 prompt levels = 27 task variants

Architecture

See docs/ARCHITECTURE.md for the full design and docs/WORKFLOW.md for the per-turn loop.

The key idea: each user message is one WorkflowRunner.run() call with respond as the terminal tool. The model stays in tool-calling mode where forge's full guardrail stack applies. Multi-turn memory is a growing list[Message] passed via initial_messages. Context compaction is handled transparently by forge using real token counts from the backend.

All three frontends are thin UI layers. Backend logic (clients, runners, session, sub-agent orchestration) lives in Engine.

Running Tests

pip install -e ".[dev]"
python -m pytest tests/ -v

Eval Harness

9 scenarios measuring how reliably a model completes real coding tasks — bug fixes, refactors, and feature additions against a dummy data-pipeline repo. See Eval Concept for design rationale.

# Single scenario
python -m tests.eval.eval_runner --scenario B1 --prompt natural --backend llamafile --model ministral-3-14b --verbose

# Batch eval (JSONL output, automatic resume)
python -m tests.eval.batch_eval --config all --runs 10

# Reports
python -m tests.eval.report eval_results.jsonl
python -m tests.eval.report eval_results.jsonl --by-model

Key design choices

  • Multi-dimensional scoring — syntax validity, test pass rate, edit distance vs reference, pipeline regression, LLM review (deferred). No binary pass/fail.
  • Prompt specificity axis — each scenario runs at three levels (specific, natural, vague) to measure how well the model handles ambiguity.
  • Context telemetry — per-step token counts, peak context, tool breakdowns. Designed to empirically measure the effective attention threshold (~12-15K for 8B models).
  • Partial credit — a model that writes correct logic with a dropped f-string quote scores differently from one that writes wrong logic. Existing benchmarks miss this.

Documentation

  • Architecture — Design decisions, component overview, what forge provides vs what forge-code builds
  • Workflow — Per-turn loop, the respond tool, permission flow, plan mode
  • Eval Concept — Eval suite design, scoring methodology, scenario descriptions

License

MIT — Copyright (c) 2025-2026 Antoine Zambelli

About

Offline code assist built on forge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors