Skip to content

Milestones

List view

  • First run, defaults, visual hierarchy, task finales, website / community front door, contributor guidance, and release smoke feel composed, legible, and exact. The project's visual identity locks in: an orange / blue / white work-state palette against a dark navy / near-black background, used to encode state rather than decorate. Drop the legacy `deepseek` and `deepseek-tui` binary shims now that the rename has baked through the v0.8.x line. ## In scope - **First-run wizard.** A short structured wizard for new users: provider setup (DeepSeek, Hugging Face Inference Providers, OpenRouter, or a local runtime via the Serving Workset path), model choice, execution-mode default (Plan / Agent / YOLO), `--model auto` opt-in, theme, optional memory-store opt-in, optional verifier opt-in. Replaces an unguided "just start typing" first turn. - **Control panel polish.** `/config` becomes a control panel: modes, permissions, provider, model, thinking, memory, MCP, theme — every setting inspectable, adjustable, and accompanied by a "what does this do?" line. - **Visual direction (project identity).** The orange / blue / white work-state palette (against dark navy / near-black background) becomes the project's visual identity. The palette encodes state, not decoration: - **Blue** owns frame, control, selection, borders, taskbar, stable system state. - **White** owns readable content, primary text, final answers, clean surfaces. - **Orange** owns active reasoning, live progress, Fin wakeup/verifier checks, warnings, and goal momentum — appears only when Brother Whale is actively doing something. - **Dark navy / near-black** is the background that keeps the palette crisp instead of loud. The feeling is crisp, cold, useful, kinetic — not a sports-brand parody, not gradient overload. Restraint is the brief. Affordance, not nostalgia. Other themes still ship; this one becomes the default. - **Shim-drop.** Remove `[[bin]] deepseek` from `crates/cli/Cargo.toml` and `[[bin]] deepseek-tui` from `crates/tui/Cargo.toml`. Drop `npm/deepseek-tui/` package directory. Drop `deepseek-artifacts-sha256.txt` alias from `release.yml`; stop building `deepseek-*` release assets. Drop `CODEWHALE_*=DEEPSEEK_TUI_*` legacy env-var aliases (TUI-brand vars only — DeepSeek API vars stay forever). - **Governance + community docs.** `GOVERNANCE.md` (decision rights, trust boundary), `SPONSORSHIP.md` (how sponsorship works, what it does not buy — no preferential merging, no "powered by" branding without explicit maintainer approval), `SUPPORT.md` (where to ask for help). `security@codewhale.net` security inbox listed in `SECURITY.md`. Issue templates refreshed. - **Public positioning.** README + landing page frame the project precisely: **CodeWhale is an agentic terminal workbench for open-source and open-weight coding models. DeepSeek is first-class; Hugging Face Inference Providers and OpenRouter serve as the open-model discovery and routing layer; vLLM, SGLang, Ollama, and TGI cover local serving.** Historical CHANGELOG entries untouched. ## Out of scope (kept forever) - `~/.deepseek/` user config directory name. - DeepSeek API env vars (`DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `DEEPSEEK_MODEL`, `DEEPSEEK_PROVIDER`). - `api.deepseek.com`, `api.deepseeki.com`. - Model IDs (`deepseek-v4-pro`, `deepseek-v4-flash`). - `ProviderKind::Deepseek` enum + `"deepseek"` provider string. - Historical CHANGELOG entries. - CNB mirror namespace. ## Definition of done - Fresh install via `npm install -g codewhale` or `cargo install codewhale-cli` ships `codewhale` and `codewhale-tui` binaries; no `deepseek` shim present. - First-run wizard runs cleanly for a new user and produces a working setup (provider key for DeepSeek / HF / OpenRouter / local, default execution mode, theme). - `/config` opens the control panel; every entry has a "what does this do?" line. - The orange / blue / white work-state palette is the default theme; alternate themes still ship. - `GOVERNANCE.md`, `SPONSORSHIP.md`, `SUPPORT.md` live and linked from README. - README provider matrix lists DeepSeek + HF + OpenRouter + the local-runtime path with clear "what each is for" framing. - A new user sees codewhale-named welcome surfaces with zero legacy strings. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.48]` thanks contributors to the v0.8.x DeepSeek-TUI line; frames v0.8.48 as the brand-consolidation + workbench-affordance milestone. - Release notes document the binary removal explicitly (running `deepseek` produces a shell "command not found" — no in-binary message because there is no binary).

    No due date
    8/41 issues closed
  • A bounded source-linked orientation memory plus a shared state / evidence / permission contract across terminal, server, editor, bridge, and agents. Handoffs become first-class artifacts: the workbench knows where it left off, what it owes, what it changed, and how to resume. Provider abstraction lands so model selection is invisible to the rest of the system — and Hugging Face Inference Providers becomes a first-class provider alongside DeepSeek and OpenRouter, anchoring the harness's open-model story. ## In scope - **Evidence ledger.** Every receipt from v0.8.43 + every decision card + every tool inspection + every memory entry compounds into a per-session evidence ledger. Inspectable, exportable, queryable from `/evidence`. - **Handoff artifacts.** Closing a workbench session writes a handoff artifact (goal, last state, blockers, decisions, evidence). Opening one resumes the workbench — "Resume previous workbench" surfaces matching artifacts. - **Orientation cache.** Bounded, source-linked, evidence-tagged. Decays as freshness drops. First-class fact source from `codewhale.net/api/state.json` (latest release, install commands per platform, published crates, known-bad version ranges). - **Provider abstractions.** Unified `Provider` trait in `codewhale-agent` consolidating env-var precedence, secret resolution, base-URL normalization, and auth-header construction (currently scattered across `crates/config`, `crates/secrets`, `crates/tui/src/client.rs`). `ProviderKind` registry becomes configurable; model selection is provider-agnostic. - **Hugging Face as a first-class provider.** New `[providers.huggingface]` config block with `api_key` (default `HF_TOKEN`, alias `HUGGINGFACE_API_KEY`), `base_url` (default `https://router.huggingface.co/v1`), and `provider = "auto"` (or a specific Inference Provider). OpenAI-compatible route. Model picker pulls model passport metadata from the HF Hub API (license, base model, context length, chat template, tool-call support, reasoning support, gated / private status). Distinct from the **Hugging Face Workset** (#1977) which adds Hub registry / datasets / adapters / Jobs — the two share auth but ship through different surfaces. - **Cross-surface alignment.** Consistent command names, output formats, error messages across CLI (`codewhale`), TUI, runtime API (`codewhale serve --http`), bridges, and web. - **VS Code extension beta.** Scaffold, local runtime detection, chat webview. Ship as VSIX attached to GitHub Release; not Marketplace-published until beta feedback. - **Protocol contract** in `crates/protocol` carries provider-auth shape explicitly so external clients don't have to special-case. - **Per-tool migration PRs.** Start ExternalTool migrations one tool at a time (git, gh, python, node, rust, cargo) with Windows CI green per step. ## Out of scope - New providers beyond the HF Inference Providers integration (the rest stay as they are). - Cloud-hosted runtime API. - Marketplace publish of VS Code extension. - Plugin tool runtime implementation (still gated on v0.8.46 RFC). - Model Lab workset implementation (post-v0.9.0; see #1977). The HF Workset specifically depends on this milestone's provider work landing first. ## Definition of done - Switching providers mid-session is one config change with no surrounding code change. - Hugging Face Inference Providers works end-to-end against `Qwen/*`, `deepseek-ai/*`, and `meta-llama/*` model IDs without per-model special-casing in the engine. - Model picker surfaces HF model passport metadata (license, context length, gated status) before selection. - `/evidence` returns the per-session ledger. - Closing and reopening a session restores the workbench state (active task, last decision, pending blockers). - Orientation cache surfaces the latest published release within freshness window after restart. - VS Code beta VSIX attached to v0.8.47 GitHub Release; smoke-tested against local runtime API. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.47]` entry calls out HF as a first-class provider and provider abstractions as the model-neutral lever. - README provider matrix + "Bring your own open-weight model" section updated; HF Inference Providers and OpenRouter framed as the open-model discovery+routing layer.

    No due date
    4/32 issues closed
  • Make documents, images, search, execution, previews, dependency probes, and safe approvals feel native to the workbench instead of like loose subprocesses. Tools become first-class workbench objects with inspectors, properties, previews, and copy/select. Preview verifier-native completion hooks behind an opt-in flag so the v0.9.0 surface lands with field experience accumulated. ## In scope - **Tool inspectors.** Every tool call has a properties pane: inputs, outputs, duration, exit, environment, sandbox decision. Tool results render in a preview panel (image, table, diff, doc) rather than as raw text. - **Copy / select mode.** First-class text selection inside the workbench (transcript, properties, previews) without breaking terminal-line semantics. The terminal-copy-includes-wrapped-lines bug (#1853) gets fixed structurally. - **Provider / model diagnostics.** Inspectable view of provider state: active provider + model, last-call latency, last error, prompt-cache hit rate. Surfaces from doctor + `/provider` slash command + properties pane. When the active provider is Hugging Face Inference Providers (via the v0.8.47 first-class integration when that lands first, or via the OpenAI-compatible path in the interim), the pane previews **model passport** metadata pulled from the HF Hub: license, base model, context length, chat template, tool-call support, reasoning support, gated / private status. - **Tool surface polish.** PDF / image / search / exec ergonomics; dependency probe + skip-on-miss (existing pattern); approval-flow clarity. Ship the small, independent RuntimeTool Windows `.cmd` fallback as a standalone PR. - **Plugin tool design RFC.** Open `docs/proposals/plugin-tools.md` defining how `// approval: auto` frontmatter interacts with the execution modes (Plan / Agent / YOLO), the approval policy, trust, and `--model auto` model routing. Implementation deferred until design lands. - **Verifier preview (general).** Opt-in via `[verifier] enabled = true`. On claim-of-done, auto-spawn a verifier sub-agent with fresh context and read-only tools, time-boxed. Verdict-as-gate: pass / partial / fail. `/force-complete` override is audit-logged. `/tasks` shows a "verified" badge. Extends the Goal-mode-specific Fin wakeup from v0.8.43 to the general claim-of-done case. ## Out of scope - Full ExternalTool / RuntimeTool 65-call-site migration (defer; ship per-tool migration PRs in v0.8.47). - Plugin tool runtime implementation (gated on the RFC). - Verifier as default (preview only — explicit opt-in). - New execution modes — Plan / Agent / YOLO remain the three modes. - Hugging Face Workset implementation (Hub registry / datasets / adapters / Jobs — see #1977). v0.8.46 only previews HF model-passport metadata in the provider pane; the workset surface ships post-v0.9.0. ## Definition of done - Every tool surfaces an inspectable properties pane and a result preview. - Copy / select mode ships with terminal-line semantics preserved. - `/provider` slash command and provider properties pane in place; HF model passport preview works against at least three model IDs (`Qwen/*`, `deepseek-ai/*`, `meta-llama/*`). - Verifier preview enables via config; verdict appears on completed tasks; `/force-complete` writes to audit log. - Plugin tool design RFC lands as draft; one round of maintainer review captured in PR. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.46]` entry marks verifier preview as experimental and tool inspectors as the headline. - `docs/verifier-preview.md` explains opt-in + override semantics.

    No due date
    12/114 issues closed
  • Make long-running agentic work interruptible and recoverable across every supported provider — DeepSeek as reference, plus the additive hosts (Hugging Face Inference Providers, OpenRouter, Novita, Fireworks, SGLang, vLLM, Ollama, SiliconFlow) that the harness aims to be best-in-class for. Performance SLOs and the Homebrew tap rename land in the same window. Plan / Agent / YOLO remain the three execution modes; `--model auto` remains model routing, not a mode. ## In scope - **Control plane.** Pause / redirect / cancel / resume on a running turn. Synchronous tool cancellation (#1791). Crash recovery for in-flight turns. Restore points / undo / checkpoints — the workbench can roll back to a prior state. - **Provider parity sweep.** Control-plane semantics tested against every supported provider so they behave identically. DeepSeek is the reference; the additive hosts must match. Hugging Face Inference Providers gets included in this sweep as preparatory work for its first-class promotion in v0.8.47 — cancel/resume must work against `Qwen/*`, `deepseek-ai/*`, and `meta-llama/*` model IDs via the HF router. Inbox-zero pass over hostability / long-session / Windows regressions re-targeted from v0.8.41 + the v0.8.43 redistribution: #1835, #1737, #1690, #1679, #1651, #1596, #1531, #1791, #1806, #1472, #1786, #1425, #1827, #1732, #765. - **Performance SLOs.** Cold startup p50 < 500 ms, no synchronous I/O on the UI thread, tool dispatch p50 < 100 ms. `tracing` spans instrumented; `/timing` slash command. Treat root causes, not symptoms. - **Homebrew tap rename.** `Hmbown/homebrew-deepseek-tui` → `Hmbown/homebrew-codewhale`, redirect/alias kept. ## Out of scope - **New execution modes.** Plan / Agent / YOLO stay as the three execution modes; `--model auto` stays as model routing, not a mode. "Auto mode" as a fourth execution mode is explicitly ruled out — the three-mode model is the project's contract. - New providers beyond the HF Inference Providers parity inclusion. (HF as a config-block first-class provider lands in v0.8.47; v0.8.45 only tests parity through the existing OpenAI-compatible path.) - New OAuth flows. - `~/.deepseek/` config path rename. - Tool surface expansion (defer to v0.8.46). - Verifier work (defer to v0.8.46 preview; Goal-mode-specific Fin wakeup lives in v0.8.43). - Model Lab workset implementation (post-v0.9.0; see #1977). The Serving Workset (vLLM / SGLang / TGI / llama.cpp / Ollama recipes) is downstream of this milestone's provider parity work. ## Definition of done - A 30+ minute session can be paused, redirected, and resumed without state loss against DeepSeek (reference) plus all additive providers, including HF Inference Providers. - `/timing` shows cold-startup p50 < 500 ms on macOS / Linux / Windows reference machines. - Every v0.8.41-origin or v0.8.43-redistributed control-plane / hostability issue is fixed, deferred-with-cause, or closed-as-unreproducible. - `brew install hmbown/codewhale/codewhale` works against the renamed tap. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.45]` entry calls out cancel/resume by provider and perf SLOs. - README provider matrix updated; the modes section continues to document Plan / Agent / YOLO (three modes), with `--model auto` covered separately as model routing.

    No due date
    24/25 issues closed
  • Turn goals, plans, files, tests, blockers, evidence, and decisions into navigable, inspectable objects in the workbench. Everything should be openable: a task, a tool, an agent, a file, an evidence item. Ship the first capability upgrade — a typed memory store — as opt-in so users feel real progress without a default-on surface change. ## In scope - **Inspectable objects.** "Open Properties" works on every meaningful object: task, tool call, agent, file, decision, evidence item. Properties show state, history, related objects. - **Spatial task surface.** Structured task map with file / blocker / evidence / decision rows. Clickable / keyboard-navigable workspace map. Navigate task → file → diff without leaving the workbench. Ctrl+O Activity Detail pager (#1547) ships here. `!` shell-prefix in the composer (#1546) lifts terminal-level interaction quality. - **Memory typed store** (#534–#536). SQLite + FTS5 backend with graph-structured memory. Opt-in via `[memory] enabled = true`. `/memory` slash command for inspection. Pairs with existing `crates/state` session persistence — first "Underway" capability to graduate. - **Docker image rename.** `ghcr.io/hmbown/deepseek-tui` → `ghcr.io/hmbown/codewhale`. Old tag still pulls (alias) for one minor. ## Out of scope - Distributed task scheduling, network-shared task state. - `~/.deepseek/` storage path rename. - Editor integrations (defer to v0.8.47 VS Code beta). - New tool surface (defer to v0.8.46). ## Definition of done - Every task / tool / agent / file / decision / evidence object responds to "Open Properties." - A task can be opened, its files navigated, its blockers shown, its tests run, and its evidence read without scrolling back through transcript. - Memory store round-trips entries on opt-in; `/memory list` shows entries; default off; doctor surfaces opt-in. - New Docker image at `ghcr.io/hmbown/codewhale:v0.8.44` pulls and runs; old tag still pulls. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.44]` entry. - README updated for memory-store opt-in + new Docker image path.

    No due date
    14/24 issues closed
  • CodeWhale is a terminal-native workbench, not a chat transcript. v0.8.43 makes Brother Whale's state inspectable instead of mysterious (visible active work, last event, stall reason, evidence, decision cards, receipts), opens Goal mode as a persistent objective surface with Fin as a cheap wakeup/verifier, and lands the remaining registry publishes and URL flips that complete the `deepseek-tui` → `codewhale` rename. ## In scope - **Truth-surface UI primitives.** Every "Working" state names the active operation, the last event timestamp, and a stall reason when stalled > 30s. Stuck-state explanations replace silent freezes. Resizable thinking blocks and thinking-preview controls (#1972). Active tool/job properties pane (#1884, #1862, #1830, #1269, #1190, #1147, #765, etc.). Active-state surfaces start using the project's orange / blue / white work-state palette (full theme pass lands in v0.8.48). - **Tasks sidebar inspectability (#1975).** Full / recoverable turn and task IDs (no mid-string ellipsis below the prefix needed for `task_read`/`task_cancel`). Replace ambiguous `active / running` wording with a state model the user can act on. Structured two-line recent-tool rows (status, name, duration on top; description / output below). Visible detail/expand path for truncated output. `y` / `Y` yank affordances (focused ID / full receipt). Copy-paste-friendly rows. Terminal-native throughout — no mouse dependency. - **Goal mode (replaces the narrow `/goal` command over time).** Goal mode is a persistent objective surface — orthogonal to Plan / Agent / YOLO execution modes and to `--model auto` model routing. The agent can ask clarifying questions when an objective is underspecified, plan its approach as a visible artifact, and keep working until complete, blocked, or explicitly stopped. **Fin** (the fast DeepSeek V4 Flash thinking-off seam) acts as a cheap wakeup/verifier — periodic binary-perspective check on whether the goal is actually done from tests, build, changed files, docs, install state, release gates, and user acceptance criteria. `/goal` continues to work as the current-goal command; Goal mode is the recommended surface for objective-driven sessions. Fin dispatches through the existing provider plumbing today and will inherit the v0.8.47 provider abstraction (including Hugging Face Inference Providers as a first-class backend) with no surface change. - **Decision cards.** When Brother Whale needs input (including clarifying questions from Goal mode), it surfaces a structured choice card (numbered options, default highlighted, explicit accept/decline) instead of a vague prompt. Decisions are logged. - **Receipts.** Every meaningful action ends with a receipt: what changed, what was tried, what evidence supports the claim. Model-asserted "done" always cites at least one tool result or file diff. Fin wakeup checks emit receipts too. - **Doctor surfaces.** `codewhale doctor` names the invoked command path correctly (codewhale vs deepseek shim) and the brand consistently. - **Publish-out.** Publish the 14 `codewhale-*` crates to crates.io (first-time), `codewhale@0.8.43` to npm, `deepseek-tui@0.8.43` as a deprecation shim. Merge the web rebrand PR so `codewhale.net` serves canonical content and `deepseek-tui.com` 301-redirects. Docs domain flip. Follow-up PR for residual `Hmbown/DeepSeek-TUI` URLs. ## Out of scope - `~/.deepseek/` config directory rename (would break existing users; permanent anti-scope). - DeepSeek API env vars (`DEEPSEEK_API_KEY` etc.) and model IDs (refer to the DeepSeek API, not the TUI brand). - CNB mirror namespace, Homebrew tap rename, Docker image rename (later milestones). - New tool surface or capability work beyond truth-surface UI primitives, Goal mode, and publish-out. - Verifier-native completion hooks as default behavior (preview ships in v0.8.46; Fin wakeup in v0.8.43 is the Goal-mode-specific case, not the general verifier). - Renaming `/model auto` or treating it as an execution mode — it remains model routing, distinct from Plan / Agent / YOLO and from Goal mode. - Hugging Face Workset implementation (Hub API / model cards / datasets / adapters — see #1977). The HF *provider* integration lands in v0.8.47; the HF *workset* lands post-v0.9.0. ## Definition of done - Stall reason and last-event surfaced in TUI for every long-running operation. - Tasks sidebar exposes full IDs, structured tool rows, an expand/inspect path, and at least one keyboard path to copy the focused ID or details (per #1975 acceptance criteria). - Goal mode ships as a persistent objective surface; agent asks clarifying questions when objectives are underspecified; Fin wakeup/verifier flags binary-incomplete states. - Decision cards used at every point where Brother Whale needs structured input. - Every completed task ends with a receipt that cites evidence. - `npm view codewhale@0.8.43` and `cargo search codewhale-cli` confirm publishes. - `codewhale.net/roadmap` renders from the renamed feed. ## Release gate - Parity gates green: `cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings`, `cargo test --workspace --all-features --locked`. - `CHANGELOG.md` `[0.8.43]` entry written; contributor credits captured. - GitHub Release notes + npm shim publish (`deepseek-tui@0.8.43`) verified.

    No due date
    17/42 issues closed
  • Clear the field: turn backlog and community reports into product signal, closing stale items, deduping recurring pain, improving intake, and promoting only evidence-backed release work. Release gate: before closing, update README, CHANGELOG, GitHub release notes, and contributor credits, or record why unchanged.

    No due date
    11/11 issues closed
  • The architectural release. Three sub-projects ship together — a micro-LM service layer for cheap typed micro-operations, an operations + hints + eval pipeline, and verifier-native completion hooks graduated from preview. The release earns the .0 by changing how the agent thinks, not just what it can do. The provider-neutral substrate from v0.8.47 means every piece works against any open-weight model — DeepSeek, Hugging Face Inference Providers, OpenRouter, or a local runtime (vLLM / SGLang / Ollama / TGI) are interchangeable as the cheap-small-model backend. ## In scope - **A — Micro-LM Service Layer.** Typed micro-operations (routing, synthesis, planning, dispatch, turn-state evaluation) backed by cheap small-model calls. Dispatches through the v0.8.47 provider abstraction — Hugging Face Inference Providers, DeepSeek, OpenRouter, and the local-runtime path are all interchangeable. Any open-weight or hosted small model can power it. - **B — Operations + Hints + Eval Pipeline.** Decomposition-hints injection, sub-agent dispatch optimization, and an eval pipeline that measures lift across providers and model sizes. - **D — Verifier-Native Completion Hooks** (graduated from v0.8.46 preview). Default-on for paid / large operations, audit-logged `/force-complete` override, prompts optimized via the eval pipeline. ## Public claims (measurable, reproducible) 1. ≥ 15 pp lift on long-context recursive tasks vs comparable harnesses running on the same model. 2. ≤ 5 % false-completion rate on the deliberate-trap suite. Both claims publish with reproduction instructions, public eval suites, and the model + provider matrix the runs were taken against — at minimum DeepSeek, one HF Inference Providers backend, and one local-runtime path. ## Out of scope - Distilled model release. - Major UI overhaul. - Model-facing tool surface change. - New providers beyond what v0.8.47 lands. - Public pre-announcement before claims are reproducible against the published eval suites. - **Model Lab** (fine-tuning / eval bridge to open-weight services and worksets — Hugging Face, Unsloth, NeMo, Arcee, Serving, Eval, Observability, Training Infra) — see #1977. v0.9.0 ships the eval-pipeline substrate that makes Model Lab possible; the user-facing surface (`/model-lab capture | export | redact | dataset | eval | finetune --workset … --provider …`) is scoped for the next .0 (likely v0.10.0). ## Definition of done - Eval suites `evals/recursive-hard/` and `evals/false-completion/` dogfooded; results reproducible on a clean install with publicly listed models. - All three sub-projects pass full workspace parity gates on the integration branch. - The provider abstraction from v0.8.47 carries the runtime end-to-end with no provider-specific hot paths. ## Release gate - Parity gates green on `main` after integration branch merges. - `CHANGELOG.md` `[0.9.0]` entry frames the release as a capability shift, not a rewrite. - Published results posted with the exact reproduction commands. ## Post-shipping direction - **Model Lab** (#1977): turn the eval-pipeline substrate into a user-facing surface for exporting redacted session traces, running eval replays against alternate models, and (optionally) handing curated data to open-weight fine-tuning backends via curated worksets (Hugging Face, Unsloth, NeMo, Arcee, Serving, Eval, Observability, Training Infra). Scoped as v0.10.0 once v0.9.0 has shipped and stabilized.

    No due date
    6/74 issues closed