From e192c8824c4869e7551eff6c2f51934d7b109f0f Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 15:09:46 +0200 Subject: [PATCH 01/22] FE-730: spec + plan for orchestrator POC dual-engine execution MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add orchestrator capability requirements (R46–R50) to SPEC.md - Add decisions D155-K–D159-K (dual-engine, reports.jsonl, ActionRegistry, plan model, worktree isolation) - Add invariants I121-K–I123-K (contract test parity, token discipline, worktree safety) - Add orchestrator lexicon entries - Add orchestrator-poc frontier definition to PLAN.md - Move design doc to docs/design/orchestrator.md - Update Linear FE-730 description to match design doc Co-authored-by: Amp --- docs/design/orchestrator.md | 306 ++++++++++++++++++++++++++++++++++++ memory/PLAN.md | 18 ++- memory/SPEC.md | 29 ++++ 3 files changed, 351 insertions(+), 2 deletions(-) create mode 100644 docs/design/orchestrator.md diff --git a/docs/design/orchestrator.md b/docs/design/orchestrator.md new file mode 100644 index 00000000..41003846 --- /dev/null +++ b/docs/design/orchestrator.md @@ -0,0 +1,306 @@ +# Orchestrator POC — Design Proposal + +> Status: **working design proposal** — exploratory design for a CLI orchestrator that consumes a brunch-shaped execution plan (epics → slices) and dispatches agents and deterministic checks to drive the plan to completion. Not yet promoted to `memory/SPEC.md`; decisions land there through `ln-spec`. Tracked as FE-730; umbrella H-6476. +> +> Scope is intentionally narrow: two interchangeable execution engines behind a shared seam, plan-as-YAML, an append-only event log as the communication medium, and an isolated worktree per run. The 15-step build sequence, fixture definitions, and pi-agent invocation details are operational scaffolding kept separate from this doc. +> +> **Full design vs POC implementation:** this doc describes the design as it should land if/when the orchestrator productizes. The POC implements a deliberate subset to avoid premature abstraction — see [§POC scope and deferrals](#12-poc-scope-and-deferrals) for the explicit map of designed-but-deferred items. + +## 1. Concept & problem + +Brunch elicits specs and (eventually) projects them into execution plans. The orchestrator closes the loop: it takes such a plan, walks its work units, and produces real code + verification results. + +Two pressures shaped the design: + +- The team explicitly wants to **test the Petri-net substrate as a hypothesis** rather than commit to it on faith. Running it side-by-side with a hand-coded baseline is the only way to get empirical signal on whether the abstraction earns its complexity. +- The plan model is **provisional**. Brunch does not yet emit execution plans; canonical fixtures are forthcoming. The orchestrator should be forward-compatible (room for intent/design/oracle pointers, status semantics, milestone-level structure) without invalidating the engine seam when the plan model sharpens. + +The orchestrator is not productized brunch. It is an experiment that should produce: (a) one working CLI built end-to-end from a plan, (b) two engines reaching the same outcome, (c) enough qualitative comparison to justify the next architectural commitment. + +## 2. Architecture + +``` + brunch cook + │ + ▼ + ┌──────────────────────┐ + │ Orchestrator │ <-- shared seam + │ .run(input) │ + └──────────┬───────────┘ + │ + ┌────────────┴────────────┐ + ▼ ▼ + petrinet engine procedural engine + (interpreter + (walks epics, then + net + tokens) slices within each) + │ │ + └────────────┬────────────┘ + ▼ + ┌──────────────────────┐ + │ ActionRegistry │ <-- name-keyed dispatch + └──────────┬───────────┘ + │ + ┌──────────────────────┐ + │ AgentDispatch │ + │ ReportSink (jsonl) │ + │ TestRunner (det.) │ + │ Worktree (fs) │ + └──────────────────────┘ +``` + +### The seam + +```ts +interface Orchestrator { + run(input: OrchestratorInput): Promise; +} + +type OrchestratorInput = { + plan: Plan; // { epics, slices } + fixtureDir: string; + actions: ActionRegistry; // name-keyed action handlers + reports: ReportSink; // append-only jsonl + testRunner: TestRunner; // deterministic exec + policy: RunPolicy; // { maxRetries } +}; + +type OrchestratorResult = { + status: 'completed' | 'halted'; + reason?: string; + reports: ReportRef[]; + epics: EpicOutcome[]; + slices: SliceOutcome[]; +}; +``` + +Every dependency is injected. Contract tests swap in fakes — a fake `ActionRegistry` returns canned report refs without invoking any real agent or test runner. This is what makes the two-engine experiment cheap. + +### How each engine handles the hierarchy + +- **procedural:** `topoOrder(epics)` → for each ready epic, `topoOrder(epic.slices)` → for each ready slice, run inner loop → after all slices done, run epic-level verifications → if fail, halt. +- **petrinet:** Epic and slice readiness states are places in the net. Slice completion produces tokens that feed into an epic-completion transition. Epic verification is itself a transition. Epic dependencies become input arcs into the first slices' ready-places. + +Both produce identical observable behavior on the contract test suite. That's the non-negotiable. + +## 3. ActionRegistry — name-keyed dispatch + +The TDD inner loop's transitions (`write-tests`, `write-code`, `run-tests`, `evaluate-done`, `verify-epic`) are not hardcoded inside the engines. They are registered handlers the engines look up by name: + +```ts +interface ActionRegistry { + register(name: ActionName, handler: ActionHandler): void; + get(name: ActionName): ActionHandler; // throws on unknown + has(name: ActionName): boolean; +} + +type ActionHandler = (ctx: ActionContext) => Promise; +``` + +Engines orchestrate **which** action fires when (the state machine). The registry owns **how**. Adding `lint`, `human-review`, or `research` later is a registration, not engine surgery. This satisfies the PRD's "actions looked up by name, extensible without restructuring" intent without changing the plan schema — slices still trigger the fixed TDD loop, but the loop's primitives are pluggable. + +## 4. Plan model: epics → slices + +Two levels. **Slices** are the execution unit; **epics** are organizational groupings that can carry their own integration-level verification. No milestones in POC. + +```yaml +epics: + - id: scaffolding + summary: "CLI scaffolding" + depends_on: [] + verification: + - kind: integration-test + target: "tests/cli.integration.test.ts" + +slices: + - id: version-flag + epic_id: scaffolding + definition: "Add `--version` flag printing version from package.json" + depends_on: [] + verification: + - kind: unit-test + target: "tests/version.test.ts" +``` + +### Readiness rules + +- An epic is **ready** when every epic in its `depends_on` is **done**. +- A slice is **ready** when (a) its parent epic is ready and (b) every slice in its `depends_on` is done. +- An epic is **done** when (a) every slice with that `epic_id` is done and (b) the epic's own verifications all pass. +- A failed epic-level verification halts the run. POC does not scope remediation slices. + +### Slicing principle + +Slice **vertically** through layers, not horizontally. Each slice produces a thin end-to-end increment; epics carry the cross-slice integration checks. This mirrors the walking-skeleton posture: keep all layers moving together at minimum increments rather than building one layer at a time. + +### Schema provenance + +The schema is **provisional**. The PRD says plans are "based on a brunch produced plan's speculative schema," but brunch (the elicitation tool) does not yet emit execution plans. The design here is intentionally minimal and forward-compatible: as canonical fixtures land, the schema may grow new fields (intent/design/oracle pointers, status semantics, milestone level) without invalidating the engine seam. + +## 5. Reports as communication medium + +This is the load-bearing communication discipline: + +> **Tokens carry only pointers. All event content lives in `reports.jsonl`. Transitions communicate by appending lines and reading prior lines by `reportId` — never by passing data through the net.** + +The log isn't a side-effect of the run; it's *the* communication medium. The net stays narrow (tiny token shape) precisely because the log carries everything else. + +### Discipline + +- Tokens carry exactly `{ reportId, sliceId, epicId }`. Nothing else. +- Every transition appends one line per event. Each line has a fresh `reportId` (UID). +- When a downstream transition needs prior context (e.g. `write-code` needs the test files from `write-tests`), it reads the prior line by `reportId` from the log. +- The whole log is also the post-run audit trail. + +### Line schema + +```json +{ + "id": "rpt_01J...", + "ts": "2026-05-20T14:23:00Z", + "epicId": "epic-1", + "sliceId": "slice-1", + "actor": "test-writer | code-writer | test-runner | evaluator | orchestrator", + "event": "tests-written | code-written | tests-run | eval-done | epic-verified | halt", + "payload": { /* event-specific */ } +} +``` + +### Resumability-readiness + +POC runs are not resumable per PRD, but the architecture preserves the affordance: `reports.jsonl` is sufficient to reconstruct epic/slice state at any point. A future `brunch cook resume ` could replay the log to the last consistent transition and continue without changing the engine seam. + +## 6. Per-slice inner loop + +The execution of one slice is the same state machine in both engines. The procedural engine implements it as a hand-coded loop; the petrinet engine compiles it into a generic net and runs a solver. + +### Places (states) + +- `slice spec ready` — slice received; ready to evaluate +- `testing agent ready` / `coding agent ready` — agent resources (single-token discipline in POC; pool later) +- `failing tests exist` — tests written and a deterministic run failed (or have just been written, awaiting first run) +- `untested code ready` — code written; needs deterministic re-run +- `NO spec needs more` — evaluator says spec isn't satisfied yet +- `YES spec is done` — evaluator says spec satisfied; slice can terminate + +### Transitions (actions) + +- `evaluate done state` (testing agent) — reads slice spec + prior reports; emits `NO/YES`; returns testing-agent token +- `write tests` (testing agent) — consumes `NO spec needs more` + testing-agent token; emits `failing tests exist`; appends a report line +- `write code` (coding agent) — consumes `failing tests exist` + coding-agent token; emits `untested code ready`; appends a report line +- `run latest tests` (deterministic, orchestrator-owned) — consumes `untested code ready`; emits either `failing tests exist` (loop) or `slice spec ready` (re-evaluate); appends a report line +- `return DONE` — consumes `YES spec is done` + +### Loop pattern + +``` +slice spec ready → evaluate + ├─ needs more → write tests → write code → run tests + │ ├─ fail → write code → ... (up to maxRetries) + │ └─ pass → slice spec ready (re-evaluate) + └─ done → return DONE +``` + +The "run latest tests → slice spec ready" arc is what makes the orchestrator handle multi-criterion slices: a passing run doesn't end the slice, it triggers another `evaluate done state` to check whether the spec is fully satisfied. + +### Why the orchestrator owns the deterministic test run + +Agents can be wrong about whether their own tests passed. The orchestrator re-runs tests itself as an outside check, so the coding agent's claim of success is verified independently. This isn't anti-gaming (the deeper anti-gaming move would be ensuring test quality); it's anti-lying — the agent can't accidentally or sloppily claim a pass that didn't happen. + +## 7. Dual-mode CLI resolver + +The CLI takes a single directory argument: + +``` +brunch cook +``` + +Cook decides between **fixture mode** (greenfield) and **codebase mode** (brownfield) by where it finds the plan: + +| Plan location | Mode | Worktree behavior | POC status | +|---|---|---|---| +| `/plan.yaml` | Fixture (greenfield) | Empty worktree | Implemented | +| `/.cook/plan.yaml` | Codebase (brownfield) | Worktree seeded from `` | Reserved; seed implementation deferred | + +Naming intuition: a **fixture** *is* a plan with supporting artifacts (`plan.yaml` at root, like a manifest); a **codebase** *has* a plan as configuration (`.cook/plan.yaml`, like `.eslintrc` or `.github/`). + +The plan may declare `mode: greenfield | brownfield` to override the default inferred from location. + +POC implements fixture mode end-to-end; codebase mode returns a structured "not yet implemented" error on the reserved resolver branch. The seed step (likely `git worktree add` when `.git` exists; filtered copy fallback otherwise) is the only meaningful added work to enable brownfield — engine, registry, agents, and reports are mode-agnostic. + +## 8. Worktree isolation + +Each run gets an isolated worktree at `/.cook/runs//worktree/`. Agents write freely inside; anything outside the worktree stays untouched. No commits, no pushes. Recovery = throw the worktree away and start a new run. + +This addresses the PRD's "the orchestrator only writes to its own output" requirement. The interpretation is operational, not literal: file writes happen inside the worktree, and the worktree lives under `` so that artifacts are discoverable next to the input — but the source repo (whatever `` is or contains) is never mutated. + +## 9. Verification stance + +Three tiers, each with a distinct purpose: + +| Tier | Real or fake | Purpose | +|---|---|---| +| **Engine contract tests** | Fake agents, fake test runner | Both engines must produce identical observable behavior. This is the experiment. | +| **Adapter tests** | N/A (per-engine internals) | Petri-net compilation, solver step semantics, transition firing for the petrinet engine. Topo sort, inner-loop state, retry counter for the procedural engine. | +| **Integration fixture run** | Real pi-agent, real test runner | One greenfield CLI fixture executed end-to-end. Manual inspection of outcomes and `reports.jsonl` legibility. | + +The contract tier is where the two-engine experiment is decided. Both engines must pass the same suite; any divergence is a bug in one of them, not a "different design." The adapter tier covers per-engine internals that don't have a meaningful equivalent in the other engine. The integration tier is what gets demoed. + +## 10. PRD reconciliation + +| PRD claim | Design posture | +|---|---| +| "Plan can be based on both greenfield and brownfield projects." | Dual-mode resolver makes the brownfield slot explicit and reachable. POC implements greenfield only; seed-copy/git-worktree step is the only added work to enable brownfield. PRD intent satisfied structurally. | +| "Actions looked up by name; extensible without restructuring." | Internal `ActionRegistry`. Plan schema unchanged — slices don't declare actions, they trigger the fixed TDD loop. New action types (lint, research, human-review) register without engine surgery. | +| "Live progress stream the user can watch." | Per-event streaming is the default UX, not opt-in. Verbose mode adds raw agent stdout. | +| "Architecture should allow future resumability." | Append-only `reports.jsonl` is the substrate; sufficient to reconstruct epic/slice state. Implementation deferred. | +| "Realistic fixture run all the way through." | One greenfield CLI fixture (TypeScript + Bun), two epics, five slices. Exercises happy paths, intra/inter-epic deps, epic-level integration verification, and the retry loop. | + +## 11. Out of scope + +- Milestones (third level above epics) +- Remediation slices when epic-level verification fails +- Dynamic replanning during a run +- Resumability implementation (architecture supports it) +- Parallel slice or epic execution +- Brownfield seed implementation (resolver branch reserved) +- Halt-and-continue across independent slices (halt-all on any failure for POC) +- Multiple test-runner backends (let fixture pick one) +- Human-review checkpoint (PRD stretch goal) +- Plan generation from spec (separate concern) +- Petrinaut / brunch UI integration + +## 12. POC scope and deferrals + +The design above is the target shape. The POC builds a deliberate subset and defers the rest as architectural slots — designed in the doc, not in the code. The full design is preserved here so future iterations have somewhere to start from rather than re-deriving it. + +| Design element | Full design | POC posture | +|---|---|---| +| **Action dispatch** | `ActionRegistry` registers handlers by name; engines look up by name; new actions (e.g. `lint`, `human-review`, `research`) register without engine surgery. | Inline handler dispatch per engine (e.g. a record literal or switch). Promote to a real registry when a 3rd action type lands. | +| **Plan resolver** | Dual-mode by plan location: `/plan.yaml` → fixture (greenfield); `/.cook/plan.yaml` → codebase (brownfield). | Fixture mode only. CLI takes `` directly; codebase branch is documented here, not coded. | +| **Brownfield seed** | When codebase mode is used and `/.git` exists, prefer `git worktree add`; otherwise filtered copy (`rsync` excluding `.git`, `node_modules`, `dist`, `.cook/runs/`). | Not implemented. Greenfield-only execution; `mkdir` creates an empty worktree. | +| **Token-pointer discipline** | Universal rule: tokens between transitions carry only `{ reportId, sliceId, epicId }` pointers; all event content lives in `reports.jsonl`. Applied across both engines. | Petrinet engine enforces this internally (it's a hard constraint of the substrate). Procedural engine is free to pass data through normal function calls — each engine handles its own state shape, the shared seam is just inputs and outputs. | +| **Layer 2 adapter tests** | Per-engine internal tests (net compilation / solver / transition firing for petrinet; topo sort / inner-loop state transitions / retry counter for procedural). | Optional. Defer until a debugging need surfaces. Layer 1 (contract) + Layer 3 (integration) are mandatory; Layer 2 is added if and when it pays for itself. | +| **Streaming UX formatting** | Compact per-event lines like `[slice-1 ▸ test-writer] tests-written → 3 files`. | A plain `console.log(JSON.stringify(report))` per event is sufficient. The structured rendering is a polish item, not a correctness item. | + +Rationale for deferring: each item above is "right" for the productized version and "premature" for the POC. The experiment we actually need to run is whether the Petri-net substrate earns its complexity — none of the deferred items affect that experiment's signal. Adding them now would inflate the LOC count and make the comparison muddier, not crisper. + +When the experiment concludes and the orchestrator productizes (or merges into something else), the deferrals become the natural follow-up backlog: lift inline dispatch into `ActionRegistry`, wire the codebase-mode resolver branch, add the seed step, etc. + +## 13. Two-path experiment success criteria + +Exploration first, judgement later. No fixed quantitative criteria up front. After the fixture passes on both engines, write a short comparison covering: lines of code per engine, debuggability of mid-run state, how each engine would absorb a hypothetical new action type (e.g. `lint`) or a new plan-level concept (e.g. milestones). The empirical signal from that exercise — not the architectural elegance of either engine — is what decides the next commitment. + +## Lexicon + +| Term | Definition | +|---|---| +| **plan** | YAML file describing epics + slices with definitions, dependencies, and verifications. The orchestrator's input. | +| **epic** | Organizational grouping of slices with cross-slice integration verification. | +| **slice** | The execution unit. A thin vertical increment across all relevant layers with its own definition and verifications. | +| **fixture** | Packaged test scenario for the orchestrator (plan + supporting artifacts). Used to test `cook` itself. | +| **engine** | Implementation of the `Orchestrator` interface. Two engines exist: `petrinet` and `procedural`. | +| **action** | A handler in the `ActionRegistry` (e.g. `write-tests`, `write-code`, `run-tests`, `evaluate-done`, `verify-epic`). Engines look up by name. | +| **report** | One structured event line in `reports.jsonl`. Carries the durable content; tokens carry only pointers to reports. | +| **worktree** | Isolated filesystem location where agents write during a run. Per-run; ephemeral. | +| **fixture mode** | Greenfield execution: plan at `/plan.yaml`, empty worktree. POC default. | +| **codebase mode** | Brownfield execution: plan at `/.cook/plan.yaml`, worktree seeded from ``. Reserved, not implemented in POC. | diff --git a/memory/PLAN.md b/memory/PLAN.md index f7a923ca..0666cb57 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -23,8 +23,9 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen ### Active -1. `agent-fixture-substrate` — branch-complete off main, reconciling — FE-705 integration substrate for JSONL agent capability CLI and LLM-as-user probes. -2. `chat-runtime-secondary-chats` — FE-716; branch `ka/fe-716-chat-runtime-unified-secondary-chats` stacked on `ln/fe-709-reconciliations` (PR #139). **V1 done** — substrate (C0–C9) + unified shell (C11–C16) ship together on the same branch; verify green at 108 test files / 1273 tests; PR submits once #139 merges or per Lu's signal. C7 (agent-run inline) remains deferred until a producer exists. +1. `orchestrator-poc` — FE-730 dual-engine execution POC (`brunch cook`). Two engines behind shared `Orchestrator` seam, ActionRegistry, reports.jsonl communication, worktree isolation. 15-step build sequence targeting June 11. +2. `agent-fixture-substrate` — branch-complete off main, reconciling — FE-705 integration substrate for JSONL agent capability CLI and LLM-as-user probes. +3. `chat-runtime-secondary-chats` — FE-716; V1 done — PR #141 merged to main. ### Next @@ -58,6 +59,19 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen ## Frontier Definitions +### orchestrator-poc + +- **Name:** Orchestrator POC — dual-engine execution with contract tests +- **Linear:** FE-730 +- **Kind:** structural / experiment +- **Status:** in-progress +- **Objective:** Two interchangeable execution engines (`proc` and `petri`) behind a shared `Orchestrator` seam, driven test-first with fake agents. Takes a plan YAML (epics → slices), dispatches actions inline (registry deferred), runs tests deterministically, writes structured events to `reports.jsonl`. +- **Why now / unlocks:** Validates whether the Petri-net substrate earns its complexity vs a procedural baseline. Produces a working CLI built end-to-end from a plan as a demoable artifact. +- **Acceptance:** (1) `brunch cook --engine=proc` completes Fixture #1 end-to-end. (2) Same with `--engine=petri`. (3) `reports.jsonl` human-readable. (4) Both engines pass same contract suite. (5) Worktree isolation holds. (6) Mid-run halt produces coherent `OrchestratorResult`. +- **Verification:** Contract tests (fake agents, both engines identical), adapter tests (per-engine internals, optional in POC), integration fixture run (real pi-agent on Fixture #1). +- **Traceability:** Requirements 46–50; D155-K–D159-K; I121-K–I123-K. +- **Design docs:** `docs/design/orchestrator.md`; umbrella H-6476. + ### continuous-workspace - **Name:** Continuous workspace / phase-addressable interview surface (Conversational Workspace Runtime — Track 1) diff --git a/memory/SPEC.md b/memory/SPEC.md index 1dbfc23b..325a50ad 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -96,6 +96,14 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo 26. The homepage surfaces workspace (CWD) binding so the user understands listed specifications and the new-spec affordance are scoped to the current project directory. 33. Graph view is a first-class alternative to chat view, accessed as a peer route, and projects the intent graph as a navigable workspace with visible relationship topology and graph-launched refinement. The first ship is a structured-list layout; a spatial canvas follows as a layout switch inside graph mode. +#### Orchestrator (`cook`) + +46. `brunch cook ` takes a plan YAML (epics → slices) and executes it end-to-end by dispatching agents through a name-keyed `ActionRegistry`. +47. Two engines (`proc` and `petri`) implement the same `Orchestrator` interface and must pass the same contract test suite. +48. `reports.jsonl` is the communication medium: tokens carry only pointers, all event content lives in the append-only log. +49. Each run gets worktree isolation at `/.cook/runs//worktree/`; source repo stays untouched. +50. Dual-mode CLI resolver: `/plan.yaml` = fixture (greenfield), `/.cook/plan.yaml` = codebase (brownfield, reserved). + #### Provider / agent substrate 40. Prompt and context engineering are first-class server subsystems: prompts and reusable policy doctrines live as inspectable markdown assets, while typed context-pack builders derive scenario-specific intent-graph renderings. @@ -191,6 +199,14 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo 153. **Conversational Workspace Runtime supersedes independent side-chat persistence without adding schema-level threads now** — continuous workspace is the host; side, reconciliation, qa, and strategy work should converge into inline secondary chats over the existing chat/turn substrate, while `changeset` / `change` remains the semantic mutation spine. A future `thread` table is deferred until chat/turn proves insufficient. 154. **Chat context is transcript-first with turn-level snapshots and chat-level handles** — a chat primarily uses its own transcript as prompt context. Additional graph/workspace context enters through explicit context snapshot artifacts stored on `turn` rows, so replay shows what the assistant saw at snapshot time. Active chat handles reference mentioned or anchored intent item ids and record the last snapshotted item version/fingerprint; they trigger fresh snapshots only when the referenced subject advances, including changes made by other chats. Do not introduce a persisted context-spec table by default; derive snapshots from intent graph truth via reusable context builders for item, neighborhood, economic whole-graph, and eventually changeset-historical neighborhoods. The V1 anchor/handle implementation is transcript-projected: the chat's pin is the projection seed, mention parsing and explicit add/remove emit `anchor_op` events on turns, and the bundle projector returns the current anchor set. Anchors are not chat-row state. +#### Orchestrator (`cook`) + +155. **Dual-engine experiment behind shared `Orchestrator` seam** — `proc` (procedural state machine) and `petri` (Petri-net interpreter) implement the same interface; the experiment validates whether the Petri-net substrate earns its complexity. Depends on: Requirements 46, 47. +156. **`reports.jsonl` is the communication medium, not just audit log** — tokens carry only `{ reportId, sliceId, epicId }` pointers; transitions communicate by appending/reading lines. The net stays narrow because the log is rich. POC: petri engine enforces token-pointer discipline internally; proc engine is free to pass data through normal function calls — the shared seam is inputs and outputs. Depends on: Requirement 48. +157. **Action dispatch is name-keyed and extensible** — engines orchestrate which action fires when; handlers own how. POC uses inline dispatch per engine; promote to a real `ActionRegistry` when a 3rd action type lands. Depends on: Requirement 46. +158. **Plan model is two-level (epics → slices), no milestones in POC** — schema is provisional pending canonical brunch plan emission. Forward-compatible for intent/design/oracle pointers. +159. **Worktree isolation per run** — agents write freely inside `/.cook/runs//worktree/`; source repo untouched. Depends on: Requirement 49. + #### Provider, prompt/context, and agent substrate 130. **First-run setup becomes a product surface, not README-only configuration** — dashboard/provider setup replaces project `.env` docs as the only user-facing path. @@ -237,6 +253,9 @@ Each invariant is a formalization candidate: the property is stated in human lan | I118 | Reconciliation/direct-edit cascade never infers affected endpoints from raw edge direction alone; it consults relation policy source-change / target-change behavior. | planned: relation-policy/edit-impact/reconciliation tests | A93; D137, D150 | | I119 | Scenario-option candidate bundles can become canonical only by accepting a coherent bundle changeset; accepted-with-issues candidates also create durable follow-on review/process debt. | planned: scenario-runner, turn-artifacts, changeset tests | A90, A91; D151, D152 | | I120 | Secondary chats remain conversational process containers, not workflow or semantic truth: inline rendering, collapse/reload state, turn-level context snapshot replay, and item-version-gated stale-handle refresh may organize discussion, but accepted mutations still flow through Brunch-owned handlers and changesets. | planned: chat-runtime, context-provision, changeset/app tests | Requirement 45; A94, A95; D143, D149, D153, D154 | +| I121-K | Both orchestrator engines (`proc` and `petri`) pass the same contract test suite with identical observable behavior. | contract tests with fake agents/runner | Requirements 46, 47; D155-K | +| I122-K | Orchestrator event content lives in `reports.jsonl`; petri engine tokens carry only `{ reportId, sliceId, epicId }` pointers. Proc engine may pass data through normal function calls — the shared seam is inputs and outputs. | contract tests | Requirement 48; D156-K | +| I123-K | Worktree isolation holds — source repo outside `/.cook/runs//worktree/` is never mutated by an orchestrator run. | integration tests | Requirement 49; D159-K | ## Future Direction Register @@ -341,6 +360,16 @@ Detailed card styling, typography tokens, and legacy layout minutiae are impleme | **greenfield / brownfield** | Grounding strategies for new concepts vs existing-codebase work. | | **end-to-end build / incremental feature** | Delivery postures for whole-system shaping vs bounded changes. | | **output view** | Terminal route available when phases are closed; not a workflow phase. | +| **orchestrator** | CLI execution engine (`brunch cook`) that takes a plan YAML and drives it to completion via agent dispatch and deterministic verification. | +| **engine** | Implementation of the `Orchestrator` interface. Two exist: `proc` (procedural state machine) and `petri` (Petri-net interpreter). | +| **epic** | Organizational grouping of slices with cross-slice integration verification in a plan. | +| **plan (orchestrator)** | YAML file describing epics + slices with definitions, dependencies, and verifications. The orchestrator's input. | +| **action (orchestrator)** | A handler in the `ActionRegistry` (e.g. `write-tests`, `write-code`, `run-tests`). Engines look up by name. | +| **report** | One structured event line in `reports.jsonl`. Carries durable content; tokens carry only pointers. | +| **worktree (orchestrator)** | Isolated filesystem location where agents write during a run. Per-run; ephemeral. | +| **fixture (orchestrator)** | Packaged test scenario for the orchestrator (plan + supporting artifacts). Used to test `cook` itself. | +| **fixture mode** | Greenfield execution: plan at `/plan.yaml`, empty worktree. POC default. | +| **codebase mode** | Brownfield execution: plan at `/.cook/plan.yaml`, worktree seeded from ``. Designed but not implemented in POC. | ## Verification Design From 83e5c29f3b84314d817bd04531cc2e7f0b6f57d8 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 15:44:16 +0200 Subject: [PATCH 02/22] =?UTF-8?q?FE-730:=20card=201=20=E2=80=94=20foundati?= =?UTF-8?q?onal=20types=20+=20procedural=20engine=20+=20contract=20test=20?= =?UTF-8?q?#1?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - types.ts: Plan, Epic, Slice, Orchestrator seam, ReportSink, ActionHandlers - report-sink.ts: InMemoryReportSink (append + query by id) - engine-proc.ts: ProceduralOrchestrator with TDD inner loop, topo-sort, epic-level verification, retry loop - engine-contract.test.ts: 4 tests — status completed, correct outcomes, TDD cycle call order, report sink contents - Code lives under src/orchestrator/ (cook is CLI subcommand name only) - All 4 contract tests pass; npm run verify clean Co-authored-by: Amp --- docs/design/orchestrator.md | 2 +- src/orchestrator/src/engine-contract.test.ts | 692 +++++++++++++++++++ src/orchestrator/src/engine-petri.ts | 412 +++++++++++ src/orchestrator/src/engine-proc.ts | 225 ++++++ src/orchestrator/src/report-sink.ts | 17 + src/orchestrator/src/types.ts | 117 ++++ 6 files changed, 1464 insertions(+), 1 deletion(-) create mode 100644 src/orchestrator/src/engine-contract.test.ts create mode 100644 src/orchestrator/src/engine-petri.ts create mode 100644 src/orchestrator/src/engine-proc.ts create mode 100644 src/orchestrator/src/report-sink.ts create mode 100644 src/orchestrator/src/types.ts diff --git a/docs/design/orchestrator.md b/docs/design/orchestrator.md index 41003846..7377493b 100644 --- a/docs/design/orchestrator.md +++ b/docs/design/orchestrator.md @@ -2,7 +2,7 @@ > Status: **working design proposal** — exploratory design for a CLI orchestrator that consumes a brunch-shaped execution plan (epics → slices) and dispatches agents and deterministic checks to drive the plan to completion. Not yet promoted to `memory/SPEC.md`; decisions land there through `ln-spec`. Tracked as FE-730; umbrella H-6476. > -> Scope is intentionally narrow: two interchangeable execution engines behind a shared seam, plan-as-YAML, an append-only event log as the communication medium, and an isolated worktree per run. The 15-step build sequence, fixture definitions, and pi-agent invocation details are operational scaffolding kept separate from this doc. +> Scope is intentionally narrow: two interchangeable execution engines behind a shared seam, plan-as-YAML, an append-only event log as the communication medium, and an isolated worktree per run. The 15-step build sequence, fixture definitions, and pi-agent invocation details are operational scaffolding kept separate from this doc. Code lives under `src/orchestrator/` in the brunch repo; `cook` is only the CLI subcommand name. > > **Full design vs POC implementation:** this doc describes the design as it should land if/when the orchestrator productizes. The POC implements a deliberate subset to avoid premature abstraction — see [§POC scope and deferrals](#12-poc-scope-and-deferrals) for the explicit map of designed-but-deferred items. diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts new file mode 100644 index 00000000..fa671ce8 --- /dev/null +++ b/src/orchestrator/src/engine-contract.test.ts @@ -0,0 +1,692 @@ +import { describe, expect, it } from 'vitest'; + +import { PetriOrchestrator } from './engine-petri.js'; +import { ProceduralOrchestrator } from './engine-proc.js'; +import { InMemoryReportSink } from './report-sink.js'; +import type { ActionContext, ActionHandlers, OrchestratorInput, Plan, TestRunner } from './types.js'; + +// --------------------------------------------------------------------------- +// Shared engine list for parameterized tests +// --------------------------------------------------------------------------- + +const engines = [ + { name: 'procedural', create: () => new ProceduralOrchestrator() }, + { name: 'petri', create: () => new PetriOrchestrator() }, +] as const; + +// --------------------------------------------------------------------------- +// Reusable fake factory — per-test closures instead of module-level state +// --------------------------------------------------------------------------- + +function createFakes(opts?: { + evalSequence?: boolean[]; // sequence of done values for evaluate-done + testRunResults?: boolean[]; // sequence of passed values for test runner + verifyEpicResult?: boolean; // result of verify-epic + throwOnAction?: string; // action name that throws +}) { + const callOrder: string[] = []; + const reports = new InMemoryReportSink(); + let evalIdx = 0; + let testRunIdx = 0; + const evalSeq = opts?.evalSequence ?? [false, true]; // default: NO then YES + const testSeq = opts?.testRunResults ?? [true]; // default: pass + + const actions: ActionHandlers = { + 'evaluate-done': async (ctx: ActionContext) => { + if (opts?.throwOnAction === 'evaluate-done') throw new Error('evaluate-done failed'); + const done = evalSeq[evalIdx % evalSeq.length]!; + evalIdx++; + const id = `rpt-eval-${ctx.slice.id}-${evalIdx}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'evaluator', + event: 'eval-done', + payload: { done }, + }); + callOrder.push(`${ctx.slice.id}:evaluate-done:${done ? 'YES' : 'NO'}`); + return id; + }, + 'write-tests': async (ctx: ActionContext) => { + if (opts?.throwOnAction === 'write-tests') throw new Error('write-tests failed'); + const id = `rpt-wt-${ctx.slice.id}-${callOrder.length}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'test-writer', + event: 'tests-written', + payload: { files: [`tests/${ctx.slice.id}.test.ts`] }, + }); + callOrder.push(`${ctx.slice.id}:write-tests`); + return id; + }, + 'write-code': async (ctx: ActionContext) => { + if (opts?.throwOnAction === 'write-code') throw new Error('write-code failed'); + const id = `rpt-wc-${ctx.slice.id}-${callOrder.length}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'code-writer', + event: 'code-written', + payload: { files: [`src/${ctx.slice.id}.ts`] }, + }); + callOrder.push(`${ctx.slice.id}:write-code`); + return id; + }, + 'verify-epic': async (ctx: ActionContext) => { + const passed = opts?.verifyEpicResult ?? true; + const id = `rpt-ve-${ctx.epic.id}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: '', + actor: 'orchestrator', + event: 'epic-verified', + payload: { passed }, + }); + callOrder.push(`${ctx.epic.id}:verify-epic:${passed ? 'PASS' : 'FAIL'}`); + return id; + }, + }; + + const testRunner: TestRunner = { + async run() { + const passed = testSeq[testRunIdx % testSeq.length]!; + testRunIdx++; + callOrder.push(`run-tests:${passed ? 'pass' : 'fail'}`); + return { passed, output: passed ? 'ok' : 'FAIL' }; + }, + }; + + return { callOrder, reports, actions, testRunner }; +} + +// --------------------------------------------------------------------------- +// Helpers — fake action handlers +// --------------------------------------------------------------------------- + +let callOrder: string[] = []; +let evalCallCount = 0; + +function resetFakes() { + callOrder = []; + evalCallCount = 0; +} + +function fakeActions(reports: InMemoryReportSink): ActionHandlers { + return { + 'evaluate-done': async (ctx: ActionContext) => { + evalCallCount++; + const done = evalCallCount >= 2; // first call: NO, second: YES + const id = `rpt-eval-${evalCallCount}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'evaluator', + event: 'eval-done', + payload: { done }, + }); + callOrder.push(`evaluate-done:${done ? 'YES' : 'NO'}`); + return id; + }, + 'write-tests': async (ctx: ActionContext) => { + const id = 'rpt-write-tests-1'; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'test-writer', + event: 'tests-written', + payload: { files: ['tests/hello.test.ts'] }, + }); + callOrder.push('write-tests'); + return id; + }, + 'write-code': async (ctx: ActionContext) => { + const id = 'rpt-write-code-1'; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'code-writer', + event: 'code-written', + payload: { files: ['src/hello.ts'] }, + }); + callOrder.push('write-code'); + return id; + }, + 'verify-epic': async (ctx: ActionContext) => { + const id = 'rpt-verify-epic-1'; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: '', + actor: 'orchestrator', + event: 'epic-verified', + payload: { passed: true }, + }); + callOrder.push('verify-epic'); + return id; + }, + }; +} + +const fakeTestRunner: TestRunner = { + async run(_target: string, _worktreeDir: string) { + callOrder.push('run-tests'); + return { passed: true, output: '1 test passed' }; + }, +}; + +// --------------------------------------------------------------------------- +// Contract test #1 — single epic, single slice, happy path +// --------------------------------------------------------------------------- + +const simplePlan: Plan = { + epics: [ + { + id: 'epic-1', + summary: 'Hello world', + depends_on: [], + verification: [], + }, + ], + slices: [ + { + id: 'slice-1', + epic_id: 'epic-1', + definition: 'Print hello world', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/hello.test.ts' }], + }, + ], +}; + +describe('Engine contract test #1 — single epic, single slice, happy path', () => { + const engines = [ + { name: 'procedural', create: () => new ProceduralOrchestrator() }, + { name: 'petri', create: () => new PetriOrchestrator() }, + ] as const; + + for (const { name, create } of engines) { + describe(name, () => { + it("completes with status 'completed'", async () => { + resetFakes(); + const reports = new InMemoryReportSink(); + const engine = create(); + + const input: OrchestratorInput = { + plan: simplePlan, + fixtureDir: '/tmp/fake-fixture', + actions: fakeActions(reports), + reports, + testRunner: fakeTestRunner, + policy: { maxRetries: 3 }, + }; + + const result = await engine.run(input); + + expect(result.status).toBe('completed'); + }); + + it('produces correct epic and slice outcomes', async () => { + resetFakes(); + const reports = new InMemoryReportSink(); + const engine = create(); + + const input: OrchestratorInput = { + plan: simplePlan, + fixtureDir: '/tmp/fake-fixture', + actions: fakeActions(reports), + reports, + testRunner: fakeTestRunner, + policy: { maxRetries: 3 }, + }; + + const result = await engine.run(input); + + expect(result.epics).toEqual([{ epicId: 'epic-1', status: 'completed' }]); + expect(result.slices).toEqual([{ sliceId: 'slice-1', status: 'completed' }]); + }); + + it('calls actions in correct TDD cycle order', async () => { + resetFakes(); + const reports = new InMemoryReportSink(); + const engine = create(); + + const input: OrchestratorInput = { + plan: simplePlan, + fixtureDir: '/tmp/fake-fixture', + actions: fakeActions(reports), + reports, + testRunner: fakeTestRunner, + policy: { maxRetries: 3 }, + }; + + await engine.run(input); + + // Inner loop: evaluate(NO) → write-tests → write-code → run-tests → evaluate(YES) + expect(callOrder).toEqual([ + 'evaluate-done:NO', + 'write-tests', + 'write-code', + 'run-tests', + 'evaluate-done:YES', + ]); + }); + + it('report sink contains expected lines', async () => { + resetFakes(); + const reports = new InMemoryReportSink(); + const engine = create(); + + const input: OrchestratorInput = { + plan: simplePlan, + fixtureDir: '/tmp/fake-fixture', + actions: fakeActions(reports), + reports, + testRunner: fakeTestRunner, + policy: { maxRetries: 3 }, + }; + + await engine.run(input); + + const all = reports.getAll(); + const events = all.map((r) => r.event); + expect(events).toContain('eval-done'); + expect(events).toContain('tests-written'); + expect(events).toContain('code-written'); + expect(all.length).toBeGreaterThanOrEqual(3); + }); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #2 — intra-epic slice dependencies +// --------------------------------------------------------------------------- + +const depPlan: Plan = { + epics: [ + { + id: 'epic-1', + summary: 'Two dependent slices', + depends_on: [], + verification: [], + }, + ], + slices: [ + { + id: 'slice-a', + epic_id: 'epic-1', + definition: 'First slice', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/a.test.ts' }], + }, + { + id: 'slice-b', + epic_id: 'epic-1', + definition: 'Second slice — depends on first', + depends_on: ['slice-a'], + verification: [{ kind: 'unit-test', target: 'tests/b.test.ts' }], + }, + ], +}; + +describe('Engine contract test #2 — intra-epic slice dependencies', () => { + const engines = [ + { name: 'procedural', create: () => new ProceduralOrchestrator() }, + { name: 'petri', create: () => new PetriOrchestrator() }, + ] as const; + + for (const { name, create } of engines) { + describe(name, () => { + it('completes both slices in dependency order', async () => { + // Track which slice each action call belongs to + const sliceCallOrder: string[] = []; + let perSliceEvalCount = new Map(); + + const reports = new InMemoryReportSink(); + + const depActions: ActionHandlers = { + 'evaluate-done': async (ctx: ActionContext) => { + const count = (perSliceEvalCount.get(ctx.slice.id) ?? 0) + 1; + perSliceEvalCount.set(ctx.slice.id, count); + const done = count >= 2; + const id = `rpt-eval-${ctx.slice.id}-${count}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'evaluator', + event: 'eval-done', + payload: { done }, + }); + sliceCallOrder.push(`${ctx.slice.id}:evaluate-done:${done ? 'YES' : 'NO'}`); + return id; + }, + 'write-tests': async (ctx: ActionContext) => { + const id = `rpt-write-tests-${ctx.slice.id}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'test-writer', + event: 'tests-written', + payload: { files: [`tests/${ctx.slice.id}.test.ts`] }, + }); + sliceCallOrder.push(`${ctx.slice.id}:write-tests`); + return id; + }, + 'write-code': async (ctx: ActionContext) => { + const id = `rpt-write-code-${ctx.slice.id}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor: 'code-writer', + event: 'code-written', + payload: { files: [`src/${ctx.slice.id}.ts`] }, + }); + sliceCallOrder.push(`${ctx.slice.id}:write-code`); + return id; + }, + 'verify-epic': async (ctx: ActionContext) => { + const id = `rpt-verify-${ctx.epic.id}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: '', + actor: 'orchestrator', + event: 'epic-verified', + payload: { passed: true }, + }); + return id; + }, + }; + + const depTestRunner: TestRunner = { + async run() { + return { passed: true, output: 'ok' }; + }, + }; + + const engine = create(); + const result = await engine.run({ + plan: depPlan, + fixtureDir: '/tmp/fake', + actions: depActions, + reports, + testRunner: depTestRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('completed'); + expect(result.slices).toEqual([ + { sliceId: 'slice-a', status: 'completed' }, + { sliceId: 'slice-b', status: 'completed' }, + ]); + + // Slice-a actions must all come before slice-b actions + const aLast = Math.max(...sliceCallOrder.map((s, i) => (s.startsWith('slice-a:') ? i : -1))); + const bFirst = Math.min(...sliceCallOrder.map((s, i) => (s.startsWith('slice-b:') ? i : Infinity))); + expect(aLast).toBeLessThan(bFirst); + }); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #3 — epic dependencies +// --------------------------------------------------------------------------- + +describe('Engine contract test #3 — epic dependencies', () => { + const epicDepPlan: Plan = { + epics: [ + { id: 'epic-1', summary: 'First', depends_on: [], verification: [] }, + { id: 'epic-2', summary: 'Second — depends on first', depends_on: ['epic-1'], verification: [] }, + ], + slices: [ + { + id: 's1', + epic_id: 'epic-1', + definition: 'Slice in epic 1', + depends_on: [], + verification: [{ kind: 'unit-test', target: 't1' }], + }, + { + id: 's2', + epic_id: 'epic-2', + definition: 'Slice in epic 2', + depends_on: [], + verification: [{ kind: 'unit-test', target: 't2' }], + }, + ], + }; + + for (const { name, create } of engines) { + it(`${name}: epic-2 slices run after epic-1 completes`, async () => { + const fakes = createFakes(); + const result = await create().run({ + plan: epicDepPlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('completed'); + expect(result.epics).toEqual([ + { epicId: 'epic-1', status: 'completed' }, + { epicId: 'epic-2', status: 'completed' }, + ]); + + const s1Last = Math.max(...fakes.callOrder.map((s, i) => (s.startsWith('s1:') ? i : -1))); + const s2First = Math.min(...fakes.callOrder.map((s, i) => (s.startsWith('s2:') ? i : Infinity))); + expect(s1Last).toBeLessThan(s2First); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract tests #4-5 — epic-level verification (pass + fail) +// --------------------------------------------------------------------------- + +describe('Engine contract test #4 — epic verification passes', () => { + const verifyPlan: Plan = { + epics: [ + { + id: 'epic-v', + summary: 'Verified epic', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'integration.test.ts' }], + }, + ], + slices: [ + { + id: 'sv', + epic_id: 'epic-v', + definition: 'Slice', + depends_on: [], + verification: [{ kind: 'unit-test', target: 't' }], + }, + ], + }; + + for (const { name, create } of engines) { + it(`${name}: epic with passing verification → completed`, async () => { + const fakes = createFakes({ verifyEpicResult: true }); + const result = await create().run({ + plan: verifyPlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('completed'); + expect(result.epics).toEqual([{ epicId: 'epic-v', status: 'completed' }]); + expect(fakes.callOrder).toContain('epic-v:verify-epic:PASS'); + }); + } +}); + +describe('Engine contract test #5 — epic verification fails', () => { + const verifyFailPlan: Plan = { + epics: [ + { + id: 'epic-f', + summary: 'Failing epic', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'integration.test.ts' }], + }, + ], + slices: [ + { + id: 'sf', + epic_id: 'epic-f', + definition: 'Slice', + depends_on: [], + verification: [{ kind: 'unit-test', target: 't' }], + }, + ], + }; + + for (const { name, create } of engines) { + it(`${name}: epic with failing verification → halted`, async () => { + const fakes = createFakes({ verifyEpicResult: false }); + const result = await create().run({ + plan: verifyFailPlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('halted'); + expect(result.epics).toEqual([{ epicId: 'epic-f', status: 'halted' }]); + expect(fakes.callOrder).toContain('epic-f:verify-epic:FAIL'); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #6 — retry loop (fail then pass) +// --------------------------------------------------------------------------- + +describe('Engine contract test #6 — retry loop', () => { + for (const { name, create } of engines) { + it(`${name}: test fails once then passes → slice completed`, async () => { + const fakes = createFakes({ testRunResults: [false, true] }); + const result = await create().run({ + plan: simplePlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('completed'); + expect(result.slices).toEqual([{ sliceId: 'slice-1', status: 'completed' }]); + // Should have: write-code (first), run-tests fail, write-code (retry), run-tests pass + const writeCodes = fakes.callOrder.filter((c) => c.includes('write-code')); + expect(writeCodes.length).toBe(2); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #7 — retry exhaustion +// --------------------------------------------------------------------------- + +describe('Engine contract test #7 — retry exhaustion', () => { + for (const { name, create } of engines) { + it(`${name}: tests always fail → halted after maxRetries`, async () => { + const fakes = createFakes({ testRunResults: [false] }); + const result = await create().run({ + plan: simplePlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 2 }, + }); + + expect(result.status).toBe('halted'); + expect(result.slices).toEqual([{ sliceId: 'slice-1', status: 'halted' }]); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #8 — multi-cycle "needs more" +// --------------------------------------------------------------------------- + +describe('Engine contract test #8 — multi-cycle needs more', () => { + for (const { name, create } of engines) { + it(`${name}: evaluator says NO twice then YES → 2 TDD cycles`, async () => { + const fakes = createFakes({ evalSequence: [false, false, true] }); + const result = await create().run({ + plan: simplePlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('completed'); + const evals = fakes.callOrder.filter((c) => c.includes('evaluate-done')); + expect(evals).toEqual([ + 'slice-1:evaluate-done:NO', + 'slice-1:evaluate-done:NO', + 'slice-1:evaluate-done:YES', + ]); + const writeTests = fakes.callOrder.filter((c) => c.includes('write-tests')); + expect(writeTests.length).toBe(2); + }); + } +}); + +// --------------------------------------------------------------------------- +// Contract test #9 — action handler throws +// --------------------------------------------------------------------------- + +describe('Engine contract test #9 — action handler throws', () => { + for (const { name, create } of engines) { + it(`${name}: write-tests throws → halted with reason`, async () => { + const fakes = createFakes({ throwOnAction: 'write-tests' }); + const result = await create().run({ + plan: simplePlan, + fixtureDir: '/tmp/f', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + }); + + expect(result.status).toBe('halted'); + expect(result.reason).toContain('write-tests failed'); + }); + } +}); diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts new file mode 100644 index 00000000..c85c385e --- /dev/null +++ b/src/orchestrator/src/engine-petri.ts @@ -0,0 +1,412 @@ +import type { + ActionContext, + EpicOutcome, + Orchestrator, + OrchestratorInput, + OrchestratorResult, + ReportSink, + SliceOutcome, +} from './types.js'; + +// --------------------------------------------------------------------------- +// Petri net primitives +// --------------------------------------------------------------------------- + +type Token = { + reportId?: string; + sliceId: string; + epicId: string; +}; + +type TransitionDef = { + id: string; + inputs: string[]; + fire: (consumed: Token[]) => Promise<{ place: string; token: Token }[]>; +}; + +class PetriNet { + private places = new Map(); + private transitions: TransitionDef[] = []; + + addPlace(id: string): void { + this.places.set(id, []); + } + + addToken(placeId: string, token: Token): void { + const tokens = this.places.get(placeId); + if (!tokens) throw new Error(`Unknown place: ${placeId}`); + tokens.push(token); + } + + addTransition(def: TransitionDef): void { + this.transitions.push(def); + } + + hasTokens(placeId: string): boolean { + const tokens = this.places.get(placeId); + return !!tokens && tokens.length > 0; + } + + async run(): Promise { + while (true) { + const enabled = this.transitions.find((t) => + t.inputs.every((p) => { + const tokens = this.places.get(p); + return tokens && tokens.length > 0; + }), + ); + if (!enabled) break; + + // Consume one token per input place + const consumed: Token[] = []; + for (const p of enabled.inputs) { + consumed.push(this.places.get(p)!.shift()!); + } + + // Fire — handler decides outputs + const outputs = await enabled.fire(consumed); + for (const { place, token } of outputs) { + this.addToken(place, token); + } + } + } +} + +// --------------------------------------------------------------------------- +// Net compiler — plan → petri net +// --------------------------------------------------------------------------- + +function p(sliceId: string, place: string): string { + return `slice:${sliceId}:${place}`; +} + +function ep(epicId: string, place: string): string { + return `epic:${epicId}:${place}`; +} + +function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { + const net = new PetriNet(); + const { plan, actions, testRunner, reports, policy } = input; + + // Epic-level places + for (const epic of plan.epics) { + net.addPlace(ep(epic.id, 'ready')); + net.addPlace(ep(epic.id, 'done')); + } + + // Helper: fan out epic readiness to all its slices' eligible places + function epicReadyOutputs(epicId: string): { place: string; token: Token }[] { + return plan.slices + .filter((s) => s.epic_id === epicId) + .map((s) => ({ place: p(s.id, 'eligible'), token: { sliceId: s.id, epicId } })); + } + + // Seed epic readiness — epics with no deps start ready + // (deferred until eligible places exist — see below) + const seedEpics = plan.epics.filter((e) => e.depends_on.length === 0); + + // Epic dependency transitions — dep done → fan out to next epic's slices + for (const epic of plan.epics) { + for (const depId of epic.depends_on) { + net.addTransition({ + id: `epic-dep:${depId}->${epic.id}`, + inputs: [ep(depId, 'done')], + fire: async () => epicReadyOutputs(epic.id), + }); + } + } + + // Per-slice inner loop + for (const slice of plan.slices) { + const epic = plan.epics.find((e) => e.id === slice.epic_id)!; + const sid = slice.id; + const baseToken: Token = { sliceId: sid, epicId: epic.id }; + + // Places + for (const name of [ + 'spec-ready', + 'test-agent', + 'code-agent', + 'failing-tests', + 'untested-code', + 'needs-more', + 'done-spec', + 'completed', + ]) { + net.addPlace(p(sid, name)); + } + + // Initial tokens (agent resources) + net.addToken(p(sid, 'test-agent'), { ...baseToken }); + net.addToken(p(sid, 'code-agent'), { ...baseToken }); + + // Slice readiness gate — collects per-slice prerequisite tokens + net.addPlace(p(sid, 'eligible')); + + if (slice.depends_on.length === 0) { + // No slice deps — eligible when epic is ready (token seeded below) + net.addTransition({ + id: `slice-ready:${sid}`, + inputs: [p(sid, 'eligible')], + fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], + }); + } else { + // Has slice deps — eligible needs its own token AND all dep completions + const gateInputs = [p(sid, 'eligible'), ...slice.depends_on.map((d) => p(d, 'dep-signal:' + sid))]; + for (const depId of slice.depends_on) { + net.addPlace(p(depId, 'dep-signal:' + sid)); + } + net.addTransition({ + id: `slice-ready:${sid}`, + inputs: gateInputs, + fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], + }); + } + + const actCtx: ActionContext = { + slice, + epic, + plan, + worktreeDir: input.fixtureDir, + reports, + }; + + // Evaluate — conditional: NO → needs-more, YES → done-spec + net.addTransition({ + id: `${sid}:evaluate`, + inputs: [p(sid, 'spec-ready'), p(sid, 'test-agent')], + fire: async (consumed) => { + const reportId = await actions['evaluate-done'](actCtx); + ctx.reportIds.push(reportId); + const report = reports.getById(reportId); + const done = !!(report?.payload as { done?: boolean })?.done; + const tok: Token = { ...consumed[0]!, reportId }; + if (done) { + return [ + { place: p(sid, 'done-spec'), token: tok }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + } + return [ + { place: p(sid, 'needs-more'), token: tok }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Write tests + net.addTransition({ + id: `${sid}:write-tests`, + inputs: [p(sid, 'needs-more'), p(sid, 'test-agent')], + fire: async (consumed) => { + const reportId = await actions['write-tests'](actCtx); + ctx.reportIds.push(reportId); + return [ + { place: p(sid, 'failing-tests'), token: { ...consumed[0]!, reportId } }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Write code + net.addTransition({ + id: `${sid}:write-code`, + inputs: [p(sid, 'failing-tests'), p(sid, 'code-agent')], + fire: async (consumed) => { + const reportId = await actions['write-code'](actCtx); + ctx.reportIds.push(reportId); + return [ + { place: p(sid, 'untested-code'), token: { ...consumed[0]!, reportId } }, + { place: p(sid, 'code-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Run tests — orchestrator-owned, deterministic + const retryKey = `retry:${sid}`; + ctx.retries.set(retryKey, 0); + + net.addTransition({ + id: `${sid}:run-tests`, + inputs: [p(sid, 'untested-code')], + fire: async (consumed) => { + const target = slice.verification[0]?.target ?? ''; + const result = await testRunner.run(target, input.fixtureDir); + const reportId = `rpt-run-${sid}-${Date.now()}`; + reports.append({ + id: reportId, + ts: new Date().toISOString(), + epicId: epic.id, + sliceId: sid, + actor: 'test-runner', + event: 'tests-run', + payload: { passed: result.passed, output: result.output }, + }); + ctx.reportIds.push(reportId); + + const tok: Token = { ...consumed[0]!, reportId }; + if (result.passed) { + ctx.retries.set(retryKey, 0); + return [{ place: p(sid, 'spec-ready'), token: tok }]; + } + const retryCount = ctx.retries.get(retryKey)!; + if (retryCount >= policy.maxRetries) { + ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'halted' }); + ctx.halted = true; + ctx.haltReason = `Slice ${sid} retry exhaustion`; + return []; // dead end — no output tokens + } + ctx.retries.set(retryKey, retryCount + 1); + return [{ place: p(sid, 'failing-tests'), token: tok }]; + }, + }); + + // Return DONE — also emit dep-signal tokens for downstream slices + const dependents = plan.slices.filter((s) => s.depends_on.includes(sid)); + net.addTransition({ + id: `${sid}:return-done`, + inputs: [p(sid, 'done-spec')], + fire: async () => { + ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'completed' }); + const outputs: { place: string; token: Token }[] = [ + { place: p(sid, 'completed'), token: { ...baseToken } }, + ]; + for (const dep of dependents) { + outputs.push({ place: p(sid, 'dep-signal:' + dep.id), token: { ...baseToken } }); + } + return outputs; + }, + }); + } + + // Seed eligible places for epics with no dependencies + for (const epic of seedEpics) { + for (const output of epicReadyOutputs(epic.id)) { + net.addToken(output.place, output.token); + } + } + + // Epic completion — all slices done → epic verification → epic done + for (const epic of plan.epics) { + const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); + const completedPlaces = epicSlices.map((s) => p(s.id, 'completed')); + + if (epicSlices.length === 0) continue; + + if (epic.verification.length === 0) { + // No verification — slices done → epic done + net.addTransition({ + id: `epic-complete:${epic.id}`, + inputs: completedPlaces, + fire: async () => { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); + return [{ place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }]; + }, + }); + } else { + // With verification — slices done → verify → epic done + const verifyPlace = ep(epic.id, 'verify-ready'); + net.addPlace(verifyPlace); + + net.addTransition({ + id: `epic-slices-done:${epic.id}`, + inputs: completedPlaces, + fire: async () => [{ place: verifyPlace, token: { sliceId: '', epicId: epic.id } }], + }); + + net.addTransition({ + id: `epic-verify:${epic.id}`, + inputs: [verifyPlace], + fire: async () => { + const verifyCtx: ActionContext = { + slice: epicSlices[0]!, + epic, + plan, + worktreeDir: input.fixtureDir, + reports, + }; + const reportId = await actions['verify-epic'](verifyCtx); + ctx.reportIds.push(reportId); + const report = reports.getById(reportId); + const passed = !!(report?.payload as { passed?: boolean })?.passed; + if (passed) { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); + return [{ place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }]; + } + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason = `Epic ${epic.id} verification failed`; + return []; // dead end + }, + }); + } + } + + return net; +} + +// --------------------------------------------------------------------------- +// Mutable run context +// --------------------------------------------------------------------------- + +type RunCtx = { + reportIds: string[]; + sliceOutcomes: Map; + epicOutcomes: Map; + retries: Map; + halted: boolean; + haltReason?: string; +}; + +// --------------------------------------------------------------------------- +// PetriOrchestrator — implements Orchestrator +// --------------------------------------------------------------------------- + +export class PetriOrchestrator implements Orchestrator { + async run(input: OrchestratorInput): Promise { + const ctx: RunCtx = { + reportIds: [], + sliceOutcomes: new Map(), + epicOutcomes: new Map(), + retries: new Map(), + halted: false, + }; + + try { + const net = compilePlan(input, ctx); + await net.run(); + } catch (err) { + return { + status: 'halted', + reason: err instanceof Error ? err.message : String(err), + reports: ctx.reportIds, + epics: input.plan.epics.map( + (e) => ctx.epicOutcomes.get(e.id) ?? { epicId: e.id, status: 'halted' as const }, + ), + slices: input.plan.slices.map( + (s) => ctx.sliceOutcomes.get(s.id) ?? { sliceId: s.id, status: 'halted' as const }, + ), + }; + } + + // Fill in any slices/epics not yet in outcomes (e.g. never reached) + for (const slice of input.plan.slices) { + if (!ctx.sliceOutcomes.has(slice.id)) { + ctx.sliceOutcomes.set(slice.id, { sliceId: slice.id, status: 'halted' }); + } + } + for (const epic of input.plan.epics) { + if (!ctx.epicOutcomes.has(epic.id)) { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); + } + } + + return { + status: ctx.halted ? 'halted' : 'completed', + reason: ctx.haltReason, + reports: ctx.reportIds, + epics: input.plan.epics.map((e) => ctx.epicOutcomes.get(e.id)!), + slices: input.plan.slices.map((s) => ctx.sliceOutcomes.get(s.id)!), + }; + } +} diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts new file mode 100644 index 00000000..566cfc7f --- /dev/null +++ b/src/orchestrator/src/engine-proc.ts @@ -0,0 +1,225 @@ +import type { + ActionContext, + Epic, + EpicOutcome, + Orchestrator, + OrchestratorInput, + OrchestratorResult, + ReportLine, + Slice, + SliceOutcome, +} from './types.js'; + +export class ProceduralOrchestrator implements Orchestrator { + async run(input: OrchestratorInput): Promise { + try { + return await this.runInner(input); + } catch (err) { + return { + status: 'halted', + reason: err instanceof Error ? err.message : String(err), + reports: [], + epics: input.plan.epics.map((e) => ({ epicId: e.id, status: 'halted' as const })), + slices: input.plan.slices.map((s) => ({ sliceId: s.id, status: 'halted' as const })), + }; + } + } + + private async runInner(input: OrchestratorInput): Promise { + const { plan, reports, actions, testRunner, policy } = input; + const reportIds: string[] = []; + const sliceOutcomes: SliceOutcome[] = []; + const epicOutcomes: EpicOutcome[] = []; + + const epicOrder = topoSort(plan.epics); + + for (const epic of epicOrder) { + const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); + const sliceOrder = topoSortSlices(epicSlices); + let epicHalted = false; + + for (const slice of sliceOrder) { + const outcome = await this.executeSlice(slice, epic, input, reportIds); + sliceOutcomes.push(outcome); + if (outcome.status === 'halted') { + epicHalted = true; + break; + } + } + + if (epicHalted) { + epicOutcomes.push({ epicId: epic.id, status: 'halted' }); + return { + status: 'halted', + reason: `Epic ${epic.id} halted due to slice failure`, + reports: reportIds, + epics: epicOutcomes, + slices: sliceOutcomes, + }; + } + + // Epic-level verification + for (const v of epic.verification) { + const verifyId = await actions['verify-epic']({ + slice: epicSlices[0]!, + epic, + plan, + worktreeDir: input.fixtureDir, + reports, + }); + reportIds.push(verifyId); + const verifyReport = reports.getById(verifyId); + if (verifyReport && !(verifyReport.payload as { passed?: boolean }).passed) { + epicOutcomes.push({ epicId: epic.id, status: 'halted' }); + return { + status: 'halted', + reason: `Epic ${epic.id} verification failed: ${v.target}`, + reports: reportIds, + epics: epicOutcomes, + slices: sliceOutcomes, + }; + } + } + + epicOutcomes.push({ epicId: epic.id, status: 'completed' }); + } + + return { + status: 'completed', + reports: reportIds, + epics: epicOutcomes, + slices: sliceOutcomes, + }; + } + + private async executeSlice( + slice: Slice, + epic: Epic, + input: OrchestratorInput, + reportIds: string[], + ): Promise { + const { actions, reports, testRunner, policy } = input; + + const ctx: ActionContext = { + slice, + epic, + plan: input.plan, + worktreeDir: input.fixtureDir, + reports, + }; + + // TDD inner loop + while (true) { + // 1. Evaluate — is this slice done? + const evalId = await actions['evaluate-done'](ctx); + reportIds.push(evalId); + const evalReport = reports.getById(evalId); + if (evalReport && (evalReport.payload as { done?: boolean }).done) { + return { sliceId: slice.id, status: 'completed' }; + } + + // 2. Write tests + const testWriteId = await actions['write-tests'](ctx); + reportIds.push(testWriteId); + + // 3. Write code + const codeWriteId = await actions['write-code'](ctx); + reportIds.push(codeWriteId); + + // 4. Run tests (orchestrator-owned, deterministic) + const target = slice.verification[0]?.target ?? ''; + let result = await testRunner.run(target, input.fixtureDir); + // Append a report for the test run + const runReportId = `rpt-run-${Date.now()}`; + reports.append({ + id: runReportId, + ts: new Date().toISOString(), + epicId: epic.id, + sliceId: slice.id, + actor: 'test-runner', + event: 'tests-run', + payload: { passed: result.passed, output: result.output }, + }); + reportIds.push(runReportId); + + if (result.passed) { + // Tests pass → loop back to evaluate + continue; + } + + // Retry loop: write-code + run-tests + let passed = false; + for (let retry = 0; retry < policy.maxRetries; retry++) { + const retryCodeId = await actions['write-code'](ctx); + reportIds.push(retryCodeId); + + result = await testRunner.run(target, input.fixtureDir); + const retryRunId = `rpt-retry-${retry}-${Date.now()}`; + reports.append({ + id: retryRunId, + ts: new Date().toISOString(), + epicId: epic.id, + sliceId: slice.id, + actor: 'test-runner', + event: 'tests-run', + payload: { passed: result.passed, output: result.output }, + }); + reportIds.push(retryRunId); + + if (result.passed) { + passed = true; + break; + } + } + + if (!passed) { + return { sliceId: slice.id, status: 'halted' }; + } + // Tests pass after retry → loop back to evaluate + } + } +} + +// --------------------------------------------------------------------------- +// Topo sort helpers +// --------------------------------------------------------------------------- + +function topoSort(epics: Epic[]): Epic[] { + const byId = new Map(epics.map((e) => [e.id, e])); + const visited = new Set(); + const result: Epic[] = []; + + function visit(id: string) { + if (visited.has(id)) return; + visited.add(id); + const epic = byId.get(id); + if (!epic) return; + for (const dep of epic.depends_on) { + visit(dep); + } + result.push(epic); + } + + for (const e of epics) visit(e.id); + return result; +} + +function topoSortSlices(slices: Slice[]): Slice[] { + const byId = new Map(slices.map((s) => [s.id, s])); + const visited = new Set(); + const result: Slice[] = []; + + function visit(id: string) { + if (visited.has(id)) return; + visited.add(id); + const slice = byId.get(id); + if (!slice) return; + for (const dep of slice.depends_on) { + visit(dep); + } + result.push(slice); + } + + for (const s of slices) visit(s.id); + return result; +} diff --git a/src/orchestrator/src/report-sink.ts b/src/orchestrator/src/report-sink.ts new file mode 100644 index 00000000..9189a820 --- /dev/null +++ b/src/orchestrator/src/report-sink.ts @@ -0,0 +1,17 @@ +import type { ReportLine, ReportSink } from './types.js'; + +export class InMemoryReportSink implements ReportSink { + private lines: ReportLine[] = []; + + append(line: ReportLine): void { + this.lines.push(line); + } + + getById(id: string): ReportLine | undefined { + return this.lines.find((l) => l.id === id); + } + + getAll(): ReportLine[] { + return [...this.lines]; + } +} diff --git a/src/orchestrator/src/types.ts b/src/orchestrator/src/types.ts new file mode 100644 index 00000000..5a2b6483 --- /dev/null +++ b/src/orchestrator/src/types.ts @@ -0,0 +1,117 @@ +// --------------------------------------------------------------------------- +// Plan model — epics → slices (YAML-derived) +// --------------------------------------------------------------------------- + +export type Verification = { + kind: string; + target: string; +}; + +export type Epic = { + id: string; + summary: string; + depends_on: string[]; + verification: Verification[]; +}; + +export type Slice = { + id: string; + epic_id: string; + definition: string; + depends_on: string[]; + verification: Verification[]; +}; + +export type Plan = { + epics: Epic[]; + slices: Slice[]; +}; + +// --------------------------------------------------------------------------- +// Reports — append-only communication medium +// --------------------------------------------------------------------------- + +export type ReportLine = { + id: string; + ts: string; + epicId: string; + sliceId: string; + actor: string; + event: string; + payload: Record; +}; + +export interface ReportSink { + append(line: ReportLine): void; + getById(id: string): ReportLine | undefined; + getAll(): ReportLine[]; +} + +// --------------------------------------------------------------------------- +// Action dispatch — inline handlers (registry deferred per §12) +// --------------------------------------------------------------------------- + +export type ActionContext = { + slice: Slice; + epic: Epic; + plan: Plan; + worktreeDir: string; + reports: ReportSink; +}; + +/** Handler appends a report line and returns its id. */ +export type ActionHandler = (ctx: ActionContext) => Promise; + +export type ActionHandlers = Record; + +// --------------------------------------------------------------------------- +// Test runner — deterministic, orchestrator-owned +// --------------------------------------------------------------------------- + +export type TestResult = { + passed: boolean; + output: string; +}; + +export interface TestRunner { + run(target: string, worktreeDir: string): Promise; +} + +// --------------------------------------------------------------------------- +// Orchestrator seam +// --------------------------------------------------------------------------- + +export type RunPolicy = { + maxRetries: number; +}; + +export type OrchestratorInput = { + plan: Plan; + fixtureDir: string; + actions: ActionHandlers; + reports: ReportSink; + testRunner: TestRunner; + policy: RunPolicy; +}; + +export type EpicOutcome = { + epicId: string; + status: 'completed' | 'halted'; +}; + +export type SliceOutcome = { + sliceId: string; + status: 'completed' | 'halted'; +}; + +export type OrchestratorResult = { + status: 'completed' | 'halted'; + reason?: string; + reports: string[]; + epics: EpicOutcome[]; + slices: SliceOutcome[]; +}; + +export interface Orchestrator { + run(input: OrchestratorInput): Promise; +} From e1f555394508bf6e8d181babe9eeaba120c1544f Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 16:05:31 +0200 Subject: [PATCH 03/22] =?UTF-8?q?FE-730:=20cards=2010=E2=80=9315=20?= =?UTF-8?q?=E2=80=94=20plan=20loader,=20test=20runner,=20worktree,=20pi-ac?= =?UTF-8?q?tions,=20CLI,=20fixture?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - plan-loader.ts: YAML parsing with validation (3 tests) - test-runner.ts: BunTestRunner wrapping `bun test` - worktree.ts: createWorktree with .cook/runs//worktree/ (2 tests) - file-report-sink.ts: JSONL-backed ReportSink with stdout streaming - pi-actions.ts: createPiActions() dispatching pi CLI for each agent role - prompts/: test-writer.md, code-writer.md, evaluator.md - cook-cli.ts: parseCookArgs + runCook wiring everything together (5 tests) - cli.ts: `brunch cook` command registered alongside agent - fixtures/txt/plan.yaml: Fixture #1 (2 epics, 5 slices) - 34 orchestrator tests pass; build clean with cook-cli chunk --- config/vite-server-runtime.ts | 4 + fixtures/txt/package.json | 4 + fixtures/txt/plan.yaml | 55 +++++++ fixtures/txt/src/cli.ts | 11 ++ fixtures/txt/tests/version.test.ts | 93 +++++++++++ package-lock.json | 40 ++--- package.json | 1 + src/orchestrator/prompts/code-writer.md | 10 ++ src/orchestrator/prompts/evaluator.md | 11 ++ src/orchestrator/prompts/test-writer.md | 11 ++ src/orchestrator/src/cook-cli.test.ts | 30 ++++ src/orchestrator/src/cook-cli.ts | 91 +++++++++++ src/orchestrator/src/file-report-sink.ts | 38 +++++ src/orchestrator/src/pi-actions.ts | 188 +++++++++++++++++++++++ src/orchestrator/src/plan-loader.test.ts | 51 ++++++ src/orchestrator/src/plan-loader.ts | 19 +++ src/orchestrator/src/test-runner.ts | 23 +++ src/orchestrator/src/worktree.test.ts | 34 ++++ src/orchestrator/src/worktree.ts | 15 ++ src/server/cli.ts | 18 ++- 20 files changed, 720 insertions(+), 27 deletions(-) create mode 100644 fixtures/txt/package.json create mode 100644 fixtures/txt/plan.yaml create mode 100644 fixtures/txt/src/cli.ts create mode 100644 fixtures/txt/tests/version.test.ts create mode 100644 src/orchestrator/prompts/code-writer.md create mode 100644 src/orchestrator/prompts/evaluator.md create mode 100644 src/orchestrator/prompts/test-writer.md create mode 100644 src/orchestrator/src/cook-cli.test.ts create mode 100644 src/orchestrator/src/cook-cli.ts create mode 100644 src/orchestrator/src/file-report-sink.ts create mode 100644 src/orchestrator/src/pi-actions.ts create mode 100644 src/orchestrator/src/plan-loader.test.ts create mode 100644 src/orchestrator/src/plan-loader.ts create mode 100644 src/orchestrator/src/test-runner.ts create mode 100644 src/orchestrator/src/worktree.test.ts create mode 100644 src/orchestrator/src/worktree.ts diff --git a/config/vite-server-runtime.ts b/config/vite-server-runtime.ts index 255aa99b..db3c875f 100644 --- a/config/vite-server-runtime.ts +++ b/config/vite-server-runtime.ts @@ -54,6 +54,10 @@ export const createServerRuntimeConfig = ({ }, closeBundle() { copyServerPromptAssets(resolve(rootDir, 'src/server/prompts'), promptAssetsDestinationDir); + copyServerPromptAssets( + resolve(rootDir, 'src/orchestrator/prompts'), + resolve(promptAssetsDestinationDir, '..', 'orchestrator-prompts'), + ); }, }, ], diff --git a/fixtures/txt/package.json b/fixtures/txt/package.json new file mode 100644 index 00000000..afd29f1a --- /dev/null +++ b/fixtures/txt/package.json @@ -0,0 +1,4 @@ +{ + "name": "txt", + "version": "1.0.0" +} diff --git a/fixtures/txt/plan.yaml b/fixtures/txt/plan.yaml new file mode 100644 index 00000000..caf85f8b --- /dev/null +++ b/fixtures/txt/plan.yaml @@ -0,0 +1,55 @@ +epics: + - id: scaffolding + summary: "CLI scaffolding" + depends_on: [] + verification: + - kind: integration-test + target: "tests/cli-scaffolding.integration.test.ts" + + - id: text-ops + summary: "Text operations" + depends_on: [scaffolding] + verification: + - kind: integration-test + target: "tests/text-ops-pipe.integration.test.ts" + +slices: + - id: version-flag + epic_id: scaffolding + definition: "Add `--version` flag that prints the version from package.json" + depends_on: [] + verification: + - kind: unit-test + target: "tests/version.test.ts" + + - id: help-flag + epic_id: scaffolding + definition: "Add `--help` flag that lists subcommands: reverse, count, slugify" + depends_on: [version-flag] + verification: + - kind: unit-test + target: "tests/help.test.ts" + + - id: reverse + epic_id: text-ops + definition: "Add `reverse` subcommand. Pure function reverses a string. CLI wires it to argv[2]." + depends_on: [] + verification: + - kind: unit-test + target: "tests/reverse.test.ts" + + - id: count + epic_id: text-ops + definition: "Add `count` subcommand that counts whitespace-separated words. Empty input returns 0." + depends_on: [] + verification: + - kind: unit-test + target: "tests/count.test.ts" + + - id: slugify + epic_id: text-ops + definition: "Add `slugify` subcommand. Lowercase, replace non-alphanumerics with single dash, collapse multiple dashes, trim leading/trailing dashes. Handle unicode by removing diacritics." + depends_on: [] + verification: + - kind: unit-test + target: "tests/slugify.test.ts" diff --git a/fixtures/txt/src/cli.ts b/fixtures/txt/src/cli.ts new file mode 100644 index 00000000..38491c93 --- /dev/null +++ b/fixtures/txt/src/cli.ts @@ -0,0 +1,11 @@ +import pkg from "../package.json" with { type: "json" }; + +export function getVersion(): string { + return pkg.version; +} + +export function run(args: string[]): void { + if (args.includes("--version")) { + process.stdout.write(getVersion() + "\n"); + } +} diff --git a/fixtures/txt/tests/version.test.ts b/fixtures/txt/tests/version.test.ts new file mode 100644 index 00000000..7fbb84d8 --- /dev/null +++ b/fixtures/txt/tests/version.test.ts @@ -0,0 +1,93 @@ +import { describe, expect, it, spyOn } from "bun:test"; + +// These imports will fail until the implementation is created. +// The module is expected to export: +// getVersion(): string — reads version from package.json +// run(args: string[]): void — main CLI entry point; honours --version +import { getVersion, run } from "../src/cli.ts"; + +// The package.json that the implementation must read from. +import pkg from "../package.json" with { type: "json" }; + +describe("getVersion", () => { + it("returns a non-empty string", () => { + const version = getVersion(); + expect(typeof version).toBe("string"); + expect(version.length).toBeGreaterThan(0); + }); + + it("matches the version field in package.json", () => { + expect(getVersion()).toBe(pkg.version); + }); + + it("looks like a semver string (major.minor.patch)", () => { + const version = getVersion(); + expect(version).toMatch(/^\d+\.\d+\.\d+/); + }); +}); + +describe("run(['--version'])", () => { + it("writes the version to stdout", () => { + const writes: string[] = []; + const originalWrite = process.stdout.write.bind(process.stdout); + const spy = spyOn(process.stdout, "write").mockImplementation( + (chunk: string | Uint8Array) => { + writes.push(typeof chunk === "string" ? chunk : String(chunk)); + return true; + }, + ); + + try { + run(["--version"]); + } finally { + spy.mockRestore(); + } + + const output = writes.join(""); + expect(output).toContain(pkg.version); + }); + + it("exits with code 0 after printing the version", () => { + let exitCode: number | undefined; + const exitSpy = spyOn(process, "exit").mockImplementation((code?: number) => { + exitCode = code ?? 0; + return undefined as never; + }); + + const writeSpy = spyOn(process.stdout, "write").mockImplementation(() => true); + + try { + run(["--version"]); + } finally { + exitSpy.mockRestore(); + writeSpy.mockRestore(); + } + + // Either exits with 0, or doesn't call process.exit at all (both are acceptable). + if (exitCode !== undefined) { + expect(exitCode).toBe(0); + } + }); + + it("prints nothing to stderr when --version is used", () => { + const stderrWrites: string[] = []; + const stderrSpy = spyOn(process.stderr, "write").mockImplementation( + (chunk: string | Uint8Array) => { + stderrWrites.push(typeof chunk === "string" ? chunk : String(chunk)); + return true; + }, + ); + const stdoutSpy = spyOn(process.stdout, "write").mockImplementation(() => true); + const exitSpy = spyOn(process, "exit").mockImplementation(() => undefined as never); + + try { + run(["--version"]); + } finally { + stderrSpy.mockRestore(); + stdoutSpy.mockRestore(); + exitSpy.mockRestore(); + } + + expect(stderrWrites.join("")).toBe(""); + }); +}); diff --git a/package-lock.json b/package-lock.json index f43fa23f..a3a5bb94 100644 --- a/package-lock.json +++ b/package-lock.json @@ -54,6 +54,7 @@ "tw-animate-css": "^1.4.0", "use-stick-to-bottom": "^1.1.3", "xstate": "^5.30.0", + "yaml": "^2.9.0", "zod": "^4.3.6" }, "bin": { @@ -4490,9 +4491,6 @@ "arm64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -4510,9 +4508,6 @@ "arm64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MIT", "optional": true, "os": [ @@ -4530,9 +4525,6 @@ "ppc64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -4550,9 +4542,6 @@ "riscv64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -4570,9 +4559,6 @@ "riscv64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MIT", "optional": true, "os": [ @@ -4590,9 +4576,6 @@ "s390x" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -4610,9 +4593,6 @@ "x64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -4630,9 +4610,6 @@ "x64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MIT", "optional": true, "os": [ @@ -21665,6 +21642,21 @@ "integrity": "sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==", "license": "ISC" }, + "node_modules/yaml": { + "version": "2.9.0", + "resolved": "https://registry.npmjs.org/yaml/-/yaml-2.9.0.tgz", + "integrity": "sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA==", + "license": "ISC", + "bin": { + "yaml": "bin.mjs" + }, + "engines": { + "node": ">= 14.6" + }, + "funding": { + "url": "https://github.com/sponsors/eemeli" + } + }, "node_modules/yargs": { "version": "17.7.2", "resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz", diff --git a/package.json b/package.json index ce834ee1..9f80fa93 100644 --- a/package.json +++ b/package.json @@ -96,6 +96,7 @@ "tw-animate-css": "^1.4.0", "use-stick-to-bottom": "^1.1.3", "xstate": "^5.30.0", + "yaml": "^2.9.0", "zod": "^4.3.6" }, "devDependencies": { diff --git a/src/orchestrator/prompts/code-writer.md b/src/orchestrator/prompts/code-writer.md new file mode 100644 index 00000000..8d777517 --- /dev/null +++ b/src/orchestrator/prompts/code-writer.md @@ -0,0 +1,10 @@ +You are a code-writing agent. Your job is to write the minimum implementation to make existing tests pass. + +## Rules + +- Read the existing test files first to understand what's expected. +- Write the minimum code to make ALL tests pass. +- Use TypeScript with Bun conventions. +- Do NOT modify test files. +- Do NOT add features beyond what the tests require. +- Create any necessary directory structure and configuration files. diff --git a/src/orchestrator/prompts/evaluator.md b/src/orchestrator/prompts/evaluator.md new file mode 100644 index 00000000..29554391 --- /dev/null +++ b/src/orchestrator/prompts/evaluator.md @@ -0,0 +1,11 @@ +You are an evaluator agent. Your job is to assess whether a slice specification is fully satisfied by the current code. + +## Rules + +- Read the slice definition and verification targets. +- Check if test files exist and if they cover the specification. +- Run `bun test` on the verification targets to check if tests pass. +- Respond with a JSON object: { "done": true/false, "reasoning": "..." } +- "done": true means ALL verification targets pass and the slice spec is satisfied. +- "done": false means more work is needed. +- Be honest — if tests are missing, failing, or incomplete, say so. diff --git a/src/orchestrator/prompts/test-writer.md b/src/orchestrator/prompts/test-writer.md new file mode 100644 index 00000000..749e20cb --- /dev/null +++ b/src/orchestrator/prompts/test-writer.md @@ -0,0 +1,11 @@ +You are a test-writing agent. Your job is to write failing tests for a given slice specification. + +## Rules + +- Write tests that will initially FAIL because the implementation doesn't exist yet. +- Use `bun test` conventions (import { describe, expect, it } from "bun:test"). +- Each test should verify one observable behavior from the slice definition. +- Write tests to the file paths specified in the verification targets. +- Keep tests simple and focused — test behavior, not implementation. +- Create any necessary directory structure. +- Do NOT write implementation code — only tests. diff --git a/src/orchestrator/src/cook-cli.test.ts b/src/orchestrator/src/cook-cli.test.ts new file mode 100644 index 00000000..69f4c960 --- /dev/null +++ b/src/orchestrator/src/cook-cli.test.ts @@ -0,0 +1,30 @@ +import { describe, expect, it } from 'vitest'; + +import { parseCookArgs } from './cook-cli.js'; + +describe('parseCookArgs', () => { + it('parses dir only', () => { + const opts = parseCookArgs(['./fixtures/txt']); + expect(opts.dir).toContain('fixtures/txt'); + expect(opts.engine).toBe('proc'); + expect(opts.maxRetries).toBe(3); + }); + + it('parses --engine=petri', () => { + const opts = parseCookArgs(['./f', '--engine=petri']); + expect(opts.engine).toBe('petri'); + }); + + it('parses --max-retries=5', () => { + const opts = parseCookArgs(['./f', '--max-retries=5']); + expect(opts.maxRetries).toBe(5); + }); + + it('throws on missing dir', () => { + expect(() => parseCookArgs(['--engine=proc'])).toThrow('Usage'); + }); + + it('throws on unknown engine', () => { + expect(() => parseCookArgs(['./f', '--engine=unknown'])).toThrow('Unknown engine'); + }); +}); diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts new file mode 100644 index 00000000..70939fa8 --- /dev/null +++ b/src/orchestrator/src/cook-cli.ts @@ -0,0 +1,91 @@ +import { existsSync } from 'node:fs'; +import { resolve, join } from 'node:path'; + +import { PetriOrchestrator } from './engine-petri.js'; +import { ProceduralOrchestrator } from './engine-proc.js'; +import { FileReportSink } from './file-report-sink.js'; +import { createPiActions } from './pi-actions.js'; +import { loadPlan } from './plan-loader.js'; +import { BunTestRunner } from './test-runner.js'; +import type { Orchestrator } from './types.js'; +import { createWorktree } from './worktree.js'; + +export type CookOptions = { + dir: string; + engine: 'proc' | 'petri'; + maxRetries: number; +}; + +export function parseCookArgs(args: string[]): CookOptions { + let dir = ''; + let engine: 'proc' | 'petri' = 'proc'; + let maxRetries = 3; + + for (let i = 0; i < args.length; i++) { + const arg = args[i]!; + if (arg.startsWith('--engine=')) { + const val = arg.split('=')[1]!; + if (val !== 'proc' && val !== 'petri') { + throw new Error(`Unknown engine: ${val}. Use proc or petri.`); + } + engine = val; + } else if (arg.startsWith('--max-retries=')) { + maxRetries = Number.parseInt(arg.split('=')[1]!, 10); + } else if (!arg.startsWith('-')) { + dir = arg; + } + } + + if (!dir) { + throw new Error('Usage: brunch cook [--engine=proc|petri] [--max-retries=N]'); + } + + return { dir: resolve(dir), engine, maxRetries }; +} + +export async function runCook(opts: CookOptions): Promise { + const planPath = join(opts.dir, 'plan.yaml'); + if (!existsSync(planPath)) { + // Check for codebase mode (reserved) + const codebasePlanPath = join(opts.dir, '.cook', 'plan.yaml'); + if (existsSync(codebasePlanPath)) { + console.error('Codebase mode (brownfield) is not yet implemented.'); + console.error('POC supports fixture mode only: place plan.yaml at the root of .'); + process.exit(1); + } + console.error(`No plan found at ${planPath}`); + process.exit(1); + } + + const plan = loadPlan(planPath); + const { worktreeDir, runId } = createWorktree(opts.dir); + const reportsPath = join(opts.dir, '.cook', 'runs', runId, 'reports.jsonl'); + + console.error(`[cook] Engine: ${opts.engine}`); + console.error(`[cook] Plan: ${plan.epics.length} epics, ${plan.slices.length} slices`); + console.error(`[cook] Worktree: ${worktreeDir}`); + console.error(`[cook] Reports: ${reportsPath}`); + + const reports = new FileReportSink(reportsPath); + const actions = createPiActions(); + const testRunner = new BunTestRunner(); + + const engine: Orchestrator = + opts.engine === 'petri' ? new PetriOrchestrator() : new ProceduralOrchestrator(); + + const result = await engine.run({ + plan, + fixtureDir: worktreeDir, + actions, + reports, + testRunner, + policy: { maxRetries: opts.maxRetries }, + }); + + console.error(`\n[cook] Result: ${result.status}${result.reason ? ` — ${result.reason}` : ''}`); + console.error(`[cook] Epics: ${result.epics.map((e) => `${e.epicId}:${e.status}`).join(', ')}`); + console.error(`[cook] Slices: ${result.slices.map((s) => `${s.sliceId}:${s.status}`).join(', ')}`); + console.error(`[cook] Reports: ${result.reports.length} events`); + + process.exit(result.status === 'completed' ? 0 : 1); +} diff --git a/src/orchestrator/src/file-report-sink.ts b/src/orchestrator/src/file-report-sink.ts new file mode 100644 index 00000000..e92902d1 --- /dev/null +++ b/src/orchestrator/src/file-report-sink.ts @@ -0,0 +1,38 @@ +import { appendFileSync, readFileSync, existsSync } from 'node:fs'; + +import type { ReportLine, ReportSink } from './types.js'; + +/** + * Append-only JSONL report sink backed by a file. + * Also keeps an in-memory index for getById lookups. + */ +export class FileReportSink implements ReportSink { + private lines: ReportLine[] = []; + + constructor(private readonly path: string) { + // Load existing lines if file exists (for resumability-readiness) + if (existsSync(path)) { + const content = readFileSync(path, 'utf8').trim(); + if (content) { + for (const line of content.split('\n')) { + this.lines.push(JSON.parse(line) as ReportLine); + } + } + } + } + + append(line: ReportLine): void { + this.lines.push(line); + appendFileSync(this.path, JSON.stringify(line) + '\n'); + // Stream to stdout — plain JSON per event (§12 POC UX) + console.log(JSON.stringify(line)); + } + + getById(id: string): ReportLine | undefined { + return this.lines.find((l) => l.id === id); + } + + getAll(): ReportLine[] { + return [...this.lines]; + } +} diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts new file mode 100644 index 00000000..957a6e4a --- /dev/null +++ b/src/orchestrator/src/pi-actions.ts @@ -0,0 +1,188 @@ +import { spawnSync } from 'node:child_process'; +import { join, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +import type { ActionContext, ActionHandlers, ReportSink } from './types.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const promptsDir = __dirname.includes('dist') + ? join(__dirname, '..', 'orchestrator-prompts') + : join(__dirname, '..', 'prompts'); + +function runPi(opts: { model: string; promptFile: string; task: string; worktreeDir: string }): string { + console.error(` [pi] ${opts.model} → ${opts.worktreeDir}`); + + const result = spawnSync( + 'pi', + [ + '-p', + '--no-session', + '--no-context-files', + '--mode', + 'text', + '--provider', + 'anthropic', + '--model', + opts.model, + '--system-prompt', + '', + '--append-system-prompt', + opts.promptFile, + '--tools', + 'read,write,edit,bash', + opts.task, + ], + { + cwd: opts.worktreeDir, + encoding: 'utf8', + timeout: 300_000, + maxBuffer: 10 * 1024 * 1024, + stdio: ['ignore', 'pipe', 'pipe'], + }, + ); + + if (result.error) { + throw new Error(`pi failed to start: ${result.error.message}`); + } + if (result.status !== 0) { + const stderr = result.stderr?.trim() ?? ''; + throw new Error(`pi exited with code ${result.status}${stderr ? `: ${stderr}` : ''}`); + } + + return result.stdout; +} + +/** Try to extract a JSON object from pi's text output. */ +function extractJson(raw: string): Record | undefined { + // Look for a JSON object in the output (pi may wrap it in markdown or prose) + const match = raw.match(/\{[\s\S]*?\}/); + if (!match) return undefined; + try { + return JSON.parse(match[0]) as Record; + } catch { + return undefined; + } +} + +function appendReport( + reports: ReportSink, + ctx: ActionContext, + actor: string, + event: string, + payload: Record, +): string { + const id = `rpt-${actor}-${ctx.slice.id}-${Date.now()}`; + reports.append({ + id, + ts: new Date().toISOString(), + epicId: ctx.epic.id, + sliceId: ctx.slice.id, + actor, + event, + payload, + }); + return id; +} + +export function createPiActions(): ActionHandlers { + return { + 'evaluate-done': async (ctx: ActionContext) => { + console.error(` [evaluate] slice=${ctx.slice.id}`); + const task = `Evaluate slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => v.target).join(', ')}\nDetermine if all verification targets are satisfied. Respond with a JSON object: { "done": true/false, "reasoning": "..." }`; + + try { + const raw = runPi({ + model: 'claude-haiku-4-5', + promptFile: join(promptsDir, 'evaluator.md'), + task, + worktreeDir: ctx.worktreeDir, + }); + const parsed = extractJson(raw) as { done?: boolean; reasoning?: string } | undefined; + const done = !!parsed?.done; + console.error(` [evaluate] result: ${done ? 'YES' : 'NO'}`); + return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { + done, + reasoning: parsed?.reasoning ?? raw.slice(0, 200), + }); + } catch (err) { + console.error(` [evaluate] failed: ${err instanceof Error ? err.message : err}`); + return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { + done: false, + reasoning: `evaluation failed: ${err instanceof Error ? err.message : String(err)}`, + }); + } + }, + + 'write-tests': async (ctx: ActionContext) => { + console.error(` [write-tests] slice=${ctx.slice.id}`); + const task = `Write failing tests for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nWrite test files that will initially fail. Use bun test conventions.`; + + runPi({ + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'test-writer.md'), + task, + worktreeDir: ctx.worktreeDir, + }); + + return appendReport(ctx.reports, ctx, 'test-writer', 'tests-written', { + sliceId: ctx.slice.id, + targets: ctx.slice.verification.map((v) => v.target), + }); + }, + + 'write-code': async (ctx: ActionContext) => { + console.error(` [write-code] slice=${ctx.slice.id}`); + const task = `Write code to make tests pass for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nImplement the minimum code to make all tests pass.`; + + runPi({ + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'code-writer.md'), + task, + worktreeDir: ctx.worktreeDir, + }); + + return appendReport(ctx.reports, ctx, 'code-writer', 'code-written', { + sliceId: ctx.slice.id, + }); + }, + + 'verify-epic': async (ctx: ActionContext) => { + console.error(` [verify-epic] epic=${ctx.epic.id}`); + const targets = ctx.epic.verification.map((v) => `${v.kind}: ${v.target}`).join(', '); + + // Step 1: write the integration test if it doesn't exist + const writeTask = `Write an integration test for epic "${ctx.epic.id}": ${ctx.epic.summary}\nThis test should verify that all slices in this epic work together correctly.\nVerification targets: ${targets}\nWrite the test file(s) using bun test conventions. Then run them with "bun test" to verify they pass.`; + + console.error(` [verify-epic] writing + running integration tests`); + runPi({ + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'evaluator.md'), + task: writeTask, + worktreeDir: ctx.worktreeDir, + }); + + // Step 2: run the verification targets deterministically + let allPassed = true; + for (const v of ctx.epic.verification) { + try { + const { execSync } = await import('node:child_process'); + execSync(`bun test ${v.target}`, { + cwd: ctx.worktreeDir, + encoding: 'utf8', + timeout: 60_000, + stdio: ['ignore', 'pipe', 'pipe'], + }); + console.error(` [verify-epic] ${v.target}: PASS`); + } catch { + console.error(` [verify-epic] ${v.target}: FAIL`); + allPassed = false; + } + } + + console.error(` [verify-epic] result: ${allPassed ? 'PASS' : 'FAIL'}`); + return appendReport(ctx.reports, ctx, 'orchestrator', 'epic-verified', { + passed: allPassed, + }); + }, + }; +} diff --git a/src/orchestrator/src/plan-loader.test.ts b/src/orchestrator/src/plan-loader.test.ts new file mode 100644 index 00000000..838de692 --- /dev/null +++ b/src/orchestrator/src/plan-loader.test.ts @@ -0,0 +1,51 @@ +import { mkdtempSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { describe, expect, it } from 'vitest'; + +import { loadPlan } from './plan-loader.js'; + +describe('loadPlan', () => { + it('parses a valid plan.yaml', () => { + const dir = mkdtempSync(join(tmpdir(), 'cook-plan-')); + const yamlPath = join(dir, 'plan.yaml'); + writeFileSync( + yamlPath, + ` +epics: + - id: e1 + summary: "First" + depends_on: [] + verification: [] +slices: + - id: s1 + epic_id: e1 + definition: "Do something" + depends_on: [] + verification: + - kind: unit-test + target: "tests/s1.test.ts" +`, + ); + + const plan = loadPlan(yamlPath); + expect(plan.epics).toHaveLength(1); + expect(plan.epics[0]!.id).toBe('e1'); + expect(plan.slices).toHaveLength(1); + expect(plan.slices[0]!.epic_id).toBe('e1'); + expect(plan.slices[0]!.verification).toEqual([{ kind: 'unit-test', target: 'tests/s1.test.ts' }]); + }); + + it('throws on missing file', () => { + expect(() => loadPlan('/tmp/nonexistent-plan.yaml')).toThrow(); + }); + + it('throws on invalid structure (no epics)', () => { + const dir = mkdtempSync(join(tmpdir(), 'cook-plan-')); + const yamlPath = join(dir, 'bad.yaml'); + writeFileSync(yamlPath, 'foo: bar\n'); + + expect(() => loadPlan(yamlPath)).toThrow('missing or non-array'); + }); +}); diff --git a/src/orchestrator/src/plan-loader.ts b/src/orchestrator/src/plan-loader.ts new file mode 100644 index 00000000..7505dd3d --- /dev/null +++ b/src/orchestrator/src/plan-loader.ts @@ -0,0 +1,19 @@ +import { readFileSync } from 'node:fs'; + +import { parse } from 'yaml'; + +import type { Plan } from './types.js'; + +export function loadPlan(yamlPath: string): Plan { + const raw = readFileSync(yamlPath, 'utf8'); + const parsed = parse(raw) as Plan; + + if (!Array.isArray(parsed?.epics)) { + throw new Error(`Invalid plan: missing or non-array "epics" in ${yamlPath}`); + } + if (!Array.isArray(parsed?.slices)) { + throw new Error(`Invalid plan: missing or non-array "slices" in ${yamlPath}`); + } + + return parsed; +} diff --git a/src/orchestrator/src/test-runner.ts b/src/orchestrator/src/test-runner.ts new file mode 100644 index 00000000..29effd72 --- /dev/null +++ b/src/orchestrator/src/test-runner.ts @@ -0,0 +1,23 @@ +import { execSync } from 'node:child_process'; + +import type { TestResult, TestRunner } from './types.js'; + +export class BunTestRunner implements TestRunner { + async run(target: string, worktreeDir: string): Promise { + try { + const output = execSync(`bun test ${target}`, { + cwd: worktreeDir, + encoding: 'utf8', + timeout: 60_000, + stdio: ['ignore', 'pipe', 'pipe'], + }); + return { passed: true, output }; + } catch (err) { + const output = + err && typeof err === 'object' && 'stdout' in err + ? String((err as { stdout: unknown }).stdout) + : String(err); + return { passed: false, output }; + } + } +} diff --git a/src/orchestrator/src/worktree.test.ts b/src/orchestrator/src/worktree.test.ts new file mode 100644 index 00000000..b61e3019 --- /dev/null +++ b/src/orchestrator/src/worktree.test.ts @@ -0,0 +1,34 @@ +import { existsSync, mkdtempSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { describe, expect, it, afterEach } from 'vitest'; + +import { createWorktree } from './worktree.js'; + +describe('createWorktree', () => { + const dirs: string[] = []; + afterEach(() => { + for (const d of dirs) rmSync(d, { recursive: true, force: true }); + dirs.length = 0; + }); + + it('creates worktree directory under .cook/runs//worktree/', () => { + const fixtureDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); + dirs.push(fixtureDir); + + const info = createWorktree(fixtureDir, 'test-run-1'); + expect(info.runId).toBe('test-run-1'); + expect(info.worktreeDir).toBe(join(fixtureDir, '.cook', 'runs', 'test-run-1', 'worktree')); + expect(existsSync(info.worktreeDir)).toBe(true); + }); + + it('generates a runId when not provided', () => { + const fixtureDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); + dirs.push(fixtureDir); + + const info = createWorktree(fixtureDir); + expect(info.runId).toBeTruthy(); + expect(existsSync(info.worktreeDir)).toBe(true); + }); +}); diff --git a/src/orchestrator/src/worktree.ts b/src/orchestrator/src/worktree.ts new file mode 100644 index 00000000..3e45d869 --- /dev/null +++ b/src/orchestrator/src/worktree.ts @@ -0,0 +1,15 @@ +import { randomUUID } from 'node:crypto'; +import { mkdirSync } from 'node:fs'; +import { join } from 'node:path'; + +export type WorktreeInfo = { + runId: string; + worktreeDir: string; +}; + +export function createWorktree(fixtureDir: string, runId?: string): WorktreeInfo { + const id = runId ?? randomUUID(); + const worktreeDir = join(fixtureDir, '.cook', 'runs', id, 'worktree'); + mkdirSync(worktreeDir, { recursive: true }); + return { runId: id, worktreeDir }; +} diff --git a/src/server/cli.ts b/src/server/cli.ts index f879eb00..c1dd424c 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -13,16 +13,28 @@ const launchCwd = process.env.BRUNCH_LAUNCH_CWD || process.cwd(); loadLocalEnvFile(launchCwd); if (args.has('--help') || args.has('-h') || args.has('help')) { - console.log('Usage: brunch [agent]'); + console.log('Usage: brunch [command]'); console.log(''); console.log('Launch the Brunch web UI in the current project directory.'); console.log(''); console.log('Commands:'); - console.log(' agent Run a JSONL capability session on stdin/stdout.'); + console.log(' agent Run a JSONL capability session on stdin/stdout.'); + console.log(' cook [flags] Run the orchestrator on a plan directory.'); + console.log(''); + console.log('Cook flags:'); + console.log(' --engine=proc|petri Execution engine (default: proc)'); + console.log(' --max-retries=N Retry budget per slice (default: 3)'); process.exit(0); } -if (rawArgs[0] === 'agent') { +if (rawArgs[0] === 'cook') { + const { parseCookArgs, runCook } = await import('../orchestrator/src/cook-cli.js'); + const opts = parseCookArgs(rawArgs.slice(1)); + runCook(opts).catch((error) => { + console.error('Failed to run brunch cook:', error); + process.exit(1); + }); +} else if (rawArgs[0] === 'agent') { const project = resolveBrunchProject(launchCwd); const db = createDb(project.dbPath); runAgentJsonlSession({ db, input: process.stdin, output: process.stdout, projectCwd: project.cwd }) From 78401fb445e4b352102184f4f0e83db362ea134a Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 16:47:22 +0200 Subject: [PATCH 04/22] =?UTF-8?q?FE-730:=20align=20spec/design/cards=20?= =?UTF-8?q?=E2=80=94=20cwd-scoped=20worktree=20isolation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Design doc §8: worktree at /.cook/runs/ not /.cook/runs/ R49, D159-K, I123-K: updated to cwd-scoped worktree Lexicon: worktree entry clarifies cwd-scoped Card 16 scoped: cwd worktree + fixture cleanup Co-authored-by: Amp --- .gitignore | 1 + docs/design/orchestrator.md | 10 ++- fixtures/txt/package.json | 4 -- fixtures/txt/src/cli.ts | 11 ---- fixtures/txt/tests/version.test.ts | 93 --------------------------- memory/SPEC.md | 8 +-- src/orchestrator/src/cook-cli.ts | 5 +- src/orchestrator/src/worktree.test.ts | 32 ++++++--- src/orchestrator/src/worktree.ts | 12 +++- 9 files changed, 48 insertions(+), 128 deletions(-) delete mode 100644 fixtures/txt/package.json delete mode 100644 fixtures/txt/src/cli.ts delete mode 100644 fixtures/txt/tests/version.test.ts diff --git a/.gitignore b/.gitignore index 84ab885b..489e2363 100644 --- a/.gitignore +++ b/.gitignore @@ -35,6 +35,7 @@ dist-ssr bun.lock .notes.md .brunch/ +.cook/ brunch.db* todo.txt diff --git a/docs/design/orchestrator.md b/docs/design/orchestrator.md index 7377493b..c267da2b 100644 --- a/docs/design/orchestrator.md +++ b/docs/design/orchestrator.md @@ -229,9 +229,15 @@ POC implements fixture mode end-to-end; codebase mode returns a structured "not ## 8. Worktree isolation -Each run gets an isolated worktree at `/.cook/runs//worktree/`. Agents write freely inside; anything outside the worktree stays untouched. No commits, no pushes. Recovery = throw the worktree away and start a new run. +Each run gets an isolated worktree at `/.cook/runs//worktree/`, where `` is the directory the user invoked `brunch cook` from (not the fixture/plan directory). Reports land alongside at `/.cook/runs//reports.jsonl`. Agents write freely inside the worktree; the fixture directory (``) and the invoking repo are never mutated. No commits, no pushes. Recovery = throw the worktree away and start a new run. -This addresses the PRD's "the orchestrator only writes to its own output" requirement. The interpretation is operational, not literal: file writes happen inside the worktree, and the worktree lives under `` so that artifacts are discoverable next to the input — but the source repo (whatever `` is or contains) is never mutated. +The run location is cwd-scoped rather than fixture-scoped so that: + +- **Fixtures stay pristine.** Checked-in fixture directories (e.g. `fixtures/txt/`) contain only `plan.yaml` and are byte-identical before and after a run. +- **No path traversal.** Because the worktree is not a descendant of the fixture dir, agents cannot accidentally read or write fixture-level files. +- **Easy cleanup.** `rm -rf .cook/runs/` in the invoking directory clears all run history. `.cook/` is gitignored at the repo level. + +`--worktree ` overrides the default location for explicit pinning. ## 9. Verification stance diff --git a/fixtures/txt/package.json b/fixtures/txt/package.json deleted file mode 100644 index afd29f1a..00000000 --- a/fixtures/txt/package.json +++ /dev/null @@ -1,4 +0,0 @@ -{ - "name": "txt", - "version": "1.0.0" -} diff --git a/fixtures/txt/src/cli.ts b/fixtures/txt/src/cli.ts deleted file mode 100644 index 38491c93..00000000 --- a/fixtures/txt/src/cli.ts +++ /dev/null @@ -1,11 +0,0 @@ -import pkg from "../package.json" with { type: "json" }; - -export function getVersion(): string { - return pkg.version; -} - -export function run(args: string[]): void { - if (args.includes("--version")) { - process.stdout.write(getVersion() + "\n"); - } -} diff --git a/fixtures/txt/tests/version.test.ts b/fixtures/txt/tests/version.test.ts deleted file mode 100644 index 7fbb84d8..00000000 --- a/fixtures/txt/tests/version.test.ts +++ /dev/null @@ -1,93 +0,0 @@ -import { describe, expect, it, spyOn } from "bun:test"; - -// These imports will fail until the implementation is created. -// The module is expected to export: -// getVersion(): string — reads version from package.json -// run(args: string[]): void — main CLI entry point; honours --version -import { getVersion, run } from "../src/cli.ts"; - -// The package.json that the implementation must read from. -import pkg from "../package.json" with { type: "json" }; - -describe("getVersion", () => { - it("returns a non-empty string", () => { - const version = getVersion(); - expect(typeof version).toBe("string"); - expect(version.length).toBeGreaterThan(0); - }); - - it("matches the version field in package.json", () => { - expect(getVersion()).toBe(pkg.version); - }); - - it("looks like a semver string (major.minor.patch)", () => { - const version = getVersion(); - expect(version).toMatch(/^\d+\.\d+\.\d+/); - }); -}); - -describe("run(['--version'])", () => { - it("writes the version to stdout", () => { - const writes: string[] = []; - const originalWrite = process.stdout.write.bind(process.stdout); - const spy = spyOn(process.stdout, "write").mockImplementation( - (chunk: string | Uint8Array) => { - writes.push(typeof chunk === "string" ? chunk : String(chunk)); - return true; - }, - ); - - try { - run(["--version"]); - } finally { - spy.mockRestore(); - } - - const output = writes.join(""); - expect(output).toContain(pkg.version); - }); - - it("exits with code 0 after printing the version", () => { - let exitCode: number | undefined; - const exitSpy = spyOn(process, "exit").mockImplementation((code?: number) => { - exitCode = code ?? 0; - return undefined as never; - }); - - const writeSpy = spyOn(process.stdout, "write").mockImplementation(() => true); - - try { - run(["--version"]); - } finally { - exitSpy.mockRestore(); - writeSpy.mockRestore(); - } - - // Either exits with 0, or doesn't call process.exit at all (both are acceptable). - if (exitCode !== undefined) { - expect(exitCode).toBe(0); - } - }); - - it("prints nothing to stderr when --version is used", () => { - const stderrWrites: string[] = []; - const stderrSpy = spyOn(process.stderr, "write").mockImplementation( - (chunk: string | Uint8Array) => { - stderrWrites.push(typeof chunk === "string" ? chunk : String(chunk)); - return true; - }, - ); - const stdoutSpy = spyOn(process.stdout, "write").mockImplementation(() => true); - const exitSpy = spyOn(process, "exit").mockImplementation(() => undefined as never); - - try { - run(["--version"]); - } finally { - stderrSpy.mockRestore(); - stdoutSpy.mockRestore(); - exitSpy.mockRestore(); - } - - expect(stderrWrites.join("")).toBe(""); - }); -}); diff --git a/memory/SPEC.md b/memory/SPEC.md index 325a50ad..d89fe3c9 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -101,7 +101,7 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo 46. `brunch cook ` takes a plan YAML (epics → slices) and executes it end-to-end by dispatching agents through a name-keyed `ActionRegistry`. 47. Two engines (`proc` and `petri`) implement the same `Orchestrator` interface and must pass the same contract test suite. 48. `reports.jsonl` is the communication medium: tokens carry only pointers, all event content lives in the append-only log. -49. Each run gets worktree isolation at `/.cook/runs//worktree/`; source repo stays untouched. +49. Each run gets worktree isolation at `/.cook/runs//worktree/` (cwd-scoped, not fixture-scoped); fixture directory and source repo stay untouched. 50. Dual-mode CLI resolver: `/plan.yaml` = fixture (greenfield), `/.cook/plan.yaml` = codebase (brownfield, reserved). #### Provider / agent substrate @@ -205,7 +205,7 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo 156. **`reports.jsonl` is the communication medium, not just audit log** — tokens carry only `{ reportId, sliceId, epicId }` pointers; transitions communicate by appending/reading lines. The net stays narrow because the log is rich. POC: petri engine enforces token-pointer discipline internally; proc engine is free to pass data through normal function calls — the shared seam is inputs and outputs. Depends on: Requirement 48. 157. **Action dispatch is name-keyed and extensible** — engines orchestrate which action fires when; handlers own how. POC uses inline dispatch per engine; promote to a real `ActionRegistry` when a 3rd action type lands. Depends on: Requirement 46. 158. **Plan model is two-level (epics → slices), no milestones in POC** — schema is provisional pending canonical brunch plan emission. Forward-compatible for intent/design/oracle pointers. -159. **Worktree isolation per run** — agents write freely inside `/.cook/runs//worktree/`; source repo untouched. Depends on: Requirement 49. +159. **Worktree isolation per run** — agents write freely inside `/.cook/runs//worktree/` (cwd-scoped, not fixture-scoped); fixture dir and source repo untouched. Fixtures stay byte-identical before and after a run. Depends on: Requirement 49. #### Provider, prompt/context, and agent substrate @@ -255,7 +255,7 @@ Each invariant is a formalization candidate: the property is stated in human lan | I120 | Secondary chats remain conversational process containers, not workflow or semantic truth: inline rendering, collapse/reload state, turn-level context snapshot replay, and item-version-gated stale-handle refresh may organize discussion, but accepted mutations still flow through Brunch-owned handlers and changesets. | planned: chat-runtime, context-provision, changeset/app tests | Requirement 45; A94, A95; D143, D149, D153, D154 | | I121-K | Both orchestrator engines (`proc` and `petri`) pass the same contract test suite with identical observable behavior. | contract tests with fake agents/runner | Requirements 46, 47; D155-K | | I122-K | Orchestrator event content lives in `reports.jsonl`; petri engine tokens carry only `{ reportId, sliceId, epicId }` pointers. Proc engine may pass data through normal function calls — the shared seam is inputs and outputs. | contract tests | Requirement 48; D156-K | -| I123-K | Worktree isolation holds — source repo outside `/.cook/runs//worktree/` is never mutated by an orchestrator run. | integration tests | Requirement 49; D159-K | +| I123-K | Worktree isolation holds — fixture directory and source repo are never mutated by an orchestrator run; worktree is cwd-scoped at `/.cook/runs//worktree/`. | integration tests, worktree.test.ts | Requirement 49; D159-K | ## Future Direction Register @@ -366,7 +366,7 @@ Detailed card styling, typography tokens, and legacy layout minutiae are impleme | **plan (orchestrator)** | YAML file describing epics + slices with definitions, dependencies, and verifications. The orchestrator's input. | | **action (orchestrator)** | A handler in the `ActionRegistry` (e.g. `write-tests`, `write-code`, `run-tests`). Engines look up by name. | | **report** | One structured event line in `reports.jsonl`. Carries durable content; tokens carry only pointers. | -| **worktree (orchestrator)** | Isolated filesystem location where agents write during a run. Per-run; ephemeral. | +| **worktree (orchestrator)** | Isolated filesystem location where agents write during a run. Per-run; ephemeral. Cwd-scoped (`/.cook/runs//worktree/`), not fixture-scoped. | | **fixture (orchestrator)** | Packaged test scenario for the orchestrator (plan + supporting artifacts). Used to test `cook` itself. | | **fixture mode** | Greenfield execution: plan at `/plan.yaml`, empty worktree. POC default. | | **codebase mode** | Brownfield execution: plan at `/.cook/plan.yaml`, worktree seeded from ``. Designed but not implemented in POC. | diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index 70939fa8..7a9b2444 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -58,8 +58,9 @@ export async function runCook(opts: CookOptions): Promise { } const plan = loadPlan(planPath); - const { worktreeDir, runId } = createWorktree(opts.dir); - const reportsPath = join(opts.dir, '.cook', 'runs', runId, 'reports.jsonl'); + const launchCwd = process.env.BRUNCH_LAUNCH_CWD || process.cwd(); + const { worktreeDir, runDir } = createWorktree(launchCwd); + const reportsPath = join(runDir, 'reports.jsonl'); console.error(`[cook] Engine: ${opts.engine}`); console.error(`[cook] Plan: ${plan.epics.length} epics, ${plan.slices.length} slices`); diff --git a/src/orchestrator/src/worktree.test.ts b/src/orchestrator/src/worktree.test.ts index b61e3019..ea03c9be 100644 --- a/src/orchestrator/src/worktree.test.ts +++ b/src/orchestrator/src/worktree.test.ts @@ -2,7 +2,7 @@ import { existsSync, mkdtempSync, rmSync } from 'node:fs'; import { tmpdir } from 'node:os'; import { join } from 'node:path'; -import { describe, expect, it, afterEach } from 'vitest'; +import { afterEach, describe, expect, it } from 'vitest'; import { createWorktree } from './worktree.js'; @@ -13,22 +13,36 @@ describe('createWorktree', () => { dirs.length = 0; }); - it('creates worktree directory under .cook/runs//worktree/', () => { - const fixtureDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); - dirs.push(fixtureDir); + it('creates worktree under baseDir/.cook/runs//worktree/', () => { + const baseDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); + dirs.push(baseDir); - const info = createWorktree(fixtureDir, 'test-run-1'); + const info = createWorktree(baseDir, 'test-run-1'); expect(info.runId).toBe('test-run-1'); - expect(info.worktreeDir).toBe(join(fixtureDir, '.cook', 'runs', 'test-run-1', 'worktree')); + expect(info.runDir).toBe(join(baseDir, '.cook', 'runs', 'test-run-1')); + expect(info.worktreeDir).toBe(join(baseDir, '.cook', 'runs', 'test-run-1', 'worktree')); expect(existsSync(info.worktreeDir)).toBe(true); }); it('generates a runId when not provided', () => { - const fixtureDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); - dirs.push(fixtureDir); + const baseDir = mkdtempSync(join(tmpdir(), 'cook-wt-')); + dirs.push(baseDir); - const info = createWorktree(fixtureDir); + const info = createWorktree(baseDir); expect(info.runId).toBeTruthy(); expect(existsSync(info.worktreeDir)).toBe(true); }); + + it('does not write to a separate fixture directory', () => { + const baseDir = mkdtempSync(join(tmpdir(), 'cook-base-')); + const fixtureDir = mkdtempSync(join(tmpdir(), 'cook-fixture-')); + dirs.push(baseDir, fixtureDir); + + createWorktree(baseDir, 'isolated-run'); + + // Fixture dir must not have a .cook/ directory + expect(existsSync(join(fixtureDir, '.cook'))).toBe(false); + // Base dir must have it + expect(existsSync(join(baseDir, '.cook', 'runs', 'isolated-run', 'worktree'))).toBe(true); + }); }); diff --git a/src/orchestrator/src/worktree.ts b/src/orchestrator/src/worktree.ts index 3e45d869..86afc4ac 100644 --- a/src/orchestrator/src/worktree.ts +++ b/src/orchestrator/src/worktree.ts @@ -4,12 +4,18 @@ import { join } from 'node:path'; export type WorktreeInfo = { runId: string; + runDir: string; worktreeDir: string; }; -export function createWorktree(fixtureDir: string, runId?: string): WorktreeInfo { +/** + * Create an isolated run directory under `baseDir/.cook/runs//`. + * `baseDir` should be cwd (not the fixture directory) so fixtures stay pristine. + */ +export function createWorktree(baseDir: string, runId?: string): WorktreeInfo { const id = runId ?? randomUUID(); - const worktreeDir = join(fixtureDir, '.cook', 'runs', id, 'worktree'); + const runDir = join(baseDir, '.cook', 'runs', id); + const worktreeDir = join(runDir, 'worktree'); mkdirSync(worktreeDir, { recursive: true }); - return { runId: id, worktreeDir }; + return { runId: id, runDir, worktreeDir }; } From 532095127594eda13b12f8feb8359d9ec0d03914 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 18:25:50 +0200 Subject: [PATCH 05/22] FE-730: clean CLI output with elapsed timing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - cook-cli: structured header/footer with engine, plan, worktree, retries; per-epic/slice result table; total duration - pi-actions: elapsed timer from session start, compact one-line-per-action with icons (▸ start, ✓ done, ✗ fail, ● verdict, ○ needs work, ? evaluate) - file-report-sink: stop streaming raw JSON to stdout; JSON stays in file only - 35 tests pass, build clean --- src/orchestrator/src/cook-cli.test.ts | 6 ++ src/orchestrator/src/cook-cli.ts | 69 ++++++++++++++---- src/orchestrator/src/file-report-sink.ts | 5 +- src/orchestrator/src/pi-actions.ts | 92 ++++++++++++++++++------ src/server/cli.ts | 1 + 5 files changed, 135 insertions(+), 38 deletions(-) diff --git a/src/orchestrator/src/cook-cli.test.ts b/src/orchestrator/src/cook-cli.test.ts index 69f4c960..d61bc008 100644 --- a/src/orchestrator/src/cook-cli.test.ts +++ b/src/orchestrator/src/cook-cli.test.ts @@ -8,6 +8,7 @@ describe('parseCookArgs', () => { expect(opts.dir).toContain('fixtures/txt'); expect(opts.engine).toBe('proc'); expect(opts.maxRetries).toBe(3); + expect(opts.verbose).toBe(false); }); it('parses --engine=petri', () => { @@ -27,4 +28,9 @@ describe('parseCookArgs', () => { it('throws on unknown engine', () => { expect(() => parseCookArgs(['./f', '--engine=unknown'])).toThrow('Unknown engine'); }); + + it('parses --verbose', () => { + expect(parseCookArgs(['./f', '--verbose']).verbose).toBe(true); + expect(parseCookArgs(['./f', '-v']).verbose).toBe(true); + }); }); diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index 7a9b2444..d724fe2b 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -1,5 +1,5 @@ import { existsSync } from 'node:fs'; -import { resolve, join } from 'node:path'; +import { join, resolve } from 'node:path'; import { PetriOrchestrator } from './engine-petri.js'; import { ProceduralOrchestrator } from './engine-proc.js'; @@ -14,12 +14,14 @@ export type CookOptions = { dir: string; engine: 'proc' | 'petri'; maxRetries: number; + verbose: boolean; }; export function parseCookArgs(args: string[]): CookOptions { let dir = ''; let engine: 'proc' | 'petri' = 'proc'; let maxRetries = 3; + let verbose = false; for (let i = 0; i < args.length; i++) { const arg = args[i]!; @@ -31,22 +33,32 @@ export function parseCookArgs(args: string[]): CookOptions { engine = val; } else if (arg.startsWith('--max-retries=')) { maxRetries = Number.parseInt(arg.split('=')[1]!, 10); + } else if (arg === '--verbose' || arg === '-v') { + verbose = true; } else if (!arg.startsWith('-')) { dir = arg; } } if (!dir) { - throw new Error('Usage: brunch cook [--engine=proc|petri] [--max-retries=N]'); + throw new Error('Usage: brunch cook [--engine=proc|petri] [--max-retries=N] [--verbose]'); } - return { dir: resolve(dir), engine, maxRetries }; + return { dir: resolve(dir), engine, maxRetries, verbose }; +} + +function fmtDuration(ms: number): string { + if (ms < 1000) return `${ms}ms`; + const s = ms / 1000; + if (s < 60) return `${s.toFixed(1)}s`; + const m = Math.floor(s / 60); + const rem = s % 60; + return `${m}m ${rem.toFixed(0)}s`; } export async function runCook(opts: CookOptions): Promise { const planPath = join(opts.dir, 'plan.yaml'); if (!existsSync(planPath)) { - // Check for codebase mode (reserved) const codebasePlanPath = join(opts.dir, '.cook', 'plan.yaml'); if (existsSync(codebasePlanPath)) { console.error('Codebase mode (brownfield) is not yet implemented.'); @@ -62,18 +74,28 @@ export async function runCook(opts: CookOptions): Promise { const { worktreeDir, runDir } = createWorktree(launchCwd); const reportsPath = join(runDir, 'reports.jsonl'); - console.error(`[cook] Engine: ${opts.engine}`); - console.error(`[cook] Plan: ${plan.epics.length} epics, ${plan.slices.length} slices`); - console.error(`[cook] Worktree: ${worktreeDir}`); - console.error(`[cook] Reports: ${reportsPath}`); + const epicCount = plan.epics.length; + const sliceCount = plan.slices.length; + + console.error(''); + console.error(` brunch cook`); + console.error(` ──────────────────────────────────────`); + console.error(` engine ${opts.engine}`); + console.error(` plan ${epicCount} epics, ${sliceCount} slices`); + console.error(` retries ${opts.maxRetries}`); + console.error(` worktree ${worktreeDir}`); + console.error(` reports ${reportsPath}`); + console.error(''); const reports = new FileReportSink(reportsPath); - const actions = createPiActions(); + const actions = createPiActions({ verbose: opts.verbose }); const testRunner = new BunTestRunner(); const engine: Orchestrator = opts.engine === 'petri' ? new PetriOrchestrator() : new ProceduralOrchestrator(); + const t0 = Date.now(); + const result = await engine.run({ plan, fixtureDir: worktreeDir, @@ -83,10 +105,29 @@ export async function runCook(opts: CookOptions): Promise { policy: { maxRetries: opts.maxRetries }, }); - console.error(`\n[cook] Result: ${result.status}${result.reason ? ` — ${result.reason}` : ''}`); - console.error(`[cook] Epics: ${result.epics.map((e) => `${e.epicId}:${e.status}`).join(', ')}`); - console.error(`[cook] Slices: ${result.slices.map((s) => `${s.sliceId}:${s.status}`).join(', ')}`); - console.error(`[cook] Reports: ${result.reports.length} events`); + const duration = fmtDuration(Date.now() - t0); + const ok = result.status === 'completed'; + + console.error(''); + console.error(` ──────────────────────────────────────`); + console.error( + ` ${ok ? '✓' : '✗'} ${result.status}${result.reason ? ` — ${result.reason}` : ''} (${duration})`, + ); + console.error(''); + + for (const e of result.epics) { + const icon = e.status === 'completed' ? '✓' : '✗'; + const slices = result.slices.filter( + (s) => plan.slices.find((ps) => ps.id === s.sliceId)?.epic_id === e.epicId, + ); + const sliceSummary = slices.map((s) => `${s.status === 'completed' ? '✓' : '✗'} ${s.sliceId}`).join(' '); + console.error(` ${icon} ${e.epicId}`); + console.error(` ${sliceSummary}`); + } + + console.error(''); + console.error(` ${result.reports.length} events → ${reportsPath}`); + console.error(''); - process.exit(result.status === 'completed' ? 0 : 1); + process.exit(ok ? 0 : 1); } diff --git a/src/orchestrator/src/file-report-sink.ts b/src/orchestrator/src/file-report-sink.ts index e92902d1..8e53e453 100644 --- a/src/orchestrator/src/file-report-sink.ts +++ b/src/orchestrator/src/file-report-sink.ts @@ -1,4 +1,4 @@ -import { appendFileSync, readFileSync, existsSync } from 'node:fs'; +import { appendFileSync, existsSync, readFileSync } from 'node:fs'; import type { ReportLine, ReportSink } from './types.js'; @@ -10,7 +10,6 @@ export class FileReportSink implements ReportSink { private lines: ReportLine[] = []; constructor(private readonly path: string) { - // Load existing lines if file exists (for resumability-readiness) if (existsSync(path)) { const content = readFileSync(path, 'utf8').trim(); if (content) { @@ -24,8 +23,6 @@ export class FileReportSink implements ReportSink { append(line: ReportLine): void { this.lines.push(line); appendFileSync(this.path, JSON.stringify(line) + '\n'); - // Stream to stdout — plain JSON per event (§12 POC UX) - console.log(JSON.stringify(line)); } getById(id: string): ReportLine | undefined { diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts index 957a6e4a..f38ef057 100644 --- a/src/orchestrator/src/pi-actions.ts +++ b/src/orchestrator/src/pi-actions.ts @@ -1,5 +1,5 @@ import { spawnSync } from 'node:child_process'; -import { join, dirname } from 'node:path'; +import { dirname, join } from 'node:path'; import { fileURLToPath } from 'node:url'; import type { ActionContext, ActionHandlers, ReportSink } from './types.js'; @@ -9,8 +9,45 @@ const promptsDir = __dirname.includes('dist') ? join(__dirname, '..', 'orchestrator-prompts') : join(__dirname, '..', 'prompts'); -function runPi(opts: { model: string; promptFile: string; task: string; worktreeDir: string }): string { - console.error(` [pi] ${opts.model} → ${opts.worktreeDir}`); +// --------------------------------------------------------------------------- +// Logging +// --------------------------------------------------------------------------- + +const t0 = Date.now(); +let _verbose = false; + +function elapsed(): string { + const s = ((Date.now() - t0) / 1000).toFixed(1); + return `${s}s`.padStart(7); +} + +function log(icon: string, msg: string): void { + console.error(` ${elapsed()} ${icon} ${msg}`); +} + +function logVerbose(output: string): void { + if (!_verbose) return; + const trimmed = output.trim(); + if (!trimmed) return; + console.error(''); + for (const line of trimmed.split('\n')) { + console.error(` │ ${line}`); + } + console.error(''); +} + +// --------------------------------------------------------------------------- +// Pi dispatch +// --------------------------------------------------------------------------- + +function runPi(opts: { + label: string; + model: string; + promptFile: string; + task: string; + worktreeDir: string; +}): string { + const start = Date.now(); const result = spawnSync( 'pi', @@ -41,20 +78,24 @@ function runPi(opts: { model: string; promptFile: string; task: string; worktree }, ); + const dur = ((Date.now() - start) / 1000).toFixed(1); + if (result.error) { throw new Error(`pi failed to start: ${result.error.message}`); } if (result.status !== 0) { const stderr = result.stderr?.trim() ?? ''; - throw new Error(`pi exited with code ${result.status}${stderr ? `: ${stderr}` : ''}`); + throw new Error(`pi exited ${result.status}${stderr ? `: ${stderr}` : ''}`); } + log('✓', `${opts.label} (${dur}s)`); + logVerbose(result.stdout); + return result.stdout; } /** Try to extract a JSON object from pi's text output. */ function extractJson(raw: string): Record | undefined { - // Look for a JSON object in the output (pi may wrap it in markdown or prose) const match = raw.match(/\{[\s\S]*?\}/); if (!match) return undefined; try { @@ -84,14 +125,21 @@ function appendReport( return id; } -export function createPiActions(): ActionHandlers { +// --------------------------------------------------------------------------- +// Actions +// --------------------------------------------------------------------------- + +export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { + _verbose = opts?.verbose ?? false; + return { 'evaluate-done': async (ctx: ActionContext) => { - console.error(` [evaluate] slice=${ctx.slice.id}`); + log('?', `evaluate ${ctx.slice.id}`); const task = `Evaluate slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => v.target).join(', ')}\nDetermine if all verification targets are satisfied. Respond with a JSON object: { "done": true/false, "reasoning": "..." }`; try { const raw = runPi({ + label: `evaluate ${ctx.slice.id}`, model: 'claude-haiku-4-5', promptFile: join(promptsDir, 'evaluator.md'), task, @@ -99,13 +147,13 @@ export function createPiActions(): ActionHandlers { }); const parsed = extractJson(raw) as { done?: boolean; reasoning?: string } | undefined; const done = !!parsed?.done; - console.error(` [evaluate] result: ${done ? 'YES' : 'NO'}`); + log(done ? '●' : '○', `verdict ${ctx.slice.id} → ${done ? 'DONE' : 'NEEDS WORK'}`); return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { done, reasoning: parsed?.reasoning ?? raw.slice(0, 200), }); } catch (err) { - console.error(` [evaluate] failed: ${err instanceof Error ? err.message : err}`); + log('✗', `evaluate ${ctx.slice.id} — ${err instanceof Error ? err.message : err}`); return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { done: false, reasoning: `evaluation failed: ${err instanceof Error ? err.message : String(err)}`, @@ -114,10 +162,11 @@ export function createPiActions(): ActionHandlers { }, 'write-tests': async (ctx: ActionContext) => { - console.error(` [write-tests] slice=${ctx.slice.id}`); + log('▸', `tests ${ctx.slice.id}`); const task = `Write failing tests for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nWrite test files that will initially fail. Use bun test conventions.`; runPi({ + label: `tests ${ctx.slice.id}`, model: 'claude-sonnet-4-6', promptFile: join(promptsDir, 'test-writer.md'), task, @@ -131,10 +180,11 @@ export function createPiActions(): ActionHandlers { }, 'write-code': async (ctx: ActionContext) => { - console.error(` [write-code] slice=${ctx.slice.id}`); + log('▸', `code ${ctx.slice.id}`); const task = `Write code to make tests pass for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nImplement the minimum code to make all tests pass.`; runPi({ + label: `code ${ctx.slice.id}`, model: 'claude-sonnet-4-6', promptFile: join(promptsDir, 'code-writer.md'), task, @@ -147,39 +197,41 @@ export function createPiActions(): ActionHandlers { }, 'verify-epic': async (ctx: ActionContext) => { - console.error(` [verify-epic] epic=${ctx.epic.id}`); + log('▸', `verify ${ctx.epic.id}`); const targets = ctx.epic.verification.map((v) => `${v.kind}: ${v.target}`).join(', '); - // Step 1: write the integration test if it doesn't exist const writeTask = `Write an integration test for epic "${ctx.epic.id}": ${ctx.epic.summary}\nThis test should verify that all slices in this epic work together correctly.\nVerification targets: ${targets}\nWrite the test file(s) using bun test conventions. Then run them with "bun test" to verify they pass.`; - console.error(` [verify-epic] writing + running integration tests`); runPi({ + label: `verify ${ctx.epic.id} (write)`, model: 'claude-sonnet-4-6', promptFile: join(promptsDir, 'evaluator.md'), task: writeTask, worktreeDir: ctx.worktreeDir, }); - // Step 2: run the verification targets deterministically let allPassed = true; for (const v of ctx.epic.verification) { try { const { execSync } = await import('node:child_process'); - execSync(`bun test ${v.target}`, { + const output = execSync(`bun test ${v.target}`, { cwd: ctx.worktreeDir, encoding: 'utf8', timeout: 60_000, stdio: ['ignore', 'pipe', 'pipe'], }); - console.error(` [verify-epic] ${v.target}: PASS`); - } catch { - console.error(` [verify-epic] ${v.target}: FAIL`); + log('✓', `verify ${v.target}`); + logVerbose(output); + } catch (err) { + log('✗', `verify ${v.target}`); + if (_verbose && err && typeof err === 'object' && 'stdout' in err) { + logVerbose(String((err as { stdout: unknown }).stdout)); + } allPassed = false; } } - console.error(` [verify-epic] result: ${allPassed ? 'PASS' : 'FAIL'}`); + log(allPassed ? '●' : '✗', `epic ${ctx.epic.id} → ${allPassed ? 'PASS' : 'FAIL'}`); return appendReport(ctx.reports, ctx, 'orchestrator', 'epic-verified', { passed: allPassed, }); diff --git a/src/server/cli.ts b/src/server/cli.ts index c1dd424c..a115cc30 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -24,6 +24,7 @@ if (args.has('--help') || args.has('-h') || args.has('help')) { console.log('Cook flags:'); console.log(' --engine=proc|petri Execution engine (default: proc)'); console.log(' --max-retries=N Retry budget per slice (default: 3)'); + console.log(' --verbose, -v Show raw pi-agent output'); process.exit(0); } From b762a1ee59eb0893729f4bdfa44ceb2b6db090a9 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 20:56:56 +0200 Subject: [PATCH 06/22] =?UTF-8?q?FE-730:=20rename=20fixtureDir=20=E2=86=92?= =?UTF-8?q?=20worktreeDir=20on=20OrchestratorInput?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The field was always the agent working directory, not the fixture directory. Also removes unused ReportLine import from engine-proc. Co-authored-by: Amp --- .../0a9baf85-93e0-432b-8116-99eeb2dcd95d.json | 1 + memory/REFACTOR.md | 43 +++++++++++++++++++ src/orchestrator/src/cook-cli.ts | 2 +- src/orchestrator/src/engine-contract.test.ts | 24 +++++------ src/orchestrator/src/engine-petri.ts | 6 +-- src/orchestrator/src/engine-proc.ts | 9 ++-- src/orchestrator/src/types.ts | 12 +++--- 7 files changed, 70 insertions(+), 27 deletions(-) create mode 120000 .antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json create mode 100644 memory/REFACTOR.md diff --git a/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json b/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json new file mode 120000 index 00000000..b6ef8bba --- /dev/null +++ b/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json @@ -0,0 +1 @@ +/Users/kostandin/.gemini/config/projects/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json \ No newline at end of file diff --git a/memory/REFACTOR.md b/memory/REFACTOR.md new file mode 100644 index 00000000..1dc2a442 --- /dev/null +++ b/memory/REFACTOR.md @@ -0,0 +1,43 @@ + + +# Orchestrator review cleanup + +Batch of mechanical fixes from ln-review findings 1, 2, 3, 6, 8. + +## Problem Statement + +The orchestrator seam has a naming lie (`fixtureDir` means worktree), duplicated topo-sort, scattered report construction, dead imports, and two coexisting fake systems in the contract tests. + +## Solution + +Five tiny commits, each leaving tests green. + +## Commits + +1. **Rename `fixtureDir` → `worktreeDir` on `OrchestratorInput`** — rename the field on the type, update both engines, cook-cli, and all test references. Pure rename, no behavior change. + +2. **Generic topo-sort** — collapse `topoSort` + `topoSortSlices` into one `topoSort(items, getId, getDeps)`. Remove the two specialized copies. + +3. **Extract `createReport` helper onto `ReportSink`** — add a factory method that handles id generation + timestamp + append. Update both engines and pi-actions to use it. Removes 3 inline report-construction sites. + +4. **Migrate contract test #1 to `createFakes()`** — delete the old module-level `callOrder`/`evalCallCount`/`fakeActions`/`fakeTestRunner`. Rewrite test #1 to use the factory. ~100 lines removed. + +5. **Remove unused `ReportLine` import from engine-proc.ts**. + +## Decisions + +- `OrchestratorInput.worktreeDir` replaces `fixtureDir` as the canonical field name +- `topoSort` becomes a generic utility in its own module or at the top of engine-proc +- `createReport` is a free function, not a method on `ReportSink` interface (keeps the interface minimal) + +## Testing Decisions + +- All 36 existing tests must pass after each commit — no new tests needed since this is pure refactor +- Contract tests are the primary safety net + +## Out of Scope + +- Finding #4 (extractJson fragility) — not mechanical, needs design thought +- Finding #5 (module-level verbose state) — acceptable for single-run CLI +- Finding #7 (split verify-epic) — needs ln-design, not cleanup diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index d724fe2b..f737ea13 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -98,7 +98,7 @@ export async function runCook(opts: CookOptions): Promise { const result = await engine.run({ plan, - fixtureDir: worktreeDir, + worktreeDir, actions, reports, testRunner, diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts index fa671ce8..a24e27a9 100644 --- a/src/orchestrator/src/engine-contract.test.ts +++ b/src/orchestrator/src/engine-contract.test.ts @@ -229,7 +229,7 @@ describe('Engine contract test #1 — single epic, single slice, happy path', () const input: OrchestratorInput = { plan: simplePlan, - fixtureDir: '/tmp/fake-fixture', + worktreeDir: '/tmp/fake-fixture', actions: fakeActions(reports), reports, testRunner: fakeTestRunner, @@ -248,7 +248,7 @@ describe('Engine contract test #1 — single epic, single slice, happy path', () const input: OrchestratorInput = { plan: simplePlan, - fixtureDir: '/tmp/fake-fixture', + worktreeDir: '/tmp/fake-fixture', actions: fakeActions(reports), reports, testRunner: fakeTestRunner, @@ -268,7 +268,7 @@ describe('Engine contract test #1 — single epic, single slice, happy path', () const input: OrchestratorInput = { plan: simplePlan, - fixtureDir: '/tmp/fake-fixture', + worktreeDir: '/tmp/fake-fixture', actions: fakeActions(reports), reports, testRunner: fakeTestRunner, @@ -294,7 +294,7 @@ describe('Engine contract test #1 — single epic, single slice, happy path', () const input: OrchestratorInput = { plan: simplePlan, - fixtureDir: '/tmp/fake-fixture', + worktreeDir: '/tmp/fake-fixture', actions: fakeActions(reports), reports, testRunner: fakeTestRunner, @@ -430,7 +430,7 @@ describe('Engine contract test #2 — intra-epic slice dependencies', () => { const engine = create(); const result = await engine.run({ plan: depPlan, - fixtureDir: '/tmp/fake', + worktreeDir: '/tmp/fake', actions: depActions, reports, testRunner: depTestRunner, @@ -485,7 +485,7 @@ describe('Engine contract test #3 — epic dependencies', () => { const fakes = createFakes(); const result = await create().run({ plan: epicDepPlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -535,7 +535,7 @@ describe('Engine contract test #4 — epic verification passes', () => { const fakes = createFakes({ verifyEpicResult: true }); const result = await create().run({ plan: verifyPlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -575,7 +575,7 @@ describe('Engine contract test #5 — epic verification fails', () => { const fakes = createFakes({ verifyEpicResult: false }); const result = await create().run({ plan: verifyFailPlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -599,7 +599,7 @@ describe('Engine contract test #6 — retry loop', () => { const fakes = createFakes({ testRunResults: [false, true] }); const result = await create().run({ plan: simplePlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -625,7 +625,7 @@ describe('Engine contract test #7 — retry exhaustion', () => { const fakes = createFakes({ testRunResults: [false] }); const result = await create().run({ plan: simplePlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -648,7 +648,7 @@ describe('Engine contract test #8 — multi-cycle needs more', () => { const fakes = createFakes({ evalSequence: [false, false, true] }); const result = await create().run({ plan: simplePlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, @@ -678,7 +678,7 @@ describe('Engine contract test #9 — action handler throws', () => { const fakes = createFakes({ throwOnAction: 'write-tests' }); const result = await create().run({ plan: simplePlan, - fixtureDir: '/tmp/f', + worktreeDir: '/tmp/f', actions: fakes.actions, reports: fakes.reports, testRunner: fakes.testRunner, diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index c85c385e..38c61eb2 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -167,7 +167,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { slice, epic, plan, - worktreeDir: input.fixtureDir, + worktreeDir: input.worktreeDir, reports, }; @@ -231,7 +231,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { inputs: [p(sid, 'untested-code')], fire: async (consumed) => { const target = slice.verification[0]?.target ?? ''; - const result = await testRunner.run(target, input.fixtureDir); + const result = await testRunner.run(target, input.worktreeDir); const reportId = `rpt-run-${sid}-${Date.now()}`; reports.append({ id: reportId, @@ -322,7 +322,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { slice: epicSlices[0]!, epic, plan, - worktreeDir: input.fixtureDir, + worktreeDir: input.worktreeDir, reports, }; const reportId = await actions['verify-epic'](verifyCtx); diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index 566cfc7f..ad727558 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -5,7 +5,6 @@ import type { Orchestrator, OrchestratorInput, OrchestratorResult, - ReportLine, Slice, SliceOutcome, } from './types.js'; @@ -64,7 +63,7 @@ export class ProceduralOrchestrator implements Orchestrator { slice: epicSlices[0]!, epic, plan, - worktreeDir: input.fixtureDir, + worktreeDir: input.worktreeDir, reports, }); reportIds.push(verifyId); @@ -104,7 +103,7 @@ export class ProceduralOrchestrator implements Orchestrator { slice, epic, plan: input.plan, - worktreeDir: input.fixtureDir, + worktreeDir: input.worktreeDir, reports, }; @@ -128,7 +127,7 @@ export class ProceduralOrchestrator implements Orchestrator { // 4. Run tests (orchestrator-owned, deterministic) const target = slice.verification[0]?.target ?? ''; - let result = await testRunner.run(target, input.fixtureDir); + let result = await testRunner.run(target, input.worktreeDir); // Append a report for the test run const runReportId = `rpt-run-${Date.now()}`; reports.append({ @@ -153,7 +152,7 @@ export class ProceduralOrchestrator implements Orchestrator { const retryCodeId = await actions['write-code'](ctx); reportIds.push(retryCodeId); - result = await testRunner.run(target, input.fixtureDir); + result = await testRunner.run(target, input.worktreeDir); const retryRunId = `rpt-retry-${retry}-${Date.now()}`; reports.append({ id: retryRunId, diff --git a/src/orchestrator/src/types.ts b/src/orchestrator/src/types.ts index 5a2b6483..db148a5e 100644 --- a/src/orchestrator/src/types.ts +++ b/src/orchestrator/src/types.ts @@ -86,12 +86,12 @@ export type RunPolicy = { }; export type OrchestratorInput = { - plan: Plan; - fixtureDir: string; - actions: ActionHandlers; - reports: ReportSink; - testRunner: TestRunner; - policy: RunPolicy; + plan: Plan; + worktreeDir: string; + actions: ActionHandlers; + reports: ReportSink; + testRunner: TestRunner; + policy: RunPolicy; }; export type EpicOutcome = { From e443d45bbc3cd9f68b7dc022229bb8ecb51d43bf Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 20:57:30 +0200 Subject: [PATCH 07/22] FE-730: collapse duplicated topo-sort into one generic topoSort(items, getId, getDeps) replaces topoSort(epics) + topoSortSlices(slices). Co-authored-by: Amp --- src/orchestrator/src/engine-proc.ts | 62 ++++++++++------------------- 1 file changed, 21 insertions(+), 41 deletions(-) diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index ad727558..88ac9f4d 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -30,11 +30,11 @@ export class ProceduralOrchestrator implements Orchestrator { const sliceOutcomes: SliceOutcome[] = []; const epicOutcomes: EpicOutcome[] = []; - const epicOrder = topoSort(plan.epics); + const epicOrder = topoSort(plan.epics, (e) => e.id, (e) => e.depends_on); for (const epic of epicOrder) { const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); - const sliceOrder = topoSortSlices(epicSlices); + const sliceOrder = topoSort(epicSlices, (s) => s.id, (s) => s.depends_on); let epicHalted = false; for (const slice of sliceOrder) { @@ -180,45 +180,25 @@ export class ProceduralOrchestrator implements Orchestrator { } // --------------------------------------------------------------------------- -// Topo sort helpers +// Topo sort // --------------------------------------------------------------------------- -function topoSort(epics: Epic[]): Epic[] { - const byId = new Map(epics.map((e) => [e.id, e])); - const visited = new Set(); - const result: Epic[] = []; - - function visit(id: string) { - if (visited.has(id)) return; - visited.add(id); - const epic = byId.get(id); - if (!epic) return; - for (const dep of epic.depends_on) { - visit(dep); - } - result.push(epic); - } - - for (const e of epics) visit(e.id); - return result; -} - -function topoSortSlices(slices: Slice[]): Slice[] { - const byId = new Map(slices.map((s) => [s.id, s])); - const visited = new Set(); - const result: Slice[] = []; - - function visit(id: string) { - if (visited.has(id)) return; - visited.add(id); - const slice = byId.get(id); - if (!slice) return; - for (const dep of slice.depends_on) { - visit(dep); - } - result.push(slice); - } - - for (const s of slices) visit(s.id); - return result; +function topoSort(items: T[], getId: (item: T) => string, getDeps: (item: T) => string[]): T[] { + const byId = new Map(items.map((item) => [getId(item), item])); + const visited = new Set(); + const result: T[] = []; + + function visit(id: string) { + if (visited.has(id)) return; + visited.add(id); + const item = byId.get(id); + if (!item) return; + for (const dep of getDeps(item)) { + visit(dep); + } + result.push(item); + } + + for (const item of items) visit(getId(item)); + return result; } From 828ba76d5ce645bfd79219c16c32348839cc1240 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 20:58:57 +0200 Subject: [PATCH 08/22] =?UTF-8?q?FE-730:=20extract=20createReport=20helper?= =?UTF-8?q?=20=E2=80=94=20deduplicate=20report=20construction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit report-helpers.ts: createReport(sink, fields) handles id generation + timestamp + append. Replaces 5 inline report-construction sites across engine-proc, engine-petri, and pi-actions. Co-authored-by: Amp --- src/orchestrator/src/engine-petri.ts | 6 ++--- src/orchestrator/src/engine-proc.ts | 12 +++------- src/orchestrator/src/pi-actions.ts | 33 +++++++------------------- src/orchestrator/src/report-helpers.ts | 16 +++++++++++++ 4 files changed, 30 insertions(+), 37 deletions(-) create mode 100644 src/orchestrator/src/report-helpers.ts diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index 38c61eb2..e12d7f32 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -1,3 +1,4 @@ +import { createReport } from './report-helpers.js'; import type { ActionContext, EpicOutcome, @@ -232,10 +233,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { fire: async (consumed) => { const target = slice.verification[0]?.target ?? ''; const result = await testRunner.run(target, input.worktreeDir); - const reportId = `rpt-run-${sid}-${Date.now()}`; - reports.append({ - id: reportId, - ts: new Date().toISOString(), + const reportId = createReport(reports, { epicId: epic.id, sliceId: sid, actor: 'test-runner', diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index 88ac9f4d..143fe889 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -1,3 +1,4 @@ +import { createReport } from './report-helpers.js'; import type { ActionContext, Epic, @@ -128,11 +129,7 @@ export class ProceduralOrchestrator implements Orchestrator { // 4. Run tests (orchestrator-owned, deterministic) const target = slice.verification[0]?.target ?? ''; let result = await testRunner.run(target, input.worktreeDir); - // Append a report for the test run - const runReportId = `rpt-run-${Date.now()}`; - reports.append({ - id: runReportId, - ts: new Date().toISOString(), + const runReportId = createReport(reports, { epicId: epic.id, sliceId: slice.id, actor: 'test-runner', @@ -153,10 +150,7 @@ export class ProceduralOrchestrator implements Orchestrator { reportIds.push(retryCodeId); result = await testRunner.run(target, input.worktreeDir); - const retryRunId = `rpt-retry-${retry}-${Date.now()}`; - reports.append({ - id: retryRunId, - ts: new Date().toISOString(), + const retryRunId = createReport(reports, { epicId: epic.id, sliceId: slice.id, actor: 'test-runner', diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts index f38ef057..848d593e 100644 --- a/src/orchestrator/src/pi-actions.ts +++ b/src/orchestrator/src/pi-actions.ts @@ -2,7 +2,8 @@ import { spawnSync } from 'node:child_process'; import { dirname, join } from 'node:path'; import { fileURLToPath } from 'node:url'; -import type { ActionContext, ActionHandlers, ReportSink } from './types.js'; +import { createReport } from './report-helpers.js'; +import type { ActionContext, ActionHandlers } from './types.js'; const __dirname = dirname(fileURLToPath(import.meta.url)); const promptsDir = __dirname.includes('dist') @@ -105,24 +106,8 @@ function extractJson(raw: string): Record | undefined { } } -function appendReport( - reports: ReportSink, - ctx: ActionContext, - actor: string, - event: string, - payload: Record, -): string { - const id = `rpt-${actor}-${ctx.slice.id}-${Date.now()}`; - reports.append({ - id, - ts: new Date().toISOString(), - epicId: ctx.epic.id, - sliceId: ctx.slice.id, - actor, - event, - payload, - }); - return id; +function report(ctx: ActionContext, actor: string, event: string, payload: Record): string { + return createReport(ctx.reports, { epicId: ctx.epic.id, sliceId: ctx.slice.id, actor, event, payload }); } // --------------------------------------------------------------------------- @@ -148,13 +133,13 @@ export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { const parsed = extractJson(raw) as { done?: boolean; reasoning?: string } | undefined; const done = !!parsed?.done; log(done ? '●' : '○', `verdict ${ctx.slice.id} → ${done ? 'DONE' : 'NEEDS WORK'}`); - return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { + return report(ctx, 'evaluator', 'eval-done', { done, reasoning: parsed?.reasoning ?? raw.slice(0, 200), }); } catch (err) { log('✗', `evaluate ${ctx.slice.id} — ${err instanceof Error ? err.message : err}`); - return appendReport(ctx.reports, ctx, 'evaluator', 'eval-done', { + return report(ctx, 'evaluator', 'eval-done', { done: false, reasoning: `evaluation failed: ${err instanceof Error ? err.message : String(err)}`, }); @@ -173,7 +158,7 @@ export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { worktreeDir: ctx.worktreeDir, }); - return appendReport(ctx.reports, ctx, 'test-writer', 'tests-written', { + return report(ctx, 'test-writer', 'tests-written', { sliceId: ctx.slice.id, targets: ctx.slice.verification.map((v) => v.target), }); @@ -191,7 +176,7 @@ export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { worktreeDir: ctx.worktreeDir, }); - return appendReport(ctx.reports, ctx, 'code-writer', 'code-written', { + return report(ctx, 'code-writer', 'code-written', { sliceId: ctx.slice.id, }); }, @@ -232,7 +217,7 @@ export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { } log(allPassed ? '●' : '✗', `epic ${ctx.epic.id} → ${allPassed ? 'PASS' : 'FAIL'}`); - return appendReport(ctx.reports, ctx, 'orchestrator', 'epic-verified', { + return report(ctx, 'orchestrator', 'epic-verified', { passed: allPassed, }); }, diff --git a/src/orchestrator/src/report-helpers.ts b/src/orchestrator/src/report-helpers.ts new file mode 100644 index 00000000..25700fd4 --- /dev/null +++ b/src/orchestrator/src/report-helpers.ts @@ -0,0 +1,16 @@ +import type { ReportLine, ReportSink } from './types.js'; + +/** Create and append a report line, returning its id. */ +export function createReport( + sink: ReportSink, + fields: Omit, +): string { + const id = `rpt-${fields.actor}-${fields.sliceId || fields.epicId}-${Date.now()}`; + const line: ReportLine = { + id, + ts: new Date().toISOString(), + ...fields, + }; + sink.append(line); + return id; +} From 2bbe8aa635af6a17c92f64ad5303638ab13638c4 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 20:59:55 +0200 Subject: [PATCH 09/22] FE-730: migrate contract test #1 to createFakes() Delete old module-level callOrder/evalCallCount/fakeActions/fakeTestRunner. All 9 contract test suites now use the same createFakes() factory. ~100 lines removed. Co-authored-by: Amp --- src/orchestrator/src/engine-contract.test.ts | 184 ++++--------------- 1 file changed, 35 insertions(+), 149 deletions(-) diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts index a24e27a9..29823f20 100644 --- a/src/orchestrator/src/engine-contract.test.ts +++ b/src/orchestrator/src/engine-contract.test.ts @@ -108,88 +108,6 @@ function createFakes(opts?: { return { callOrder, reports, actions, testRunner }; } -// --------------------------------------------------------------------------- -// Helpers — fake action handlers -// --------------------------------------------------------------------------- - -let callOrder: string[] = []; -let evalCallCount = 0; - -function resetFakes() { - callOrder = []; - evalCallCount = 0; -} - -function fakeActions(reports: InMemoryReportSink): ActionHandlers { - return { - 'evaluate-done': async (ctx: ActionContext) => { - evalCallCount++; - const done = evalCallCount >= 2; // first call: NO, second: YES - const id = `rpt-eval-${evalCallCount}`; - reports.append({ - id, - ts: new Date().toISOString(), - epicId: ctx.epic.id, - sliceId: ctx.slice.id, - actor: 'evaluator', - event: 'eval-done', - payload: { done }, - }); - callOrder.push(`evaluate-done:${done ? 'YES' : 'NO'}`); - return id; - }, - 'write-tests': async (ctx: ActionContext) => { - const id = 'rpt-write-tests-1'; - reports.append({ - id, - ts: new Date().toISOString(), - epicId: ctx.epic.id, - sliceId: ctx.slice.id, - actor: 'test-writer', - event: 'tests-written', - payload: { files: ['tests/hello.test.ts'] }, - }); - callOrder.push('write-tests'); - return id; - }, - 'write-code': async (ctx: ActionContext) => { - const id = 'rpt-write-code-1'; - reports.append({ - id, - ts: new Date().toISOString(), - epicId: ctx.epic.id, - sliceId: ctx.slice.id, - actor: 'code-writer', - event: 'code-written', - payload: { files: ['src/hello.ts'] }, - }); - callOrder.push('write-code'); - return id; - }, - 'verify-epic': async (ctx: ActionContext) => { - const id = 'rpt-verify-epic-1'; - reports.append({ - id, - ts: new Date().toISOString(), - epicId: ctx.epic.id, - sliceId: '', - actor: 'orchestrator', - event: 'epic-verified', - payload: { passed: true }, - }); - callOrder.push('verify-epic'); - return id; - }, - }; -} - -const fakeTestRunner: TestRunner = { - async run(_target: string, _worktreeDir: string) { - callOrder.push('run-tests'); - return { passed: true, output: '1 test passed' }; - }, -}; - // --------------------------------------------------------------------------- // Contract test #1 — single epic, single slice, happy path // --------------------------------------------------------------------------- @@ -215,100 +133,68 @@ const simplePlan: Plan = { }; describe('Engine contract test #1 — single epic, single slice, happy path', () => { - const engines = [ - { name: 'procedural', create: () => new ProceduralOrchestrator() }, - { name: 'petri', create: () => new PetriOrchestrator() }, - ] as const; - for (const { name, create } of engines) { describe(name, () => { it("completes with status 'completed'", async () => { - resetFakes(); - const reports = new InMemoryReportSink(); - const engine = create(); - - const input: OrchestratorInput = { + const fakes = createFakes(); + const result = await create().run({ plan: simplePlan, - worktreeDir: '/tmp/fake-fixture', - actions: fakeActions(reports), - reports, - testRunner: fakeTestRunner, + worktreeDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, policy: { maxRetries: 3 }, - }; - - const result = await engine.run(input); - + }); expect(result.status).toBe('completed'); }); it('produces correct epic and slice outcomes', async () => { - resetFakes(); - const reports = new InMemoryReportSink(); - const engine = create(); - - const input: OrchestratorInput = { + const fakes = createFakes(); + const result = await create().run({ plan: simplePlan, - worktreeDir: '/tmp/fake-fixture', - actions: fakeActions(reports), - reports, - testRunner: fakeTestRunner, + worktreeDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, policy: { maxRetries: 3 }, - }; - - const result = await engine.run(input); - + }); expect(result.epics).toEqual([{ epicId: 'epic-1', status: 'completed' }]); expect(result.slices).toEqual([{ sliceId: 'slice-1', status: 'completed' }]); }); it('calls actions in correct TDD cycle order', async () => { - resetFakes(); - const reports = new InMemoryReportSink(); - const engine = create(); - - const input: OrchestratorInput = { + const fakes = createFakes(); + await create().run({ plan: simplePlan, - worktreeDir: '/tmp/fake-fixture', - actions: fakeActions(reports), - reports, - testRunner: fakeTestRunner, + worktreeDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, policy: { maxRetries: 3 }, - }; - - await engine.run(input); - - // Inner loop: evaluate(NO) → write-tests → write-code → run-tests → evaluate(YES) - expect(callOrder).toEqual([ - 'evaluate-done:NO', - 'write-tests', - 'write-code', - 'run-tests', - 'evaluate-done:YES', + }); + expect(fakes.callOrder).toEqual([ + 'slice-1:evaluate-done:NO', + 'slice-1:write-tests', + 'slice-1:write-code', + 'run-tests:pass', + 'slice-1:evaluate-done:YES', ]); }); it('report sink contains expected lines', async () => { - resetFakes(); - const reports = new InMemoryReportSink(); - const engine = create(); - - const input: OrchestratorInput = { + const fakes = createFakes(); + await create().run({ plan: simplePlan, - worktreeDir: '/tmp/fake-fixture', - actions: fakeActions(reports), - reports, - testRunner: fakeTestRunner, + worktreeDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, policy: { maxRetries: 3 }, - }; - - await engine.run(input); - - const all = reports.getAll(); - const events = all.map((r) => r.event); + }); + const events = fakes.reports.getAll().map((r) => r.event); expect(events).toContain('eval-done'); expect(events).toContain('tests-written'); expect(events).toContain('code-written'); - expect(all.length).toBeGreaterThanOrEqual(3); }); }); } From 5c3b62aab3d0efad32a4ee1837d7ac553eda9334 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 21:00:02 +0200 Subject: [PATCH 10/22] FE-730: remove exhausted REFACTOR.md Co-authored-by: Amp --- memory/REFACTOR.md | 43 ------------------------------------------- 1 file changed, 43 deletions(-) delete mode 100644 memory/REFACTOR.md diff --git a/memory/REFACTOR.md b/memory/REFACTOR.md deleted file mode 100644 index 1dc2a442..00000000 --- a/memory/REFACTOR.md +++ /dev/null @@ -1,43 +0,0 @@ - - -# Orchestrator review cleanup - -Batch of mechanical fixes from ln-review findings 1, 2, 3, 6, 8. - -## Problem Statement - -The orchestrator seam has a naming lie (`fixtureDir` means worktree), duplicated topo-sort, scattered report construction, dead imports, and two coexisting fake systems in the contract tests. - -## Solution - -Five tiny commits, each leaving tests green. - -## Commits - -1. **Rename `fixtureDir` → `worktreeDir` on `OrchestratorInput`** — rename the field on the type, update both engines, cook-cli, and all test references. Pure rename, no behavior change. - -2. **Generic topo-sort** — collapse `topoSort` + `topoSortSlices` into one `topoSort(items, getId, getDeps)`. Remove the two specialized copies. - -3. **Extract `createReport` helper onto `ReportSink`** — add a factory method that handles id generation + timestamp + append. Update both engines and pi-actions to use it. Removes 3 inline report-construction sites. - -4. **Migrate contract test #1 to `createFakes()`** — delete the old module-level `callOrder`/`evalCallCount`/`fakeActions`/`fakeTestRunner`. Rewrite test #1 to use the factory. ~100 lines removed. - -5. **Remove unused `ReportLine` import from engine-proc.ts**. - -## Decisions - -- `OrchestratorInput.worktreeDir` replaces `fixtureDir` as the canonical field name -- `topoSort` becomes a generic utility in its own module or at the top of engine-proc -- `createReport` is a free function, not a method on `ReportSink` interface (keeps the interface minimal) - -## Testing Decisions - -- All 36 existing tests must pass after each commit — no new tests needed since this is pure refactor -- Contract tests are the primary safety net - -## Out of Scope - -- Finding #4 (extractJson fragility) — not mechanical, needs design thought -- Finding #5 (module-level verbose state) — acceptable for single-run CLI -- Finding #7 (split verify-epic) — needs ln-design, not cleanup From da12df2596c2e80eb4f5122901e93caf90149b62 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 21:20:18 +0200 Subject: [PATCH 11/22] FE-730: formatting cleanup (oxfmt) Co-authored-by: Amp --- src/orchestrator/src/engine-proc.ts | 46 +++++++++++++++----------- src/orchestrator/src/report-helpers.ts | 5 +-- src/orchestrator/src/types.ts | 12 +++---- 3 files changed, 34 insertions(+), 29 deletions(-) diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index 143fe889..b7312d14 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -31,11 +31,19 @@ export class ProceduralOrchestrator implements Orchestrator { const sliceOutcomes: SliceOutcome[] = []; const epicOutcomes: EpicOutcome[] = []; - const epicOrder = topoSort(plan.epics, (e) => e.id, (e) => e.depends_on); + const epicOrder = topoSort( + plan.epics, + (e) => e.id, + (e) => e.depends_on, + ); for (const epic of epicOrder) { const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); - const sliceOrder = topoSort(epicSlices, (s) => s.id, (s) => s.depends_on); + const sliceOrder = topoSort( + epicSlices, + (s) => s.id, + (s) => s.depends_on, + ); let epicHalted = false; for (const slice of sliceOrder) { @@ -178,21 +186,21 @@ export class ProceduralOrchestrator implements Orchestrator { // --------------------------------------------------------------------------- function topoSort(items: T[], getId: (item: T) => string, getDeps: (item: T) => string[]): T[] { - const byId = new Map(items.map((item) => [getId(item), item])); - const visited = new Set(); - const result: T[] = []; - - function visit(id: string) { - if (visited.has(id)) return; - visited.add(id); - const item = byId.get(id); - if (!item) return; - for (const dep of getDeps(item)) { - visit(dep); - } - result.push(item); - } - - for (const item of items) visit(getId(item)); - return result; + const byId = new Map(items.map((item) => [getId(item), item])); + const visited = new Set(); + const result: T[] = []; + + function visit(id: string) { + if (visited.has(id)) return; + visited.add(id); + const item = byId.get(id); + if (!item) return; + for (const dep of getDeps(item)) { + visit(dep); + } + result.push(item); + } + + for (const item of items) visit(getId(item)); + return result; } diff --git a/src/orchestrator/src/report-helpers.ts b/src/orchestrator/src/report-helpers.ts index 25700fd4..b7ff6b30 100644 --- a/src/orchestrator/src/report-helpers.ts +++ b/src/orchestrator/src/report-helpers.ts @@ -1,10 +1,7 @@ import type { ReportLine, ReportSink } from './types.js'; /** Create and append a report line, returning its id. */ -export function createReport( - sink: ReportSink, - fields: Omit, -): string { +export function createReport(sink: ReportSink, fields: Omit): string { const id = `rpt-${fields.actor}-${fields.sliceId || fields.epicId}-${Date.now()}`; const line: ReportLine = { id, diff --git a/src/orchestrator/src/types.ts b/src/orchestrator/src/types.ts index db148a5e..26523927 100644 --- a/src/orchestrator/src/types.ts +++ b/src/orchestrator/src/types.ts @@ -86,12 +86,12 @@ export type RunPolicy = { }; export type OrchestratorInput = { - plan: Plan; - worktreeDir: string; - actions: ActionHandlers; - reports: ReportSink; - testRunner: TestRunner; - policy: RunPolicy; + plan: Plan; + worktreeDir: string; + actions: ActionHandlers; + reports: ReportSink; + testRunner: TestRunner; + policy: RunPolicy; }; export type EpicOutcome = { From e78dbfbe6bc3b351cf9dcd751faa2d96558da3bc Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Wed, 20 May 2026 21:23:08 +0200 Subject: [PATCH 12/22] FE-730: remove .antigravitycli/ and add to .gitignore Co-authored-by: Amp --- .antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json | 1 - .gitignore | 3 +++ 2 files changed, 3 insertions(+), 1 deletion(-) delete mode 120000 .antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json diff --git a/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json b/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json deleted file mode 120000 index b6ef8bba..00000000 --- a/.antigravitycli/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json +++ /dev/null @@ -1 +0,0 @@ -/Users/kostandin/.gemini/config/projects/0a9baf85-93e0-432b-8116-99eeb2dcd95d.json \ No newline at end of file diff --git a/.gitignore b/.gitignore index 489e2363..1ed6282a 100644 --- a/.gitignore +++ b/.gitignore @@ -53,3 +53,6 @@ tmp/ # skill quarantine .agents/_quarantine + +# antigravity cli +.antigravitycli/ From 9e1a7f0a86de84aa9d821fa00f2f3f29ee4b7e69 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 10:40:51 +0200 Subject: [PATCH 13/22] =?UTF-8?q?FE-730:=20update=20design=20doc=20?= =?UTF-8?q?=E2=80=94=20landed=20status,=20seam=20rename,=20experiment=20re?= =?UTF-8?q?sults?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Status banner: landed POC with SPEC cross-references - §2 seam: fixtureDir → worktreeDir, ActionRegistry → ActionHandlers - §3: POC note pointing to §12 deferral - §12: streaming UX row updated (implemented, not deferred) - §13: experiment results with verdict (proc wins on simplicity) Co-authored-by: Amp --- docs/design/orchestrator.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/design/orchestrator.md b/docs/design/orchestrator.md index c267da2b..1f42acb5 100644 --- a/docs/design/orchestrator.md +++ b/docs/design/orchestrator.md @@ -1,6 +1,6 @@ # Orchestrator POC — Design Proposal -> Status: **working design proposal** — exploratory design for a CLI orchestrator that consumes a brunch-shaped execution plan (epics → slices) and dispatches agents and deterministic checks to drive the plan to completion. Not yet promoted to `memory/SPEC.md`; decisions land there through `ln-spec`. Tracked as FE-730; umbrella H-6476. +> Status: **landed POC** — CLI orchestrator that consumes a brunch-shaped execution plan (epics → slices) and dispatches agents and deterministic checks to drive the plan to completion. Canonical decisions in `memory/SPEC.md` (R46–50, D155-K–D159-K, I121-K–I123-K). Tracked as FE-730; umbrella H-6476. > > Scope is intentionally narrow: two interchangeable execution engines behind a shared seam, plan-as-YAML, an append-only event log as the communication medium, and an isolated worktree per run. The 15-step build sequence, fixture definitions, and pi-agent invocation details are operational scaffolding kept separate from this doc. Code lives under `src/orchestrator/` in the brunch repo; `cook` is only the CLI subcommand name. > @@ -57,8 +57,8 @@ interface Orchestrator { type OrchestratorInput = { plan: Plan; // { epics, slices } - fixtureDir: string; - actions: ActionRegistry; // name-keyed action handlers + worktreeDir: string; // cwd-scoped isolated run directory + actions: ActionHandlers; // Record — inline dispatch (POC); ActionRegistry when productized (§12) reports: ReportSink; // append-only jsonl testRunner: TestRunner; // deterministic exec policy: RunPolicy; // { maxRetries } @@ -84,6 +84,8 @@ Both produce identical observable behavior on the contract test suite. That's th ## 3. ActionRegistry — name-keyed dispatch +> **POC note:** The POC uses inline `ActionHandlers` (a record of handler functions) instead of a formal registry class. The `ActionRegistry` interface below is the productized target — see [§12 POC scope and deferrals](#12-poc-scope-and-deferrals). + The TDD inner loop's transitions (`write-tests`, `write-code`, `run-tests`, `evaluate-done`, `verify-epic`) are not hardcoded inside the engines. They are registered handlers the engines look up by name: ```ts @@ -286,15 +288,19 @@ The design above is the target shape. The POC builds a deliberate subset and def | **Brownfield seed** | When codebase mode is used and `/.git` exists, prefer `git worktree add`; otherwise filtered copy (`rsync` excluding `.git`, `node_modules`, `dist`, `.cook/runs/`). | Not implemented. Greenfield-only execution; `mkdir` creates an empty worktree. | | **Token-pointer discipline** | Universal rule: tokens between transitions carry only `{ reportId, sliceId, epicId }` pointers; all event content lives in `reports.jsonl`. Applied across both engines. | Petrinet engine enforces this internally (it's a hard constraint of the substrate). Procedural engine is free to pass data through normal function calls — each engine handles its own state shape, the shared seam is just inputs and outputs. | | **Layer 2 adapter tests** | Per-engine internal tests (net compilation / solver / transition firing for petrinet; topo sort / inner-loop state transitions / retry counter for procedural). | Optional. Defer until a debugging need surfaces. Layer 1 (contract) + Layer 3 (integration) are mandatory; Layer 2 is added if and when it pays for itself. | -| **Streaming UX formatting** | Compact per-event lines like `[slice-1 ▸ test-writer] tests-written → 3 files`. | A plain `console.log(JSON.stringify(report))` per event is sufficient. The structured rendering is a polish item, not a correctness item. | +| **Streaming UX formatting** | Compact per-event lines like `[slice-1 ▸ test-writer] tests-written → 3 files`. | Implemented: elapsed timing, icons (▸/✓/✗/●/○), structured header/footer, `--verbose` for raw pi output. JSON stays in `reports.jsonl` only. | Rationale for deferring: each item above is "right" for the productized version and "premature" for the POC. The experiment we actually need to run is whether the Petri-net substrate earns its complexity — none of the deferred items affect that experiment's signal. Adding them now would inflate the LOC count and make the comparison muddier, not crisper. When the experiment concludes and the orchestrator productizes (or merges into something else), the deferrals become the natural follow-up backlog: lift inline dispatch into `ActionRegistry`, wire the codebase-mode resolver branch, add the seed step, etc. -## 13. Two-path experiment success criteria +## 13. Two-path experiment results + +Both engines completed Fixture #1 end-to-end. Procedural: 206 LOC, ~9 min, 23 events. Petri-net: 410 LOC, ~13 min, 27 events. Both produced a working `txt` CLI with 154 agent-written tests passing. + +**Verdict:** The procedural engine is half the code, faster to debug (stack traces point to loop lines, not fire() closures), and trivially readable. The Petri-net engine's main advantage is parallelism readiness — independent slices could fire concurrently without restructuring the engine. For serial execution, proc wins. Petri earns its complexity only when parallel execution or dynamic replanning enters scope. -Exploration first, judgement later. No fixed quantitative criteria up front. After the fixture passes on both engines, write a short comparison covering: lines of code per engine, debuggability of mid-run state, how each engine would absorb a hypothetical new action type (e.g. `lint`) or a new plan-level concept (e.g. milestones). The empirical signal from that exercise — not the architectural elegance of either engine — is what decides the next commitment. +Full comparison table in the POC summary doc. ## Lexicon From aa937a8cea368320dbfc25f55c9c55b018641eb7 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 10:56:56 +0200 Subject: [PATCH 14/22] =?UTF-8?q?FE-730:=20fix=20bot-reported=20issues=20?= =?UTF-8?q?=E2=80=94=20NaN=20retries,=20multi-dep=20epics,=20halt=20propag?= =?UTF-8?q?ation,=20verify-epic=20parity?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - cook-cli: validate --max-retries is finite non-negative (prevents NaN infinite loop) - engine-petri: epic deps use single transition with ALL dep-done places as inputs (was one transition per dep → fired on first dep instead of all) - engine-petri: PetriNet.run() accepts shouldHalt callback, checked each iteration (was ignoring ctx.halted so transitions kept firing after a halt) - engine-proc: verify-epic called once per epic, not once per verification entry (handler owns all targets; matches petri engine behavior) Co-authored-by: Amp --- src/orchestrator/src/cook-cli.ts | 6 +++++- src/orchestrator/src/engine-petri.ts | 14 ++++++++------ src/orchestrator/src/engine-proc.ts | 6 +++--- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index f737ea13..602b943f 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -32,7 +32,11 @@ export function parseCookArgs(args: string[]): CookOptions { } engine = val; } else if (arg.startsWith('--max-retries=')) { - maxRetries = Number.parseInt(arg.split('=')[1]!, 10); + const parsed = Number.parseInt(arg.split('=')[1]!, 10); + if (!Number.isFinite(parsed) || parsed < 0) { + throw new Error(`Invalid --max-retries value: ${arg.split('=')[1]}. Must be a non-negative integer.`); + } + maxRetries = parsed; } else if (arg === '--verbose' || arg === '-v') { verbose = true; } else if (!arg.startsWith('-')) { diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index e12d7f32..88107929 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -48,8 +48,10 @@ class PetriNet { return !!tokens && tokens.length > 0; } - async run(): Promise { + async run(shouldHalt?: () => boolean): Promise { while (true) { + if (shouldHalt?.()) break; + const enabled = this.transitions.find((t) => t.inputs.every((p) => { const tokens = this.places.get(p); @@ -106,12 +108,12 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { // (deferred until eligible places exist — see below) const seedEpics = plan.epics.filter((e) => e.depends_on.length === 0); - // Epic dependency transitions — dep done → fan out to next epic's slices + // Epic dependency transitions — ALL deps done → fan out to next epic's slices for (const epic of plan.epics) { - for (const depId of epic.depends_on) { + if (epic.depends_on.length > 0) { net.addTransition({ - id: `epic-dep:${depId}->${epic.id}`, - inputs: [ep(depId, 'done')], + id: `epic-deps-met:${epic.id}`, + inputs: epic.depends_on.map((depId) => ep(depId, 'done')), fire: async () => epicReadyOutputs(epic.id), }); } @@ -372,7 +374,7 @@ export class PetriOrchestrator implements Orchestrator { try { const net = compilePlan(input, ctx); - await net.run(); + await net.run(() => ctx.halted); } catch (err) { return { status: 'halted', diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index b7312d14..97cfa88d 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -66,8 +66,8 @@ export class ProceduralOrchestrator implements Orchestrator { }; } - // Epic-level verification - for (const v of epic.verification) { + // Epic-level verification (one call — handler owns all targets) + if (epic.verification.length > 0) { const verifyId = await actions['verify-epic']({ slice: epicSlices[0]!, epic, @@ -81,7 +81,7 @@ export class ProceduralOrchestrator implements Orchestrator { epicOutcomes.push({ epicId: epic.id, status: 'halted' }); return { status: 'halted', - reason: `Epic ${epic.id} verification failed: ${v.target}`, + reason: `Epic ${epic.id} verification failed`, reports: reportIds, epics: epicOutcomes, slices: sliceOutcomes, From a50ffe57efd26d28339dbdc757a4fb48c57a0234 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 12:28:00 +0200 Subject: [PATCH 15/22] FE-730: fix unreached-slice false success + report ID collisions - engine-petri: unreached slices/epics now set ctx.halted=true so the overall status correctly reports 'halted' instead of 'completed' - report-helpers: append monotonic sequence counter to IDs to prevent collisions when multiple reports are created in the same millisecond Co-authored-by: Amp --- docs/next/fe-716-chat-runtime-walkthrough.md | 309 +++++++++++++++++++ src/orchestrator/src/engine-petri.ts | 4 + src/orchestrator/src/report-helpers.ts | 4 +- 3 files changed, 316 insertions(+), 1 deletion(-) create mode 100644 docs/next/fe-716-chat-runtime-walkthrough.md diff --git a/docs/next/fe-716-chat-runtime-walkthrough.md b/docs/next/fe-716-chat-runtime-walkthrough.md new file mode 100644 index 00000000..c7f6bbf3 --- /dev/null +++ b/docs/next/fe-716-chat-runtime-walkthrough.md @@ -0,0 +1,309 @@ +# FE-716 — Chat Runtime Walkthrough (call notes) + +> Reference brief for talking through PR #141 (`ka/fe-716-chat-runtime-unified-secondary-chats`, merged). +> Pairs with `docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md` (umbrella) and `docs/design/UNIFIED_CHAT_UX.md` (visual brief). +> Old testing guide referenced is the **V3.1 side-chat popover** flow — most of its surface is gone; capabilities are re-surfaced on a different shape. + +--- + +## Executive summary + +PR #141 lands **V1 of Track 2 (`chat-runtime-secondary-chats`)** from the Conversational Workspace Runtime. It replaces the V3.1 `SideChatPopover` + `SideChatHost` popover machinery with a **unified expandable chat shell** that hosts a primary "master" chat plus N **secondary chats** anchored to items or reconciliation needs — all on the existing `chat` + `turn` substrate (no new `thread` table per Decision D153). + +What changed for the user: + +- A persistent **chat shell** docks to the right of the workspace (default `side-docked` ~50%; toggles to `compact`, `maximize`, `full`). +- Clicking **chat-with on a row** no longer opens a popover — it opens (or focuses) a **secondary chat tab** inside the shell, with the item pinned. +- Each chat is durable, switchable (tab strip + dropdown when 2+ item chats exist), and runs its own streaming `useChat` instance in parallel. +- **Ask** and **Edit** modes are a per-chat toggle (Ask = `explore`, Edit = `edit`). Edit gates `propose_edit / propose_edge / propose_drill_down` tools. +- **Staged patches** from any chat collect in a single shell-level panel (``) — bulk Apply at the header, per-row Discard. The old top-bar `` is mount-commented out. +- **Pending review** (reconciliation needs) is hoisted into the shell body above the patch panel. The substantive-row trigger now opens a secondary chat instead of focusing the popover. +- Layout state, presence (`expanded` / `minimized` / `closed`), and the open chat survive across navigation; layout mode persists per-spec in localStorage. + +What did **not** ship in V1 (with the design doc's track ownership): + +| Missing capability | Owning track / frontier | Reason it's not here | +| --------------------------------------------------- | -------------------------------------- | --------------------------------------------------------------------------------------------------- | +| `$` thread mentions, `!` annotation mentions, `@` | Track 5 — `chat-context-provision` | Mention substrate beyond `#REF-CODE` is part of context provision, not chat runtime. | +| Snapshot builder family + handle refresh | Track 5 — `chat-context-provision` | Needs Track 4 changeset-backed item versions before stale-handle freshness is meaningful. | +| Target-grouped reconciliation UX + "Reconcile Now" | Track 3 — `reconciliation-runtime` | V1 only resurfaces the **existing** PendingReviewSection inside the shell; full Track 3 UX is later.| +| Full `PendingReviewSection` retirement | Track 3 — `reconciliation-runtime` | Same as above — retirement happens once Track 3 ships parity. | +| Changeset / change tables (semantic mutation spine) | Track 4 — `changeset-ledger` | Runs in parallel; not blocking Track 2 surface work. | +| Agent-run inline rendering (C7) | Within FE-716 scope but **blocked** | No producer surface emits agent-run secondary chats yet; substrate is ready. | +| Strategy sub-chat UI | Follow-up frontier | Strategy is chat-local state; surface design waits for use cases. | +| Shift+Tab mode toggle, Ladle prototype | Skipped / parked | Mode toggle is in the composer chip; visual iteration adopts `ai-elements` directly, not isolated stories. | +| Persisting assistant turns as structured `parts[]` | Follow-up | Wire-protocol carries `BrunchAssistantPart[]`, persistence stays text-only until a consumer needs it.| + +`npm run verify` is green at merge: 108 test files / 1273 tests. + +--- + +## What changed vs. the old testing guide + +The old testing guide (V3.1) walked through a **popover surface** with a top-bar staged patch list and a standalone Pending review section. After FE-716: + +| Old guide concept | FE-716 mapping | +| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | +| Side-chat popover + drag-out | **Retired.** Replaced by `` docked to the right. | +| Span-level "💬 Chat / 📝 Annotate" floating menu | The Chat path opens a secondary chat (no popover). **Annotate flow not re-surfaced in V1** (no annotation composer in the shell yet — needs a follow-up scope card). | +| Top-bar `N change(s) · Undo · Apply` | **``** mounted inside the shell body. Header shows count + Apply; rows show kind, summary, impact chip, optional ContentDiff, Discard. | +| Pending review section (separate) | Same `` component, now mounted **inside the shell body** above the patch panel. | +| Run agent / classifier chips | **Unchanged** — `` + pending-review rows reuse the existing classifier flow. Track 3 retirement is later. | +| Substantive row → "Open side-chat" | Now opens a **secondary chat** (with `pinned_reconciliation_need_id` set so the C9 panel renders). | +| Direct row edit | Unchanged. Still stages into the same shared patch list — now visible in the shell panel. | +| Soft/Hard impact tier routing | Unchanged. Soft applies in-place + toast; Hard opens Pending review rows. | +| Propose-edge / Drill-down | Unchanged tool surface; tools gated on Edit mode of a secondary chat. | + +--- + +## Architecture at a glance + +```mermaid +flowchart TB + subgraph Workspace + WS[ContinuousWorkspaceView
structured-list + graph] + TR[Triggers
StructuredListView ItemActionRail
PendingReviewSection 'Open side-chat'] + end + subgraph Shell["UnifiedChatShell (right rail / dock / full)"] + H[Header
tabs · switcher · layout buttons · minimize · close] + STK[Sticky overlays
PendingReviewSection + ChatShellPatchPanel] + ACT[Active SecondaryChatHost
useChat per chat · transcript + composer] + BG[Background hosts
hidden but mounted · stream/unread dots] + FT[Footer
composer portal target] + end + Server[(chat + turn substrate
parent_chat_id IS NOT NULL ⇒ secondary)] + + TR -- useSecondaryChatTrigger().create --> Server + Server -- bundle.secondaryChats[] --> Shell + H --> ACT + H --> BG + ACT -- propose_* tool calls --> STK + STK -- Apply --> Server +``` + +Substrate (no new tables, four migrations): + +- `drizzle/0020` — `chat.parent_chat_id`, `invoked_in_turn_id`, `pinned_item_id`, `pinned_span_hint` +- `drizzle/0021` — `chat.mode` (`'explore' | 'edit' | null`) +- `drizzle/0022` — `chat.pinned_reconciliation_need_id` +- `drizzle/0023` — `chat.anchored_item_ids` (Track 5 will revisit this column — see C deferral) + +Secondary-chat projection is purely `parent_chat_id IS NOT NULL`. There is **no `chat.kind = 'secondary'` enum**; kind enum stays `interview | side_chat`. + +--- + +## UI elements — what each one does + +### Shell chrome + +| Element (testid) | Role | +| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `unified-chat-shell` | The expanded shell wrapper. `data-layout-mode` ∈ {compact, side-docked, maximize, full}. | +| `unified-chat-shell-minimized` (the "Ask brunch" pill) | Bottom-right pill when shell is minimized. Shows open-chat count. Click → expand. | +| `unified-chat-shell-header` | Header strip with tabs/switcher on the left, control buttons on the right. | +| `unified-chat-shell-minimize` | Minimize → pill. Hosts stay mounted; streaming/unread dots keep firing. | +| `unified-chat-shell-layout-side-docked` | Toggles between `compact` and `side-docked`. Active state tinted. | +| `unified-chat-shell-layout-toggle` | Toggles between `side-docked` and `full`. (Maximize tier exists internally; Esc walks down through it.) | +| `unified-chat-shell-close` | Close — collapses the shell entirely (presence `closed`). Re-opening reconstructs from bundle. | +| `unified-chat-shell-tabs` / `ChatTabs` | Tab strip — one tab per "promoted" chat. Tabs carry streaming dot (emerald, animated) and/or unread dot (sky-blue). Active-tab click is a no-op or re-focus. | +| `chat-switcher-trigger` | Dropdown when 2+ item-anchored chats exist; aggregates streaming/unread state across hidden chats; trigger styled with the active chat's `kindAccentHex`. | +| `unified-chat-shell-body` | Scrollable body. Scrollbar takes the active chat's kind accent at 20% opacity. | +| `chat-shell-sticky-overlays` | Sticky band at top of body when patches or reconciliation needs exist; collapses entirely when both feeds are empty. | +| `unified-chat-shell-scroll-to-bottom` | Arrow button that surfaces when scroll position is > 50% from bottom. | +| `unified-chat-shell-footer` | Empty container that the active chat's composer portals into. | + +### Inside the active chat (`` + ``) + +| Element (testid) | Role | +| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `secondary-chat-collapsible` | Transcript surface. Carries `data-secondary-chat-id` + `data-accent-hex` (anchor kind's accent). | +| `secondary-chat-reconciliation-panel` | Renders only when `pinnedReconciliationNeed` is set. Shows need kind label (Supersedes / Needs confirmation) + source ref-code + target ref-code + excerpts. | +| Kickoff turn (`kickoffContent`) | First assistant message — `Anchored to ''.` (with `, focused on ''` when a span hint was passed). | +| `SecondaryChatFreshStateHero` (turn-zero) | Shown when no turns exist. Three static "How to start" chips (`Summarize this spec`, `What needs attention?`, `Suggest next steps`) + ``. | +| `secondary-chat-bottom-anchor` | Sentinel for autoscroll; scrolls into view on new turn or streaming-text growth. | +| `secondary-chat-jump-to-anchor` | "Jump" button (Crosshair icon) in collapsible header when `invoked_in_turn_id` is set. Smooth-scrolls workspace center to the originating turn + briefly highlights it. | +| `secondary-chat-kind-chip` | Mode chip in header — `PencilLine` "Edit" or `MessageCircleQuestion` "Ask". | +| `` (portaled to footer) | Textarea + Send. Placeholder copy depends on mode + pinned-state + turn-zero (e.g. "Propose a change to your spec…"). | +| Mode toggle (in composer) | Segmented Ask/Edit. PATCH `…/mode`. Locked while a request is in flight. | +| `` | `#`-triggered autocomplete on the composer. Resolves to knowledge-item ref codes (e.g. `#G1`, `#D5`). Adds a context snapshot to the next user message. | +| `` | Edit-mode-only suggestion chips above composer (suppressed on turn-zero). | + +### Shell-mounted overlays + +| Element | Role | +| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `` (in shell body) | **Unchanged** component, relocated. Lists open reconciliation needs. Substantive row's "Open side-chat" now triggers `useSecondaryChatTrigger().create({ kind, id, reconciliationNeedId })`. | +| `` (`chat-shell-patch-panel`) | Single union view of staged patches across **all** chats (uses `usePatchList()`, not the per-chat partition). Header: `N change(s)` + Apply. Per row: kind icon (Pencil/Spline/ArrowDownToDot/NotebookPen), ImpactChip (edits), summary, ref-code, Run-agent button (placeholder, no backend yet), Discard (X). Edits render an inline `` when before/after differ. | +| `` | Floats above the pill (when minimized) or in body (when expanded). Survives shell remounts via `useStablePatchListEnv`. 5s Undo window. | + +### Workspace-side triggers + +| Trigger | Surface | +| -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | +| `ItemActionRail` in `StructuredListView` | Adds an "Open inline chat" button (`data-graph-action="open-inline-chat"`, `MessagesSquare` icon) on each row. | +| `PendingReviewSection` substantive row "Open side-chat" | Calls trigger with `reconciliationNeedId` so the C9 panel renders. | +| `WorkspaceArtifactRow` (`data-anchor-turn-id`) | Scroll target for `secondary-chat-jump-to-anchor`. Threaded through every relevant turn-render artifact. | + +--- + +## Modes & state machines + +### Presence (shell appearance) + +```mermaid +stateDiagram-v2 + [*] --> Expanded + Expanded --> Minimized: click Minimize + Expanded --> Closed: click X + Minimized --> Expanded: click "Ask brunch" pill + Closed --> Expanded: trigger creates chat
(presence.expand()) + note right of Minimized + Hosts stay mounted. + Streaming + unread dots + keep firing. + end note + note right of Closed + Hosts unmount. + Re-open rehydrates + from bundle. + end note +``` + +### Layout mode (per-spec localStorage) + +```mermaid +stateDiagram-v2 + [*] --> SideDocked: default + Compact --> SideDocked: toggle dock button + SideDocked --> Compact: toggle dock button + SideDocked --> Full: toggle maximize button + Full --> SideDocked: toggle restore button + Full --> Maximize: Esc + Maximize --> SideDocked: Esc + SideDocked --> Compact: Esc + Compact --> Compact: Esc (no-op) +``` + +Layout-mode key: `brunch:chat-layout-mode:{specificationId}`. Maximize is not directly bindable via a button — it lives between `side-docked` and `full` in the Esc-decrement chain. + +### Per-chat composer mode + +| Mode (user-facing) | `chat.mode` | Tool gating | +| ------------------ | ----------- | ------------------------------------------------------------------------------------ | +| **Ask** | `explore` | Read-only assistant. No `propose_*` tools registered. | +| **Edit** | `edit` | `propose_edit`, `propose_edge`, `propose_drill_down`. Edits stage into patch panel. | + +The mode is **persisted on the chat row forever** at first send (per `UNIFIED_CHAT_UX.md` §2). Toggling later updates the column; tool gating follows the latest mode but original kind chip stays. + +### Secondary chat lifecycle + +```mermaid +flowchart LR + A[User clicks row
or 'Open side-chat'] --> B[useSecondaryChatTrigger.create] + B --> C{Existing chat for
parent + item?} + C -- yes --> D[Re-focus existing chat
kickoffTurnId=null] + C -- no --> E[Create chat row
+ kickoff turn] + D --> F[presence.focusChat] + E --> F + F --> G[Shell expands if needed
Tab/Switcher highlights] + G --> H[Compose → POST .../messages
UIMessage protocol] + H --> I{Mode = edit?} + I -- yes --> J[propose_* tool stream
→ extractStagedIntents
→ patch panel] + I -- no --> K[Plain assistant turn] + J --> L[Bulk Apply at panel] + K --> M[Conversation continues] +``` + +Dedupe key for item chats: `(parent_chat_id, pinned_item_id, pinned_reconciliation_need_id IS NULL)`. Reconciliation chats always create fresh (`reconciliationNeedId` non-null). + +--- + +## End-to-end flow comparison (vs. old testing guide checklist) + +| Old guide step | FE-716 state | +| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | +| 1. Surface & pinned context (chat-with, multi-pin, drag-to-side) | Replaced by row-action → secondary chat tab. **Multi-pin not yet** — anchor columns support `anchored_item_ids` but UI only surfaces the primary `pinned_item_id`. | +| 2. Explore | ✅ Ask mode on an empty/master chat. Streaming reply; nothing stages. | +| 3. Annotate | ⚠️ **Not in V1.** The annotate composer was popover-bound; no shell equivalent yet. (Annotation API still exists; trigger UX needs a follow-up.) | +| 4. Direct row edit | ✅ Unchanged. Lands in the shell patch panel along with chat-driven patches. | +| 5. Soft-tier edit | ✅ Unchanged. Applies in-place with toast. | +| 6. Hard-tier edit → Pending review | ✅ Pending review now renders in the shell body above the patch panel. | +| 7. Run agent classifier chips | ✅ Unchanged classifier pipeline (`ClassificationChip`). Bulk Confirm-all / Apply-all-suggested unchanged. Full Track 3 UX deferred. | +| 8. Re-entrant cascade ("Edit target") | ✅ Still works through the same Pending review row affordances. | +| 9. Propose-edge | ✅ `propose_edge` tool available in Edit mode; multi-pin caveat applies (one anchor in UI). | +| 10. Drill-down | ✅ `propose_drill_down` tool available in Edit mode. | + +--- + +## What's missing & why — mapped to Conversational Workspace Runtime tracks + +`docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md` §5 defines five tracks. FE-716 lands V1 of Track 2 only. Each missing capability below cites the tracking doc. + +### Track 2 (`chat-runtime-secondary-chats`) — what's still pending **inside this frontier** + +- **C7 Agent-run inline rendering.** Substrate ready (`first_turn_role='system'` projects `agent_run` flavor), but no producer emits an agent-run secondary chat yet. Blocked on a consumer. +- **Annotate composer in the shell.** V3.1 annotate flow was popover-only. Re-surfacing it on the shell needs a small follow-up card (selection menu → secondary chat with annotation composer). +- **Multi-pin UI.** `anchored_item_ids` column exists; the shell only renders the single `pinned_item_id`. Adding chips for additional anchors is a UI follow-up. +- **Per-tab close + tab reordering.** Parked (CARDS §parking lot). +- **`!` and `$` mention chips on the composer.** `#` lands in V1; `$` and `!` are Track 5. +- **Slice 4b transcript popover / Slice 5–6 footer chat.** Briefly built then **reverted on 2026-05-19** because the expandable shell direction was reinstated. Orphan modules (`chat-transcript-popover.tsx`, unused props on `chat-tabs.tsx` + `secondary-chat-host.tsx`) are still in tree pending a deletion sign-off. + +### Track 3 (`reconciliation-runtime`) — entirely deferred + +Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.3. V1 keeps the existing `` as a flat list inside the shell body. **Not yet**: + +- Target-grouped reconciliation **chat** (one secondary chat per target group). +- Async-by-default classifier scheduling (today's classifier runs on user click). +- **"Reconcile Now"** explicit trigger affordance. +- `` retirement after target-grouped UX reaches parity. +- Workspace-level subtle badges on items with open non-auto-confirmed needs. + +The C9 panel ("elements being reconciled") inside a secondary chat is a **lightweight bridge** only — it labels the source/target endpoints when a substantive row opens a chat. It is explicitly **not** the Track 3 UX. + +### Track 4 (`changeset-ledger`) — entirely deferred + +Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.4. Patches today are still transitional client state; accepted edits flow through existing Brunch-owned handlers. **Not yet**: + +- `changeset` / `change` tables and durable mutation history. +- `reconciliation_need.caused_by_changeset_id` wiring. +- Provenance bundling (originating turn/chat, base semantic state). +- Renaming client `patch` vocabulary to `change`. + +### Track 5 (`chat-context-provision`) — V1-narrow only + +Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.5. V1 ships only the `#REF-CODE` resolver (server-owned, scoped to the spec). **Not yet**: + +- `$` thread mention symbol. +- `!` annotation/artifact mention symbol. +- `@` code reference (reserved). +- The snapshot builder family (`buildIntentItemContextSnapshot`, neighborhood-mode, economic-graph, historical). +- **Context handles** with item-version-gated freshness refresh (needs Track 4 changeset-backed versions). +- Per-kind kickoff copy variations (V1 ships one generic template `Anchored to ''.`). +- Turn-zero per-kind prompt assembly. +- T5-anchor-projection (drop `chat.anchored_item_ids` column in favor of transcript `anchor_op` events) — currently queued. +- T5-mention-snapshot (resolved `#` rendered as snapshot artifact, not synthetic user-bubble text) — Lu's `secondary-chat-route.ts` user-parts smuggle is still in place. + +### Visual / interaction debt called out in the brief but not built + +From `UNIFIED_CHAT_UX.md`, the following remain unrealized in V1 (most are explicit deferrals in CARDS.md parking lot): + +- **Shift+Tab** keyboard toggle between Ask/Edit modes. +- **LLM-generated context-aware suggestions** (V1 ships static-per-mode chips). +- **Item-anchored badge** in structured-list / graph view (trailing `◉ N` chip per kind). +- **Typed data parts** for `thread.kickoff`, `thread.suggestions`, `thread.mention_resolved`, `thread.reconciliation_summary`, `thread.agent_progress`. The `useChat` refit enables these; schemas land when a consumer needs them. +- **Ladle prototype** (§13). Skipped — components built directly against `ai-elements/*`. +- **Patch surface hybrid pill** — when the shell is minimized/closed the patch panel is invisible. Top-bar `N pending · Apply · Undo` pill would restore workspace-wide visibility (CARDS parking lot). +- **Soft gradient wash** on the chat panel from the SgAI Figma — deferred to brand pass. +- **Tab-face last-turn snippet** and **per-tab close affordance** — parked. + +--- + +## Talk-track shortlist + +If the call only has time for the highlights: + +1. We've **landed the unified chat surface** and **retired the popover**. Substrate is just columns on `chat`; no `thread` table. +2. **Secondary chats are first-class**: tabs, switcher, per-chat streaming, durable across reload, Ask/Edit modes, per-chat tool gating. +3. **Patches and pending review live inside the shell now**, with a single bulk-Apply panel — much shorter mental loop than V3.1's three-surface dance. +4. We **deliberately did not build Track 3 (reconciliation)**, **Track 4 (changeset ledger)**, or most of **Track 5 (context provision)**. The shell is the host; those tracks layer on top. +5. The annotate flow and a few visual flourishes (mention `$`/`!`, agent-run inline, multi-pin UI, item-anchored badges) are known gaps awaiting follow-up scope cards. diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index 88107929..b9bc268a 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -393,11 +393,15 @@ export class PetriOrchestrator implements Orchestrator { for (const slice of input.plan.slices) { if (!ctx.sliceOutcomes.has(slice.id)) { ctx.sliceOutcomes.set(slice.id, { sliceId: slice.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason ??= 'Some slices were never reached'; } } for (const epic of input.plan.epics) { if (!ctx.epicOutcomes.has(epic.id)) { ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason ??= 'Some epics were never reached'; } } diff --git a/src/orchestrator/src/report-helpers.ts b/src/orchestrator/src/report-helpers.ts index b7ff6b30..4653dc96 100644 --- a/src/orchestrator/src/report-helpers.ts +++ b/src/orchestrator/src/report-helpers.ts @@ -1,8 +1,10 @@ import type { ReportLine, ReportSink } from './types.js'; +let seq = 0; + /** Create and append a report line, returning its id. */ export function createReport(sink: ReportSink, fields: Omit): string { - const id = `rpt-${fields.actor}-${fields.sliceId || fields.epicId}-${Date.now()}`; + const id = `rpt-${fields.actor}-${fields.sliceId || fields.epicId}-${Date.now()}-${seq++}`; const line: ReportLine = { id, ts: new Date().toISOString(), From 6557da86b821e9473c0a047b3473a32e6d7d0c95 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 13:08:07 +0200 Subject: [PATCH 16/22] FE-730: fix epic dep fan-out starvation + proc unreached items - engine-petri: epic deps use per-dependent signal places (same pattern as slice deps) so multiple epics depending on the same predecessor each get their own token instead of competing for one - engine-proc: haltedResult() fills in unreached epics/slices as halted before returning, matching petri engine behavior Co-authored-by: Amp --- src/orchestrator/src/engine-petri.ts | 26 ++++++++++++--- src/orchestrator/src/engine-proc.ts | 48 ++++++++++++++++++++-------- 2 files changed, 56 insertions(+), 18 deletions(-) diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index b9bc268a..dcbf2d49 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -108,12 +108,18 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { // (deferred until eligible places exist — see below) const seedEpics = plan.epics.filter((e) => e.depends_on.length === 0); - // Epic dependency transitions — ALL deps done → fan out to next epic's slices + // Epic dependency wiring — per-dependent signal places (avoids token starvation + // when multiple epics depend on the same predecessor) for (const epic of plan.epics) { if (epic.depends_on.length > 0) { + const signalPlaces = epic.depends_on.map((depId) => { + const signalPlace = ep(depId, `dep-signal:${epic.id}`); + net.addPlace(signalPlace); + return signalPlace; + }); net.addTransition({ id: `epic-deps-met:${epic.id}`, - inputs: epic.depends_on.map((depId) => ep(depId, 'done')), + inputs: signalPlaces, fire: async () => epicReadyOutputs(epic.id), }); } @@ -293,6 +299,18 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { if (epicSlices.length === 0) continue; + // Find epics that depend on this one — emit dep-signal tokens on completion + const epicDependents = plan.epics.filter((e) => e.depends_on.includes(epic.id)); + function epicDoneOutputs(): { place: string; token: Token }[] { + const outputs: { place: string; token: Token }[] = [ + { place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }, + ]; + for (const dep of epicDependents) { + outputs.push({ place: ep(epic.id, `dep-signal:${dep.id}`), token: { sliceId: '', epicId: epic.id } }); + } + return outputs; + } + if (epic.verification.length === 0) { // No verification — slices done → epic done net.addTransition({ @@ -300,7 +318,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { inputs: completedPlaces, fire: async () => { ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); - return [{ place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }]; + return epicDoneOutputs(); }, }); } else { @@ -331,7 +349,7 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { const passed = !!(report?.payload as { passed?: boolean })?.passed; if (passed) { ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); - return [{ place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }]; + return epicDoneOutputs(); } ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); ctx.halted = true; diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index 97cfa88d..c68e361e 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -6,6 +6,7 @@ import type { Orchestrator, OrchestratorInput, OrchestratorResult, + Plan, Slice, SliceOutcome, } from './types.js'; @@ -57,13 +58,13 @@ export class ProceduralOrchestrator implements Orchestrator { if (epicHalted) { epicOutcomes.push({ epicId: epic.id, status: 'halted' }); - return { - status: 'halted', - reason: `Epic ${epic.id} halted due to slice failure`, - reports: reportIds, - epics: epicOutcomes, - slices: sliceOutcomes, - }; + return this.haltedResult( + plan, + `Epic ${epic.id} halted due to slice failure`, + reportIds, + epicOutcomes, + sliceOutcomes, + ); } // Epic-level verification (one call — handler owns all targets) @@ -79,13 +80,13 @@ export class ProceduralOrchestrator implements Orchestrator { const verifyReport = reports.getById(verifyId); if (verifyReport && !(verifyReport.payload as { passed?: boolean }).passed) { epicOutcomes.push({ epicId: epic.id, status: 'halted' }); - return { - status: 'halted', - reason: `Epic ${epic.id} verification failed`, - reports: reportIds, - epics: epicOutcomes, - slices: sliceOutcomes, - }; + return this.haltedResult( + plan, + `Epic ${epic.id} verification failed`, + reportIds, + epicOutcomes, + sliceOutcomes, + ); } } @@ -100,6 +101,25 @@ export class ProceduralOrchestrator implements Orchestrator { }; } + /** Fill in unreached items as halted before returning a halted result. */ + private haltedResult( + plan: Plan, + reason: string, + reportIds: string[], + epicOutcomes: EpicOutcome[], + sliceOutcomes: SliceOutcome[], + ): OrchestratorResult { + const seenEpics = new Set(epicOutcomes.map((e) => e.epicId)); + const seenSlices = new Set(sliceOutcomes.map((s) => s.sliceId)); + for (const epic of plan.epics) { + if (!seenEpics.has(epic.id)) epicOutcomes.push({ epicId: epic.id, status: 'halted' }); + } + for (const slice of plan.slices) { + if (!seenSlices.has(slice.id)) sliceOutcomes.push({ sliceId: slice.id, status: 'halted' }); + } + return { status: 'halted', reason, reports: reportIds, epics: epicOutcomes, slices: sliceOutcomes }; + } + private async executeSlice( slice: Slice, epic: Epic, From 62248f0f7e506877c3ec1e0d28556639ddf9c6e0 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 13:18:30 +0200 Subject: [PATCH 17/22] FE-730: remove stale fe-716 walkthrough doc (leaked from rebase) Co-authored-by: Amp --- docs/next/fe-716-chat-runtime-walkthrough.md | 309 ------------------- 1 file changed, 309 deletions(-) delete mode 100644 docs/next/fe-716-chat-runtime-walkthrough.md diff --git a/docs/next/fe-716-chat-runtime-walkthrough.md b/docs/next/fe-716-chat-runtime-walkthrough.md deleted file mode 100644 index c7f6bbf3..00000000 --- a/docs/next/fe-716-chat-runtime-walkthrough.md +++ /dev/null @@ -1,309 +0,0 @@ -# FE-716 — Chat Runtime Walkthrough (call notes) - -> Reference brief for talking through PR #141 (`ka/fe-716-chat-runtime-unified-secondary-chats`, merged). -> Pairs with `docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md` (umbrella) and `docs/design/UNIFIED_CHAT_UX.md` (visual brief). -> Old testing guide referenced is the **V3.1 side-chat popover** flow — most of its surface is gone; capabilities are re-surfaced on a different shape. - ---- - -## Executive summary - -PR #141 lands **V1 of Track 2 (`chat-runtime-secondary-chats`)** from the Conversational Workspace Runtime. It replaces the V3.1 `SideChatPopover` + `SideChatHost` popover machinery with a **unified expandable chat shell** that hosts a primary "master" chat plus N **secondary chats** anchored to items or reconciliation needs — all on the existing `chat` + `turn` substrate (no new `thread` table per Decision D153). - -What changed for the user: - -- A persistent **chat shell** docks to the right of the workspace (default `side-docked` ~50%; toggles to `compact`, `maximize`, `full`). -- Clicking **chat-with on a row** no longer opens a popover — it opens (or focuses) a **secondary chat tab** inside the shell, with the item pinned. -- Each chat is durable, switchable (tab strip + dropdown when 2+ item chats exist), and runs its own streaming `useChat` instance in parallel. -- **Ask** and **Edit** modes are a per-chat toggle (Ask = `explore`, Edit = `edit`). Edit gates `propose_edit / propose_edge / propose_drill_down` tools. -- **Staged patches** from any chat collect in a single shell-level panel (``) — bulk Apply at the header, per-row Discard. The old top-bar `` is mount-commented out. -- **Pending review** (reconciliation needs) is hoisted into the shell body above the patch panel. The substantive-row trigger now opens a secondary chat instead of focusing the popover. -- Layout state, presence (`expanded` / `minimized` / `closed`), and the open chat survive across navigation; layout mode persists per-spec in localStorage. - -What did **not** ship in V1 (with the design doc's track ownership): - -| Missing capability | Owning track / frontier | Reason it's not here | -| --------------------------------------------------- | -------------------------------------- | --------------------------------------------------------------------------------------------------- | -| `$` thread mentions, `!` annotation mentions, `@` | Track 5 — `chat-context-provision` | Mention substrate beyond `#REF-CODE` is part of context provision, not chat runtime. | -| Snapshot builder family + handle refresh | Track 5 — `chat-context-provision` | Needs Track 4 changeset-backed item versions before stale-handle freshness is meaningful. | -| Target-grouped reconciliation UX + "Reconcile Now" | Track 3 — `reconciliation-runtime` | V1 only resurfaces the **existing** PendingReviewSection inside the shell; full Track 3 UX is later.| -| Full `PendingReviewSection` retirement | Track 3 — `reconciliation-runtime` | Same as above — retirement happens once Track 3 ships parity. | -| Changeset / change tables (semantic mutation spine) | Track 4 — `changeset-ledger` | Runs in parallel; not blocking Track 2 surface work. | -| Agent-run inline rendering (C7) | Within FE-716 scope but **blocked** | No producer surface emits agent-run secondary chats yet; substrate is ready. | -| Strategy sub-chat UI | Follow-up frontier | Strategy is chat-local state; surface design waits for use cases. | -| Shift+Tab mode toggle, Ladle prototype | Skipped / parked | Mode toggle is in the composer chip; visual iteration adopts `ai-elements` directly, not isolated stories. | -| Persisting assistant turns as structured `parts[]` | Follow-up | Wire-protocol carries `BrunchAssistantPart[]`, persistence stays text-only until a consumer needs it.| - -`npm run verify` is green at merge: 108 test files / 1273 tests. - ---- - -## What changed vs. the old testing guide - -The old testing guide (V3.1) walked through a **popover surface** with a top-bar staged patch list and a standalone Pending review section. After FE-716: - -| Old guide concept | FE-716 mapping | -| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | -| Side-chat popover + drag-out | **Retired.** Replaced by `` docked to the right. | -| Span-level "💬 Chat / 📝 Annotate" floating menu | The Chat path opens a secondary chat (no popover). **Annotate flow not re-surfaced in V1** (no annotation composer in the shell yet — needs a follow-up scope card). | -| Top-bar `N change(s) · Undo · Apply` | **``** mounted inside the shell body. Header shows count + Apply; rows show kind, summary, impact chip, optional ContentDiff, Discard. | -| Pending review section (separate) | Same `` component, now mounted **inside the shell body** above the patch panel. | -| Run agent / classifier chips | **Unchanged** — `` + pending-review rows reuse the existing classifier flow. Track 3 retirement is later. | -| Substantive row → "Open side-chat" | Now opens a **secondary chat** (with `pinned_reconciliation_need_id` set so the C9 panel renders). | -| Direct row edit | Unchanged. Still stages into the same shared patch list — now visible in the shell panel. | -| Soft/Hard impact tier routing | Unchanged. Soft applies in-place + toast; Hard opens Pending review rows. | -| Propose-edge / Drill-down | Unchanged tool surface; tools gated on Edit mode of a secondary chat. | - ---- - -## Architecture at a glance - -```mermaid -flowchart TB - subgraph Workspace - WS[ContinuousWorkspaceView
structured-list + graph] - TR[Triggers
StructuredListView ItemActionRail
PendingReviewSection 'Open side-chat'] - end - subgraph Shell["UnifiedChatShell (right rail / dock / full)"] - H[Header
tabs · switcher · layout buttons · minimize · close] - STK[Sticky overlays
PendingReviewSection + ChatShellPatchPanel] - ACT[Active SecondaryChatHost
useChat per chat · transcript + composer] - BG[Background hosts
hidden but mounted · stream/unread dots] - FT[Footer
composer portal target] - end - Server[(chat + turn substrate
parent_chat_id IS NOT NULL ⇒ secondary)] - - TR -- useSecondaryChatTrigger().create --> Server - Server -- bundle.secondaryChats[] --> Shell - H --> ACT - H --> BG - ACT -- propose_* tool calls --> STK - STK -- Apply --> Server -``` - -Substrate (no new tables, four migrations): - -- `drizzle/0020` — `chat.parent_chat_id`, `invoked_in_turn_id`, `pinned_item_id`, `pinned_span_hint` -- `drizzle/0021` — `chat.mode` (`'explore' | 'edit' | null`) -- `drizzle/0022` — `chat.pinned_reconciliation_need_id` -- `drizzle/0023` — `chat.anchored_item_ids` (Track 5 will revisit this column — see C deferral) - -Secondary-chat projection is purely `parent_chat_id IS NOT NULL`. There is **no `chat.kind = 'secondary'` enum**; kind enum stays `interview | side_chat`. - ---- - -## UI elements — what each one does - -### Shell chrome - -| Element (testid) | Role | -| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `unified-chat-shell` | The expanded shell wrapper. `data-layout-mode` ∈ {compact, side-docked, maximize, full}. | -| `unified-chat-shell-minimized` (the "Ask brunch" pill) | Bottom-right pill when shell is minimized. Shows open-chat count. Click → expand. | -| `unified-chat-shell-header` | Header strip with tabs/switcher on the left, control buttons on the right. | -| `unified-chat-shell-minimize` | Minimize → pill. Hosts stay mounted; streaming/unread dots keep firing. | -| `unified-chat-shell-layout-side-docked` | Toggles between `compact` and `side-docked`. Active state tinted. | -| `unified-chat-shell-layout-toggle` | Toggles between `side-docked` and `full`. (Maximize tier exists internally; Esc walks down through it.) | -| `unified-chat-shell-close` | Close — collapses the shell entirely (presence `closed`). Re-opening reconstructs from bundle. | -| `unified-chat-shell-tabs` / `ChatTabs` | Tab strip — one tab per "promoted" chat. Tabs carry streaming dot (emerald, animated) and/or unread dot (sky-blue). Active-tab click is a no-op or re-focus. | -| `chat-switcher-trigger` | Dropdown when 2+ item-anchored chats exist; aggregates streaming/unread state across hidden chats; trigger styled with the active chat's `kindAccentHex`. | -| `unified-chat-shell-body` | Scrollable body. Scrollbar takes the active chat's kind accent at 20% opacity. | -| `chat-shell-sticky-overlays` | Sticky band at top of body when patches or reconciliation needs exist; collapses entirely when both feeds are empty. | -| `unified-chat-shell-scroll-to-bottom` | Arrow button that surfaces when scroll position is > 50% from bottom. | -| `unified-chat-shell-footer` | Empty container that the active chat's composer portals into. | - -### Inside the active chat (`` + ``) - -| Element (testid) | Role | -| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `secondary-chat-collapsible` | Transcript surface. Carries `data-secondary-chat-id` + `data-accent-hex` (anchor kind's accent). | -| `secondary-chat-reconciliation-panel` | Renders only when `pinnedReconciliationNeed` is set. Shows need kind label (Supersedes / Needs confirmation) + source ref-code + target ref-code + excerpts. | -| Kickoff turn (`kickoffContent`) | First assistant message — `Anchored to ''.` (with `, focused on ''` when a span hint was passed). | -| `SecondaryChatFreshStateHero` (turn-zero) | Shown when no turns exist. Three static "How to start" chips (`Summarize this spec`, `What needs attention?`, `Suggest next steps`) + ``. | -| `secondary-chat-bottom-anchor` | Sentinel for autoscroll; scrolls into view on new turn or streaming-text growth. | -| `secondary-chat-jump-to-anchor` | "Jump" button (Crosshair icon) in collapsible header when `invoked_in_turn_id` is set. Smooth-scrolls workspace center to the originating turn + briefly highlights it. | -| `secondary-chat-kind-chip` | Mode chip in header — `PencilLine` "Edit" or `MessageCircleQuestion` "Ask". | -| `` (portaled to footer) | Textarea + Send. Placeholder copy depends on mode + pinned-state + turn-zero (e.g. "Propose a change to your spec…"). | -| Mode toggle (in composer) | Segmented Ask/Edit. PATCH `…/mode`. Locked while a request is in flight. | -| `` | `#`-triggered autocomplete on the composer. Resolves to knowledge-item ref codes (e.g. `#G1`, `#D5`). Adds a context snapshot to the next user message. | -| `` | Edit-mode-only suggestion chips above composer (suppressed on turn-zero). | - -### Shell-mounted overlays - -| Element | Role | -| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `` (in shell body) | **Unchanged** component, relocated. Lists open reconciliation needs. Substantive row's "Open side-chat" now triggers `useSecondaryChatTrigger().create({ kind, id, reconciliationNeedId })`. | -| `` (`chat-shell-patch-panel`) | Single union view of staged patches across **all** chats (uses `usePatchList()`, not the per-chat partition). Header: `N change(s)` + Apply. Per row: kind icon (Pencil/Spline/ArrowDownToDot/NotebookPen), ImpactChip (edits), summary, ref-code, Run-agent button (placeholder, no backend yet), Discard (X). Edits render an inline `` when before/after differ. | -| `` | Floats above the pill (when minimized) or in body (when expanded). Survives shell remounts via `useStablePatchListEnv`. 5s Undo window. | - -### Workspace-side triggers - -| Trigger | Surface | -| -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | -| `ItemActionRail` in `StructuredListView` | Adds an "Open inline chat" button (`data-graph-action="open-inline-chat"`, `MessagesSquare` icon) on each row. | -| `PendingReviewSection` substantive row "Open side-chat" | Calls trigger with `reconciliationNeedId` so the C9 panel renders. | -| `WorkspaceArtifactRow` (`data-anchor-turn-id`) | Scroll target for `secondary-chat-jump-to-anchor`. Threaded through every relevant turn-render artifact. | - ---- - -## Modes & state machines - -### Presence (shell appearance) - -```mermaid -stateDiagram-v2 - [*] --> Expanded - Expanded --> Minimized: click Minimize - Expanded --> Closed: click X - Minimized --> Expanded: click "Ask brunch" pill - Closed --> Expanded: trigger creates chat
(presence.expand()) - note right of Minimized - Hosts stay mounted. - Streaming + unread dots - keep firing. - end note - note right of Closed - Hosts unmount. - Re-open rehydrates - from bundle. - end note -``` - -### Layout mode (per-spec localStorage) - -```mermaid -stateDiagram-v2 - [*] --> SideDocked: default - Compact --> SideDocked: toggle dock button - SideDocked --> Compact: toggle dock button - SideDocked --> Full: toggle maximize button - Full --> SideDocked: toggle restore button - Full --> Maximize: Esc - Maximize --> SideDocked: Esc - SideDocked --> Compact: Esc - Compact --> Compact: Esc (no-op) -``` - -Layout-mode key: `brunch:chat-layout-mode:{specificationId}`. Maximize is not directly bindable via a button — it lives between `side-docked` and `full` in the Esc-decrement chain. - -### Per-chat composer mode - -| Mode (user-facing) | `chat.mode` | Tool gating | -| ------------------ | ----------- | ------------------------------------------------------------------------------------ | -| **Ask** | `explore` | Read-only assistant. No `propose_*` tools registered. | -| **Edit** | `edit` | `propose_edit`, `propose_edge`, `propose_drill_down`. Edits stage into patch panel. | - -The mode is **persisted on the chat row forever** at first send (per `UNIFIED_CHAT_UX.md` §2). Toggling later updates the column; tool gating follows the latest mode but original kind chip stays. - -### Secondary chat lifecycle - -```mermaid -flowchart LR - A[User clicks row
or 'Open side-chat'] --> B[useSecondaryChatTrigger.create] - B --> C{Existing chat for
parent + item?} - C -- yes --> D[Re-focus existing chat
kickoffTurnId=null] - C -- no --> E[Create chat row
+ kickoff turn] - D --> F[presence.focusChat] - E --> F - F --> G[Shell expands if needed
Tab/Switcher highlights] - G --> H[Compose → POST .../messages
UIMessage protocol] - H --> I{Mode = edit?} - I -- yes --> J[propose_* tool stream
→ extractStagedIntents
→ patch panel] - I -- no --> K[Plain assistant turn] - J --> L[Bulk Apply at panel] - K --> M[Conversation continues] -``` - -Dedupe key for item chats: `(parent_chat_id, pinned_item_id, pinned_reconciliation_need_id IS NULL)`. Reconciliation chats always create fresh (`reconciliationNeedId` non-null). - ---- - -## End-to-end flow comparison (vs. old testing guide checklist) - -| Old guide step | FE-716 state | -| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | -| 1. Surface & pinned context (chat-with, multi-pin, drag-to-side) | Replaced by row-action → secondary chat tab. **Multi-pin not yet** — anchor columns support `anchored_item_ids` but UI only surfaces the primary `pinned_item_id`. | -| 2. Explore | ✅ Ask mode on an empty/master chat. Streaming reply; nothing stages. | -| 3. Annotate | ⚠️ **Not in V1.** The annotate composer was popover-bound; no shell equivalent yet. (Annotation API still exists; trigger UX needs a follow-up.) | -| 4. Direct row edit | ✅ Unchanged. Lands in the shell patch panel along with chat-driven patches. | -| 5. Soft-tier edit | ✅ Unchanged. Applies in-place with toast. | -| 6. Hard-tier edit → Pending review | ✅ Pending review now renders in the shell body above the patch panel. | -| 7. Run agent classifier chips | ✅ Unchanged classifier pipeline (`ClassificationChip`). Bulk Confirm-all / Apply-all-suggested unchanged. Full Track 3 UX deferred. | -| 8. Re-entrant cascade ("Edit target") | ✅ Still works through the same Pending review row affordances. | -| 9. Propose-edge | ✅ `propose_edge` tool available in Edit mode; multi-pin caveat applies (one anchor in UI). | -| 10. Drill-down | ✅ `propose_drill_down` tool available in Edit mode. | - ---- - -## What's missing & why — mapped to Conversational Workspace Runtime tracks - -`docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md` §5 defines five tracks. FE-716 lands V1 of Track 2 only. Each missing capability below cites the tracking doc. - -### Track 2 (`chat-runtime-secondary-chats`) — what's still pending **inside this frontier** - -- **C7 Agent-run inline rendering.** Substrate ready (`first_turn_role='system'` projects `agent_run` flavor), but no producer emits an agent-run secondary chat yet. Blocked on a consumer. -- **Annotate composer in the shell.** V3.1 annotate flow was popover-only. Re-surfacing it on the shell needs a small follow-up card (selection menu → secondary chat with annotation composer). -- **Multi-pin UI.** `anchored_item_ids` column exists; the shell only renders the single `pinned_item_id`. Adding chips for additional anchors is a UI follow-up. -- **Per-tab close + tab reordering.** Parked (CARDS §parking lot). -- **`!` and `$` mention chips on the composer.** `#` lands in V1; `$` and `!` are Track 5. -- **Slice 4b transcript popover / Slice 5–6 footer chat.** Briefly built then **reverted on 2026-05-19** because the expandable shell direction was reinstated. Orphan modules (`chat-transcript-popover.tsx`, unused props on `chat-tabs.tsx` + `secondary-chat-host.tsx`) are still in tree pending a deletion sign-off. - -### Track 3 (`reconciliation-runtime`) — entirely deferred - -Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.3. V1 keeps the existing `` as a flat list inside the shell body. **Not yet**: - -- Target-grouped reconciliation **chat** (one secondary chat per target group). -- Async-by-default classifier scheduling (today's classifier runs on user click). -- **"Reconcile Now"** explicit trigger affordance. -- `` retirement after target-grouped UX reaches parity. -- Workspace-level subtle badges on items with open non-auto-confirmed needs. - -The C9 panel ("elements being reconciled") inside a secondary chat is a **lightweight bridge** only — it labels the source/target endpoints when a substantive row opens a chat. It is explicitly **not** the Track 3 UX. - -### Track 4 (`changeset-ledger`) — entirely deferred - -Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.4. Patches today are still transitional client state; accepted edits flow through existing Brunch-owned handlers. **Not yet**: - -- `changeset` / `change` tables and durable mutation history. -- `reconciliation_need.caused_by_changeset_id` wiring. -- Provenance bundling (originating turn/chat, base semantic state). -- Renaming client `patch` vocabulary to `change`. - -### Track 5 (`chat-context-provision`) — V1-narrow only - -Per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.5. V1 ships only the `#REF-CODE` resolver (server-owned, scoped to the spec). **Not yet**: - -- `$` thread mention symbol. -- `!` annotation/artifact mention symbol. -- `@` code reference (reserved). -- The snapshot builder family (`buildIntentItemContextSnapshot`, neighborhood-mode, economic-graph, historical). -- **Context handles** with item-version-gated freshness refresh (needs Track 4 changeset-backed versions). -- Per-kind kickoff copy variations (V1 ships one generic template `Anchored to ''.`). -- Turn-zero per-kind prompt assembly. -- T5-anchor-projection (drop `chat.anchored_item_ids` column in favor of transcript `anchor_op` events) — currently queued. -- T5-mention-snapshot (resolved `#` rendered as snapshot artifact, not synthetic user-bubble text) — Lu's `secondary-chat-route.ts` user-parts smuggle is still in place. - -### Visual / interaction debt called out in the brief but not built - -From `UNIFIED_CHAT_UX.md`, the following remain unrealized in V1 (most are explicit deferrals in CARDS.md parking lot): - -- **Shift+Tab** keyboard toggle between Ask/Edit modes. -- **LLM-generated context-aware suggestions** (V1 ships static-per-mode chips). -- **Item-anchored badge** in structured-list / graph view (trailing `◉ N` chip per kind). -- **Typed data parts** for `thread.kickoff`, `thread.suggestions`, `thread.mention_resolved`, `thread.reconciliation_summary`, `thread.agent_progress`. The `useChat` refit enables these; schemas land when a consumer needs them. -- **Ladle prototype** (§13). Skipped — components built directly against `ai-elements/*`. -- **Patch surface hybrid pill** — when the shell is minimized/closed the patch panel is invisible. Top-bar `N pending · Apply · Undo` pill would restore workspace-wide visibility (CARDS parking lot). -- **Soft gradient wash** on the chat panel from the SgAI Figma — deferred to brand pass. -- **Tab-face last-turn snippet** and **per-tab close affordance** — parked. - ---- - -## Talk-track shortlist - -If the call only has time for the highlights: - -1. We've **landed the unified chat surface** and **retired the popover**. Substrate is just columns on `chat`; no `thread` table. -2. **Secondary chats are first-class**: tabs, switcher, per-chat streaming, durable across reload, Ask/Edit modes, per-chat tool gating. -3. **Patches and pending review live inside the shell now**, with a single bulk-Apply panel — much shorter mental loop than V3.1's three-surface dance. -4. We **deliberately did not build Track 3 (reconciliation)**, **Track 4 (changeset ledger)**, or most of **Track 5 (context provision)**. The shell is the host; those tracks layer on top. -5. The annotate flow and a few visual flourishes (mention `$`/`!`, agent-run inline, multi-pin UI, item-anchored badges) are known gaps awaiting follow-up scope cards. From 9c12408651ae04553dc4bdeaf0b308037ec07e6f Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 13:35:10 +0200 Subject: [PATCH 18/22] FE-730: make petri the default engine Co-authored-by: Amp --- src/orchestrator/src/cook-cli.test.ts | 2 +- src/orchestrator/src/cook-cli.ts | 2 +- src/server/cli.ts | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/orchestrator/src/cook-cli.test.ts b/src/orchestrator/src/cook-cli.test.ts index d61bc008..8e8af74c 100644 --- a/src/orchestrator/src/cook-cli.test.ts +++ b/src/orchestrator/src/cook-cli.test.ts @@ -6,7 +6,7 @@ describe('parseCookArgs', () => { it('parses dir only', () => { const opts = parseCookArgs(['./fixtures/txt']); expect(opts.dir).toContain('fixtures/txt'); - expect(opts.engine).toBe('proc'); + expect(opts.engine).toBe('petri'); expect(opts.maxRetries).toBe(3); expect(opts.verbose).toBe(false); }); diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index 602b943f..0c83436a 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -19,7 +19,7 @@ export type CookOptions = { export function parseCookArgs(args: string[]): CookOptions { let dir = ''; - let engine: 'proc' | 'petri' = 'proc'; + let engine: 'proc' | 'petri' = 'petri'; let maxRetries = 3; let verbose = false; diff --git a/src/server/cli.ts b/src/server/cli.ts index a115cc30..c703c026 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -22,7 +22,7 @@ if (args.has('--help') || args.has('-h') || args.has('help')) { console.log(' cook [flags] Run the orchestrator on a plan directory.'); console.log(''); console.log('Cook flags:'); - console.log(' --engine=proc|petri Execution engine (default: proc)'); + console.log(' --engine=proc|petri Execution engine (default: petri)'); console.log(' --max-retries=N Retry budget per slice (default: 3)'); console.log(' --verbose, -v Show raw pi-agent output'); process.exit(0); From 085fc170f86a2d6fd434dfdbbd01e66290de3630 Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 13:41:34 +0200 Subject: [PATCH 19/22] =?UTF-8?q?FE-730:=20fix=20bot=20round=203=20?= =?UTF-8?q?=E2=80=94=20report=20IDs=20on=20throw,=20dead=20place,=20wrong?= =?UTF-8?q?=20prompt?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - engine-proc: hoist reportIds to run() scope so catch preserves them - engine-petri: remove dead ep(epicId, 'ready') place (readiness fans out directly to slice eligible places) - pi-actions: verify-epic write step uses test-writer.md, not evaluator.md Co-authored-by: Amp --- src/orchestrator/src/engine-petri.ts | 1 - src/orchestrator/src/engine-proc.ts | 8 ++++---- src/orchestrator/src/pi-actions.ts | 2 +- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index dcbf2d49..e6058b56 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -93,7 +93,6 @@ function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { // Epic-level places for (const epic of plan.epics) { - net.addPlace(ep(epic.id, 'ready')); net.addPlace(ep(epic.id, 'done')); } diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index c68e361e..771ae264 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -13,22 +13,22 @@ import type { export class ProceduralOrchestrator implements Orchestrator { async run(input: OrchestratorInput): Promise { + const reportIds: string[] = []; try { - return await this.runInner(input); + return await this.runInner(input, reportIds); } catch (err) { return { status: 'halted', reason: err instanceof Error ? err.message : String(err), - reports: [], + reports: reportIds, epics: input.plan.epics.map((e) => ({ epicId: e.id, status: 'halted' as const })), slices: input.plan.slices.map((s) => ({ sliceId: s.id, status: 'halted' as const })), }; } } - private async runInner(input: OrchestratorInput): Promise { + private async runInner(input: OrchestratorInput, reportIds: string[]): Promise { const { plan, reports, actions, testRunner, policy } = input; - const reportIds: string[] = []; const sliceOutcomes: SliceOutcome[] = []; const epicOutcomes: EpicOutcome[] = []; diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts index 848d593e..4c32682c 100644 --- a/src/orchestrator/src/pi-actions.ts +++ b/src/orchestrator/src/pi-actions.ts @@ -190,7 +190,7 @@ export function createPiActions(opts?: { verbose?: boolean }): ActionHandlers { runPi({ label: `verify ${ctx.epic.id} (write)`, model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'evaluator.md'), + promptFile: join(promptsDir, 'test-writer.md'), task: writeTask, worktreeDir: ctx.worktreeDir, }); From a277aba9e944cc5899bd89d5889ff34f132076cd Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 18:57:52 +0200 Subject: [PATCH 20/22] =?UTF-8?q?FE-730:=20Phase=200=20=E2=80=94=20extract?= =?UTF-8?q?=20NetCompiler=20+=20Interpreter=20+=20FiringPolicy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Extract PetriNet class, Token, TransitionDef, FiringPolicy into petri-net.ts - Extract compilePlan() and RunCtx into net-compiler.ts - Both engines now call shared compilePlan() with serial firing policy - Migrate retry state from ctx.retries Map into in-net retry-budget places - Add adapter tests pinning compiled net place/transition counts - engine-petri.ts and engine-proc.ts are thin wrappers (~65 LOC each) Co-authored-by: Amp --- src/orchestrator/src/engine-contract.test.ts | 80 ++++ src/orchestrator/src/engine-petri.ts | 382 +------------------ src/orchestrator/src/engine-proc.ts | 250 +++--------- src/orchestrator/src/net-compiler.ts | 321 ++++++++++++++++ src/orchestrator/src/petri-net.ts | 87 +++++ 5 files changed, 535 insertions(+), 585 deletions(-) create mode 100644 src/orchestrator/src/net-compiler.ts create mode 100644 src/orchestrator/src/petri-net.ts diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts index 29823f20..c0e99c8f 100644 --- a/src/orchestrator/src/engine-contract.test.ts +++ b/src/orchestrator/src/engine-contract.test.ts @@ -2,6 +2,8 @@ import { describe, expect, it } from 'vitest'; import { PetriOrchestrator } from './engine-petri.js'; import { ProceduralOrchestrator } from './engine-proc.js'; +import { compilePlan } from './net-compiler.js'; +import type { RunCtx } from './net-compiler.js'; import { InMemoryReportSink } from './report-sink.js'; import type { ActionContext, ActionHandlers, OrchestratorInput, Plan, TestRunner } from './types.js'; @@ -576,3 +578,81 @@ describe('Engine contract test #9 — action handler throws', () => { }); } }); + +// --------------------------------------------------------------------------- +// Adapter test — compiled net shape for simplePlan +// --------------------------------------------------------------------------- + +describe('Adapter: compiled net shape', () => { + it('simplePlan compiles to expected place and transition counts', () => { + const reports = new InMemoryReportSink(); + const ctx: RunCtx = { + reportIds: [], + sliceOutcomes: new Map(), + epicOutcomes: new Map(), + + halted: false, + }; + const input: OrchestratorInput = { + plan: simplePlan, + worktreeDir: '/tmp/fake', + actions: createFakes().actions, + reports, + testRunner: createFakes().testRunner, + policy: { maxRetries: 3 }, + }; + + const net = compilePlan(input, ctx); + + // simplePlan: 1 epic, 1 slice (no deps) + // Epic places: epic:epic-1:done = 1 + // Slice places: spec-ready, test-agent, code-agent, failing-tests, + // untested-code, needs-more, done-spec, completed, eligible, + // retry-budget = 10 + // Total places: 11 + expect(net.placeCount).toBe(11); + + // Transitions: + // slice-ready:slice-1, slice-1:evaluate, slice-1:write-tests, + // slice-1:write-code, slice-1:run-tests, slice-1:return-done, + // epic-complete:epic-1 + // Total: 7 + expect(net.transitionCount).toBe(7); + }); + + it('depPlan compiles with additional dep-signal places and transitions', () => { + const reports = new InMemoryReportSink(); + const ctx: RunCtx = { + reportIds: [], + sliceOutcomes: new Map(), + epicOutcomes: new Map(), + + halted: false, + }; + const input: OrchestratorInput = { + plan: depPlan, + worktreeDir: '/tmp/fake', + actions: createFakes().actions, + reports, + testRunner: createFakes().testRunner, + policy: { maxRetries: 3 }, + }; + + const net = compilePlan(input, ctx); + + // depPlan: 1 epic, 2 slices (slice-b depends on slice-a) + // Epic places: epic:epic-1:done = 1 + // Slice-a places: 10 (8 standard + eligible + retry-budget) + // Slice-b places: 10 (8 standard + eligible + retry-budget) + // Dep-signal places: slice:slice-a:dep-signal:slice-b = 1 + // Total: 22 + expect(net.placeCount).toBe(22); + + // Transitions: + // slice-a: slice-ready, evaluate, write-tests, write-code, run-tests, return-done = 6 + // slice-b: slice-ready (with dep gate), evaluate, write-tests, write-code, run-tests, return-done = 6 + // epic-complete:epic-1 = 1 + // Total: 13 + expect(net.transitionCount).toBe(13); + }); +}); diff --git a/src/orchestrator/src/engine-petri.ts b/src/orchestrator/src/engine-petri.ts index e6058b56..e91ffac3 100644 --- a/src/orchestrator/src/engine-petri.ts +++ b/src/orchestrator/src/engine-petri.ts @@ -1,379 +1,6 @@ -import { createReport } from './report-helpers.js'; -import type { - ActionContext, - EpicOutcome, - Orchestrator, - OrchestratorInput, - OrchestratorResult, - ReportSink, - SliceOutcome, -} from './types.js'; - -// --------------------------------------------------------------------------- -// Petri net primitives -// --------------------------------------------------------------------------- - -type Token = { - reportId?: string; - sliceId: string; - epicId: string; -}; - -type TransitionDef = { - id: string; - inputs: string[]; - fire: (consumed: Token[]) => Promise<{ place: string; token: Token }[]>; -}; - -class PetriNet { - private places = new Map(); - private transitions: TransitionDef[] = []; - - addPlace(id: string): void { - this.places.set(id, []); - } - - addToken(placeId: string, token: Token): void { - const tokens = this.places.get(placeId); - if (!tokens) throw new Error(`Unknown place: ${placeId}`); - tokens.push(token); - } - - addTransition(def: TransitionDef): void { - this.transitions.push(def); - } - - hasTokens(placeId: string): boolean { - const tokens = this.places.get(placeId); - return !!tokens && tokens.length > 0; - } - - async run(shouldHalt?: () => boolean): Promise { - while (true) { - if (shouldHalt?.()) break; - - const enabled = this.transitions.find((t) => - t.inputs.every((p) => { - const tokens = this.places.get(p); - return tokens && tokens.length > 0; - }), - ); - if (!enabled) break; - - // Consume one token per input place - const consumed: Token[] = []; - for (const p of enabled.inputs) { - consumed.push(this.places.get(p)!.shift()!); - } - - // Fire — handler decides outputs - const outputs = await enabled.fire(consumed); - for (const { place, token } of outputs) { - this.addToken(place, token); - } - } - } -} - -// --------------------------------------------------------------------------- -// Net compiler — plan → petri net -// --------------------------------------------------------------------------- - -function p(sliceId: string, place: string): string { - return `slice:${sliceId}:${place}`; -} - -function ep(epicId: string, place: string): string { - return `epic:${epicId}:${place}`; -} - -function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { - const net = new PetriNet(); - const { plan, actions, testRunner, reports, policy } = input; - - // Epic-level places - for (const epic of plan.epics) { - net.addPlace(ep(epic.id, 'done')); - } - - // Helper: fan out epic readiness to all its slices' eligible places - function epicReadyOutputs(epicId: string): { place: string; token: Token }[] { - return plan.slices - .filter((s) => s.epic_id === epicId) - .map((s) => ({ place: p(s.id, 'eligible'), token: { sliceId: s.id, epicId } })); - } - - // Seed epic readiness — epics with no deps start ready - // (deferred until eligible places exist — see below) - const seedEpics = plan.epics.filter((e) => e.depends_on.length === 0); - - // Epic dependency wiring — per-dependent signal places (avoids token starvation - // when multiple epics depend on the same predecessor) - for (const epic of plan.epics) { - if (epic.depends_on.length > 0) { - const signalPlaces = epic.depends_on.map((depId) => { - const signalPlace = ep(depId, `dep-signal:${epic.id}`); - net.addPlace(signalPlace); - return signalPlace; - }); - net.addTransition({ - id: `epic-deps-met:${epic.id}`, - inputs: signalPlaces, - fire: async () => epicReadyOutputs(epic.id), - }); - } - } - - // Per-slice inner loop - for (const slice of plan.slices) { - const epic = plan.epics.find((e) => e.id === slice.epic_id)!; - const sid = slice.id; - const baseToken: Token = { sliceId: sid, epicId: epic.id }; - - // Places - for (const name of [ - 'spec-ready', - 'test-agent', - 'code-agent', - 'failing-tests', - 'untested-code', - 'needs-more', - 'done-spec', - 'completed', - ]) { - net.addPlace(p(sid, name)); - } - - // Initial tokens (agent resources) - net.addToken(p(sid, 'test-agent'), { ...baseToken }); - net.addToken(p(sid, 'code-agent'), { ...baseToken }); - - // Slice readiness gate — collects per-slice prerequisite tokens - net.addPlace(p(sid, 'eligible')); - - if (slice.depends_on.length === 0) { - // No slice deps — eligible when epic is ready (token seeded below) - net.addTransition({ - id: `slice-ready:${sid}`, - inputs: [p(sid, 'eligible')], - fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], - }); - } else { - // Has slice deps — eligible needs its own token AND all dep completions - const gateInputs = [p(sid, 'eligible'), ...slice.depends_on.map((d) => p(d, 'dep-signal:' + sid))]; - for (const depId of slice.depends_on) { - net.addPlace(p(depId, 'dep-signal:' + sid)); - } - net.addTransition({ - id: `slice-ready:${sid}`, - inputs: gateInputs, - fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], - }); - } - - const actCtx: ActionContext = { - slice, - epic, - plan, - worktreeDir: input.worktreeDir, - reports, - }; - - // Evaluate — conditional: NO → needs-more, YES → done-spec - net.addTransition({ - id: `${sid}:evaluate`, - inputs: [p(sid, 'spec-ready'), p(sid, 'test-agent')], - fire: async (consumed) => { - const reportId = await actions['evaluate-done'](actCtx); - ctx.reportIds.push(reportId); - const report = reports.getById(reportId); - const done = !!(report?.payload as { done?: boolean })?.done; - const tok: Token = { ...consumed[0]!, reportId }; - if (done) { - return [ - { place: p(sid, 'done-spec'), token: tok }, - { place: p(sid, 'test-agent'), token: { ...baseToken } }, - ]; - } - return [ - { place: p(sid, 'needs-more'), token: tok }, - { place: p(sid, 'test-agent'), token: { ...baseToken } }, - ]; - }, - }); - - // Write tests - net.addTransition({ - id: `${sid}:write-tests`, - inputs: [p(sid, 'needs-more'), p(sid, 'test-agent')], - fire: async (consumed) => { - const reportId = await actions['write-tests'](actCtx); - ctx.reportIds.push(reportId); - return [ - { place: p(sid, 'failing-tests'), token: { ...consumed[0]!, reportId } }, - { place: p(sid, 'test-agent'), token: { ...baseToken } }, - ]; - }, - }); - - // Write code - net.addTransition({ - id: `${sid}:write-code`, - inputs: [p(sid, 'failing-tests'), p(sid, 'code-agent')], - fire: async (consumed) => { - const reportId = await actions['write-code'](actCtx); - ctx.reportIds.push(reportId); - return [ - { place: p(sid, 'untested-code'), token: { ...consumed[0]!, reportId } }, - { place: p(sid, 'code-agent'), token: { ...baseToken } }, - ]; - }, - }); - - // Run tests — orchestrator-owned, deterministic - const retryKey = `retry:${sid}`; - ctx.retries.set(retryKey, 0); - - net.addTransition({ - id: `${sid}:run-tests`, - inputs: [p(sid, 'untested-code')], - fire: async (consumed) => { - const target = slice.verification[0]?.target ?? ''; - const result = await testRunner.run(target, input.worktreeDir); - const reportId = createReport(reports, { - epicId: epic.id, - sliceId: sid, - actor: 'test-runner', - event: 'tests-run', - payload: { passed: result.passed, output: result.output }, - }); - ctx.reportIds.push(reportId); - - const tok: Token = { ...consumed[0]!, reportId }; - if (result.passed) { - ctx.retries.set(retryKey, 0); - return [{ place: p(sid, 'spec-ready'), token: tok }]; - } - const retryCount = ctx.retries.get(retryKey)!; - if (retryCount >= policy.maxRetries) { - ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'halted' }); - ctx.halted = true; - ctx.haltReason = `Slice ${sid} retry exhaustion`; - return []; // dead end — no output tokens - } - ctx.retries.set(retryKey, retryCount + 1); - return [{ place: p(sid, 'failing-tests'), token: tok }]; - }, - }); - - // Return DONE — also emit dep-signal tokens for downstream slices - const dependents = plan.slices.filter((s) => s.depends_on.includes(sid)); - net.addTransition({ - id: `${sid}:return-done`, - inputs: [p(sid, 'done-spec')], - fire: async () => { - ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'completed' }); - const outputs: { place: string; token: Token }[] = [ - { place: p(sid, 'completed'), token: { ...baseToken } }, - ]; - for (const dep of dependents) { - outputs.push({ place: p(sid, 'dep-signal:' + dep.id), token: { ...baseToken } }); - } - return outputs; - }, - }); - } - - // Seed eligible places for epics with no dependencies - for (const epic of seedEpics) { - for (const output of epicReadyOutputs(epic.id)) { - net.addToken(output.place, output.token); - } - } - - // Epic completion — all slices done → epic verification → epic done - for (const epic of plan.epics) { - const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); - const completedPlaces = epicSlices.map((s) => p(s.id, 'completed')); - - if (epicSlices.length === 0) continue; - - // Find epics that depend on this one — emit dep-signal tokens on completion - const epicDependents = plan.epics.filter((e) => e.depends_on.includes(epic.id)); - function epicDoneOutputs(): { place: string; token: Token }[] { - const outputs: { place: string; token: Token }[] = [ - { place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }, - ]; - for (const dep of epicDependents) { - outputs.push({ place: ep(epic.id, `dep-signal:${dep.id}`), token: { sliceId: '', epicId: epic.id } }); - } - return outputs; - } - - if (epic.verification.length === 0) { - // No verification — slices done → epic done - net.addTransition({ - id: `epic-complete:${epic.id}`, - inputs: completedPlaces, - fire: async () => { - ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); - return epicDoneOutputs(); - }, - }); - } else { - // With verification — slices done → verify → epic done - const verifyPlace = ep(epic.id, 'verify-ready'); - net.addPlace(verifyPlace); - - net.addTransition({ - id: `epic-slices-done:${epic.id}`, - inputs: completedPlaces, - fire: async () => [{ place: verifyPlace, token: { sliceId: '', epicId: epic.id } }], - }); - - net.addTransition({ - id: `epic-verify:${epic.id}`, - inputs: [verifyPlace], - fire: async () => { - const verifyCtx: ActionContext = { - slice: epicSlices[0]!, - epic, - plan, - worktreeDir: input.worktreeDir, - reports, - }; - const reportId = await actions['verify-epic'](verifyCtx); - ctx.reportIds.push(reportId); - const report = reports.getById(reportId); - const passed = !!(report?.payload as { passed?: boolean })?.passed; - if (passed) { - ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); - return epicDoneOutputs(); - } - ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); - ctx.halted = true; - ctx.haltReason = `Epic ${epic.id} verification failed`; - return []; // dead end - }, - }); - } - } - - return net; -} - -// --------------------------------------------------------------------------- -// Mutable run context -// --------------------------------------------------------------------------- - -type RunCtx = { - reportIds: string[]; - sliceOutcomes: Map; - epicOutcomes: Map; - retries: Map; - halted: boolean; - haltReason?: string; -}; +import { compilePlan } from './net-compiler.js'; +import type { RunCtx } from './net-compiler.js'; +import type { Orchestrator, OrchestratorInput, OrchestratorResult } from './types.js'; // --------------------------------------------------------------------------- // PetriOrchestrator — implements Orchestrator @@ -385,13 +12,12 @@ export class PetriOrchestrator implements Orchestrator { reportIds: [], sliceOutcomes: new Map(), epicOutcomes: new Map(), - retries: new Map(), halted: false, }; try { const net = compilePlan(input, ctx); - await net.run(() => ctx.halted); + await net.run('serial', () => ctx.halted); } catch (err) { return { status: 'halted', diff --git a/src/orchestrator/src/engine-proc.ts b/src/orchestrator/src/engine-proc.ts index 771ae264..f83609dc 100644 --- a/src/orchestrator/src/engine-proc.ts +++ b/src/orchestrator/src/engine-proc.ts @@ -1,226 +1,62 @@ -import { createReport } from './report-helpers.js'; -import type { - ActionContext, - Epic, - EpicOutcome, - Orchestrator, - OrchestratorInput, - OrchestratorResult, - Plan, - Slice, - SliceOutcome, -} from './types.js'; +import { compilePlan } from './net-compiler.js'; +import type { RunCtx } from './net-compiler.js'; +import type { Orchestrator, OrchestratorInput, OrchestratorResult } from './types.js'; + +// --------------------------------------------------------------------------- +// ProceduralOrchestrator — same compiled net, serial firing policy. +// After Phase 0, proc and petri share the compiler; the only difference +// is the firing policy passed to PetriNet.run(). Phase 2 will introduce +// a 'parallel' policy for petri while proc stays serial. +// --------------------------------------------------------------------------- export class ProceduralOrchestrator implements Orchestrator { async run(input: OrchestratorInput): Promise { - const reportIds: string[] = []; + const ctx: RunCtx = { + reportIds: [], + sliceOutcomes: new Map(), + epicOutcomes: new Map(), + halted: false, + }; + try { - return await this.runInner(input, reportIds); + const net = compilePlan(input, ctx); + await net.run('serial', () => ctx.halted); } catch (err) { return { status: 'halted', reason: err instanceof Error ? err.message : String(err), - reports: reportIds, - epics: input.plan.epics.map((e) => ({ epicId: e.id, status: 'halted' as const })), - slices: input.plan.slices.map((s) => ({ sliceId: s.id, status: 'halted' as const })), + reports: ctx.reportIds, + epics: input.plan.epics.map( + (e) => ctx.epicOutcomes.get(e.id) ?? { epicId: e.id, status: 'halted' as const }, + ), + slices: input.plan.slices.map( + (s) => ctx.sliceOutcomes.get(s.id) ?? { sliceId: s.id, status: 'halted' as const }, + ), }; } - } - private async runInner(input: OrchestratorInput, reportIds: string[]): Promise { - const { plan, reports, actions, testRunner, policy } = input; - const sliceOutcomes: SliceOutcome[] = []; - const epicOutcomes: EpicOutcome[] = []; - - const epicOrder = topoSort( - plan.epics, - (e) => e.id, - (e) => e.depends_on, - ); - - for (const epic of epicOrder) { - const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); - const sliceOrder = topoSort( - epicSlices, - (s) => s.id, - (s) => s.depends_on, - ); - let epicHalted = false; - - for (const slice of sliceOrder) { - const outcome = await this.executeSlice(slice, epic, input, reportIds); - sliceOutcomes.push(outcome); - if (outcome.status === 'halted') { - epicHalted = true; - break; - } + // Fill in any slices/epics not yet in outcomes (e.g. never reached) + for (const slice of input.plan.slices) { + if (!ctx.sliceOutcomes.has(slice.id)) { + ctx.sliceOutcomes.set(slice.id, { sliceId: slice.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason ??= 'Some slices were never reached'; } - - if (epicHalted) { - epicOutcomes.push({ epicId: epic.id, status: 'halted' }); - return this.haltedResult( - plan, - `Epic ${epic.id} halted due to slice failure`, - reportIds, - epicOutcomes, - sliceOutcomes, - ); - } - - // Epic-level verification (one call — handler owns all targets) - if (epic.verification.length > 0) { - const verifyId = await actions['verify-epic']({ - slice: epicSlices[0]!, - epic, - plan, - worktreeDir: input.worktreeDir, - reports, - }); - reportIds.push(verifyId); - const verifyReport = reports.getById(verifyId); - if (verifyReport && !(verifyReport.payload as { passed?: boolean }).passed) { - epicOutcomes.push({ epicId: epic.id, status: 'halted' }); - return this.haltedResult( - plan, - `Epic ${epic.id} verification failed`, - reportIds, - epicOutcomes, - sliceOutcomes, - ); - } + } + for (const epic of input.plan.epics) { + if (!ctx.epicOutcomes.has(epic.id)) { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason ??= 'Some epics were never reached'; } - - epicOutcomes.push({ epicId: epic.id, status: 'completed' }); } return { - status: 'completed', - reports: reportIds, - epics: epicOutcomes, - slices: sliceOutcomes, + status: ctx.halted ? 'halted' : 'completed', + reason: ctx.haltReason, + reports: ctx.reportIds, + epics: input.plan.epics.map((e) => ctx.epicOutcomes.get(e.id)!), + slices: input.plan.slices.map((s) => ctx.sliceOutcomes.get(s.id)!), }; } - - /** Fill in unreached items as halted before returning a halted result. */ - private haltedResult( - plan: Plan, - reason: string, - reportIds: string[], - epicOutcomes: EpicOutcome[], - sliceOutcomes: SliceOutcome[], - ): OrchestratorResult { - const seenEpics = new Set(epicOutcomes.map((e) => e.epicId)); - const seenSlices = new Set(sliceOutcomes.map((s) => s.sliceId)); - for (const epic of plan.epics) { - if (!seenEpics.has(epic.id)) epicOutcomes.push({ epicId: epic.id, status: 'halted' }); - } - for (const slice of plan.slices) { - if (!seenSlices.has(slice.id)) sliceOutcomes.push({ sliceId: slice.id, status: 'halted' }); - } - return { status: 'halted', reason, reports: reportIds, epics: epicOutcomes, slices: sliceOutcomes }; - } - - private async executeSlice( - slice: Slice, - epic: Epic, - input: OrchestratorInput, - reportIds: string[], - ): Promise { - const { actions, reports, testRunner, policy } = input; - - const ctx: ActionContext = { - slice, - epic, - plan: input.plan, - worktreeDir: input.worktreeDir, - reports, - }; - - // TDD inner loop - while (true) { - // 1. Evaluate — is this slice done? - const evalId = await actions['evaluate-done'](ctx); - reportIds.push(evalId); - const evalReport = reports.getById(evalId); - if (evalReport && (evalReport.payload as { done?: boolean }).done) { - return { sliceId: slice.id, status: 'completed' }; - } - - // 2. Write tests - const testWriteId = await actions['write-tests'](ctx); - reportIds.push(testWriteId); - - // 3. Write code - const codeWriteId = await actions['write-code'](ctx); - reportIds.push(codeWriteId); - - // 4. Run tests (orchestrator-owned, deterministic) - const target = slice.verification[0]?.target ?? ''; - let result = await testRunner.run(target, input.worktreeDir); - const runReportId = createReport(reports, { - epicId: epic.id, - sliceId: slice.id, - actor: 'test-runner', - event: 'tests-run', - payload: { passed: result.passed, output: result.output }, - }); - reportIds.push(runReportId); - - if (result.passed) { - // Tests pass → loop back to evaluate - continue; - } - - // Retry loop: write-code + run-tests - let passed = false; - for (let retry = 0; retry < policy.maxRetries; retry++) { - const retryCodeId = await actions['write-code'](ctx); - reportIds.push(retryCodeId); - - result = await testRunner.run(target, input.worktreeDir); - const retryRunId = createReport(reports, { - epicId: epic.id, - sliceId: slice.id, - actor: 'test-runner', - event: 'tests-run', - payload: { passed: result.passed, output: result.output }, - }); - reportIds.push(retryRunId); - - if (result.passed) { - passed = true; - break; - } - } - - if (!passed) { - return { sliceId: slice.id, status: 'halted' }; - } - // Tests pass after retry → loop back to evaluate - } - } -} - -// --------------------------------------------------------------------------- -// Topo sort -// --------------------------------------------------------------------------- - -function topoSort(items: T[], getId: (item: T) => string, getDeps: (item: T) => string[]): T[] { - const byId = new Map(items.map((item) => [getId(item), item])); - const visited = new Set(); - const result: T[] = []; - - function visit(id: string) { - if (visited.has(id)) return; - visited.add(id); - const item = byId.get(id); - if (!item) return; - for (const dep of getDeps(item)) { - visit(dep); - } - result.push(item); - } - - for (const item of items) visit(getId(item)); - return result; } diff --git a/src/orchestrator/src/net-compiler.ts b/src/orchestrator/src/net-compiler.ts new file mode 100644 index 00000000..7cc9f000 --- /dev/null +++ b/src/orchestrator/src/net-compiler.ts @@ -0,0 +1,321 @@ +// --------------------------------------------------------------------------- +// Net compiler — compiles a Plan into a PetriNet with wired transitions. +// Extracted from engine-petri.ts for Phase 0. +// --------------------------------------------------------------------------- + +import { PetriNet } from './petri-net.js'; +import type { Token } from './petri-net.js'; +import { createReport } from './report-helpers.js'; +import type { ActionContext, EpicOutcome, OrchestratorInput, SliceOutcome } from './types.js'; + +// --------------------------------------------------------------------------- +// Mutable run context — shared between compiler and orchestrator +// --------------------------------------------------------------------------- + +export type RunCtx = { + reportIds: string[]; + sliceOutcomes: Map; + epicOutcomes: Map; + halted: boolean; + haltReason?: string; +}; + +// --------------------------------------------------------------------------- +// Place-id helpers +// --------------------------------------------------------------------------- + +function p(sliceId: string, place: string): string { + return `slice:${sliceId}:${place}`; +} + +function ep(epicId: string, place: string): string { + return `epic:${epicId}:${place}`; +} + +// --------------------------------------------------------------------------- +// compilePlan — builds the full PetriNet for a plan +// --------------------------------------------------------------------------- + +export function compilePlan(input: OrchestratorInput, ctx: RunCtx): PetriNet { + const net = new PetriNet(); + const { plan, actions, testRunner, reports, policy } = input; + + // Epic-level places + for (const epic of plan.epics) { + net.addPlace(ep(epic.id, 'done')); + } + + // Helper: fan out epic readiness to all its slices' eligible places + function epicReadyOutputs(epicId: string): { place: string; token: Token }[] { + return plan.slices + .filter((s) => s.epic_id === epicId) + .map((s) => ({ place: p(s.id, 'eligible'), token: { sliceId: s.id, epicId } })); + } + + // Seed epic readiness — epics with no deps start ready + // (deferred until eligible places exist — see below) + const seedEpics = plan.epics.filter((e) => e.depends_on.length === 0); + + // Epic dependency wiring — per-dependent signal places (avoids token starvation + // when multiple epics depend on the same predecessor) + for (const epic of plan.epics) { + if (epic.depends_on.length > 0) { + const signalPlaces = epic.depends_on.map((depId) => { + const signalPlace = ep(depId, `dep-signal:${epic.id}`); + net.addPlace(signalPlace); + return signalPlace; + }); + net.addTransition({ + id: `epic-deps-met:${epic.id}`, + inputs: signalPlaces, + fire: async () => epicReadyOutputs(epic.id), + }); + } + } + + // Per-slice inner loop + for (const slice of plan.slices) { + const epic = plan.epics.find((e) => e.id === slice.epic_id)!; + const sid = slice.id; + const baseToken: Token = { sliceId: sid, epicId: epic.id }; + + // Places + for (const name of [ + 'spec-ready', + 'test-agent', + 'code-agent', + 'failing-tests', + 'untested-code', + 'needs-more', + 'done-spec', + 'completed', + ]) { + net.addPlace(p(sid, name)); + } + + // Initial tokens (agent resources) + net.addToken(p(sid, 'test-agent'), { ...baseToken }); + net.addToken(p(sid, 'code-agent'), { ...baseToken }); + + // Slice readiness gate — collects per-slice prerequisite tokens + net.addPlace(p(sid, 'eligible')); + + if (slice.depends_on.length === 0) { + // No slice deps — eligible when epic is ready (token seeded below) + net.addTransition({ + id: `slice-ready:${sid}`, + inputs: [p(sid, 'eligible')], + fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], + }); + } else { + // Has slice deps — eligible needs its own token AND all dep completions + const gateInputs = [p(sid, 'eligible'), ...slice.depends_on.map((d) => p(d, 'dep-signal:' + sid))]; + for (const depId of slice.depends_on) { + net.addPlace(p(depId, 'dep-signal:' + sid)); + } + net.addTransition({ + id: `slice-ready:${sid}`, + inputs: gateInputs, + fire: async () => [{ place: p(sid, 'spec-ready'), token: { ...baseToken } }], + }); + } + + const actCtx: ActionContext = { + slice, + epic, + plan, + worktreeDir: input.worktreeDir, + reports, + }; + + // Evaluate — conditional: NO → needs-more, YES → done-spec + net.addTransition({ + id: `${sid}:evaluate`, + inputs: [p(sid, 'spec-ready'), p(sid, 'test-agent')], + fire: async (consumed) => { + const reportId = await actions['evaluate-done'](actCtx); + ctx.reportIds.push(reportId); + const report = reports.getById(reportId); + const done = !!(report?.payload as { done?: boolean })?.done; + const tok: Token = { ...consumed[0]!, reportId }; + if (done) { + return [ + { place: p(sid, 'done-spec'), token: tok }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + } + return [ + { place: p(sid, 'needs-more'), token: tok }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Write tests + net.addTransition({ + id: `${sid}:write-tests`, + inputs: [p(sid, 'needs-more'), p(sid, 'test-agent')], + fire: async (consumed) => { + const reportId = await actions['write-tests'](actCtx); + ctx.reportIds.push(reportId); + return [ + { place: p(sid, 'failing-tests'), token: { ...consumed[0]!, reportId } }, + { place: p(sid, 'test-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Write code + net.addTransition({ + id: `${sid}:write-code`, + inputs: [p(sid, 'failing-tests'), p(sid, 'code-agent')], + fire: async (consumed) => { + const reportId = await actions['write-code'](actCtx); + ctx.reportIds.push(reportId); + return [ + { place: p(sid, 'untested-code'), token: { ...consumed[0]!, reportId } }, + { place: p(sid, 'code-agent'), token: { ...baseToken } }, + ]; + }, + }); + + // Retry budget — modeled as a place with a token carrying the count. + // Moved from ctx.retries Map to keep all control state inside the net. + net.addPlace(p(sid, 'retry-budget')); + net.addToken(p(sid, 'retry-budget'), { ...baseToken, retryCount: 0 }); + + // Run tests — orchestrator-owned, deterministic + net.addTransition({ + id: `${sid}:run-tests`, + inputs: [p(sid, 'untested-code'), p(sid, 'retry-budget')], + fire: async (consumed) => { + const retryToken = consumed[1]!; + const retryCount = retryToken.retryCount ?? 0; + + const target = slice.verification[0]?.target ?? ''; + const result = await testRunner.run(target, input.worktreeDir); + const reportId = createReport(reports, { + epicId: epic.id, + sliceId: sid, + actor: 'test-runner', + event: 'tests-run', + payload: { passed: result.passed, output: result.output }, + }); + ctx.reportIds.push(reportId); + + const tok: Token = { ...consumed[0]!, reportId }; + if (result.passed) { + // Reset retry budget on success + return [ + { place: p(sid, 'spec-ready'), token: tok }, + { place: p(sid, 'retry-budget'), token: { ...baseToken, retryCount: 0 } }, + ]; + } + if (retryCount >= policy.maxRetries) { + ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'halted' }); + ctx.halted = true; + ctx.haltReason = `Slice ${sid} retry exhaustion`; + return []; // dead end — no output tokens, retry budget consumed + } + return [ + { place: p(sid, 'failing-tests'), token: tok }, + { place: p(sid, 'retry-budget'), token: { ...baseToken, retryCount: retryCount + 1 } }, + ]; + }, + }); + + // Return DONE — also emit dep-signal tokens for downstream slices + const dependents = plan.slices.filter((s) => s.depends_on.includes(sid)); + net.addTransition({ + id: `${sid}:return-done`, + inputs: [p(sid, 'done-spec')], + fire: async () => { + ctx.sliceOutcomes.set(sid, { sliceId: sid, status: 'completed' }); + const outputs: { place: string; token: Token }[] = [ + { place: p(sid, 'completed'), token: { ...baseToken } }, + ]; + for (const dep of dependents) { + outputs.push({ place: p(sid, 'dep-signal:' + dep.id), token: { ...baseToken } }); + } + return outputs; + }, + }); + } + + // Seed eligible places for epics with no dependencies + for (const epic of seedEpics) { + for (const output of epicReadyOutputs(epic.id)) { + net.addToken(output.place, output.token); + } + } + + // Epic completion — all slices done → epic verification → epic done + for (const epic of plan.epics) { + const epicSlices = plan.slices.filter((s) => s.epic_id === epic.id); + const completedPlaces = epicSlices.map((s) => p(s.id, 'completed')); + + if (epicSlices.length === 0) continue; + + // Find epics that depend on this one — emit dep-signal tokens on completion + const epicDependents = plan.epics.filter((e) => e.depends_on.includes(epic.id)); + function epicDoneOutputs(): { place: string; token: Token }[] { + const outputs: { place: string; token: Token }[] = [ + { place: ep(epic.id, 'done'), token: { sliceId: '', epicId: epic.id } }, + ]; + for (const dep of epicDependents) { + outputs.push({ place: ep(epic.id, `dep-signal:${dep.id}`), token: { sliceId: '', epicId: epic.id } }); + } + return outputs; + } + + if (epic.verification.length === 0) { + // No verification — slices done → epic done + net.addTransition({ + id: `epic-complete:${epic.id}`, + inputs: completedPlaces, + fire: async () => { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); + return epicDoneOutputs(); + }, + }); + } else { + // With verification — slices done → verify → epic done + const verifyPlace = ep(epic.id, 'verify-ready'); + net.addPlace(verifyPlace); + + net.addTransition({ + id: `epic-slices-done:${epic.id}`, + inputs: completedPlaces, + fire: async () => [{ place: verifyPlace, token: { sliceId: '', epicId: epic.id } }], + }); + + net.addTransition({ + id: `epic-verify:${epic.id}`, + inputs: [verifyPlace], + fire: async () => { + const verifyCtx: ActionContext = { + slice: epicSlices[0]!, + epic, + plan, + worktreeDir: input.worktreeDir, + reports, + }; + const reportId = await actions['verify-epic'](verifyCtx); + ctx.reportIds.push(reportId); + const report = reports.getById(reportId); + const passed = !!(report?.payload as { passed?: boolean })?.passed; + if (passed) { + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'completed' }); + return epicDoneOutputs(); + } + ctx.epicOutcomes.set(epic.id, { epicId: epic.id, status: 'halted' }); + ctx.halted = true; + ctx.haltReason = `Epic ${epic.id} verification failed`; + return []; // dead end + }, + }); + } + } + + return net; +} diff --git a/src/orchestrator/src/petri-net.ts b/src/orchestrator/src/petri-net.ts new file mode 100644 index 00000000..91af574c --- /dev/null +++ b/src/orchestrator/src/petri-net.ts @@ -0,0 +1,87 @@ +// --------------------------------------------------------------------------- +// Petri-net interpreter — extracted from engine-petri.ts for Phase 0. +// PetriNet class, Token, TransitionDef, and FiringPolicy live here. +// --------------------------------------------------------------------------- + +export type Token = { + reportId?: string; + sliceId: string; + epicId: string; + /** Retry counter — carried on retry-budget tokens. Phase 0 extension + * to move retry state into the net instead of leaking to ctx.retries. */ + retryCount?: number; +}; + +export type TransitionDef = { + id: string; + inputs: string[]; + fire: (consumed: Token[]) => Promise<{ place: string; token: Token }[]>; +}; + +/** + * Firing policy determines how the interpreter selects the next enabled + * transition. Phase 0 ships only `serial` (first-enabled); Phase 2 will + * add `parallel` (all-enabled concurrently). + */ +export type FiringPolicy = 'serial'; + +export class PetriNet { + private places = new Map(); + private transitions: TransitionDef[] = []; + + addPlace(id: string): void { + this.places.set(id, []); + } + + addToken(placeId: string, token: Token): void { + const tokens = this.places.get(placeId); + if (!tokens) throw new Error(`Unknown place: ${placeId}`); + tokens.push(token); + } + + addTransition(def: TransitionDef): void { + this.transitions.push(def); + } + + hasTokens(placeId: string): boolean { + const tokens = this.places.get(placeId); + return !!tokens && tokens.length > 0; + } + + /** Returns the number of registered places. */ + get placeCount(): number { + return this.places.size; + } + + /** Returns the number of registered transitions. */ + get transitionCount(): number { + return this.transitions.length; + } + + async run(_policy: FiringPolicy, shouldHalt?: () => boolean): Promise { + // Phase 0: only serial policy — find first enabled, fire, repeat. + while (true) { + if (shouldHalt?.()) break; + + const enabled = this.transitions.find((t) => + t.inputs.every((p) => { + const tokens = this.places.get(p); + return tokens && tokens.length > 0; + }), + ); + if (!enabled) break; + + // Consume one token per input place + const consumed: Token[] = []; + for (const p of enabled.inputs) { + consumed.push(this.places.get(p)!.shift()!); + } + + // Fire — handler decides outputs + const outputs = await enabled.fire(consumed); + for (const { place, token } of outputs) { + this.addToken(place, token); + } + } + } +} From 4e1f38c297730cf5c60e6480e0810418c559c1bc Mon Sep 17 00:00:00 2001 From: Kostandin Angjellari Date: Thu, 21 May 2026 18:58:02 +0200 Subject: [PATCH 21/22] plan: orchestrator-poc done, petri-semantic-lanes frontier + cards scoped - Mark orchestrator-poc done in PLAN.md (Phase 0 complete) - Add petri-semantic-lanes and petri-parallel-execution frontier definitions - Add petri-graph-compilation and petri-simulation-oracle to Horizon - Add Track F dependency graph for H-6476 umbrella - Scope Card 1-3 queue in CARDS.md for petri-semantic-lanes Co-authored-by: Amp --- memory/CARDS.md | 1106 ++++++----------------------------------------- memory/PLAN.md | 74 +++- 2 files changed, 182 insertions(+), 998 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index 10c1c410..6201ee42 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -1,1014 +1,156 @@ - + -# Cards — `chat-runtime-secondary-chats` (FE-716) +# petri-semantic-lanes — scope cards -Branch: `ka/fe-716-chat-runtime-unified-secondary-chats` -Linear: [FE-716](https://linear.app/hash/issue/FE-716) -Stacked on: `ln/fe-709-reconciliations` (PR #139, awaiting merge to main) +## Card 1: Two-lane subnet with semantic completion gate -## V1 framing +**Status:** next -V1 = "every behavior the current side-chat (V3.1) ships today, surfaced through the elevated unified-workspace shape from `docs/design/UNIFIED_CHAT_UX.md`." Build only what that framing requires; defer the rest of the brief to follow-up frontiers. See PLAN.md `chat-runtime-secondary-chats` § V1 narrowing for the explicit defer list. +### Target Behavior -Vocabulary: **secondary chat** (matches PR #139's lexicon). The `chat.parent_chat_id IS NOT NULL` projection is the sole driver of "render inline as a secondary chat under parent." +The compiled slice subnet enforces a two-lane terminal join: `return-done` is unreachable unless both mechanical verification (`done-spec`) and semantic assessment (`semantic-satisfied`) have produced tokens. -## Card queue +### Boundary Crossings -### C0 — Bring forward `UNIFIED_CHAT_UX.md` design brief +``` +→ types.ts (add 'assess-semantic' to action vocabulary) +→ net-compiler.ts (add semantic places + assess-semantic transition + terminal join) +→ engine-contract.test.ts (update call-order assertions, add semantic-gate scenario) +→ petri-net.ts (no change — interpreter is topology-agnostic) +→ engine-petri.ts / engine-proc.ts (no change — thin wrappers) +``` -- **Status:** **done** (2026-05-15) — Option B chosen (verbatim body + prepended `` translation header mapping `thread` → `secondary chat` and noting D153 substrate deferral). -- **What:** Copy `docs/design/UNIFIED_CHAT_UX.md` verbatim from PR #138 onto this branch. Body preserved unedited; reading-note header added for current readers. Brief stays the canonical UX ceiling for future tracks. -- **Why first:** Zero substrate dependency; gives downstream cards a single in-tree reference. Cheap to land alone. -- **Scope:** doc-only. -- **Verification:** `npm run check` — 0 errors (6 pre-existing warnings unrelated). Body matches PR #138 commit `cd48b49a` byte-for-byte. +### Risks and Assumptions -### C1 — Substrate migration: four columns on `chat`, zero enum changes +``` +- RISK: All existing contract tests check call-order sequences that will gain an + assess-semantic step → every test with call-order assertions needs updating. + → MITIGATION: Mechanical change — add the step to expected sequences. The + fake factory already uses Record so adding a new key + is trivial. -- **Status:** **done** (2026-05-15) — `drizzle/0020_chat_secondary_chat_columns.sql` adds the four nullable integer/text columns + two non-unique indexes; `src/server/schema.ts` chat table promoted to `(table) => […])` form to declare the indexes. Real schema uses `integer` ids (HANDOFF's UUID was illustrative). Resolved: `invoked_in_turn_id` kept (denormalized anchor); `pinned_reconciliation_need_id` deferred; per-turn span-hint not in V1; `parent_chat_id` + `invoked_in_turn_id` indexed. -- **What:** Drizzle migration adding `parent_chat_id integer NULL REFERENCES chat(id)`, `invoked_in_turn_id integer NULL REFERENCES turn(id)`, `pinned_item_id integer NULL REFERENCES knowledge_item(id)`, `pinned_span_hint text NULL` + indexes `chat_parent_chat_id_idx` and `chat_invoked_in_turn_id_idx`. `chat.kind` enum unchanged; `chat.active_turn_id` preserved. -- **Verification:** `npm run verify` — 100 test files / 1272 tests pass; build clean. New tests in `src/server/chat-substrate.test.ts` cover column shape, index presence, FK integrity (parent_chat_id, pinned_item_id, invoked_in_turn_id all reject missing targets), nullable inserts, and `chat.active_turn_id` preservation. -- **Out of scope:** any new enum value; the `thread` table; `turn.thread_id`; `thread_context_item`. +- RISK: Semantic assessment always passes in fakes — the topological constraint + is real but the assessment itself is a no-op until real oracles land. + → MITIGATION: Add one contract test where assess-semantic fails → slice halts. + This proves the gate is load-bearing, not decorative. -### C2 — Server: `createSecondaryChat` + `createKickoffTurn` helpers +- ASSUMPTION: A single assess-semantic action per slice is sufficient for Phase 1. + The spec doc shows multiple semantic transitions (AssessOracleSatisfaction, + AssessDesignExercised, AssessIntentEstablished), but those can be sub-steps + of one assessment action in this slice; the net template can refine later. + → VALIDATE: The terminal join enforces the gate; internal decomposition of + semantic assessment is additive, not structural. +``` -- **Status:** **done** (2026-05-15) — helpers + tests landed; route deferred to C3 to avoid speculative scaffolding (no consumer until UI wires up). -- **What:** Two new public DB helpers exported from [src/server/db.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/server/db.ts): - - `createSecondaryChat(db, specId, { parent_chat_id, invoked_in_turn_id?, pinned_item_id?, pinned_span_hint? })` — inserts a `chat` row with `kind='side_chat'` and the four C1 columns; returns `Chat`. - - `createKickoffTurn(db, chatId, { phase, content })` — inserts a `turn` with `turn_kind='kickoff'`, `chat_id=chatId`, and `assistant_parts=content`; resolves the chat's `specification_id` automatically; returns `Turn`. -- **Verification:** `npm run verify` — 100 test files / 1277 tests pass. New tests in [src/server/chat-substrate.test.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/server/chat-substrate.test.ts) cover happy-path persistence, optional column population, FK rejection, kickoff turn metadata, and error on missing chat. -- **Out of scope (moved to C3):** `POST /api/specifications/:id/secondary-chats` route. Building it without a consumer is speculative; C3 will define the route alongside the UI client that calls it. -- **Harvest reference:** `src/server/side-chat-route.ts`, `src/server/side-chat-prompt.ts`, PR #138's threads endpoint. +### Acceptance Criteria -### C3 — Client: `secondary-chat-collapsible` inline component +``` +✓ semantic-places — Compiled subnet per slice includes `semantic-gate` and + `semantic-satisfied` places. Adapter test confirms updated place count. -C3 has been split into three sub-cards (C3a / C3b / C3c) for verifiable thin slices. Original "What" preserved below for reference. +✓ assess-semantic-transition — New transition `{sliceId}:assess-semantic` + consumes `done-spec` + `semantic-gate` and produces `semantic-satisfied` + (on pass) or routes to `needs-more` (on fail, forcing another TDD cycle). -- **C3 original What:** Build the inline collapsible UI for `chat.parent_chat_id IS NOT NULL` chats, anchored under their `invoked_in_turn_id` in the parent transcript. Driven entirely by the projection rule — no flavor enum needed. Replace `SideChatHost`'s popover plumbing with inline rendering inside `ContinuousWorkspaceView`. -- **Out of scope (across all sub-cards):** popover deletion (C8), Ask/Edit toggle (C4), patch staging (C5), `#` injection (C6). +✓ terminal-join — `return-done` transition consumes `semantic-satisfied` + instead of `done-spec`. PlanDoneAccepted (= `completed` place) is + topologically unreachable without semantic satisfaction. -#### C3a — Server: `listSecondaryChatsForSpecification` + bundle field +✓ assess-semantic-action — `assess-semantic` key added to ActionHandlers. + Fake factory provides a default that always returns { satisfied: true }. -- **Status:** **done** (2026-05-15) — list helper, `SecondaryChatWithKickoff` type, bundle `secondaryChats` field, and Zod schema all landed. -- **What:** New helper `listSecondaryChatsForSpecification(db, specId) → SecondaryChatWithKickoff[]` returns secondary chats (rows with `parent_chat_id IS NOT NULL`) with each chat's first kickoff turn (or null). `readSpecificationStateProjection` includes the projected `secondaryChats` field; `specificationStateSchema` extended with `secondaryChatStateSchema`. -- **Verification:** `npm run verify` — 100 test files / 1283 tests pass. New tests cover empty/single/multi-spec scoping, kickoff turn population, missing-kickoff null fallback, primary-chat exclusion, and bundle inclusion via `getSpecificationState`. +✓ contract-tests-updated — All existing contract test call-order assertions + include the new assess-semantic step. All 26 tests pass. -#### C3b — `` standalone component +✓ semantic-gate-fail-test — New contract test: assess-semantic returns + { satisfied: false } → slice re-enters TDD loop. If it keeps failing, + slice halts. -- **Status:** **done** (2026-05-15) — component + tests landed; mounting deferred to C3c (where there's a real consumer to drive it). -- **What:** New `src/client/components/secondary-chat-collapsible.tsx` renders a Radix-`Collapsible`-backed secondary chat surface. Header always renders; body shows the kickoff turn's `assistant_parts` and is collapsed by default. Supports `kickoffTurn=null` (renders an empty body when expanded). -- **Verification:** `npm run verify` — 101 test files / 1287 tests pass. New tests in `src/client/components/__tests__/secondary-chat-collapsible.test.tsx` cover header presence, collapsed-by-default, expand-on-click reveals content, and empty-body fallback for missing kickoff. -- **Scope adjustment from original C3b:** mounting in `-continuous-workspace-view.tsx` deferred to C3c. Reason: `WorkspaceTranscriptArtifacts` (556 LOC) is the actual turn-render seam; threading the collapsible through it is invasive enough to merit landing alongside the trigger that creates the rows in the first place. Building mounting now without a creation flow would require fixture-seeding side-channels. - -#### C3c-route — Server: `POST /api/specifications/:id/secondary-chats` - -- **Status:** **done** (2026-05-15) — route + handler landed; client wiring + view mounting deferred to C3c-mount and C3c-wire. -- **What:** New `src/server/secondary-chat-route.ts` exports `handleCreateSecondaryChatRequest(db, req, res)`. Body schema: `{ parentChatId, invokedInTurnId, itemKind, itemId, spanHint? }`. Validates spec exists, validates body shape, resolves the item via `getKnowledgeItem` (rejects if missing or wrong kind/spec), calls `createSecondaryChat` + `createKickoffTurn`, returns `{ chatId, kickoffTurnId }`. Kickoff content templated as `Anchored to ''.` (with `, focused on ''` when provided) — minimal V1 wording; richer per-mode templates from UNIFIED_CHAT_UX.md §6 land alongside C4 (Ask/Edit toggle). -- **Verification:** `npm run verify` — 101 test files / 1292 tests pass. New tests in `src/server/app.test.ts` cover happy path with bundle round-trip, span-hint persistence, 400 on bad body, 404 on missing spec, and 404 on missing item. - -#### C3c-mount — View: thread `secondaryChats` through to `` mounting - -- **Status:** **done** (2026-05-15) — controller projects a `secondaryChatsByInvokedTurnId: ReadonlyMap` from `specificationState.secondaryChats`; view threads it into [WorkspaceTranscriptArtifacts](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/%24id/_view/-workspace-transcript-artifacts.tsx); a new `getArtifactAnchorTurnId` helper resolves the anchor turn id for each artifact kind (`answered-turn`, `prefaced-question`, `answered-review-turn`, `answered-revision-review`, `collapsed-review-turn`, `accepted-closure`, `persisted-turn`, `active-prefaced-question`, `phase-summary`); `` instances are rendered in a `data-testid="secondary-chats-for-turn-{id}"` slot beneath each matching artifact. -- **What:** `WorkspaceTranscriptArtifacts` accepts a `secondaryChatsByInvokedTurnId` map prop and renders `` after each turn artifact whose id matches a key. `-continuous-workspace-controller.ts` projects `specificationState.secondaryChats` into the map and threads it through; `-continuous-workspace-view.tsx` passes it to the artifacts renderer. -- **Acceptance:** fixture-seeded secondary chat appears under the right turn; collapsed by default; no orphan render when the parent turn is unrendered. All three covered by tests. -- **Verification:** `npm run verify` — 102 test files / 1295 tests pass; build clean. New tests: - - [`-workspace-transcript-artifacts.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/%24id/_view/__tests__/-workspace-transcript-artifacts.test.tsx) — 4 tests covering inline rendering after the matching turn, collapsed-by-default, no-orphan when the anchor turn isn't in the stream, and multiple chats per turn. - - [`-continuous-workspace-view.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/%24id/_view/__tests__/-continuous-workspace-view.test.tsx) — added prop-threading test asserting the controller's `secondaryChatsByInvokedTurnId` reaches the artifacts renderer by reference. - -#### C3c-wire — Client: trigger that calls the C3c-route POST + invalidates bundle - -- **Status:** **done** (2026-05-15) — `useCreateSecondaryChatMutation` mutation + `SecondaryChatTriggerProvider` context landed in [src/client/components/secondary-chat-trigger.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/secondary-chat-trigger.tsx); provider is mounted in `route.tsx` alongside `SideChatHost`; `ItemActionRail` in [-structured-list-view.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/%24id/-structured-list-view.tsx) gains an `Open inline chat` button (`data-graph-action="open-inline-chat"`, MessagesSquare icon) alongside the existing chat-with popover trigger. `specificationSchema` now exposes `primary_chat_id` (nullable+optional for transition) so the client can resolve the parent chat without a new endpoint. -- **What:** New `useCreateSecondaryChatMutation(specificationId)` hook posts to `/api/specifications/:id/secondary-chats` with `{ parentChatId, invokedInTurnId, itemKind, itemId, spanHint? }` and invalidates the bundle on success. `SecondaryChatTriggerProvider` reads `specificationState.specification.primary_chat_id` (parent) + `active_turn_id` (anchor) and exposes a `create({ kind, id })` callback through `useSecondaryChatTrigger()`. The button is disabled when either is missing or while a create is in flight. -- **Acceptance:** clicking the new trigger creates a secondary chat and reveals an inline collapsible (via C3c-mount) without disturbing the existing popover path. Verified by mutation tests + bundle invalidation; UI button surfaces alongside (not replacing) the chat-with popover trigger. -- **Verification:** `npm run verify` — 103 test files / 1302 tests pass; build clean. New tests in [`secondary-chat-trigger.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/secondary-chat-trigger.test.tsx) cover canCreate=true happy path, canCreate=false when `primary_chat_id` or `active_turn_id` is missing, POST payload shape, bundle invalidation on success, and no-POST when canCreate is false. - -### C4 — Ask / Edit mode toggle on secondary chats - -- **Status:** **done** (2026-05-15) — `mode` column added to `chat` (nullable text enum `explore | edit`); `createSecondaryChat` defaults to `'explore'`; new `setSecondaryChatMode` helper + `PATCH /api/specifications/:id/secondary-chats/:chatId/mode` route; `secondaryChatStateSchema.chat.mode` propagates through the bundle; `SecondaryChatCollapsible` gains an Ask/Edit toggle (sibling to the trigger to avoid nested-button); a thin `SecondaryChatCollapsibleWithMode` wrapper subscribes to `useSetSecondaryChatModeMutation` and bundle invalidation. -- **What:** Mode toggle (Ask = `explore`, Edit = `edit`) with per-mode tool sets via `getSideChatTools(mode)`; persist mode on the chat (column-based, smallest viable storage). The actual streaming-with-tools wiring for secondary chats remains a follow-up — C4 lands persistence + UI selection. `getSideChatTools(mode)` is unchanged and continues to gate edit tools when called with `chat.mode`. -- **Why fifth:** Re-establishes V3.1 functional parity for side-chat editing. -- **Verification:** `npm run verify` — 103 test files / 1317 tests pass; build clean. New tests: - - [`chat-substrate.test.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/server/chat-substrate.test.ts) — default mode='explore', explicit mode='edit', `setSecondaryChatMode` updates + invariants (rejects non-secondary chats and missing chats). - - [`app.test.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/server/app.test.ts) — PATCH happy path with bundle round-trip, 400 on invalid mode, 404 on cross-spec chatId, 404 when targeting the primary interview chat. - - [`secondary-chat-collapsible.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/secondary-chat-collapsible.test.tsx) — toggle reflects persisted mode, falls back to explore when null, click invokes `onSetMode`, no-op when clicking active mode, disabled while pending or read-only. -- **Harvest:** `getSideChatTools(mode)` (unchanged), V3.1 mode plumbing pattern (Ask/Edit semantics). -- **Out of scope (deferred to C5):** wiring the persisted mode into the secondary-chat streaming pipeline + edit-tool registration; in-thread patch staging. - -### C5 — In-thread patch staging on secondary chats - -C5 has been split into three sub-cards (C5a / C5b / C5c) for verifiable thin slices. Original "What" preserved below for reference. +✓ adapter-test-updated — Net shape adapter tests updated for new place + and transition counts. +``` -- **C5 original What:** Port #138's in-thread staged-patches strip onto the chat substrate. Patches stay turn artifacts; accepted mutations still flow through Brunch-owned handlers (no new source of semantic truth). -- **Why sixth:** Closes the Edit-mode loop end-to-end. -- **Verification (umbrella):** staging/apply/cancel tests on a secondary chat; regression on the V3.1 side-chat edit flow; `npm run verify` green at C5c. -- **Harvest:** [side-chat-route.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/server/side-chat-route.ts), [side-chat-prompt.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/server/side-chat-prompt.ts), [side-chat-stream.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/client/lib/side-chat-stream.ts), [side-chat-host.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/side-chat-host.tsx) (staging strip render), [patch-list-host.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/patch-list-host.tsx) + [patch-list-reducer.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/patch-list-reducer.ts) (`pendingPatches` plumbing). - -#### Cross-cutting design decision (Shape A — patch-list partition seam) - -C5c needs `PatchListProvider` to keep one global event log while letting each secondary chat see *only its own* staged patches. **Decision (this thread):** add `producerChatId: number | null` to `PatchBase` and expose a new `usePatchListForChat(chatId)` hook that filters the staged slice and scopes apply/discard/editSummary to that chat's patch ids. Existing `usePatchList()` keeps current behavior (popover sees all patches; safe during transition). Reducer logic is unchanged; the partition lives at the selector layer. C8 (popover retirement) deletes the legacy `producerChatId === null` branch. +### Verification Approach -Considered alternatives and rejected: -- **Shape B (one provider per chat):** N reducers + N applier injections; popover and inline use disjoint logs. -- **Shape C (`Map` reducer):** principled but over-engineered for V1's "popover + N inline" reality; large reducer churn. -- **Shape D (no shared abstraction):** inline duplicates the popover machinery until C8. - -Shape A wins on Ousterhout's depth test (one new field + one new hook hides the partitioning concern) and is forward-compatible with A71's future server `appendPatch(spec, patch[])` signature. - -#### C5a — Server: secondary-chat streaming endpoint + edit-tool registration - -- **Status:** **next** -- **What:** New server seam `POST /api/specifications/:id/secondary-chats/:chatId/messages` (or equivalent — confirm naming during build) that resolves the chat by id, validates `chat.parent_chat_id IS NOT NULL`, calls `getSideChatTools(chat.mode)` to gate `propose_edit` / `propose_edge` / `propose_drill_down` on Edit mode, streams an assistant turn under the secondary chat using the existing SSE shape from `side-chat-route.ts`, and persists user/assistant turns under the secondary chat's `chat_id`. Reuse `side-chat-prompt.ts` for system instructions; per-mode kickoff template enrichment (deferred from C4) lands here as a side-effect of touching the prompt path. -- **Boundary crossings:** HTTP route → spec/chat lookup → `getSideChatTools(mode)` → AI SDK stream → `appendTurn(chat_id, role, parts)`. Same shape as `side-chat-route.ts`, scoped to secondary chats. -- **Risks/assumptions:** - - RISK: `side-chat-route.ts` may have popover-specific assumptions baked in (e.g. anchor item lookup from request body) → MITIGATION: read it once before mirroring; lift only the streaming/tools shell, not the request envelope. - - ASSUMPTION: secondary chats stream into `assistant_parts` of a freshly-created turn under the secondary chat (mirrors interview chat shape) → VALIDATE: round-trip oracle (POST a message, GET the bundle, see the new turn under `secondaryChats[i].turns`). May require extending the `SecondaryChatState` bundle to include turns beyond the kickoff — confirm during build. -- **Acceptance:** - - ✓ POST with mode=`explore` streams an assistant turn; bundle round-trip surfaces the new turn under the secondary chat. - - ✓ POST with mode=`edit` registers edit tools; SSE event for `propose_edit` is emitted. - - ✓ POST against a primary chat returns 404 (refuses non-secondary chats — same invariant as PATCH mode route). - - ✓ POST against a missing chat returns 404. - - ✓ Existing `POST /side-chat` (popover) regression unaffected. -- **Verification:** Inner — Vitest integration tests in `app.test.ts` covering happy paths + 404 invariants + tool gating. Middle — round-trip oracle (POST → GET bundle → assert turn presence). No outer-loop verification at this slice. -- **Out of scope:** client composer (C5b); staging strip (C5c); per-chat patch list partition (C5c). - -#### C5b — Client: composer + stream consumer for inline secondary chats - -- **Status:** **next** (after C5a) -- **What:** - 1. Promote a `` component (per the C0–C4 review finding #1) that owns *all* per-chat mutation/streaming hooks and renders `` with the wired props. Replaces the current `SecondaryChatCollapsibleWithMode` wrapper. Wires: - - `useSetSecondaryChatModeMutation(chatId)` (existing) - - `useSecondaryChatStream(chatId)` (new — wraps the C5a SSE response into staged turns + activity) - 2. Add a small composer (text input + Send) inside the collapsible body, posting to C5a and reusing `side-chat-stream.ts` parser. - 3. Render the chat's existing turns under the collapsible body (kickoff first, then user/assistant pairs). -- **Boundary crossings:** `` → `useSecondaryChatStream` → `fetch` POST → SSE parser → derived turn list → `` body. -- **Risks/assumptions:** - - RISK: `SecondaryChatState` bundle currently exposes only `chat` + `kickoffTurn`; rendering subsequent turns needs either a per-chat `turns: Turn[]` field on the bundle or a separate `useSecondaryChatTurns(chatId)` query → MITIGATION: extend the bundle if cheap (preferred), else add a per-chat turn-list query. - - ASSUMPTION: Existing `side-chat-stream.ts` parser is generic enough to consume the C5a response without forking → VALIDATE: read the parser once during build; fork only if the SSE event vocabulary diverges. -- **Acceptance:** - - ✓ Typing in the composer + Send POSTs to C5a and renders the streaming assistant turn live in the collapsible body. - - ✓ After stream completes, bundle invalidation reveals the persisted turn unchanged on next mount. - - ✓ `` replaces `SecondaryChatCollapsibleWithMode` in `-workspace-transcript-artifacts.tsx` with no regression in the C4 mode-toggle tests. - - ✓ Multiple secondary chats can be composed against in parallel without state cross-talk (no shared in-flight ref). -- **Verification:** Inner — happy-dom Vitest covering composer → POST → stream consumption → derived turn list. Middle — bundle round-trip after stream ends. Reuse `secondary-chat-collapsible.test.tsx` patterns for harness. -- **Out of scope:** patch staging strip (C5c); patch list partition (C5c); typing-while-streaming queue. - -#### C5c — Per-chat patch staging strip + partition seam - -- **Status:** **next** (after C5b) -- **What:** Land the Shape A partition seam (above) and surface the staged-patches strip *inside* ``'s collapsible body, scoped to the host's chat id. - 1. **Reducer change:** add `producerChatId: number | null` to `PatchBase` and `StagePatchInput`. Existing call sites (popover, manual tests) pass `null`. - 2. **Provider change:** new `usePatchListForChat(chatId)` hook that returns the filtered staged slice + scoped actions (apply/discard/editSummary auto-filter by chat id; apply uses `patchIds` derived from the slice). - 3. **Stream wire-up:** C5b's `useSecondaryChatStream(chatId)` translates `propose_*` SSE tool calls into `actions.stage({ ...patch, producerChatId: chatId })`. - 4. **UI:** harvest `SideChatPopover`'s staged-patches strip render shape (`stagedPatches`, `onApply`, `onUndo`, `` for `edit` patches, ``) into a `` component mounted inside ``'s collapsible body. -- **Boundary crossings:** SSE stream → `usePatchListForChat(chatId).actions.stage` → reducer event log → `usePatchListForChat(chatId).staged` → strip UI → `actions.apply()` → existing `makeEditApplier` (unchanged). -- **Risks/assumptions:** - - RISK: existing call sites (popover, side-chat-host derived state at lines 578–602) need `producerChatId: null` threaded through without semantic change → MITIGATION: type the field as required-but-nullable on `PatchBase`; let the type system surface every site. - - RISK: undo currently reverses the last apply batch globally; per-chat undo could cross chats if a popover apply followed an inline apply → MITIGATION: for V1 ship per-`apply()`-batch undo (chat scope is implicit because each chat's apply only touches its own patch ids); document the invariant in the reducer header. - - ASSUMPTION: `` and `` are reusable as-is outside the popover → VALIDATE: read both during build; lift to a shared location if needed (no new abstraction unless the second caller forces it). -- **Acceptance:** - - ✓ Staging an `edit` proposal during streaming surfaces it in the host's strip; popover does NOT see it via `usePatchList()` (filter excludes per-chat patches by default — adjust if popover-during-transition wants the full union view). - - ✓ Apply on the strip mutates the anchor item via `makeEditApplier`; undo reverses it; bundle round-trip reflects the change. - - ✓ Popover staging path (V3.1) is unaffected: existing side-chat tests pass with `producerChatId: null`. - - ✓ Two open inline secondary chats can stage edits in parallel; each strip shows only its own patches. -- **Verification:** Inner — reducer/state unit tests for `producerChatId` filtering; per-chat hook unit tests; popover regression in `side-chat-host.test.tsx`. Middle — round-trip: stage → apply → bundle reflects mutation. Outer — manual: open two inline secondary chats, stage edits in each, apply one, verify the other strip is untouched. (Capture in the C10 walkthrough.) -- **Out of scope:** rendering staged patches as turn artifacts (deferred — patches stay UI state, not turn-persisted, until a future card promotes them); cross-chat undo; deletion of `usePatchList()` (waits for C8). - -##### Order discipline - -C5a (server) → C5b (client composer + host) → C5c (partition + strip). Sequential because C5b consumes C5a's response shape; C5c's stream wire-up plugs into C5b's host. None of C5b's interface should change based on C5a build findings beyond response-shape details (those are absorbed in `useSecondaryChatStream`); C5c's interface is independent of either earlier slice. - -### C6 — `#` knowledge-item symbol injection (V1 surface only) - -- **What:** Implement `#REF-CODE` resolution in the secondary-chat composer that inserts an item context snapshot artifact into the next turn. **No** autocomplete chip; **no** `$` secondary-chat mention symbol; **no** snapshot builder lifecycle (those are Track 5 / `chat-context-provision`). Use a server-owned resolver scoped to the specification per `CONVERSATIONAL_WORKSPACE_RUNTIME.md` §3.5. -- **Why seventh:** Provides the V1 structured way to add item context, replacing the ad-hoc V3.1 anchoring path for in-flight mentions. -- **Verification:** resolver unit tests for valid/missing/ambiguous codes; turn-snapshot insertion test; manual walkthrough. - -### C7 — Agent-run inline rendering + `chat.kind` decision - -- **What:** Decide and implement: (a) keep enum at `interview` + `side_chat` and project `agent_run` flavor from `first_turn_role='system'`; (b) add a fifth `chat.kind='agent_run'` enum value. Default posture per HANDOFF: (a). Render agent-run secondary chats inline using the same component from C3. If (b) is chosen, this card carries a follow-up substrate migration. -- **Why eighth:** Agent-run inline is in V1 scope per HANDOFF; deferring to last lets the substrate decision settle after C1–C6 reveal whether projection-only is sufficient. -- **Verification:** agent-run secondary chat renders inline; system-first frontier turn invariant holds; if (b), enum migration applies cleanly. - -### C8 — `SideChatPopover` retirement + `side-chat-host` shrinkage - -- **What:** Delete `SideChatPopover`; shrink `side-chat-host` to its minimal post-popover form (target ~95 LOC per #138's harvest). Remove popover-only routes/state. -- **Why ninth:** Retire only after C3–C7 reach parity over durable secondary chats. -- **Verification:** `npm run verify`; manual regression on side-chat entry from substantive reconciliation rows; ensure no popover code paths remain reachable. - -### C9 — Lightweight reconciliation-element view - -- **Status:** **done** (2026-05-17) — `drizzle/0022_chat_pinned_reconciliation_need.sql` adds the nullable FK column on `chat`; `createSecondaryChat` + the `POST /api/specifications/:id/secondary-chats` payload accept an optional `reconciliationNeedId` (server rejects cross-spec needs with 404); `listSecondaryChatsForSpecification` joins the need + both knowledge items at read time and surfaces a `pinnedReconciliationNeed: { needId, kind, sourceItemId/RefCode/Excerpt, targetItemId/RefCode/Excerpt }` projection on each `SecondaryChat`; `SecondaryChatTriggerItem.reconciliationNeedId` is threaded through `useCreateSecondaryChatMutation`; `PendingReviewSection.handleOpenSideChat` passes `need.id` alongside `target_item_kind`/`target_item_id`; `SecondaryChatCollapsible` renders a small `data-testid="secondary-chat-reconciliation-panel"` band (kind label + per-endpoint ref code + truncated excerpt) when the field is populated. Other trigger paths (StructuredListView, etc.) are unchanged and continue to omit `reconciliationNeedId`. -- **What:** When a secondary chat is opened with a reconciliation context (entry bridge from a substantive reconciliation row), render a minimal "elements being reconciled" panel inside the secondary chat surface. **Not** the full target-grouped / classifier-state UX from the brief — that's Track 3 (`reconciliation-runtime`). `PendingReviewSection` retirement stays Track 3's job. -- **Verification:** `npm run verify` — 104 test files / 1252 tests pass; build clean. New tests: - - [`app.test.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/server/app.test.ts) — POST persists `pinned_reconciliation_need_id`, bundle round-trip surfaces `pinnedReconciliationNeed` with `kind` + source/target ref-code & excerpt joins; cross-spec need id returns 404. - - [`secondary-chat-collapsible.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/secondary-chat-collapsible.test.tsx) — panel renders kind label + source/target ref codes & excerpts when populated; no panel when `pinnedReconciliationNeed` is null. - - [`pending-review-section.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/pending-review-section.test.tsx) — assertion updated to include `reconciliationNeedId: need.id` in the substantive `Open side-chat` trigger payload. - -### C10 — Substrate verification + initial PR draft - -- **Status:** **done** (2026-05-17). `npm run verify` green at 4dc1083d (104 test files / 1252 tests pass; build clean). The substrate hypothesis behind SPEC.md A94 (durable secondary chats over chat/turn with no `thread` table) is satisfied. PR description drafted (below). -- **Note (2026-05-18):** V1 closure has since been re-scoped to include the unified chat shell (C11–C16); the PLAN.md `V1 done` status set by this card was rolled back. **The verification snapshot and the SPEC.md A94 evidence stay valid** — only the "this closes V1" framing moves to C16, which also rewrites the PR description below. -- **What:** Full `npm run verify`; outer-loop walkthrough of the side-chat V3.1 capability matrix on the new substrate; confirm SPEC.md A94 is satisfied; update PLAN.md frontier status; draft PR description. - -#### PR description (draft) - -**Title:** `FE-716: Walking skeleton chat runtime — inline secondary chats over chat/turn` - -**Body:** - -> **What** -> -> Lands V1 of the Conversational Workspace Runtime Track 2 (`chat-runtime-secondary-chats`): every behavior the V3.1 side-chat ships today, surfaced through the elevated unified-workspace shape from `docs/design/UNIFIED_CHAT_UX.md`, on the typed AI-SDK UIMessage protocol shared with the interview spine. Durable side-chats are now durable secondary chats over the existing `chat`/`turn` substrate; the legacy `SideChatPopover` is retired; lightweight reconciliation entry now renders inline; the bespoke side-chat SSE envelope is retired in favor of `useChat` + `createUIMessageStream`; the `thread` table remains deferred per A94. -> -> **Substrate (no new tables)** -> -> - `chat.parent_chat_id`, `chat.invoked_in_turn_id`, `chat.pinned_item_id`, `chat.pinned_span_hint`, `chat.mode`, `chat.pinned_reconciliation_need_id` (drizzle/0020, 0021, 0022). No enum changes; secondary chats are projected from `parent_chat_id IS NOT NULL`. -> - Shared chat types (`src/shared/chat.ts`) register `propose_edit | propose_edge | propose_drill_down` on `BrunchUITools` and an `edit-impact` data part on `BrunchDataParts` so the secondary-chat surface composes against the same typed-UIMessage substrate as the interview spine. -> -> **Server** -> -> - `createSecondaryChat`, `createKickoffTurn`, `appendSecondaryChatTurn`, `setSecondaryChatMode`, `listSecondaryChatsForSpecification` in `specification-store.ts`. -> - `POST /api/specifications/:id/secondary-chats` (create), `PATCH …/mode` (mode toggle), `POST …/messages` (UIMessage protocol — `validateUIMessages` body, `createUIMessageStream` response, `streamText(...).toUIMessageStream(...)` merged into the writer, `data-edit-impact` written after `await result.finishReason` keyed by `toolCallId`, `#REF-CODE` mention resolution preserved). `getSideChatTools(mode)` still gates edit tools. -> - Bundle hydrates `secondaryChats[*]` with kickoff turn, post-kickoff turns, pinned-item kind, and joined reconciliation-need projection. -> -> **Client** -> -> - `SecondaryChatTriggerProvider` + `useSecondaryChatTrigger()` exposes one `create({ kind, id, spanHint?, reconciliationNeedId? })` callback + an `inlineChatRoute` descriptor so non-transcript callers can navigate to the transcript view. -> - `` mounts `useChat` per chat with a `DefaultChatTransport` pointed at the C24b route; walks `messages` for `tool-propose_*` parts (dedupe by `toolCallId`) and joins `data-edit-impact` via `onData` so edit proposals stage with the correct `impact` tier. The bespoke `src/client/lib/secondary-chat-stream.ts` is deleted. -> - `` renders the kickoff card, ai-elements `` / `` / `` for turn rendering, `` live-state for streaming, `` composer with leading-edge mode chip, turn-zero ``, `#`-mention autocomplete popup via cmdk, staged-patches strip slot, and the C9 "Elements being reconciled" panel. -> - Patch-list partitioning by `producerChatId` (Shape A) — `usePatchListForChat(chatId)` returns a per-chat staged slice while the legacy popover hook keeps the global view; `` mounts inside the collapsible body. -> - Triggers: `PendingReviewSection` substantive row + `StructuredListView` item-action rail both call into `useSecondaryChatTrigger()`; `SideChatPopover` and `SideChatHost` are deleted. -> -> **Verification** -> -> - `npm run verify` — 109 test files / 1299 tests pass; build clean. -> - Coverage spans schema invariants (including `BrunchUITools` / `brunchDataPartSchemas` admission of secondary-chat surfaces), route happy-paths + 404 invariants, UIMessage envelope round-trip + bundle round-trip, partition-seam reducer + per-chat hook tests, popover-regression sweeps, the C9 reconciliation-panel render, and `useChat`-mount isolation across parallel secondary chats. -> -> **Deferred (parking lot — follow-up frontiers)** -> -> `$` mention symbol, snapshot builder family, item-version-gated handle refresh, full target-grouped reconciliation UX, `PendingReviewSection` retirement, QA composer refinements, strategy sub-chat UI, layout-state header control, and C7 agent-run inline rendering (the substrate is ready; no producer exists yet). Persisting secondary-chat assistant turns as `parts: BrunchAssistantPart[]` (currently plain text) is also deferred — the UIMessage protocol carries the parts on the wire today but persistence stays text-only until a future frontier needs the structured shape. -> -> **Stacking** -> -> Stacked on `ln/fe-709-reconciliations` (PR #139). Restack on `main` once #139 lands. - -### C11 — Strip inline-under-turn rendering + retire "Secondary chat" label - -- **Status:** **done** (2026-05-18) — controller no longer projects `secondaryChatsByInvokedTurnId`; `WorkspaceTranscriptArtifacts` drops the projection prop, `getArtifactAnchorTurnId` helper, and `` mounting (no chat surface beneath turn artifacts); `SecondaryChatCollapsible` header renders `` (PencilLine + "Edit" / MessageCircleQuestion + "Ask", `data-testid="secondary-chat-kind-chip"`, `data-kind="edit" | "ask"`) instead of the literal "Secondary chat" label. Tests updated as planned. -- **What:** Tear out the inline-under-turn mounting so the unified shell (C12) can host secondary chats instead: - - Remove the `secondaryChatsByInvokedTurnId` projection from [-continuous-workspace-controller.ts](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/_view/-continuous-workspace-controller.ts) and stop threading it through [-continuous-workspace-view.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/_view/-continuous-workspace-view.tsx). - - Remove `` rendering and the `getArtifactAnchorTurnId` helper from [-workspace-transcript-artifacts.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/_view/-workspace-transcript-artifacts.tsx); the artifacts renderer drops the `secondaryChatsByInvokedTurnId` prop. - - Replace the literal `"Secondary chat"` header label in `` with a kind chip per `UNIFIED_CHAT_UX.md` §8 (`PencilLine` for Edit, `MessageCircleQuestion` for Ask) — neutral chrome + subtle accent only on the kind chip per §7 dec 3. -- **Tests:** delete `inline rendering after the matching turn` / `no-orphan` / `multiple chats per turn` cases from `-workspace-transcript-artifacts.test.tsx`; drop the controller projection test for the map; update `secondary-chat-collapsible.test.tsx` to assert the kind chip in the header instead of the "Secondary chat" string. -- **Out of scope:** the unified shell itself (C12); layout modes (C13); trigger wire-up (C14); motion (C15). -- **Verification:** `npm run verify` green; no orphan calls into the removed projection; the workspace transcript no longer renders any chat surface beneath turn artifacts. - -### C12 — `` skeleton (Side-docked default) - -- **Status:** **done** (2026-05-18) — `src/client/components/unified-chat-shell.tsx` lands as a peer of `` inside [_view/route.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/_view/route.tsx); the shell reads `useSpecificationBundleData()`, renders a header (spec name spine label + four layout-mode buttons + close affordance) and a body listing every active `secondaryChats[*]` (already returned in `chat.id` ascending order from `listSecondaryChatsForSpecification`) as `` collapsibles. The shell defaults to side-docked at ~50% width; the workspace center (existing Outlet + EntitySidebar) reflows into the left 50%. Spine resolution: the shell is a *lightweight spine indicator + secondary-chats slot*, not a re-mounted transcript — the workspace center remains the canonical transcript + composer surface. Tests in [`unified-chat-shell.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/unified-chat-shell.test.tsx) cover header presence, default mode, empty-state, host order, close↔expand round-trip, and layout-mode callback forwarding. -- **What:** New `src/client/components/unified-chat-shell.tsx` mounted in the specification route as a peer to ``. The shell renders: - - The **interview spine** (the primary chat's transcript) as its always-visible body — sourced from the same bundle the workspace center already reads. - - **Active secondary chats** for the spec as inline collapsibles inside the shell body (ordered by `chat.created_at` ascending — confirm during build), using the existing `` per chat. No "Secondary chat" label; kind chip from C11. - - A **header strip** with a layout-mode toggle (buttons present but inert until C13) and a close affordance that switches the shell to a collapsed bar. -- **Mounting:** default layout state **Side-docked** (~50% width right rail per `UNIFIED_CHAT_UX.md` §4). Workspace center reflows to remaining width. The shell is a sibling of `` inside `route.tsx`'s layout, not a child of it. -- **Out of scope:** localStorage persistence (C13); width/mode transitions (C13); trigger auto-expand (C14); motion (C15). -- **Verification:** shell renders the interview spine + lists all active secondary chats; nothing renders under turn artifacts; existing transcript scrolls in the workspace center pane; build + test green. - -### C13 — Layout modes + header control + localStorage - -- **Status:** **done** (2026-05-18) — new `useChatLayoutMode(specificationId)` hook in [`use-chat-layout-mode.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/use-chat-layout-mode.ts) persists the chosen mode under per-spec localStorage key `brunch:chat-layout-mode:{id}`, defaulting to `side-docked`; document-level Esc keydown decrements one tier via the exported `decrementChatLayoutMode` helper (Full → Maximize → Side-docked → Compact, no-op below). [_view/route.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/_view/route.tsx) gains three layout components: `ResizableLayout` (50/50 for Side-docked, 30/70 for Maximize; `key={mode}` remounts the ResizablePanelGroup on mode change for clean defaultSizes), `CompactLayout` (floating dock 360–420 px bottom-right, workspace center fills), `FullLayout` (chat at 100%, center hidden). 9-test suite in [`use-chat-layout-mode.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/use-chat-layout-mode.test.tsx) covers default, persistence, rehydration, junk rejection, Esc tier walk, defaultPrevented skip, and per-spec switching. **Open question kept open:** default stays Side-docked; revisit Compact-as-default only if walkthrough surfaces friction. -- **What:** Implement the four layout states from `UNIFIED_CHAT_UX.md` §4: - - **Compact** — small floating dock, ~360–420 px. - - **Side-docked** *(default)* — right rail, ~50% width. - - **Maximize** — wide center, ~70% with rails. - - **Full** — 100% workspace. -- New `useChatLayoutMode(specificationId)` hook backed by `localStorage` (key per workspace; default Side-docked). Header strip in the shell renders four mode buttons; current mode highlighted. **Esc** decrements one tier per §10. -- **Out of scope:** motion (C15); mode chip on the composer (deferred); suggestions row (deferred). -- **Verification:** four modes render at correct footprints; workspace center reflows correctly; toggle persists across reload; Esc steps the mode down. -- **Open question (resolve in build):** brief defaults to Side-docked, but Compact is closer to the retired V3.1 popover footprint. Keep Side-docked unless walkthrough surfaces friction — revisit in C16. - -### C14 — Trigger wire-up: open shell + auto-expand new chat - -- **Status:** **done** (2026-05-18) — new `ChatShellPresenceProvider` ([`chat-shell-presence.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/chat-shell-presence.tsx)) supplies `{ isCollapsed, expand, collapse, focusedChatId, focusChat, clearFocus, jumpToAnchor }`; mounted in [parent route.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/$id/route.tsx) above `` so the trigger can `focusChat(response.chatId)` after a successful create (expands shell + sets focused id). `SecondaryChatCollapsible` gained controlled `open`/`onOpenChange` props plus an `onJumpToAnchor` handler that renders a `Crosshair`-iconed "Jump" button (data-testid `secondary-chat-jump-to-anchor`) when the chat carries an `invoked_in_turn_id`. `SecondaryChatHost` watches `focusedChatId === chatId` and auto-opens its collapsible via the controlled open prop. `WorkspaceArtifactRow` accepts `anchorTurnId` and exposes `data-anchor-turn-id`; threaded through `answered-turn`, `prefaced-question`, `answered-review-turn`, `answered-revision-review`, `accepted-closure`, `persisted-turn`, `active-prefaced-question`. `jumpToAnchor` does `document.querySelector('[data-anchor-turn-id="X"]')?.scrollIntoView({ behavior: 'smooth' })` plus a 1.5 s ring highlight. Tests in [`chat-shell-presence.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/chat-shell-presence.test.tsx) cover trigger → expand + focus, Jump button rendering and scroll dispatch, absence when `invoked_in_turn_id` is null, and auto-open on focus. -- **What:** Extend `useSecondaryChatTrigger().create()` (or add a sibling effect inside the shell) so that creating a secondary chat: - 1. Ensures the shell is visible (if user collapsed it to a bar, expand to its last layout mode). - 2. Auto-expands the newly-created chat's collapsible inside the shell. - 3. Adds a "Jump to anchor" link in the collapsible header that scrolls the workspace center pane to `invoked_in_turn_id` (highlight briefly). -- Trigger sites (`PendingReviewSection`, `StructuredListView`) are unchanged externally. -- **Verification:** clicking the trigger from either site opens the shell with the new chat expanded; reconciliation-pinned chats still render the C9 panel inside; jump-to-anchor scrolls correctly; reload keeps the persisted chat (no regression on substrate); per-chat collapse state stays component-local. - -### C15 — Motion + spring transitions - -- **Status:** **done** (2026-05-18) — `motion` v12.38.0 was already a dep; no new install required. New [`use-prefers-reduced-motion.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/use-prefers-reduced-motion.ts) hook + exported `CHAT_SHELL_SPRING` constant (mass 0.6, stiffness 220, damping 30 per §7 dec 5). `SecondaryChatCollapsible` wraps the streaming-assistant text in a `motion.div` that pulses opacity at ~1.4s (per §8 live-state); pulse collapses to `opacity: 1` when reduced-motion is requested. `UnifiedChatShell` switches its root containers to `motion.div` with spring fade-ins and uses `` with `layout` per secondary-chat-host wrapper for smooth add/remove transitions; all transitions short-circuit to `{ duration: 0 }` under reduced-motion. Tests in [`use-prefers-reduced-motion.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/use-prefers-reduced-motion.test.tsx) cover canonical spring config, matchMedia true/false branches, and missing-matchMedia fallback. -- **What:** Wire `motion` (Framer Motion) per `UNIFIED_CHAT_UX.md` §7 dec 5 / §8: - - Spring on collapsible expand/collapse: mass 0.6, stiffness 220, damping 30, ~250 ms. - - Animate shell width across layout-mode changes. - - Streaming live-state pulse on the kickoff card. -- Confirm `framer-motion` dep state before adding; honor `prefers-reduced-motion` to disable springs. -- **Verification:** transitions feel smooth across all four modes; no layout thrash during workspace reflow; reduced-motion preference disables springs. - -### C16 — V1 closure (unified shell) + verification + PR description rewrite - -- **Status:** **done** (2026-05-18) — supersedes C10 as the V1 closeout. `npm run verify` green: 108 test files / 1273 tests pass; build clean; only the 6 pre-existing `rendered is declared but never used` warnings in `InterviewView.test.tsx` (not introduced here). `memory/PLAN.md` frontier `chat-runtime-secondary-chats` status updated to **V1 done** in both the Sequencing list and the Frontier Definition. PR description draft rewritten below to reflect the full V1 surface (substrate + unified shell). PR submits once #139 merges or per Lu's signal. -- **Outer-loop walkthrough (deferred to operator):** the mechanical four-mode walkthrough across Compact ↔ Side-docked ↔ Maximize ↔ Full, with one open secondary chat from each trigger site (`PendingReviewSection` substantive row + `StructuredListView` item-action rail), localStorage round-trip across reload, reconciliation panel rendering inside the C9 band, and staging strip scoped per chat — performed by the human operator before clicking "Ready for review". The unit/integration coverage above asserts each mechanism in isolation; the outer-loop run confirms the integrated UX. - -#### PR description (final draft, supersedes C10) - -**Title:** `FE-716: Walking skeleton chat runtime — durable secondary chats + unified chat shell` - -**Body:** - -> **What** -> -> Lands V1 of Conversational Workspace Runtime Track 2 (`chat-runtime-secondary-chats`): every behavior the V3.1 side-chat shipped, now surfaced through the layoutable unified chat shell from `docs/design/UNIFIED_CHAT_UX.md`. Durable side-chats become durable secondary chats over the existing `chat`/`turn` substrate; the legacy `SideChatPopover` is retired; the inline-under-turn rendering from the earlier substrate slice is replaced by a peer chat surface with Compact / Side-docked / Maximize / Full layout modes. The `thread` table stays deferred per A94. -> -> **Substrate (no new tables)** -> -> - `chat.parent_chat_id`, `chat.invoked_in_turn_id`, `chat.pinned_item_id`, `chat.pinned_span_hint`, `chat.mode`, `chat.pinned_reconciliation_need_id` (drizzle/0020, 0021, 0022). No enum changes; secondary chats are projected from `parent_chat_id IS NOT NULL`. -> -> **Server** -> -> - `createSecondaryChat`, `createKickoffTurn`, `appendSecondaryChatTurn`, `setSecondaryChatMode`, `listSecondaryChatsForSpecification` in `specification-store.ts`. -> - `POST /api/specifications/:id/secondary-chats` (create), `PATCH …/mode` (mode toggle), `POST …/messages` (streaming SSE with `getSideChatTools(mode)` edit-tool gating + `#REF-CODE` mention resolution). -> - Bundle hydrates `secondaryChats[*]` with kickoff turn, post-kickoff turns, pinned-item kind, and joined reconciliation-need projection. -> -> **Client — substrate (C0–C9)** -> -> - `SecondaryChatTriggerProvider` + `useSecondaryChatTrigger()` exposes one `create({ kind, id, spanHint?, reconciliationNeedId? })` callback + an `inlineChatRoute` descriptor. -> - `` wires per-chat mutation/streaming hooks; `` renders the kickoff card, mode toggle, composer, streaming assistant, staged-patches strip slot, and the C9 reconciliation panel. -> - Patch-list partitioning by `producerChatId` (Shape A) — `usePatchListForChat(chatId)` returns a per-chat staged slice; `` mounts inside the collapsible body. -> - Triggers: `PendingReviewSection` substantive row + `StructuredListView` item-action rail both call `useSecondaryChatTrigger()`; `SideChatPopover` and `SideChatHost` are deleted. -> -> **Client — unified shell (C11–C15)** -> -> - C11 — Inline-under-turn rendering retired. `WorkspaceTranscriptArtifacts` no longer mounts secondary chats; the controller no longer projects `secondaryChatsByInvokedTurnId`. `SecondaryChatCollapsible` renders a kind chip (`PencilLine` = Edit, `MessageCircleQuestion` = Ask) instead of the literal "Secondary chat" label. -> - C12 — `` mounts in `_view/route.tsx` as a peer of ``. Header (spec-name spine indicator + four layout-mode buttons + close affordance) + body (active secondary chats as `` collapsibles, id-ascending order). The workspace center remains the canonical transcript+composer surface; the shell is the spine indicator + secondary-chats slot. -> - C13 — `useChatLayoutMode(specificationId)` persists Compact / Side-docked / Maximize / Full under per-spec localStorage; default Side-docked. Esc decrements one tier (Full → Maximize → Side-docked → Compact, no-op below) per §10. Each mode has its own layout component: ResizableLayout (50/50 or 30/70), CompactLayout (floating dock 360–420 px), FullLayout (100%). -> - C14 — `` provides `expand`/`focusChat`/`jumpToAnchor`. The trigger calls `focusChat(response.chatId)` on successful create so the shell expands and the new chat auto-opens. `` renders a Jump-to-anchor button when `invoked_in_turn_id` is set; `WorkspaceArtifactRow` exposes `data-anchor-turn-id` on rendered turn rows so jumps scroll into view with a brief highlight ring. -> - C15 — `motion` springs (mass 0.6 / stiffness 220 / damping 30 per §7 dec 5); streaming live-state pulse on the secondary-chat streaming text per §8; AnimatePresence on the chat list for smooth add/remove. `usePrefersReducedMotion` short-circuits every animation to a duration-0 step per §10. -> -> **Verification** -> -> - `npm run verify` — 108 test files / 1273 tests pass; build clean. -> - Coverage spans schema invariants, route happy-paths + 404 invariants, SSE chunk round-trip + bundle round-trip, partition-seam reducer + per-chat hook tests, popover-regression sweeps, the C9 reconciliation panel render, the unified shell skeleton + layout-mode persistence + Esc decrement + presence-focused auto-expand + jump-to-anchor scroll dispatch, and the prefers-reduced-motion hook. -> -> **Deferred (parking lot — follow-up frontiers)** -> -> `$` mention symbol, mention autocomplete, snapshot builder family, item-version-gated handle refresh, full target-grouped reconciliation UX, `PendingReviewSection` retirement, QA composer refinements, strategy sub-chat UI, mode chip + Shift+Tab toggle on the composer, suggestions row per mode, per-kind kickoff copy variations, item-anchored badge in structured-list / graph view, Ladle prototype, C7 agent-run inline rendering (the substrate is ready; no producer exists yet). -> -> **Stacking** -> -> Stacked on `ln/fe-709-reconciliations` (PR #139). Restack on `main` once #139 lands. - -### C17 — Hide Full + Minimize/Maximize toggle + close vs minimize semantics - -- **Status:** **done** (2026-05-18) — four iterations on the same day per walkthrough feedback. Final shape: - - **Header layout-button row (left → right):** Minimize · Side-docked · Compact↔Maximize-toggle. The toggle is a single button whose icon + click target flip with state (Maximize2 when not maxed → click to Maximize; Minimize2 when maxed → click to Compact). Pressed state lights when current mode is compact or maximize. - - **Full mode hidden** entirely. Substrate type union + `CHAT_LAYOUT_MODE_ORDER` intact; `useChatLayoutMode` clamps any persisted `'full'` to `'maximize'` so older reloads stay safe. - - **Close (X) vs Minimize semantics** lifted into `ChatShellPresenceProvider` as `appearance: 'expanded' | 'minimized' | 'closed'`: - - `X` close (top-right of header) → `presence.close()` → shell renders `null`. The route layout drops the shell's panel slot so the workspace center fills the freed space (no empty pane). - - `Minimize` (first in layout-button row, `Minus` icon) → `presence.minimize()` → fixed bottom-right "Ask Brunch" pill with `Send` icon. Pill click restores via `presence.expand()`. Context persists (the chat substrate is untouched). - - Trigger-driven `focusChat()` always restores `appearance='expanded'` so creating a chat re-opens a closed shell. - - **Route layout:** `_view/route.tsx` consults `presence.isCollapsed` first. When collapsed, the workspace center renders at full width and the shell mounts at root (renders pill or null). When expanded, the layout dispatches per `layoutMode` (compact dock / resizable / full) as before. - - **Comments cleanup:** stripped the FE-716-C17 narrative comments and the C12/C13/C14/C15 docstring sections per "remove unnecessary comments" direction. -- **What:** User direction (2026-05-18): three layout modes (Compact / Side-docked / Maximize) exposed as a Minimize + Side-docked + Compact↔Maximize-toggle row; X closes the shell entirely (workspace reclaims the space, no empty pane); Minimize sends the shell to a bottom-right pill while preserving chat state. - - [`unified-chat-shell.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/unified-chat-shell.tsx) — extend `LAYOUT_MODE_BUTTONS` entries with an optional `disabled: true` flag; mark `'full'` disabled; OR the render's `disabled` prop with the per-button flag; add a `title="Coming soon"` (or similar) hint. - - [`use-chat-layout-mode.ts`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/use-chat-layout-mode.ts) — clamp persisted `'full'` to `'maximize'` on read (and rewrite storage); refuse to write `'full'` (silently clamp). The type stays `'compact' | 'side-docked' | 'maximize' | 'full'` so the substrate can re-enable later without a migration. -- **Tests:** update [`unified-chat-shell.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/unified-chat-shell.test.tsx) — Full button is rendered but always `disabled`; clicking it does not fire `onLayoutModeChange`. Update [`use-chat-layout-mode.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/__tests__/use-chat-layout-mode.test.tsx) — persisted `'full'` reads back as `'maximize'`; `setLayoutMode('full')` clamps to `'maximize'`; Esc decrement chain stays valid mechanically but starts from `'maximize'`. -- **Out of scope:** removing `'full'` from the union type or `CHAT_LAYOUT_MODE_ORDER`; rewriting motion transitions; re-styling Maximize. -- **Verification:** `npm run verify` green. - -### C18 — Single scratch chat per spec + click-to-anchor injection - -- **Status:** **done** (2026-05-18) — substrate pivot landed. `npm run verify` green: 108 test files / 1278 tests pass; build clean. - - **Migration `0023_chat_anchored_items.sql`:** adds `chat.anchored_item_ids text NOT NULL DEFAULT '[]'` (JSON array). Schema column added; journal updated. - - **Server helpers:** `findScratchSecondaryChat(db, specId, parentChatId)` returns the unique per-spec scratch (one with `parent_chat_id = primary AND pinned_reconciliation_need_id IS NULL`). `appendAnchorToScratchChat(db, specId, input)` is the public entry: find-or-create the scratch (carries `invoked_in_turn_id` from the first call), parse anchored ids, no-op if itemId already present, else push id + append a mode-aware kickoff turn ("Editing ''." for edit-mode, "Anchored to ''." otherwise). Returns `{ chat, kickoffTurnId, anchoredItemIds }`. - - **Route repointed:** `POST /api/specifications/:id/secondary-chats` now branches — when `reconciliationNeedId` is set, falls through to the existing dedicated-chat path (preserves FE-716 C9 reconciliation chat behavior); otherwise calls `appendAnchorToScratchChat`. Response shape: `{ chatId, kickoffTurnId | null, anchoredItemIds }`. - - **Bundle projection:** `SecondaryChatWithKickoff.anchoredItemIds: number[]` derived from the new column; threaded through `core.ts` and the Zod `secondaryChatStateSchema`. The first kickoff turn is what `listSecondaryChatsForSpecification` already returns via `limit(1)`, so subsequent anchor kickoffs are recorded in the substrate but invisible to the UI (Option b from the open question). - - **Shell filter:** [unified-chat-shell.tsx](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/unified-chat-shell.tsx) picks the single scratch chat (`pinned_reconciliation_need_id === null`) from `secondaryChats` and renders one `` only. Reconciliation-pinned chats stay in the bundle data but the shell hides them until Track 3 defines their UX. `AnimatePresence` dropped since only one chat renders. - - **Tests:** all existing fixtures gained `anchoredItemIds: []` (bulk perl insert across four test files + manual for the populated reconciliation case). The shell test that asserted "renders one host per chat" now asserts "renders only the scratch chat" with multiple secondary chats in the bundle. Server tests (`app.test.ts`, `secondary-chat-route.test.ts`) continue to pass — the `invoked_in_turn_id` invariant and mode-aware "Editing" kickoff verb both preserved by threading those through to `appendAnchorToScratchChat`. -- **What:** Behavioral pivot away from "one secondary chat per item-click" to "one persistent scratch chat per spec, items injected as anchors over time." - - **Migration `drizzle/0023_chat_anchored_items.sql`** — add `chat.anchored_item_ids text NOT NULL DEFAULT '[]'` (JSON array of knowledge-item ids). Index not required at this volume. - - **Server:** - - `getOrCreateScratchSecondaryChat(db, specId, primaryChatId)` — find-or-create the unique secondary chat for the spec (uniqueness enforced at create time; identified as the one with `parent_chat_id = primary`). First-call also seeds the chat with `pinned_item_id` from the inbound itemId so the existing C9 reconciliation-pin and prompt context paths keep working unchanged. - - `appendAnchorToScratchChat(db, specId, primaryChatId, { itemId, itemKind, spanHint? })` — parses `anchored_item_ids`, pushes the id if absent, persists; appends a kickoff-style turn (`Anchored to ''.`) but the UI filters post-first kickoffs (see Open question below). - - Repoint `POST /api/specifications/:id/secondary-chats` from "create new chat per click" to call `appendAnchorToScratchChat`. Response shape extends to `{ chatId, kickoffTurnId, anchoredItemIds }`. - - Bundle: `secondaryChatStateSchema.chat.anchoredItemIds: number[]` projected from the new column. - - **Client:** - - `useSecondaryChatTrigger().create()` keeps its public signature; just hits the rebound route. Call sites (`StructuredListView`, `PendingReviewSection`) need no change. - - [`unified-chat-shell.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/unified-chat-shell.tsx) line 69 — filter `secondaryChats` to the scratch chat (the one whose `parent_chat_id = primary_chat_id`; substrate uniqueness guarantees ≤1). The `AnimatePresence` list renders one host max. - - Composer stays. Ask/Edit toggle stays. No mention chip; no expansion popout. -- **Tests:** - - Server: `chat-substrate.test.ts` — `getOrCreateScratchSecondaryChat` is idempotent; `appendAnchorToScratchChat` appends idempotently (no duplicate ids); `anchored_item_ids` survives reload. - - Server: `app.test.ts` — first POST creates the scratch chat + seeds anchor; second POST against a different item appends to the existing scratch chat (no new row in `chat`); response carries `anchoredItemIds`. - - Client: `unified-chat-shell.test.tsx` — given two secondary chats in the bundle (legacy + scratch), only the scratch chat renders. - - Client: existing `secondary-chat-trigger.test.tsx` — POST payload + bundle invalidation still pass (the public signature is unchanged); add an assertion that two `create` calls against different items produce one chat row. -- **Out of scope:** workspace selection styling (C19); deleting legacy per-item chats from existing local DBs (pre-release posture per `CLAUDE.md` — operator nukes `.brunch/brunch.db`); `chat_anchor` join table (not needed for V1). -- **Verification:** `npm run verify` green; manual probe: open spec, click dash on item A → scratch chat appears; click dash on item B → same scratch chat, both items in `anchoredItemIds`; reload → state persists. -- **Open question (resolve in build):** does the per-anchor kickoff-style turn render in the transcript or stay hidden? - - **(a)** Visible — self-documenting context history; slightly noisier. - - **(b)** Hidden — filter out post-first kickoffs at render time; cleaner UI. - - **Default per user direction:** (b). Substrate still records the turns; UI just doesn't show post-first kickoffs. Easy to flip later by removing the filter. - -### C19 — Workspace selection styling for anchored items - -- **Status:** **done** (2026-05-18) — promotes C27's left-border foundation to the full selection state: rows whose ids appear in the focused chat's `pinned_item_id` or `anchoredItemIds` render with a 2px left-border + ~10% kind-accent background tint (`${kindAccentHex[item.kind]}1A`), both colours resolved per row against the item's own kind per C19's "matching the item's kind" directive. `ItemActionRail` flips its `open-inline-chat` trigger's `aria-label` between `"Anchored to active chat"` and `"Open inline chat about this item"` (plus `aria-pressed` + `data-chat-anchored` for snapshot/test access); the click handler stays a no-op-equivalent — C26 server-side dedupe makes re-triggering on the anchored item return the existing chat id. Tests: 4 new cases in [`structured-list-view.test.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/routes/specification/%24id/__tests__/structured-list-view.test.tsx) — no chat focused → no `data-graph-row-chat-anchored`; pinned item gets selection styling while siblings don't; `anchoredItemIds` selection mirrors the pinned-item selection; aria-label flip. The bundle + presence mocks were extended with mutable `mockSecondaryChats` / `mockFocusedChatId` so each test can name an active chat without a full QueryClient/Presence harness. `npm run verify` green: 110 test files / 1306 tests pass; build clean. -- **Status (historical):** **next** (after C18) -- **What:** Items in `StructuredListView` whose ids appear in `scratchChat.anchoredItemIds` render with a selected/anchored visual state using `kindAccentHex` from [`knowledge-card.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/knowledge-card.tsx) — subtle background tint + accent border matching the item's kind. Clicking the dash icon on an already-anchored item is idempotent. -- **Optional polish:** a small "Anchored: A12, GOAL3" mini-band in the chat shell's header strip listing current anchor ref-codes. Defer until walkthrough surfaces a need. -- **Tests:** `structured-list-view.test.tsx` — items not in `anchoredItemIds` render without selection class; items in it render with the kind-accented selected class; the dash button label flips between `aria-label="Anchor to chat"` and `aria-label="Anchored"` (or similar). -- **Out of scope:** anchor-removal UX (defer until walkthrough demands it); selection in graph view (defer to a follow-up frontier per the parking lot's item-anchored badge entry). -- **Verification:** `npm run verify` green; outer-loop walkthrough confirming the selection styling matches the chat's anchor state across click + reload + mode toggle. - -## V1 re-narrowing (proposed, 2026-05-18) - -V1 was originally "every behavior the V3.1 side-chat ships today, surfaced through the elevated unified-workspace shape." C20–C25 propose absorbing **ai-elements adoption for the secondary-chat surface** into the same frontier and PR, on the basis that the design brief (`docs/design/UNIFIED_CHAT_UX.md` §Constraints) names ai-elements composition as non-negotiable for the terminal state. The Ladle prototype phases A–D in §13 are explicitly skipped; visual decisions are tested against real workspace state instead. If accepted, V1 = "V3.1 parity through unified shell **+ ai-elements parity with the interview spine**." PLAN.md frontier description for `chat-runtime-secondary-chats` updates to match. Cards land sequentially after C18 / C19 so they target the post-scratch-chat-pivot shape. - -### C20 — Adopt `` + `` for turn rendering - -- **Status:** **done** (2026-05-18) — `npm run verify` green: 1280 tests pass; build clean. - - **Client:** `secondary-chat-collapsible.tsx` now wraps persisted turns + the streaming-assistant pulse in `` → ``. `SecondaryChatTurnRow` renders `` + ``. Assistant text routes through `` (→ `MarkdownRenderer`); user text stays plain `whitespace-pre-wrap`. - - **Tests:** `secondary-chat-collapsible.test.tsx`, `secondary-chat-host.test.tsx`, and `chat-shell-presence.test.tsx` add `vi.mock` shims for `@/client/components/ai-elements/conversation.js` + `message.js` (matching the `InterviewView.test.tsx` pattern). New test in collapsible suite asserts that an assistant turn with `**bold**` renders `bold` (markdown shim in the mock makes it deterministic in happy-dom). - - **Note:** Cards described `` / `` / `` as "already used by `question-cards.tsx` / the interview spine." Reality: those primitives were vendored but unused; only `Reasoning` + `Task` had real consumers. C20 introduces the first real consumer of `` + `` + `` in production code. -- **What:** Replace the bespoke `SecondaryChatTurnRow` (in [`secondary-chat-collapsible.tsx`](file:///Users/kostandin/Projects/hashdev/brunch/src/client/components/secondary-chat-collapsible.tsx) lines 228–244) with the vendored ai-elements `` shell and `` rows already used by `question-cards.tsx` / the interview spine. Wire `streamdown` markdown rendering for assistant `assistant_parts`; user `user_parts` stay plain-text. Keep the existing kickoff-content rendering as-is for one card so the diff stays scoped. -- **Why first:** Smallest delta from the current shape; proves the pattern is portable from interview to secondary chat without a streaming or composer refit. -- **Boundary crossings:** `` body → `` → `` × turns. No new server work; no bundle shape change; no test-mock surface change beyond importing the ai-elements mocks already used in `InterviewView.test.tsx`. -- **Risks / assumptions:** - - ASSUMPTION: `secondary-chat-collapsible.test.tsx` can mock `@/client/components/ai-elements/*` the way `InterviewView.test.tsx` does → VALIDATE: copy the mock pattern; expect a 5–10 line bump in setup. - - RISK: `streamdown` may render trailing whitespace differently than the current `whitespace-pre-wrap` div → MITIGATION: validate the existing `secondary-chat-collapsible.test.tsx` expectations and adjust string assertions to `toContain` rather than `toEqual` if needed. -- **Tests:** existing collapsible tests adapt to new harness; add one test asserting that an assistant turn with markdown (`**bold**`) renders strong instead of literal asterisks. -- **Out of scope:** composer refit (C21); streaming live-state (C22); suggestions (C23); `useChat` (C24); mentions (C25). -- **Verification:** `npm run verify` green; manual walkthrough confirms transcripts render identically to today on a real spec for plain-text turns, with markdown rendered for assistant turns. - -### C21 — Replace composer with `` + leading-edge mode chip - -- **Status:** **done** (2026-05-18) — `npm run verify` green: 1282 tests pass; build clean. - - **Client:** `SecondaryChatComposer` in `secondary-chat-collapsible.tsx` rebuilt on `` + `` + `` + `` + `` + ``. The mode toggle moved from the collapsible header into the composer footer (leading-edge tools slot); `Shift+Tab` inside the textarea flips Ask↔Edit via the textarea's `onKeyDown` (preventDefault). The header retains a read-only `SecondaryChatKindChip` so collapsed state still surfaces kind. Testids preserved (`secondary-chat-composer`, `secondary-chat-composer-input`, `secondary-chat-composer-send`). - - **Tests:** `secondary-chat-collapsible.test.tsx` mode-toggle tests now expand the collapsible and pass `onSubmitMessage` so the composer (and its toggle) mounts; new tests assert (a) toggle is absent without a composer, (b) `Shift+Tab` calls `onSetMode('edit')` from `'explore'`. Submit test switched to `fireEvent.submit` on the form + microtask flush because `PromptInput.onSubmit` `await`s `Promise.all([])` before invoking the user callback. - - **Note:** This is the first real production consumer of `` (the vendored primitives were previously unused outside `InterviewView.test.tsx` mocks). -- **What:** Retire the hand-rolled `SecondaryChatComposer` (`
` + `` + `