From b68326b0233e85409ffd36ed33fd6854ab9a20a3 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 10:51:53 +0200 Subject: [PATCH 01/18] FE-735: Add print snapshot transport shell --- memory/CARDS.md | 153 +++++++++++++++++++++++++++++++++++++ memory/PLAN.md | 8 +- memory/SPEC.md | 6 +- src/brunch.test.ts | 52 +++++++++++++ src/brunch.ts | 70 +++++++++++++++-- src/print-snapshot.test.ts | 70 +++++++++++++++++ src/print-snapshot.ts | 72 +++++++++++++++++ 7 files changed, 421 insertions(+), 10 deletions(-) create mode 100644 memory/CARDS.md create mode 100644 src/brunch.test.ts create mode 100644 src/print-snapshot.test.ts create mode 100644 src/print-snapshot.ts diff --git a/memory/CARDS.md b/memory/CARDS.md new file mode 100644 index 00000000..58710719 --- /dev/null +++ b/memory/CARDS.md @@ -0,0 +1,153 @@ + + +# Scope Cards — mode-shell-and-fixture-driver + +## Orientation + +- **Containing seam:** Brunch transport-mode shell over the existing Pi-backed host/coordinator: CLI dispatch, print/RPC transport adapters, named product handlers, and transcript projections. +- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) on `ln/fe-735-mode-shell-fixture-driver`; this is structural M1 work stacked after `walking-skeleton`. +- **Volatile handoff state:** no `HANDOFF.md`; current uncommitted canonical updates already distinguish transport modes from agent modes/lenses and make M1 print a snapshot renderer. +- **Main open risk:** accidentally letting transport adapters own product semantics, recreate session boot, or introduce a generic read/chat model instead of reusing `WorkspaceSessionCoordinator`, named handlers, and Pi JSONL truth. +- **Frontier obligations:** preserve transport-mode vs agent-mode separation (D23-L); keep `workspace.*` / `session.*` named method families thin over projection handlers (D19-L); keep transcript truth in Pi JSONL with no canonical chat/turn store (D6-L, D12-L, D13-L); establish the replay-regression fixture path without overbuilding property/adversarial layers before graph/coherence substrates exist. + +## Card 1 — Print snapshot transport shell + +**Status:** done + +### Weight + +Full scope card — establishes the transport-mode dispatch seam and the first product-shaped projection path outside TUI. + +### Target Behavior + +`brunch --mode print` exits after rendering the coordinator-derived workspace snapshot. + +### Boundary Crossings + +```text +→ CLI argv +→ Brunch transport-mode dispatcher +→ shared host/workspace bootstrap seam +→ WorkspaceSessionCoordinator +→ product-shaped workspace/session snapshot projection +→ stdout renderer +``` + +### Risks and Assumptions + +- RISK: Print mode reimplements spec/session boot instead of using the coordinator → MITIGATION: make tests inject a coordinator and assert print consumes coordinator states rather than touching stores directly. +- RISK: Snapshot shape becomes a throwaway string instead of reusable product state → MITIGATION: introduce a small typed snapshot/projection function used by the renderer and later RPC handler tests. +- ASSUMPTION: A snapshot can cover `ready`, `select_spec`, and `needs_human` states without running Pi interactive mode → VALIDATE: unit tests for all three states and one CLI smoke test. + +### Acceptance Criteria + +✓ `brunch --mode print` with a ready workspace prints cwd, current spec, session id/file, phase, and chat mode, then exits with code 0. +✓ `brunch --mode print` with no selected spec prints a `select_spec` snapshot without prompting or creating a session. +✓ `brunch --mode print` routes through injected/shared coordinator APIs in tests and does not launch `InteractiveMode`. + +### Verification Approach + +- Inner: unit tests + CLI smoke tests — prove dispatch and snapshot rendering over coordinator states. +- Middle: store-backed smoke in a temp cwd — prove printed ready state corresponds to `.brunch/state.json` and Pi JSONL session binding created by the coordinator. +- Outer: none for this slice. + +### Cross-cutting obligations + +- Keep print as a transport-mode proof-of-life; do not run an agent turn or introduce agent-mode defaults. +- Keep the snapshot projection product-shaped enough for RPC reuse without becoming a generic read model. +- Preserve `WorkspaceSessionCoordinator` as the boot/session-binding owner. + +## Card 2 — Named RPC stdio skeleton + +**Status:** next + +### Weight + +Full scope card — establishes the JSON-RPC transport adapter and first named product method family surface. + +### Target Behavior + +`brunch --mode rpc` serves named workspace/session methods over stdio. + +### Boundary Crossings + +```text +→ CLI argv +→ Brunch transport-mode dispatcher +→ shared host/workspace bootstrap seam +→ named handler registry (`workspace.*`, `session.*`) +→ JSON-RPC stdio adapter +→ client request/response +``` + +### Risks and Assumptions + +- RISK: The RPC shape drifts into a generic data API → MITIGATION: expose only concrete named methods needed by M1 snapshots, e.g. `workspace.snapshot` and `session.snapshot` or one explicitly named equivalent pair. +- RISK: Stdio framing details consume the slice → MITIGATION: implement the smallest JSON-RPC 2.0 request/response loop needed for deterministic tests; subscriptions and streaming remain out of scope. +- ASSUMPTION: M1 can start with request/response methods before first-class subscriptions exist → VALIDATE: contract tests exercise initial state reads; subscription tests are deferred to later frontier acceptance. + +### Acceptance Criteria + +✓ A JSON-RPC stdio client can request the workspace/session snapshot and receive product-shaped state matching print mode's projection. +✓ Unknown methods and invalid params return structured JSON-RPC errors without crashing the process. +✓ RPC mode boots through the same host/coordinator path as print/TUI and does not create a generic `records.*` surface. + +### Verification Approach + +- Inner: handler unit tests — prove named method dispatch and error shapes. +- Middle: stdio contract test — spawn `brunch --mode rpc`, send JSON-RPC requests, assert ordered responses and snapshot parity with print projection. +- Outer: none for this slice. + +### Cross-cutting obligations + +- Preserve JSON-RPC as the primary machine protocol while keeping HTTP/read-model concerns absent. +- Keep handler semantics separate from stdio transport framing so later WebSocket/TUI in-process callers can reuse them. +- Do not bypass the coordinator for session/spec state. + +## Card 3 — Elicitation exchange projection + +**Status:** queued + +### Weight + +Full scope card — establishes the transcript projection unit that fixture capture and observer extraction will rely on. + +### Target Behavior + +A Pi JSONL transcript projects into ordered elicitation exchanges with stable entry ranges. + +### Boundary Crossings + +```text +→ Pi JSONL session file +→ transcript entry loader/parser +→ elicitation exchange projector +→ `session.*` projection handler result +→ tests / fixture-prep caller +``` + +### Risks and Assumptions + +- RISK: Projection overfits current Pi entry shapes or loses raw payload fidelity → MITIGATION: derive types from Pi exports where available and keep raw entry ids/ranges in the projection result. +- RISK: Prompt/response span rules are underspecified for tool/custom entries → MITIGATION: implement the M1 default from SPEC D13-L: prompt side is all system/assistant/tool-side entries since prior user response; response side is user text and/or structured response entries. +- ASSUMPTION: Current Pi JSONL entries expose enough stable identity/order to name ranges for replay fixtures → VALIDATE: synthetic JSONL tests plus at least one coordinator-created session file fixture; if false, route to `jsonl-session-viability` or `ln-spike`. + +### Acceptance Criteria + +✓ Synthetic transcripts with alternating assistant/user spans project into expected prompt-side and response-side entry ranges. +✓ Custom structured elicitation entries are included in the correct prompt or response side without creating chat/turn records. +✓ Empty or incomplete transcripts return an explicit no-open-exchange/empty projection shape rather than inventing ambient chat state. + +### Verification Approach + +- Inner: projection unit tests over synthetic Pi entry arrays — prove span/range behavior. +- Middle: JSONL file round-trip test — load a temp Pi session JSONL file and assert the same projection result from `session.*` handler shape. +- Outer: none for this slice; replay fixture capture consumes this projection in a later card. + +### Cross-cutting obligations + +- Keep Pi JSONL as transcript truth; do not introduce canonical chat or turn tables. +- Preserve elicitation-first semantics: user entries are responses to prompt-side spans, not ambient chat. +- Keep projection handlers as read views over canonical entries, not stores. diff --git a/memory/PLAN.md b/memory/PLAN.md index c275d0cf..78d7c5a7 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -66,16 +66,18 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### mode-shell-and-fixture-driver - **Name:** Mode shell (print + rpc) and first fixture driver -- **Linear:** unassigned +- **Linear:** [FE-735](https://linear.app/hash/issue/FE-735/mode-shell-and-fixture-driver-m1) (sub-issue of FE-702) +- **Branch:** `ln/fe-735-mode-shell-fixture-driver` (stacked on `ln/fe-729-walking-skeleton`) - **Kind:** structural - **Status:** not-started -- **Objective:** Add `--mode print` and `--mode rpc` dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. +- **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. - **Verification:** Inner — verify gate plus projection-handler unit tests for elicitation exchange ranges. Middle — deterministic first captured run, stdio RPC handler contract tests, and replay-regression fixture(s) asserting transcript reproduction/projection parity (SPEC §Oracle Strategy by Loop Tier). Outer — the three-layer fixture model is established in skeleton form here; property and adversarial layers come online as later milestones supply graph/coherence substrates. -- **Cross-cutting obligations:** Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. +- **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) +- **Current execution pointer:** scope queue in `memory/CARDS.md`; next card is “Print snapshot transport shell.” ### jsonl-session-viability diff --git a/memory/SPEC.md b/memory/SPEC.md index f6561cae..69438195 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -122,6 +122,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D10-L — Web client is a native Brunch React app over one WebSocket RPC client.** TanStack Router + TanStack Query + Brunch-owned elicitation/transcript primitives (Vercel AI SDK UI or TanStack AI style). `pi-web-ui` is not reused. The browser is a thin remote head over Brunch RPC method families, not a second product runtime or REST-backed data client. Depends on: D5-L. Supersedes: —. - **D17-L — Brunch semantics ride one event substrate, not parallel channels.** Custom-message transcript entries plus `deliverAs: "nextTurn" | "followUp"` and `prepareNextTurn` are the load-bearing mechanism for structured elicitation prompts/responses, `worldUpdate`, mention-staleness hints, and side-task-result delivery. New product semantics should compose onto this substrate before inventing a second event plane. Depends on: D5-L, D6-L, D12-L, D15-L. Supersedes: —. - **D19-L — Keep transport/read architecture thin: named RPC method families over projection handlers.** Brunch exposes named method families such as `session.*`, `workspace.*`, `graph.*`, and `coherence.*`; each handler projects from the canonical store that owns the fact (Pi JSONL, `.brunch/state.json`, or SQLite graph/change log). Subscriptions are first-class and may provide initial state plus updates, but Brunch must not create a generic read-gateway platform, REST read model, DB-backed chat/turn projection, or canonical cross-store event spine merely to keep clients in sync. Depends on: D5-L, D6-L, D10-L, D16-L. Supersedes: the heavier “unified read gateway” mental model. +- **D23-L — Transport modes are distinct from agent modes and lenses.** TUI, RPC, print, and web are transport modes: ways of driving or observing the same Brunch host through Pi/Brunch harness seams. Agent modes are coarse operational strategies such as `elicitor`, `observer`, `reviewer`, `reconciler`, or future `generalist`; lenses are narrower perspectives such as technical-design, verification-design, or disambiguation that may later be skill-driven. M1 print mode is therefore only a transport proof-of-life: it boots through the same host/coordinator, renders a snapshot of product-shaped state, and exits without running an agent turn. A future single-turn headless print run is deferred until agent-mode selection/defaults are explicit. Depends on: D1-L, D5-L, D19-L, D21-L. Supersedes: overloading “mode” to mean both transport and agent strategy. #### Persistence @@ -199,7 +200,10 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | Term | Definition | | --- | --- | | **Brunch host** | The local process-level authority. Owns `.brunch/` resolution, agent session lifecycle, mode dispatch, and event fanout. | -| **Mode** | One of TUI, web, RPC, print. All four drive the same host; they are presentation surfaces, not separate products. | +| **Transport mode** | One of TUI, web, RPC, print. All four drive the same host; they are presentation/protocol surfaces, not separate products or agent strategies. | +| **Agent mode** | A coarse operational strategy/persona for an agent run, such as `elicitor`, `observer`, `reviewer`, `reconciler`, or a future `generalist`. Agent modes are selected independently from transport modes. | +| **Lens** | A narrower interpretive or task perspective applied within or alongside an agent mode, such as technical-design, verification-design, or disambiguation. Lenses may eventually be driven by skills, but are not part of M1 transport-mode proof. | +| **Print snapshot** | The M1 meaning of the print transport mode: boot the Brunch host, resolve workspace/spec/session state through the coordinator, render product-shaped state, and exit without running an agent turn. | | **Spec** | A specification workspace, identified by its intent-graph root. Lives under `.brunch/`. Multiple specs may coexist per project. | | **Session** | An elicitation transcript belonging to one spec. Backed by a pi JSONL session under `.brunch/sessions/`. A spec may have many sessions over time; a session never changes specs. | | **Session binding** | The first Brunch custom entry in a session that binds the Pi session id to exactly one spec id and schema version. Makes JSONL self-describing; registry/index state is an acceleration, not the canonical binding. | diff --git a/src/brunch.test.ts b/src/brunch.test.ts new file mode 100644 index 00000000..f14fd553 --- /dev/null +++ b/src/brunch.test.ts @@ -0,0 +1,52 @@ +import { describe, expect, it } from "vitest" + +import { runBrunchCli } from "./brunch.js" +import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" + +function coordinator(): WorkspaceSessionCoordinator { + return { + async openExisting() { + return { + status: "select_spec", + cwd: "/tmp/brunch-project", + chrome: { + cwd: "/tmp/brunch-project", + spec: null, + phase: "select_spec", + chatMode: "select-spec", + }, + } + }, + async startOrCreate() { + throw new Error("print must not create a session") + }, + async createNewSessionForCurrentSpec() { + throw new Error("not used") + }, + async bindCurrentSpecToSession() { + throw new Error("not used") + }, + async deriveChromeState() { + throw new Error("not used") + }, + } +} + +describe("Brunch CLI dispatch", () => { + it("routes --mode print through the coordinator snapshot and exits", async () => { + let output = "" + + const code = await runBrunchCli({ + argv: ["--mode", "print"], + cwd: "/tmp/brunch-project", + coordinator: coordinator(), + stdout: (chunk) => { + output += chunk + }, + }) + + expect(code).toBe(0) + expect(output).toContain("status: select_spec") + expect(output).toContain("spec: ") + }) +}) diff --git a/src/brunch.ts b/src/brunch.ts index 97bfeca3..bc381aa8 100644 --- a/src/brunch.ts +++ b/src/brunch.ts @@ -1,13 +1,71 @@ import process from "node:process" +import { fileURLToPath } from "node:url" import { runBrunchTui } from "./brunch-tui.js" +import { + renderWorkspaceSnapshot, + workspaceSnapshotFromState, +} from "./print-snapshot.js" +import { + createWorkspaceSessionCoordinator, + type WorkspaceSessionCoordinator, +} from "./workspace-session-coordinator.js" + +export interface BrunchCliOptions { + argv?: string[] + cwd?: string + coordinator?: WorkspaceSessionCoordinator + stdout?: (chunk: string) => void +} + +export async function runBrunchCli( + options: BrunchCliOptions = {}, +): Promise { + const argv = options.argv ?? process.argv.slice(2) + const cwd = options.cwd ?? process.cwd() + const mode = parseMode(argv) + const coordinator = + options.coordinator ?? createWorkspaceSessionCoordinator({ cwd }) + + if (mode === "print") { + const state = await coordinator.openExisting() + const snapshot = workspaceSnapshotFromState(state) + ;(options.stdout ?? process.stdout.write.bind(process.stdout))( + renderWorkspaceSnapshot(snapshot), + ) + return 0 + } + + if (mode === "tui") { + await runBrunchTui({ cwd, coordinator }) + return 0 + } + + throw new Error(`Unsupported Brunch mode: ${mode}`) +} + +function parseMode(argv: string[]): string { + const modeFlagIndex = argv.indexOf("--mode") + if (modeFlagIndex >= 0) { + return argv[modeFlagIndex + 1] ?? "tui" + } + + const modeEquals = argv.find((arg) => arg.startsWith("--mode=")) + if (modeEquals) { + return modeEquals.slice("--mode=".length) + } + + return "tui" +} async function main(): Promise { - await runBrunchTui({ cwd: process.cwd() }) + process.exitCode = await runBrunchCli() } -main().catch((error: unknown) => { - const message = error instanceof Error ? error.message : String(error) - process.stderr.write(`${message}\n`) - process.exitCode = 1 -}) +if (process.argv[1] === fileURLToPath(import.meta.url)) { + main().catch((error: unknown) => { + const message = error instanceof Error ? error.message : String(error) + process.stderr.write(`${message}\n`) + process.exitCode = 1 + }) +} diff --git a/src/print-snapshot.test.ts b/src/print-snapshot.test.ts new file mode 100644 index 00000000..c7fea3d2 --- /dev/null +++ b/src/print-snapshot.test.ts @@ -0,0 +1,70 @@ +import { describe, expect, it } from "vitest" + +import { + renderWorkspaceSnapshot, + workspaceSnapshotFromState, +} from "./print-snapshot.js" +import type { WorkspaceSessionState } from "./workspace-session-coordinator.js" + +const cwd = "/tmp/brunch-project" + +function readyState(): WorkspaceSessionState { + return { + status: "ready", + cwd, + spec: { id: "spec-1", title: "Alpha spec" }, + session: { + id: "session-1", + file: "/tmp/brunch-project/.brunch/sessions/session-1.jsonl", + manager: {} as WorkspaceSessionState & never, + }, + chrome: { + cwd, + spec: { id: "spec-1", title: "Alpha spec" }, + phase: "elicitation", + chatMode: "responding-to-elicitation", + }, + } +} + +describe("print snapshot", () => { + it("projects and renders a ready workspace without exposing pi internals", () => { + const snapshot = workspaceSnapshotFromState(readyState()) + + expect(snapshot).toEqual({ + status: "ready", + cwd, + spec: { id: "spec-1", title: "Alpha spec" }, + session: { + id: "session-1", + file: "/tmp/brunch-project/.brunch/sessions/session-1.jsonl", + }, + chrome: { + phase: "elicitation", + chatMode: "responding-to-elicitation", + }, + }) + expect(renderWorkspaceSnapshot(snapshot)).toContain("status: ready") + expect(renderWorkspaceSnapshot(snapshot)).toContain( + "spec: Alpha spec (spec-1)", + ) + expect(renderWorkspaceSnapshot(snapshot)).toContain("session: session-1") + }) + + it("renders select-spec as a snapshot instead of prompting", () => { + const snapshot = workspaceSnapshotFromState({ + status: "select_spec", + cwd, + chrome: { + cwd, + spec: null, + phase: "select_spec", + chatMode: "select-spec", + }, + }) + + expect(renderWorkspaceSnapshot(snapshot)).toContain("status: select_spec") + expect(renderWorkspaceSnapshot(snapshot)).toContain("spec: ") + expect(renderWorkspaceSnapshot(snapshot)).not.toContain("session:") + }) +}) diff --git a/src/print-snapshot.ts b/src/print-snapshot.ts new file mode 100644 index 00000000..5943ed6b --- /dev/null +++ b/src/print-snapshot.ts @@ -0,0 +1,72 @@ +import type { WorkspaceSessionState } from "./workspace-session-coordinator.js" + +export interface WorkspaceSnapshot { + status: WorkspaceSessionState["status"] + cwd: string + spec: { + id: string + title: string + } | null + session?: { + id: string + file: string + } + chrome: { + phase: "select_spec" | "elicitation" + chatMode: "select-spec" | "responding-to-elicitation" + } + reason?: string +} + +export function workspaceSnapshotFromState( + state: WorkspaceSessionState, +): WorkspaceSnapshot { + const base = { + status: state.status, + cwd: state.cwd, + spec: state.chrome.spec, + chrome: { + phase: state.chrome.phase, + chatMode: state.chrome.chatMode, + }, + } + + if (state.status === "ready") { + return { + ...base, + spec: state.spec, + session: { id: state.session.id, file: state.session.file }, + } + } + + if (state.status === "needs_human") { + return { ...base, reason: state.reason } + } + + return base +} + +export function renderWorkspaceSnapshot(snapshot: WorkspaceSnapshot): string { + const lines = [ + "Brunch workspace snapshot", + `status: ${snapshot.status}`, + `cwd: ${snapshot.cwd}`, + `spec: ${ + snapshot.spec ? `${snapshot.spec.title} (${snapshot.spec.id})` : "" + }`, + `phase: ${snapshot.chrome.phase}`, + `chatMode: ${snapshot.chrome.chatMode}`, + ] + + if (snapshot.session) { + lines.push( + `session: ${snapshot.session.id}`, + `sessionFile: ${snapshot.session.file}`, + ) + } + if (snapshot.reason) { + lines.push(`reason: ${snapshot.reason}`) + } + + return `${lines.join("\n")}\n` +} From 3fb6ad2eb194775b468b8b4f9e22eec00c85d241 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 10:54:14 +0200 Subject: [PATCH 02/18] FE-735: Add named JSON-RPC stdio skeleton --- memory/CARDS.md | 4 +- src/brunch.test.ts | 28 +++++++++++++ src/brunch.ts | 48 +++++++++++++++++++-- src/rpc.test.ts | 98 +++++++++++++++++++++++++++++++++++++++++++ src/rpc.ts | 102 +++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 274 insertions(+), 6 deletions(-) create mode 100644 src/rpc.test.ts create mode 100644 src/rpc.ts diff --git a/memory/CARDS.md b/memory/CARDS.md index 58710719..74c39c2e 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -61,7 +61,7 @@ Full scope card — establishes the transport-mode dispatch seam and the first p ## Card 2 — Named RPC stdio skeleton -**Status:** next +**Status:** done ### Weight @@ -108,7 +108,7 @@ Full scope card — establishes the JSON-RPC transport adapter and first named p ## Card 3 — Elicitation exchange projection -**Status:** queued +**Status:** next ### Weight diff --git a/src/brunch.test.ts b/src/brunch.test.ts index f14fd553..67903f1b 100644 --- a/src/brunch.test.ts +++ b/src/brunch.test.ts @@ -1,3 +1,5 @@ +import { PassThrough } from "node:stream" + import { describe, expect, it } from "vitest" import { runBrunchCli } from "./brunch.js" @@ -49,4 +51,30 @@ describe("Brunch CLI dispatch", () => { expect(output).toContain("status: select_spec") expect(output).toContain("spec: ") }) + + it("routes --mode rpc through the named JSON-RPC stdio adapter", async () => { + const stdin = new PassThrough() + const stdout = new PassThrough() + const chunks: string[] = [] + stdout.on("data", (chunk) => chunks.push(String(chunk))) + + stdin.end( + `${JSON.stringify({ jsonrpc: "2.0", id: 1, method: "workspace.snapshot" })}\n`, + ) + + const code = await runBrunchCli({ + argv: ["--mode=rpc"], + cwd: "/tmp/brunch-project", + coordinator: coordinator(), + stdin, + stdout, + }) + + expect(code).toBe(0) + expect(JSON.parse(chunks.join(""))).toMatchObject({ + jsonrpc: "2.0", + id: 1, + result: { status: "select_spec" }, + }) + }) }) diff --git a/src/brunch.ts b/src/brunch.ts index bc381aa8..9171edfa 100644 --- a/src/brunch.ts +++ b/src/brunch.ts @@ -1,4 +1,5 @@ import process from "node:process" +import type { Readable, Writable } from "node:stream" import { fileURLToPath } from "node:url" import { runBrunchTui } from "./brunch-tui.js" @@ -6,6 +7,7 @@ import { renderWorkspaceSnapshot, workspaceSnapshotFromState, } from "./print-snapshot.js" +import { createRpcHandlers, runJsonRpcLineServer } from "./rpc.js" import { createWorkspaceSessionCoordinator, type WorkspaceSessionCoordinator, @@ -15,7 +17,8 @@ export interface BrunchCliOptions { argv?: string[] cwd?: string coordinator?: WorkspaceSessionCoordinator - stdout?: (chunk: string) => void + stdin?: Readable + stdout?: Writable | ((chunk: string) => void) } export async function runBrunchCli( @@ -30,9 +33,16 @@ export async function runBrunchCli( if (mode === "print") { const state = await coordinator.openExisting() const snapshot = workspaceSnapshotFromState(state) - ;(options.stdout ?? process.stdout.write.bind(process.stdout))( - renderWorkspaceSnapshot(snapshot), - ) + writeStdout(options.stdout, renderWorkspaceSnapshot(snapshot)) + return 0 + } + + if (mode === "rpc") { + await runJsonRpcLineServer({ + input: options.stdin ?? process.stdin, + output: stdoutStream(options.stdout), + handlers: createRpcHandlers({ coordinator }), + }) return 0 } @@ -44,6 +54,36 @@ export async function runBrunchCli( throw new Error(`Unsupported Brunch mode: ${mode}`) } +function writeStdout( + stdout: Writable | ((chunk: string) => void) | undefined, + chunk: string, +): void { + if (!stdout) { + process.stdout.write(chunk) + } else if (typeof stdout === "function") { + stdout(chunk) + } else { + stdout.write(chunk) + } +} + +function stdoutStream( + stdout: Writable | ((chunk: string) => void) | undefined, +): Writable { + if (!stdout) { + return process.stdout + } + if (typeof stdout !== "function") { + return stdout + } + return { + write(chunk: string | Uint8Array) { + stdout(String(chunk)) + return true + }, + } as Writable +} + function parseMode(argv: string[]): string { const modeFlagIndex = argv.indexOf("--mode") if (modeFlagIndex >= 0) { diff --git a/src/rpc.test.ts b/src/rpc.test.ts new file mode 100644 index 00000000..d107bd6f --- /dev/null +++ b/src/rpc.test.ts @@ -0,0 +1,98 @@ +import { PassThrough } from "node:stream" +import { describe, expect, it } from "vitest" + +import { createRpcHandlers, runJsonRpcLineServer } from "./rpc.js" +import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" + +function coordinator(): WorkspaceSessionCoordinator { + return { + async openExisting() { + return { + status: "ready", + cwd: "/tmp/brunch-project", + spec: { id: "spec-1", title: "Alpha spec" }, + session: { + id: "session-1", + file: "/tmp/brunch-project/.brunch/sessions/session-1.jsonl", + manager: {} as never, + }, + chrome: { + cwd: "/tmp/brunch-project", + spec: { id: "spec-1", title: "Alpha spec" }, + phase: "elicitation", + chatMode: "responding-to-elicitation", + }, + } + }, + async startOrCreate() { + throw new Error("not used") + }, + async createNewSessionForCurrentSpec() { + throw new Error("not used") + }, + async bindCurrentSpecToSession() { + throw new Error("not used") + }, + async deriveChromeState() { + throw new Error("not used") + }, + } +} + +describe("JSON-RPC handlers", () => { + it("serves a named workspace snapshot method", async () => { + const handlers = createRpcHandlers({ coordinator: coordinator() }) + + const result = await handlers.handle({ + jsonrpc: "2.0", + id: 1, + method: "workspace.snapshot", + }) + + expect(result).toMatchObject({ + jsonrpc: "2.0", + id: 1, + result: { + status: "ready", + spec: { id: "spec-1", title: "Alpha spec" }, + session: { id: "session-1" }, + }, + }) + }) + + it("returns structured errors for unknown methods", async () => { + const handlers = createRpcHandlers({ coordinator: coordinator() }) + + await expect( + handlers.handle({ jsonrpc: "2.0", id: 2, method: "records.list" }), + ).resolves.toMatchObject({ + jsonrpc: "2.0", + id: 2, + error: { code: -32601, message: "Method not found" }, + }) + }) + + it("speaks newline-delimited JSON-RPC over streams", async () => { + const input = new PassThrough() + const output = new PassThrough() + const chunks: string[] = [] + output.on("data", (chunk) => chunks.push(String(chunk))) + + const done = runJsonRpcLineServer({ + input, + output, + handlers: createRpcHandlers({ coordinator: coordinator() }), + }) + + input.end( + `${JSON.stringify({ jsonrpc: "2.0", id: 1, method: "workspace.snapshot" })}\n`, + ) + await done + + expect(JSON.parse(chunks.join(""))).toMatchObject({ + jsonrpc: "2.0", + id: 1, + result: { status: "ready" }, + }) + }) +}) diff --git a/src/rpc.ts b/src/rpc.ts new file mode 100644 index 00000000..4395b60b --- /dev/null +++ b/src/rpc.ts @@ -0,0 +1,102 @@ +import { createInterface } from "node:readline/promises" +import type { Readable, Writable } from "node:stream" + +import { workspaceSnapshotFromState } from "./print-snapshot.js" +import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" + +interface JsonRpcRequest { + jsonrpc: "2.0" + id?: string | number | null + method: string + params?: unknown +} + +interface JsonRpcSuccess { + jsonrpc: "2.0" + id: string | number | null + result: unknown +} + +interface JsonRpcFailure { + jsonrpc: "2.0" + id: string | number | null + error: { + code: number + message: string + } +} + +type JsonRpcResponse = JsonRpcSuccess | JsonRpcFailure + +export interface RpcHandlers { + handle(request: unknown): Promise +} + +export function createRpcHandlers(options: { + coordinator: WorkspaceSessionCoordinator +}): RpcHandlers { + return { + async handle(request) { + if (!isJsonRpcRequest(request)) { + return failure(null, -32600, "Invalid Request") + } + + if (request.method === "workspace.snapshot") { + if (request.params !== undefined) { + return failure(request.id ?? null, -32602, "Invalid params") + } + const state = await options.coordinator.openExisting() + return success(request.id ?? null, workspaceSnapshotFromState(state)) + } + + return failure(request.id ?? null, -32601, "Method not found") + }, + } +} + +export async function runJsonRpcLineServer(options: { + input: Readable + output: Writable + handlers: RpcHandlers +}): Promise { + const lines = createInterface({ input: options.input }) + for await (const line of lines) { + if (line.trim().length === 0) { + continue + } + + let parsed: unknown + try { + parsed = (JSON.parse(line) as unknown) + } catch { + options.output.write( + `${JSON.stringify(failure(null, -32700, "Parse error"))}\n`, + ) + continue + } + + const response = await options.handlers.handle(parsed) + options.output.write(`${JSON.stringify(response)}\n`) + } +} + +function success(id: string | number | null, result: unknown): JsonRpcSuccess { + return { jsonrpc: "2.0", id, result } +} + +function failure( + id: string | number | null, + code: number, + message: string, +): JsonRpcFailure { + return { jsonrpc: "2.0", id, error: { code, message } } +} + +function isJsonRpcRequest(value: unknown): value is JsonRpcRequest { + return ( + typeof value === "object" && + value !== null && + (value as { jsonrpc?: unknown }).jsonrpc === "2.0" && + typeof (value as { method?: unknown }).method === "string" + ) +} From 593e0d7a754b708827f6c934afee37b574b618f1 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 10:56:37 +0200 Subject: [PATCH 03/18] FE-735: Project elicitation exchanges from JSONL --- memory/CARDS.md | 153 ------------------------------- memory/PLAN.md | 4 +- src/elicitation-exchange.test.ts | 105 +++++++++++++++++++++ src/elicitation-exchange.ts | 138 ++++++++++++++++++++++++++++ src/rpc.test.ts | 26 ++++++ src/rpc.ts | 20 ++++ 6 files changed, 291 insertions(+), 155 deletions(-) delete mode 100644 memory/CARDS.md create mode 100644 src/elicitation-exchange.test.ts create mode 100644 src/elicitation-exchange.ts diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index 74c39c2e..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,153 +0,0 @@ - - -# Scope Cards — mode-shell-and-fixture-driver - -## Orientation - -- **Containing seam:** Brunch transport-mode shell over the existing Pi-backed host/coordinator: CLI dispatch, print/RPC transport adapters, named product handlers, and transcript projections. -- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) on `ln/fe-735-mode-shell-fixture-driver`; this is structural M1 work stacked after `walking-skeleton`. -- **Volatile handoff state:** no `HANDOFF.md`; current uncommitted canonical updates already distinguish transport modes from agent modes/lenses and make M1 print a snapshot renderer. -- **Main open risk:** accidentally letting transport adapters own product semantics, recreate session boot, or introduce a generic read/chat model instead of reusing `WorkspaceSessionCoordinator`, named handlers, and Pi JSONL truth. -- **Frontier obligations:** preserve transport-mode vs agent-mode separation (D23-L); keep `workspace.*` / `session.*` named method families thin over projection handlers (D19-L); keep transcript truth in Pi JSONL with no canonical chat/turn store (D6-L, D12-L, D13-L); establish the replay-regression fixture path without overbuilding property/adversarial layers before graph/coherence substrates exist. - -## Card 1 — Print snapshot transport shell - -**Status:** done - -### Weight - -Full scope card — establishes the transport-mode dispatch seam and the first product-shaped projection path outside TUI. - -### Target Behavior - -`brunch --mode print` exits after rendering the coordinator-derived workspace snapshot. - -### Boundary Crossings - -```text -→ CLI argv -→ Brunch transport-mode dispatcher -→ shared host/workspace bootstrap seam -→ WorkspaceSessionCoordinator -→ product-shaped workspace/session snapshot projection -→ stdout renderer -``` - -### Risks and Assumptions - -- RISK: Print mode reimplements spec/session boot instead of using the coordinator → MITIGATION: make tests inject a coordinator and assert print consumes coordinator states rather than touching stores directly. -- RISK: Snapshot shape becomes a throwaway string instead of reusable product state → MITIGATION: introduce a small typed snapshot/projection function used by the renderer and later RPC handler tests. -- ASSUMPTION: A snapshot can cover `ready`, `select_spec`, and `needs_human` states without running Pi interactive mode → VALIDATE: unit tests for all three states and one CLI smoke test. - -### Acceptance Criteria - -✓ `brunch --mode print` with a ready workspace prints cwd, current spec, session id/file, phase, and chat mode, then exits with code 0. -✓ `brunch --mode print` with no selected spec prints a `select_spec` snapshot without prompting or creating a session. -✓ `brunch --mode print` routes through injected/shared coordinator APIs in tests and does not launch `InteractiveMode`. - -### Verification Approach - -- Inner: unit tests + CLI smoke tests — prove dispatch and snapshot rendering over coordinator states. -- Middle: store-backed smoke in a temp cwd — prove printed ready state corresponds to `.brunch/state.json` and Pi JSONL session binding created by the coordinator. -- Outer: none for this slice. - -### Cross-cutting obligations - -- Keep print as a transport-mode proof-of-life; do not run an agent turn or introduce agent-mode defaults. -- Keep the snapshot projection product-shaped enough for RPC reuse without becoming a generic read model. -- Preserve `WorkspaceSessionCoordinator` as the boot/session-binding owner. - -## Card 2 — Named RPC stdio skeleton - -**Status:** done - -### Weight - -Full scope card — establishes the JSON-RPC transport adapter and first named product method family surface. - -### Target Behavior - -`brunch --mode rpc` serves named workspace/session methods over stdio. - -### Boundary Crossings - -```text -→ CLI argv -→ Brunch transport-mode dispatcher -→ shared host/workspace bootstrap seam -→ named handler registry (`workspace.*`, `session.*`) -→ JSON-RPC stdio adapter -→ client request/response -``` - -### Risks and Assumptions - -- RISK: The RPC shape drifts into a generic data API → MITIGATION: expose only concrete named methods needed by M1 snapshots, e.g. `workspace.snapshot` and `session.snapshot` or one explicitly named equivalent pair. -- RISK: Stdio framing details consume the slice → MITIGATION: implement the smallest JSON-RPC 2.0 request/response loop needed for deterministic tests; subscriptions and streaming remain out of scope. -- ASSUMPTION: M1 can start with request/response methods before first-class subscriptions exist → VALIDATE: contract tests exercise initial state reads; subscription tests are deferred to later frontier acceptance. - -### Acceptance Criteria - -✓ A JSON-RPC stdio client can request the workspace/session snapshot and receive product-shaped state matching print mode's projection. -✓ Unknown methods and invalid params return structured JSON-RPC errors without crashing the process. -✓ RPC mode boots through the same host/coordinator path as print/TUI and does not create a generic `records.*` surface. - -### Verification Approach - -- Inner: handler unit tests — prove named method dispatch and error shapes. -- Middle: stdio contract test — spawn `brunch --mode rpc`, send JSON-RPC requests, assert ordered responses and snapshot parity with print projection. -- Outer: none for this slice. - -### Cross-cutting obligations - -- Preserve JSON-RPC as the primary machine protocol while keeping HTTP/read-model concerns absent. -- Keep handler semantics separate from stdio transport framing so later WebSocket/TUI in-process callers can reuse them. -- Do not bypass the coordinator for session/spec state. - -## Card 3 — Elicitation exchange projection - -**Status:** next - -### Weight - -Full scope card — establishes the transcript projection unit that fixture capture and observer extraction will rely on. - -### Target Behavior - -A Pi JSONL transcript projects into ordered elicitation exchanges with stable entry ranges. - -### Boundary Crossings - -```text -→ Pi JSONL session file -→ transcript entry loader/parser -→ elicitation exchange projector -→ `session.*` projection handler result -→ tests / fixture-prep caller -``` - -### Risks and Assumptions - -- RISK: Projection overfits current Pi entry shapes or loses raw payload fidelity → MITIGATION: derive types from Pi exports where available and keep raw entry ids/ranges in the projection result. -- RISK: Prompt/response span rules are underspecified for tool/custom entries → MITIGATION: implement the M1 default from SPEC D13-L: prompt side is all system/assistant/tool-side entries since prior user response; response side is user text and/or structured response entries. -- ASSUMPTION: Current Pi JSONL entries expose enough stable identity/order to name ranges for replay fixtures → VALIDATE: synthetic JSONL tests plus at least one coordinator-created session file fixture; if false, route to `jsonl-session-viability` or `ln-spike`. - -### Acceptance Criteria - -✓ Synthetic transcripts with alternating assistant/user spans project into expected prompt-side and response-side entry ranges. -✓ Custom structured elicitation entries are included in the correct prompt or response side without creating chat/turn records. -✓ Empty or incomplete transcripts return an explicit no-open-exchange/empty projection shape rather than inventing ambient chat state. - -### Verification Approach - -- Inner: projection unit tests over synthetic Pi entry arrays — prove span/range behavior. -- Middle: JSONL file round-trip test — load a temp Pi session JSONL file and assert the same projection result from `session.*` handler shape. -- Outer: none for this slice; replay fixture capture consumes this projection in a later card. - -### Cross-cutting obligations - -- Keep Pi JSONL as transcript truth; do not introduce canonical chat or turn tables. -- Preserve elicitation-first semantics: user entries are responses to prompt-side spans, not ambient chat. -- Keep projection handlers as read views over canonical entries, not stores. diff --git a/memory/PLAN.md b/memory/PLAN.md index 78d7c5a7..5f2aefd5 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -69,7 +69,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** [FE-735](https://linear.app/hash/issue/FE-735/mode-shell-and-fixture-driver-m1) (sub-issue of FE-702) - **Branch:** `ln/fe-735-mode-shell-fixture-driver` (stacked on `ln/fe-729-walking-skeleton`) - **Kind:** structural -- **Status:** not-started +- **Status:** in-progress - **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. @@ -77,7 +77,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** scope queue in `memory/CARDS.md`; next card is “Print snapshot transport shell.” +- **Current execution pointer:** initial transport/projection scope queue complete: `--mode print`, `--mode rpc`, `workspace.snapshot`, and `session.elicitationExchanges` are implemented and verified. Next scope should cover fixture-driver capture and brief seeding. ### jsonl-session-viability diff --git a/src/elicitation-exchange.test.ts b/src/elicitation-exchange.test.ts new file mode 100644 index 00000000..ebef4732 --- /dev/null +++ b/src/elicitation-exchange.test.ts @@ -0,0 +1,105 @@ +import { mkdtemp, writeFile } from "node:fs/promises" +import { tmpdir } from "node:os" +import { join } from "node:path" +import { describe, expect, it } from "vitest" + +import { + loadJsonlTranscriptEntries, + projectElicitationExchanges, +} from "./elicitation-exchange.js" + +const assistant = { + id: "a1", + type: "message", + role: "assistant", + content: "Pick one", +} +const structuredPrompt = { + id: "p1", + type: "custom", + customType: "brunch.elicitation_prompt", + data: { choices: ["A", "B"] }, +} +const user = { id: "u1", type: "message", role: "user", content: "A" } +const structuredResponse = { + id: "r1", + type: "custom", + customType: "brunch.elicitation_response", + data: { choice: "A" }, +} + +describe("elicitation exchange projection", () => { + it("projects assistant prompt spans and user response spans with stable ranges", () => { + const exchanges = projectElicitationExchanges([ + { id: "s1", type: "session" }, + assistant, + structuredPrompt, + user, + { id: "a2", type: "message", role: "assistant", content: "Why?" }, + { id: "u2", type: "message", role: "user", content: "Because" }, + ]) + + expect(exchanges).toEqual({ + status: "ready", + exchanges: [ + { + promptRange: { start: "a1", end: "p1" }, + responseRange: { start: "u1", end: "u1" }, + promptEntryIds: ["a1", "p1"], + responseEntryIds: ["u1"], + }, + { + promptRange: { start: "a2", end: "a2" }, + responseRange: { start: "u2", end: "u2" }, + promptEntryIds: ["a2"], + responseEntryIds: ["u2"], + }, + ], + openPrompt: null, + }) + }) + + it("includes structured response entries on the response side", () => { + const projection = projectElicitationExchanges([ + assistant, + user, + structuredResponse, + ]) + + expect(projection.exchanges[0]?.responseEntryIds).toEqual(["u1", "r1"]) + expect(projection.exchanges[0]?.responseRange).toEqual({ + start: "u1", + end: "r1", + }) + }) + + it("returns an explicit empty/open shape for incomplete transcripts", () => { + expect(projectElicitationExchanges([])).toEqual({ + status: "empty", + exchanges: [], + openPrompt: null, + }) + + expect(projectElicitationExchanges([assistant])).toEqual({ + status: "open_prompt", + exchanges: [], + openPrompt: { + promptRange: { start: "a1", end: "a1" }, + promptEntryIds: ["a1"], + }, + }) + }) + + it("loads newline-delimited Pi transcript entries from disk", async () => { + const dir = await mkdtemp(join(tmpdir(), "brunch-jsonl-")) + const file = join(dir, "session.jsonl") + await writeFile( + file, + `${JSON.stringify(assistant)}\n${JSON.stringify(user)}\n`, + ) + + const entries = await loadJsonlTranscriptEntries(file) + + expect(projectElicitationExchanges(entries).exchanges).toHaveLength(1) + }) +}) diff --git a/src/elicitation-exchange.ts b/src/elicitation-exchange.ts new file mode 100644 index 00000000..6f073f6f --- /dev/null +++ b/src/elicitation-exchange.ts @@ -0,0 +1,138 @@ +import { readFile } from "node:fs/promises" + +const STRUCTURED_RESPONSE_TYPES = new Set([ + "brunch.elicitation_response", + "brunch.action_response", + "brunch.choice_response", +]) + +export interface EntryRange { + start: string + end: string +} + +export interface ElicitationExchange { + promptRange: EntryRange + responseRange: EntryRange + promptEntryIds: string[] + responseEntryIds: string[] +} + +export interface OpenPromptProjection { + promptRange: EntryRange + promptEntryIds: string[] +} + +export interface ElicitationExchangeProjection { + status: "empty" | "open_prompt" | "ready" + exchanges: ElicitationExchange[] + openPrompt: OpenPromptProjection | null +} + +interface TranscriptEntry { + id: string + type?: string + role?: string + customType?: string +} + +export async function loadJsonlTranscriptEntries( + file: string, +): Promise { + const content = await readFile(file, "utf8") + return content + .split("\n") + .filter((line) => line.trim().length > 0) + .map((line) => JSON.parse(line) as unknown) +} + +export function projectElicitationExchanges( + entries: unknown[], +): ElicitationExchangeProjection { + const exchanges: ElicitationExchange[] = [] + let promptIds: string[] = [] + let responseIds: string[] = [] + + for (const entry of entries) { + if (!isTranscriptEntry(entry)) { + continue + } + + if (isPromptSideEntry(entry)) { + flushResponse() + promptIds.push(entry.id) + continue + } + + if (isResponseSideEntry(entry)) { + responseIds.push(entry.id) + } + } + + flushResponse() + + if (promptIds.length > 0) { + return { + status: "open_prompt", + exchanges, + openPrompt: { + promptRange: rangeFor(promptIds), + promptEntryIds: promptIds, + }, + } + } + + return { + status: exchanges.length === 0 ? "empty" : "ready", + exchanges, + openPrompt: null, + } + + function flushResponse(): void { + if (promptIds.length === 0 || responseIds.length === 0) { + return + } + + exchanges.push({ + promptRange: rangeFor(promptIds), + responseRange: rangeFor(responseIds), + promptEntryIds: promptIds, + responseEntryIds: responseIds, + }) + promptIds = [] + responseIds = [] + } +} + +function rangeFor(ids: string[]): EntryRange { + return { start: ids[0]!, end: ids[ids.length - 1]! } +} + +function isTranscriptEntry(value: unknown): value is TranscriptEntry { + return ( + typeof value === "object" && + value !== null && + typeof (value as { id?: unknown }).id === "string" + ) +} + +function isPromptSideEntry(entry: TranscriptEntry): boolean { + if (entry.type === "custom" && entry.customType?.includes("prompt")) { + return true + } + return ( + entry.role === "assistant" || + entry.role === "system" || + entry.role === "tool" + ) +} + +function isResponseSideEntry(entry: TranscriptEntry): boolean { + if (entry.role === "user") { + return true + } + return ( + entry.type === "custom" && + STRUCTURED_RESPONSE_TYPES.has(entry.customType ?? "") + ) +} diff --git a/src/rpc.test.ts b/src/rpc.test.ts index d107bd6f..612bc230 100644 --- a/src/rpc.test.ts +++ b/src/rpc.test.ts @@ -1,3 +1,6 @@ +import { writeFile, mkdtemp } from "node:fs/promises" +import { tmpdir } from "node:os" +import { join } from "node:path" import { PassThrough } from "node:stream" import { describe, expect, it } from "vitest" @@ -60,6 +63,29 @@ describe("JSON-RPC handlers", () => { }) }) + it("serves session elicitation exchanges from a Pi JSONL file", async () => { + const dir = await mkdtemp(join(tmpdir(), "brunch-rpc-")) + const sessionFile = join(dir, "session.jsonl") + await writeFile( + sessionFile, + `${JSON.stringify({ id: "a1", type: "message", role: "assistant", content: "Question" })}\n${JSON.stringify({ id: "u1", type: "message", role: "user", content: "Answer" })}\n`, + ) + const handlers = createRpcHandlers({ coordinator: coordinator() }) + + await expect( + handlers.handle({ + jsonrpc: "2.0", + id: 3, + method: "session.elicitationExchanges", + params: { file: sessionFile }, + }), + ).resolves.toMatchObject({ + jsonrpc: "2.0", + id: 3, + result: { status: "ready", exchanges: [{ promptEntryIds: ["a1"] }] }, + }) + }) + it("returns structured errors for unknown methods", async () => { const handlers = createRpcHandlers({ coordinator: coordinator() }) diff --git a/src/rpc.ts b/src/rpc.ts index 4395b60b..4c7a0196 100644 --- a/src/rpc.ts +++ b/src/rpc.ts @@ -1,6 +1,10 @@ import { createInterface } from "node:readline/promises" import type { Readable, Writable } from "node:stream" +import { + loadJsonlTranscriptEntries, + projectElicitationExchanges, +} from "./elicitation-exchange.js" import { workspaceSnapshotFromState } from "./print-snapshot.js" import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" @@ -49,6 +53,14 @@ export function createRpcHandlers(options: { return success(request.id ?? null, workspaceSnapshotFromState(state)) } + if (request.method === "session.elicitationExchanges") { + if (!isSessionProjectionParams(request.params)) { + return failure(request.id ?? null, -32602, "Invalid params") + } + const entries = await loadJsonlTranscriptEntries(request.params.file) + return success(request.id ?? null, projectElicitationExchanges(entries)) + } + return failure(request.id ?? null, -32601, "Method not found") }, } @@ -92,6 +104,14 @@ function failure( return { jsonrpc: "2.0", id, error: { code, message } } } +function isSessionProjectionParams(value: unknown): value is { file: string } { + return ( + typeof value === "object" && + value !== null && + typeof (value as { file?: unknown }).file === "string" + ) +} + function isJsonRpcRequest(value: unknown): value is JsonRpcRequest { return ( typeof value === "object" && From 3f3f12b79c0300a461863bdc3d601f158f3abc29 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:08:16 +0200 Subject: [PATCH 04/18] FE-735: Project real Pi JSONL exchanges --- memory/CARDS.md | 239 +++++++++++++++++++++++++++++++ src/elicitation-exchange.test.ts | 62 +++++++- src/elicitation-exchange.ts | 22 ++- src/rpc.test.ts | 2 +- 4 files changed, 312 insertions(+), 13 deletions(-) create mode 100644 memory/CARDS.md diff --git a/memory/CARDS.md b/memory/CARDS.md new file mode 100644 index 00000000..3809ec5e --- /dev/null +++ b/memory/CARDS.md @@ -0,0 +1,239 @@ + + +# Scope Cards — FE-735 Review Fixes and Next M1 Slices + +## Orientation + +- **Containing seam:** M1 transport/projection seam: CLI transport-mode dispatch, named JSON-RPC handlers, coordinator-owned workspace/session state, Pi JSONL transcript projection, and fixture capture. +- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) remains the tracker/branch boundary. The initial print/RPC/projection cards landed, but review found two blocker defects in the projection/RPC seam. +- **Volatile state:** No `HANDOFF.md`; current review found that synthetic tests pass while real `SessionManager` JSONL projects as empty, and that public `session.elicitationExchanges` currently accepts an arbitrary filesystem path. +- **Main open risk:** fixture-driver work would encode the wrong transcript model if it proceeds before real Pi JSONL projection and coordinator-owned session access are fixed. +- **Frontier obligations:** keep transport modes distinct from agent modes/lenses (D23-L); keep named RPC methods product-shaped, not generic filesystem/data reads (D5-L, D19-L); keep Pi JSONL as transcript truth without chat/turn tables (D6-L, D12-L, D13-L, I10-L); derive/import/project TypeScript shapes from owning seams rather than duplicating Pi state spaces. + +## Blocking instruction + +Complete Cards 1 and 2 before any fixture-driver or brief-capture work. Card 3 may follow immediately after those blockers. Cards 4–5 are the next M1 delivery slices after the seam is trustworthy. + +## Card 1 — Real Pi JSONL elicitation projection + +**Status:** done + +### Weight + +Full scope card — corrects the transcript projection seam that M1 fixture capture depends on. + +### Target Behavior + +A `SessionManager`-created assistant→user JSONL transcript projects to one ready elicitation exchange. + +### Boundary Crossings + +```text +→ Pi SessionManager JSONL file +→ JSONL loader +→ Pi transcript-entry projection boundary +→ elicitation exchange projector +→ session projection result +``` + +### Risks and Assumptions + +- RISK: Tests keep using synthetic top-level `role` entries and miss Pi's real nested `message.role` shape → MITIGATION: add a test that writes the transcript through `SessionManager.appendMessage`, reloads the JSONL file, and asserts the exchange projection. +- RISK: Fixing real Pi message entries accidentally loses custom structured prompt/response support → MITIGATION: keep custom-entry tests, but make their shape match Pi custom entries and classify custom entries separately from message entries. +- RISK: Orphan user/response entries before any prompt are later paired with an unrelated prompt → MITIGATION: ignore unmatched response-side entries or return an explicit unmatched/invalid diagnostic shape; do not silently attach them to later prompts. +- ASSUMPTION: Pi exports enough session entry/message types to avoid restating the message state space → VALIDATE: import/project from Pi exported types where available; if not exported, keep a narrow local runtime projection from `unknown` and document it as a trust-boundary parser rather than a duplicate Pi DTO. + +### Acceptance Criteria + +✓ `elicitation-exchange.test.ts` creates a real persisted session with `SessionManager.create(...).appendMessage(...)`, loads that JSONL file, and observes one ready exchange with assistant prompt id and user response id. +✓ Existing synthetic tests are updated to use Pi-shaped entries (`entry.message.role`) or intentionally named boundary fixtures; no production classifier relies on top-level `entry.role` for Pi message entries. +✓ A transcript with a user response before any prompt does not produce an exchange pairing that response with a later assistant prompt. +✓ Structured Brunch prompt/response custom entries still project to the correct side when they use the Pi custom entry shape. + +### Verification Approach + +- Inner: unit tests over projector helpers — prove role classification, custom-entry classification, and orphan-response behavior. +- Middle: Pi JSONL round-trip test using `SessionManager` — proves projection against the actual canonical transcript store. +- Outer: none for this fix. + +### Cross-cutting obligations + +- Preserve Pi JSONL as transcript truth and avoid chat/turn tables. +- Use source-of-truth typing: import/infer/project Pi-owned shapes when possible; only declare local types for the new semantic projection (`ElicitationExchange*`). +- Keep exchange ranges stable enough for later observer jobs and replay fixtures. + +## Card 2 — Product-scoped session exchange RPC + +**Status:** next + +### Weight + +Full scope card — corrects the public RPC method boundary so it remains product-shaped and coordinator-owned. + +### Target Behavior + +`session.elicitationExchanges` projects the coordinator-selected Brunch session instead of reading an arbitrary client-supplied file path. + +### Boundary Crossings + +```text +→ JSON-RPC stdio request +→ named `session.*` handler +→ WorkspaceSessionCoordinator +→ selected session JSONL file under `.brunch/sessions/` +→ elicitation exchange projector +→ JSON-RPC response +``` + +### Risks and Assumptions + +- RISK: Keeping `{ file }` as a public param turns a named product method into a filesystem read primitive → MITIGATION: remove public file-path params; resolve the current session through the coordinator, or accept only a product identifier that is resolved under the workspace session directory. +- RISK: The handler creates a session in a `select_spec` workspace when the caller only asked for projection → MITIGATION: define and test the no-selected-session result/error explicitly; do not prompt and do not run an agent turn. +- RISK: JSON-RPC request typing treats invalid ids as valid because the type guard under-validates → MITIGATION: validate/project `id` at the runtime boundary while touching the handler parser. +- ASSUMPTION: For M1, projecting the current coordinator-selected session is enough; historical session lookup can wait → VALIDATE: contract tests cover current-session projection and reject raw file params. + +### Acceptance Criteria + +✓ `session.elicitationExchanges` with no params returns exchanges for the coordinator's current ready session. +✓ `session.elicitationExchanges` with `{ file: ... }` returns `Invalid params` and never reads that path. +✓ A no-selected-spec/session state returns a product-shaped JSON-RPC error or empty/no-session result that does not create a session or prompt the user. +✓ JSON-RPC requests with invalid `id` shapes are rejected as `Invalid Request` rather than being accepted by TypeScript-only narrowing. + +### Verification Approach + +- Inner: handler unit tests — prove params rejection, id validation, and no-selected-session behavior. +- Middle: stdio contract test — request `session.elicitationExchanges` through `brunch --mode rpc` and assert the response is derived from the coordinator-selected session. +- Outer: none for this fix. + +### Cross-cutting obligations + +- Public RPC methods remain named product methods, not generic data/filesystem APIs. +- Coordinator remains the owner of workspace/session selection and session binding. +- Keep raw-file projection as a private helper/test utility only if it remains useful for projector tests. + +## Card 3 — RPC/print projection parity smoke + +**Status:** queued + +### Weight + +Light scope card — hardens an already-established seam after the two blocker fixes. + +### Objective + +Prove the print snapshot and RPC workspace snapshot expose the same product-shaped coordinator state. + +### Acceptance Criteria + +✓ A temp workspace with a selected spec produces matching key fields from `brunch --mode print` and `workspace.snapshot` over RPC. +✓ The parity test uses the real coordinator/store path rather than only injected fake states. +✓ The test does not require an agent turn or `InteractiveMode`. + +### Verification Approach + +- Inner: integration-style vitest with temp cwd. +- Middle: optional CLI spawn if direct `runBrunchCli` coverage is insufficient. + +### Cross-cutting obligations + +- Keep print as a snapshot transport mode only. +- Keep snapshot projection reusable without becoming a generic read-model platform. + +### Promotion checklist + +- [ ] Does this change a requirement? No. +- [ ] Does this create, retire, or invalidate an assumption? No. +- [ ] Does this make or reverse a non-trivial design decision? No. +- [ ] Does this establish a new seam-level invariant? No; it tests D19-L/D23-L. +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. +- [ ] Does it cross more than two major seams? No. +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. +- [ ] Can you not name the containing seam or current rationale from the live docs? No. + +## Card 4 — Fixture capture bundle skeleton + +**Status:** queued + +### Weight + +Full scope card — starts the fixture-driver half of M1 over the now-trusted RPC/projection seam. + +### Target Behavior + +A deterministic fixture capture command writes a `.jsonl` plus `.meta.json` bundle for one scripted run. + +### Boundary Crossings + +```text +→ fixture driver command/module +→ `brunch --mode rpc` stdio client +→ `workspace.snapshot` / `session.elicitationExchanges` +→ selected Pi JSONL transcript +→ `.brunch-fixtures///` bundle writer +``` + +### Risks and Assumptions + +- RISK: The driver becomes a one-off harness disconnected from product RPC → MITIGATION: run it through the JSON-RPC stdio surface, not direct function calls, except for unit-level bundle writer tests. +- RISK: Bundle metadata overpromises graph/coherence artifacts before those substrates exist → MITIGATION: write `.jsonl` and `.meta.json` only, with explicit placeholders/omissions for future `.graph.json` and `.coherence.json`. +- RISK: LLM variability obscures whether capture plumbing works → MITIGATION: keep this first run deterministic/scripted; do not require a model-generated interview yet. +- ASSUMPTION: A replay-regression skeleton is valuable before full agent-as-user behavior exists → VALIDATE: bundle writer and RPC driver tests assert stable paths, metadata, and transcript/projection parity. + +### Acceptance Criteria + +✓ A fixture driver can start/connect to RPC mode, request workspace/session projections, and write a run directory under `.brunch-fixtures///`. +✓ The bundle includes the source session `.jsonl` and `.meta.json` with brief id, run id, timestamp, brunch version/commit if available, session id, and projection summary. +✓ The driver is deterministic in tests and does not require live LLM output. + +### Verification Approach + +- Inner: bundle writer unit tests — prove metadata shape and path layout. +- Middle: stdio driver integration test — prove capture through RPC and JSONL copy/projection parity. +- Outer: none until real brief walkthroughs land. + +### Cross-cutting obligations + +- Establish replay-regression fixture architecture without pretending property/adversarial layers are complete. +- Keep captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artifacts. +- Do not bypass RPC for the product behavior the fixture driver is meant to prove. + +## Card 5 — Seed first deterministic briefs + +**Status:** queued + +### Weight + +Light scope card — text/fixture seed work inside the established fixture strategy. + +### Objective + +Create the first three deterministic brief files aligned with `BEHAVIORAL_KERNELS.md` and the fixture capture metadata shape. + +### Acceptance Criteria + +✓ `.brunch-fixtures/briefs/` contains briefs #1–#3 with stable ids, titles, kernel tags, expected structural observations, and deterministic scripted-user notes. +✓ Brief files validate against any schema/helper introduced by Card 4, or a minimal shape checker is added if no schema exists yet. +✓ Brief wording stays product-brief-like rather than implementation-test-like. + +### Verification Approach + +- Inner: brief shape/schema tests or fixture file checker. +- Middle: run the deterministic fixture capture against at least one seeded brief if Card 4 is complete. + +### Cross-cutting obligations + +- Keep the brief corpus aligned with replay/property/adversarial fixture architecture. +- Do not encode graph/coherence expectations before those substrates exist; note future expectations as deferred metadata if needed. + +### Promotion checklist + +- [ ] Does this change a requirement? No. +- [ ] Does this create, retire, or invalidate an assumption? No. +- [ ] Does this make or reverse a non-trivial design decision? No. +- [ ] Does this establish a new seam-level invariant? No. +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. +- [ ] Does it cross more than two major seams? No. +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. +- [ ] Can you not name the containing seam or current rationale from the live docs? No. diff --git a/src/elicitation-exchange.test.ts b/src/elicitation-exchange.test.ts index ebef4732..30838534 100644 --- a/src/elicitation-exchange.test.ts +++ b/src/elicitation-exchange.test.ts @@ -3,6 +3,8 @@ import { tmpdir } from "node:os" import { join } from "node:path" import { describe, expect, it } from "vitest" +import { SessionManager } from "@earendil-works/pi-coding-agent" + import { loadJsonlTranscriptEntries, projectElicitationExchanges, @@ -11,8 +13,7 @@ import { const assistant = { id: "a1", type: "message", - role: "assistant", - content: "Pick one", + message: { role: "assistant", content: "Pick one" }, } const structuredPrompt = { id: "p1", @@ -20,7 +21,11 @@ const structuredPrompt = { customType: "brunch.elicitation_prompt", data: { choices: ["A", "B"] }, } -const user = { id: "u1", type: "message", role: "user", content: "A" } +const user = { + id: "u1", + type: "message", + message: { role: "user", content: "A" }, +} const structuredResponse = { id: "r1", type: "custom", @@ -35,8 +40,16 @@ describe("elicitation exchange projection", () => { assistant, structuredPrompt, user, - { id: "a2", type: "message", role: "assistant", content: "Why?" }, - { id: "u2", type: "message", role: "user", content: "Because" }, + { + id: "a2", + type: "message", + message: { role: "assistant", content: "Why?" }, + }, + { + id: "u2", + type: "message", + message: { role: "user", content: "Because" }, + }, ]) expect(exchanges).toEqual({ @@ -90,6 +103,45 @@ describe("elicitation exchange projection", () => { }) }) + it("ignores orphan user responses before a prompt", () => { + const projection = projectElicitationExchanges([ + user, + { + id: "a2", + type: "message", + message: { role: "assistant", content: "Later prompt" }, + }, + ]) + + expect(projection).toEqual({ + status: "open_prompt", + exchanges: [], + openPrompt: { + promptRange: { start: "a2", end: "a2" }, + promptEntryIds: ["a2"], + }, + }) + }) + + it("projects a real SessionManager JSONL assistant/user transcript", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-pi-jsonl-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + manager.appendMessage({ role: "assistant", content: "Question" }) + manager.appendMessage({ role: "user", content: "Answer" }) + + const entries = await loadJsonlTranscriptEntries(manager.getSessionFile()!) + const projection = projectElicitationExchanges(entries) + + expect(projection.status).toBe("ready") + expect(projection.exchanges).toHaveLength(1) + expect(projection.exchanges[0]?.promptEntryIds[0]).toEqual( + expect.any(String), + ) + expect(projection.exchanges[0]?.responseEntryIds[0]).toEqual( + expect.any(String), + ) + }) + it("loads newline-delimited Pi transcript entries from disk", async () => { const dir = await mkdtemp(join(tmpdir(), "brunch-jsonl-")) const file = join(dir, "session.jsonl") diff --git a/src/elicitation-exchange.ts b/src/elicitation-exchange.ts index 6f073f6f..15dbc5d1 100644 --- a/src/elicitation-exchange.ts +++ b/src/elicitation-exchange.ts @@ -34,6 +34,9 @@ interface TranscriptEntry { type?: string role?: string customType?: string + message?: { + role?: string + } } export async function loadJsonlTranscriptEntries( @@ -64,7 +67,7 @@ export function projectElicitationExchanges( continue } - if (isResponseSideEntry(entry)) { + if (isResponseSideEntry(entry) && promptIds.length > 0) { responseIds.push(entry.id) } } @@ -120,15 +123,13 @@ function isPromptSideEntry(entry: TranscriptEntry): boolean { if (entry.type === "custom" && entry.customType?.includes("prompt")) { return true } - return ( - entry.role === "assistant" || - entry.role === "system" || - entry.role === "tool" - ) + + const role = roleOf(entry) + return role === "assistant" || role === "system" || role === "tool" } function isResponseSideEntry(entry: TranscriptEntry): boolean { - if (entry.role === "user") { + if (roleOf(entry) === "user") { return true } return ( @@ -136,3 +137,10 @@ function isResponseSideEntry(entry: TranscriptEntry): boolean { STRUCTURED_RESPONSE_TYPES.has(entry.customType ?? "") ) } + +function roleOf(entry: TranscriptEntry): string | undefined { + if (entry.type === "message") { + return entry.message?.role + } + return entry.role +} diff --git a/src/rpc.test.ts b/src/rpc.test.ts index 612bc230..bc040006 100644 --- a/src/rpc.test.ts +++ b/src/rpc.test.ts @@ -68,7 +68,7 @@ describe("JSON-RPC handlers", () => { const sessionFile = join(dir, "session.jsonl") await writeFile( sessionFile, - `${JSON.stringify({ id: "a1", type: "message", role: "assistant", content: "Question" })}\n${JSON.stringify({ id: "u1", type: "message", role: "user", content: "Answer" })}\n`, + `${JSON.stringify({ id: "a1", type: "message", message: { role: "assistant", content: "Question" } })}\n${JSON.stringify({ id: "u1", type: "message", message: { role: "user", content: "Answer" } })}\n`, ) const handlers = createRpcHandlers({ coordinator: coordinator() }) From 678f988f2e43f9c009e9a8b5a6cda8a0ec309278 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:10:31 +0200 Subject: [PATCH 05/18] FE-735: Scope session exchange RPC to current session --- memory/CARDS.md | 4 +- src/brunch.test.ts | 72 ++++++++++++++++++++--- src/rpc.test.ts | 142 ++++++++++++++++++++++++++++++++++++--------- src/rpc.ts | 40 ++++++++----- 4 files changed, 205 insertions(+), 53 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index 3809ec5e..c6459f51 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -66,7 +66,7 @@ A `SessionManager`-created assistant→user JSONL transcript projects to one rea ## Card 2 — Product-scoped session exchange RPC -**Status:** next +**Status:** done ### Weight @@ -115,7 +115,7 @@ Full scope card — corrects the public RPC method boundary so it remains produc ## Card 3 — RPC/print projection parity smoke -**Status:** queued +**Status:** next ### Weight diff --git a/src/brunch.test.ts b/src/brunch.test.ts index 67903f1b..9544d6f9 100644 --- a/src/brunch.test.ts +++ b/src/brunch.test.ts @@ -1,22 +1,45 @@ +import { mkdtemp } from "node:fs/promises" +import { tmpdir } from "node:os" +import { join } from "node:path" import { PassThrough } from "node:stream" import { describe, expect, it } from "vitest" +import { SessionManager } from "@earendil-works/pi-coding-agent" + import { runBrunchCli } from "./brunch.js" import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" -function coordinator(): WorkspaceSessionCoordinator { +function coordinator(sessionFile?: string): WorkspaceSessionCoordinator { return { async openExisting() { return { - status: "select_spec", + ...(sessionFile + ? { + status: "ready" as const, + spec: { id: "spec-1", title: "Alpha spec" }, + session: { + id: "session-1", + file: sessionFile, + manager: {} as never, + }, + chrome: { + cwd: "/tmp/brunch-project", + spec: { id: "spec-1", title: "Alpha spec" }, + phase: "elicitation" as const, + chatMode: "responding-to-elicitation" as const, + }, + } + : { + status: "select_spec" as const, + chrome: { + cwd: "/tmp/brunch-project", + spec: null, + phase: "select_spec" as const, + chatMode: "select-spec" as const, + }, + }), cwd: "/tmp/brunch-project", - chrome: { - cwd: "/tmp/brunch-project", - spec: null, - phase: "select_spec", - chatMode: "select-spec", - }, } }, async startOrCreate() { @@ -52,6 +75,39 @@ describe("Brunch CLI dispatch", () => { expect(output).toContain("spec: ") }) + it("routes --mode rpc session projection through the coordinator-selected session", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-cli-rpc-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + manager.appendMessage({ role: "assistant", content: "Question" }) + manager.appendMessage({ role: "user", content: "Answer" }) + const stdin = new PassThrough() + const stdout = new PassThrough() + const chunks: string[] = [] + stdout.on("data", (chunk) => chunks.push(String(chunk))) + + stdin.end( + `${JSON.stringify({ jsonrpc: "2.0", id: 2, method: "session.elicitationExchanges" })}\n`, + ) + + const code = await runBrunchCli({ + argv: ["--mode=rpc"], + cwd: "/tmp/brunch-project", + coordinator: coordinator(manager.getSessionFile()!), + stdin, + stdout, + }) + + expect(code).toBe(0) + expect(JSON.parse(chunks.join(""))).toMatchObject({ + jsonrpc: "2.0", + id: 2, + result: { + status: "ready", + exchanges: [{ promptEntryIds: [expect.any(String)] }], + }, + }) + }) + it("routes --mode rpc through the named JSON-RPC stdio adapter", async () => { const stdin = new PassThrough() const stdout = new PassThrough() diff --git a/src/rpc.test.ts b/src/rpc.test.ts index bc040006..8a46228f 100644 --- a/src/rpc.test.ts +++ b/src/rpc.test.ts @@ -1,31 +1,25 @@ -import { writeFile, mkdtemp } from "node:fs/promises" +import { mkdtemp } from "node:fs/promises" import { tmpdir } from "node:os" import { join } from "node:path" import { PassThrough } from "node:stream" import { describe, expect, it } from "vitest" +import { SessionManager } from "@earendil-works/pi-coding-agent" + import { createRpcHandlers, runJsonRpcLineServer } from "./rpc.js" -import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" +import type { + WorkspaceSessionCoordinator, + WorkspaceSessionState, +} from "./workspace-session-coordinator.js" -function coordinator(): WorkspaceSessionCoordinator { +function coordinator( + state: WorkspaceSessionState = readyState( + "/tmp/brunch-project/.brunch/sessions/session-1.jsonl", + ), +): WorkspaceSessionCoordinator { return { async openExisting() { - return { - status: "ready", - cwd: "/tmp/brunch-project", - spec: { id: "spec-1", title: "Alpha spec" }, - session: { - id: "session-1", - file: "/tmp/brunch-project/.brunch/sessions/session-1.jsonl", - manager: {} as never, - }, - chrome: { - cwd: "/tmp/brunch-project", - spec: { id: "spec-1", title: "Alpha spec" }, - phase: "elicitation", - chatMode: "responding-to-elicitation", - }, - } + return state }, async startOrCreate() { throw new Error("not used") @@ -42,6 +36,46 @@ function coordinator(): WorkspaceSessionCoordinator { } } +function readyState(sessionFile: string): WorkspaceSessionState { + return { + status: "ready", + cwd: "/tmp/brunch-project", + spec: { id: "spec-1", title: "Alpha spec" }, + session: { + id: "session-1", + file: sessionFile, + manager: {} as never, + }, + chrome: { + cwd: "/tmp/brunch-project", + spec: { id: "spec-1", title: "Alpha spec" }, + phase: "elicitation", + chatMode: "responding-to-elicitation", + }, + } +} + +function selectSpecState(): WorkspaceSessionState { + return { + status: "select_spec", + cwd: "/tmp/brunch-project", + chrome: { + cwd: "/tmp/brunch-project", + spec: null, + phase: "select_spec", + chatMode: "select-spec", + }, + } +} + +async function createSessionFile(): Promise { + const cwd = await mkdtemp(join(tmpdir(), "brunch-rpc-session-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + manager.appendMessage({ role: "assistant", content: "Question" }) + manager.appendMessage({ role: "user", content: "Answer" }) + return manager.getSessionFile()! +} + describe("JSON-RPC handlers", () => { it("serves a named workspace snapshot method", async () => { const handlers = createRpcHandlers({ coordinator: coordinator() }) @@ -63,26 +97,76 @@ describe("JSON-RPC handlers", () => { }) }) - it("serves session elicitation exchanges from a Pi JSONL file", async () => { - const dir = await mkdtemp(join(tmpdir(), "brunch-rpc-")) - const sessionFile = join(dir, "session.jsonl") - await writeFile( - sessionFile, - `${JSON.stringify({ id: "a1", type: "message", message: { role: "assistant", content: "Question" } })}\n${JSON.stringify({ id: "u1", type: "message", message: { role: "user", content: "Answer" } })}\n`, - ) - const handlers = createRpcHandlers({ coordinator: coordinator() }) + it("serves session elicitation exchanges from the coordinator-selected session", async () => { + const sessionFile = await createSessionFile() + const handlers = createRpcHandlers({ + coordinator: coordinator(readyState(sessionFile)), + }) await expect( handlers.handle({ jsonrpc: "2.0", id: 3, method: "session.elicitationExchanges", - params: { file: sessionFile }, }), ).resolves.toMatchObject({ jsonrpc: "2.0", id: 3, - result: { status: "ready", exchanges: [{ promptEntryIds: ["a1"] }] }, + result: { + status: "ready", + exchanges: [{ promptEntryIds: [expect.any(String)] }], + }, + }) + }) + + it("rejects raw file params on session elicitation exchange RPC", async () => { + const handlers = createRpcHandlers({ coordinator: coordinator() }) + + await expect( + handlers.handle({ + jsonrpc: "2.0", + id: 4, + method: "session.elicitationExchanges", + params: { file: "/tmp/not-a-product-param.jsonl" }, + }), + ).resolves.toMatchObject({ + jsonrpc: "2.0", + id: 4, + error: { code: -32602, message: "Invalid params" }, + }) + }) + + it("returns a product-shaped no-session error without creating a session", async () => { + const handlers = createRpcHandlers({ + coordinator: coordinator(selectSpecState()), + }) + + await expect( + handlers.handle({ + jsonrpc: "2.0", + id: 5, + method: "session.elicitationExchanges", + }), + ).resolves.toMatchObject({ + jsonrpc: "2.0", + id: 5, + error: { code: -32001, message: "No selected Brunch session" }, + }) + }) + + it("rejects invalid request id shapes", async () => { + const handlers = createRpcHandlers({ coordinator: coordinator() }) + + await expect( + handlers.handle({ + jsonrpc: "2.0", + id: { bad: true }, + method: "workspace.snapshot", + }), + ).resolves.toMatchObject({ + jsonrpc: "2.0", + id: null, + error: { code: -32600, message: "Invalid Request" }, }) }) diff --git a/src/rpc.ts b/src/rpc.ts index 4c7a0196..e717df85 100644 --- a/src/rpc.ts +++ b/src/rpc.ts @@ -54,10 +54,20 @@ export function createRpcHandlers(options: { } if (request.method === "session.elicitationExchanges") { - if (!isSessionProjectionParams(request.params)) { + if (request.params !== undefined) { return failure(request.id ?? null, -32602, "Invalid params") } - const entries = await loadJsonlTranscriptEntries(request.params.file) + + const state = await options.coordinator.openExisting() + if (state.status !== "ready") { + return failure( + request.id ?? null, + -32001, + "No selected Brunch session", + ) + } + + const entries = await loadJsonlTranscriptEntries(state.session.file) return success(request.id ?? null, projectElicitationExchanges(entries)) } @@ -104,19 +114,21 @@ function failure( return { jsonrpc: "2.0", id, error: { code, message } } } -function isSessionProjectionParams(value: unknown): value is { file: string } { - return ( - typeof value === "object" && - value !== null && - typeof (value as { file?: unknown }).file === "string" - ) -} - function isJsonRpcRequest(value: unknown): value is JsonRpcRequest { + if ( + typeof value !== "object" || + value === null || + (value as { jsonrpc?: unknown }).jsonrpc !== "2.0" || + typeof (value as { method?: unknown }).method !== "string" + ) { + return false + } + + const id = (value as { id?: unknown }).id return ( - typeof value === "object" && - value !== null && - (value as { jsonrpc?: unknown }).jsonrpc === "2.0" && - typeof (value as { method?: unknown }).method === "string" + id === undefined || + id === null || + typeof id === "string" || + typeof id === "number" ) } From 130c165c932a02297b22248fd338a1ce213b71fd Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:12:10 +0200 Subject: [PATCH 06/18] FE-735: Add RPC print snapshot parity smoke --- memory/CARDS.md | 4 +-- src/brunch.test.ts | 77 ++++++++++++++++++++++++++++++++++++---------- 2 files changed, 62 insertions(+), 19 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index c6459f51..658cf69b 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -115,7 +115,7 @@ Full scope card — corrects the public RPC method boundary so it remains produc ## Card 3 — RPC/print projection parity smoke -**Status:** next +**Status:** done ### Weight @@ -154,7 +154,7 @@ Prove the print snapshot and RPC workspace snapshot expose the same product-shap ## Card 4 — Fixture capture bundle skeleton -**Status:** queued +**Status:** next ### Weight diff --git a/src/brunch.test.ts b/src/brunch.test.ts index 9544d6f9..79e4efdf 100644 --- a/src/brunch.test.ts +++ b/src/brunch.test.ts @@ -8,7 +8,10 @@ import { describe, expect, it } from "vitest" import { SessionManager } from "@earendil-works/pi-coding-agent" import { runBrunchCli } from "./brunch.js" -import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" +import { + createWorkspaceSessionCoordinator, + type WorkspaceSessionCoordinator, +} from "./workspace-session-coordinator.js" function coordinator(sessionFile?: string): WorkspaceSessionCoordinator { return { @@ -57,6 +60,18 @@ function coordinator(sessionFile?: string): WorkspaceSessionCoordinator { } } +function rpcRequest(method: string, id = 1): PassThrough { + const stdin = new PassThrough() + stdin.end(`${JSON.stringify({ jsonrpc: "2.0", id, method })}\n`) + return stdin +} + +function collectStream(stream: PassThrough): string[] { + const chunks: string[] = [] + stream.on("data", (chunk) => chunks.push(String(chunk))) + return chunks +} + describe("Brunch CLI dispatch", () => { it("routes --mode print through the coordinator snapshot and exits", async () => { let output = "" @@ -80,20 +95,14 @@ describe("Brunch CLI dispatch", () => { const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) manager.appendMessage({ role: "assistant", content: "Question" }) manager.appendMessage({ role: "user", content: "Answer" }) - const stdin = new PassThrough() const stdout = new PassThrough() - const chunks: string[] = [] - stdout.on("data", (chunk) => chunks.push(String(chunk))) - - stdin.end( - `${JSON.stringify({ jsonrpc: "2.0", id: 2, method: "session.elicitationExchanges" })}\n`, - ) + const chunks = collectStream(stdout) const code = await runBrunchCli({ argv: ["--mode=rpc"], cwd: "/tmp/brunch-project", coordinator: coordinator(manager.getSessionFile()!), - stdin, + stdin: rpcRequest("session.elicitationExchanges", 2), stdout, }) @@ -109,20 +118,14 @@ describe("Brunch CLI dispatch", () => { }) it("routes --mode rpc through the named JSON-RPC stdio adapter", async () => { - const stdin = new PassThrough() const stdout = new PassThrough() - const chunks: string[] = [] - stdout.on("data", (chunk) => chunks.push(String(chunk))) - - stdin.end( - `${JSON.stringify({ jsonrpc: "2.0", id: 1, method: "workspace.snapshot" })}\n`, - ) + const chunks = collectStream(stdout) const code = await runBrunchCli({ argv: ["--mode=rpc"], cwd: "/tmp/brunch-project", coordinator: coordinator(), - stdin, + stdin: rpcRequest("workspace.snapshot"), stdout, }) @@ -133,4 +136,44 @@ describe("Brunch CLI dispatch", () => { result: { status: "select_spec" }, }) }) + + it("exposes matching print and RPC workspace snapshots from a real coordinator store", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-parity-")) + await createWorkspaceSessionCoordinator({ cwd }).startOrCreate({ + specTitle: "Parity spec", + }) + let printOutput = "" + const rpcOutput = new PassThrough() + const rpcChunks = collectStream(rpcOutput) + + await runBrunchCli({ + argv: ["--mode=print"], + cwd, + stdout: (chunk) => { + printOutput += chunk + }, + }) + await runBrunchCli({ + argv: ["--mode=rpc"], + cwd, + stdin: rpcRequest("workspace.snapshot"), + stdout: rpcOutput, + }) + + const rpcSnapshot = JSON.parse(rpcChunks.join("")).result + expect(printOutput).toContain("status: ready") + expect(printOutput).toContain(`cwd: ${rpcSnapshot.cwd}`) + expect(printOutput).toContain("spec: Parity spec") + expect(printOutput).toContain(`phase: ${rpcSnapshot.chrome.phase}`) + expect(printOutput).toContain(`chatMode: ${rpcSnapshot.chrome.chatMode}`) + expect(rpcSnapshot).toMatchObject({ + status: "ready", + cwd, + spec: { title: "Parity spec" }, + chrome: { + phase: "elicitation", + chatMode: "responding-to-elicitation", + }, + }) + }) }) From 10fd2508437fe917146c9201fb443eae7e367023 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:13:55 +0200 Subject: [PATCH 07/18] FE-735: Add fixture capture bundle skeleton --- memory/CARDS.md | 4 +- src/fixture-capture.test.ts | 76 ++++++++++++++++++++++ src/fixture-capture.ts | 122 ++++++++++++++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 2 deletions(-) create mode 100644 src/fixture-capture.test.ts create mode 100644 src/fixture-capture.ts diff --git a/memory/CARDS.md b/memory/CARDS.md index 658cf69b..512c7959 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -154,7 +154,7 @@ Prove the print snapshot and RPC workspace snapshot expose the same product-shap ## Card 4 — Fixture capture bundle skeleton -**Status:** next +**Status:** done ### Weight @@ -201,7 +201,7 @@ A deterministic fixture capture command writes a `.jsonl` plus `.meta.json` bund ## Card 5 — Seed first deterministic briefs -**Status:** queued +**Status:** next ### Weight diff --git a/src/fixture-capture.test.ts b/src/fixture-capture.test.ts new file mode 100644 index 00000000..bd79e14a --- /dev/null +++ b/src/fixture-capture.test.ts @@ -0,0 +1,76 @@ +import { mkdtemp, readFile } from "node:fs/promises" +import { tmpdir } from "node:os" +import { join } from "node:path" +import { describe, expect, it } from "vitest" + +import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" +import { createWorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" +import { captureFixtureRun } from "./fixture-capture.js" + +describe("fixture capture", () => { + it("captures a deterministic JSONL and metadata bundle through RPC", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-")) + const workspace = await createWorkspaceSessionCoordinator({ + cwd, + }).startOrCreate({ + specTitle: "Fixture spec", + }) + workspace.session.manager.appendMessage({ + role: "assistant", + content: "Question", + }) + workspace.session.manager.appendMessage({ role: "user", content: "Answer" }) + + const coordinator: WorkspaceSessionCoordinator = { + async openExisting() { + return workspace + }, + async startOrCreate() { + return workspace + }, + async createNewSessionForCurrentSpec() { + return workspace + }, + async bindCurrentSpecToSession() { + return workspace + }, + async deriveChromeState() { + return workspace.chrome + }, + } + + const result = await captureFixtureRun({ + cwd, + briefId: "brief-001", + runId: "run-001", + timestamp: "2026-05-21T00:00:00.000Z", + coordinator, + }) + + expect(result.runDir).toBe( + join(cwd, ".brunch-fixtures", "brief-001", "run-001"), + ) + expect(JSON.parse(await readFile(result.metaFile, "utf8"))).toMatchObject({ + schemaVersion: 1, + briefId: "brief-001", + runId: "run-001", + timestamp: "2026-05-21T00:00:00.000Z", + brunchVersion: "0.0.0", + session: { + id: expect.any(String), + sourceFile: expect.stringContaining(".brunch/sessions"), + }, + projectionSummary: { + status: "ready", + exchangeCount: 1, + openPrompt: false, + }, + artifacts: { + jsonl: "run-001.jsonl", + }, + }) + expect(await readFile(result.jsonlFile, "utf8")).toContain( + '"role":"assistant"', + ) + }) +}) diff --git a/src/fixture-capture.ts b/src/fixture-capture.ts new file mode 100644 index 00000000..246e6286 --- /dev/null +++ b/src/fixture-capture.ts @@ -0,0 +1,122 @@ +import { copyFile, mkdir, readFile, writeFile } from "node:fs/promises" +import { join } from "node:path" +import { PassThrough } from "node:stream" + +import { runBrunchCli } from "./brunch.js" +import type { ElicitationExchangeProjection } from "./elicitation-exchange.js" +import type { WorkspaceSnapshot } from "./print-snapshot.js" +import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" + +export interface FixtureCaptureOptions { + cwd: string + briefId: string + runId: string + timestamp?: string + coordinator?: WorkspaceSessionCoordinator +} + +export interface FixtureCaptureResult { + runDir: string + jsonlFile: string + metaFile: string +} + +interface JsonRpcResponse { + result?: T + error?: { + code: number + message: string + } +} + +export async function captureFixtureRun( + options: FixtureCaptureOptions, +): Promise { + const workspace = await callRpc( + options, + "workspace.snapshot", + ) + if (!workspace.session) { + throw new Error("Cannot capture fixture without a selected Brunch session") + } + + const projection = await callRpc( + options, + "session.elicitationExchanges", + ) + const runDir = join( + options.cwd, + ".brunch-fixtures", + options.briefId, + options.runId, + ) + const jsonlFile = join(runDir, `${options.runId}.jsonl`) + const metaFile = join(runDir, `${options.runId}.meta.json`) + + await mkdir(runDir, { recursive: true }) + await copyFile(workspace.session.file, jsonlFile) + await writeFile( + metaFile, + `${JSON.stringify( + { + schemaVersion: 1, + briefId: options.briefId, + runId: options.runId, + timestamp: options.timestamp ?? new Date().toISOString(), + brunchVersion: await readPackageVersion(), + session: { + id: workspace.session.id, + sourceFile: workspace.session.file, + }, + projectionSummary: { + status: projection.status, + exchangeCount: projection.exchanges.length, + openPrompt: projection.openPrompt !== null, + }, + artifacts: { + jsonl: `${options.runId}.jsonl`, + }, + }, + null, + 2, + )}\n`, + "utf8", + ) + + return { runDir, jsonlFile, metaFile } +} + +async function callRpc( + options: FixtureCaptureOptions, + method: string, +): Promise { + const stdin = new PassThrough() + const stdout = new PassThrough() + const chunks: string[] = [] + stdout.on("data", (chunk) => chunks.push(String(chunk))) + stdin.end(`${JSON.stringify({ jsonrpc: "2.0", id: 1, method })}\n`) + + await runBrunchCli({ + argv: ["--mode=rpc"], + cwd: options.cwd, + ...(options.coordinator ? { coordinator: options.coordinator } : {}), + stdin, + stdout, + }) + + const response = JSON.parse(chunks.join("")) as JsonRpcResponse + if (response.error) { + throw new Error(response.error.message) + } + if (response.result === undefined) { + throw new Error(`RPC ${method} returned no result`) + } + return response.result +} + +async function readPackageVersion(): Promise { + const packageJson = JSON.parse(await readFile("package.json", "utf8")) as { + version?: unknown + } + return typeof packageJson.version === "string" ? packageJson.version : "0.0.0" +} From 49f8fc56e0f3bd539eff8673de5ff676b625ec73 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:16:11 +0200 Subject: [PATCH 08/18] FE-735: Seed deterministic fixture briefs --- .../briefs/brief-001-identity-reference.json | 19 ++ .../briefs/brief-002-state-lifecycle.json | 19 ++ .../briefs/brief-003-derived-views.json | 19 ++ memory/CARDS.md | 239 ------------------ memory/PLAN.md | 2 +- src/brief-library.test.ts | 30 +++ src/brief-library.ts | 64 +++++ 7 files changed, 152 insertions(+), 240 deletions(-) create mode 100644 .brunch-fixtures/briefs/brief-001-identity-reference.json create mode 100644 .brunch-fixtures/briefs/brief-002-state-lifecycle.json create mode 100644 .brunch-fixtures/briefs/brief-003-derived-views.json delete mode 100644 memory/CARDS.md create mode 100644 src/brief-library.test.ts create mode 100644 src/brief-library.ts diff --git a/.brunch-fixtures/briefs/brief-001-identity-reference.json b/.brunch-fixtures/briefs/brief-001-identity-reference.json new file mode 100644 index 00000000..a7383271 --- /dev/null +++ b/.brunch-fixtures/briefs/brief-001-identity-reference.json @@ -0,0 +1,19 @@ +{ + "schemaVersion": 1, + "id": "brief-001", + "title": "Team knowledge cards", + "kernelTags": ["identity-reference", "containment-topology"], + "productBrief": "A small team wants a shared workspace for knowledge cards. Each card has a stable identity, a human-readable title, and may link to other cards even when titles change.", + "expectedStructuralObservations": [ + "Cards need stable IDs separate from mutable titles.", + "Links should target card identity rather than display text." + ], + "scriptedUserNotes": [ + "I care about renaming cards without breaking links.", + "Two cards can have similar titles, so titles cannot be the only reference." + ], + "deferredExpectations": { + "graph": "Later graph fixtures should produce identity/reference nodes and link invariants.", + "coherence": "Later coherence checks should flag title-anchored references as weak evidence." + } +} diff --git a/.brunch-fixtures/briefs/brief-002-state-lifecycle.json b/.brunch-fixtures/briefs/brief-002-state-lifecycle.json new file mode 100644 index 00000000..9370e358 --- /dev/null +++ b/.brunch-fixtures/briefs/brief-002-state-lifecycle.json @@ -0,0 +1,19 @@ +{ + "schemaVersion": 1, + "id": "brief-002", + "title": "Approval workflow for vendor invoices", + "kernelTags": ["state-lifecycle", "authority-capability"], + "productBrief": "A finance team needs invoices to move from draft to submitted to approved or rejected. Only budget owners can approve, and rejected invoices can be revised and resubmitted.", + "expectedStructuralObservations": [ + "Invoice states and legal transitions must be explicit.", + "Approval authority depends on the budget owner role." + ], + "scriptedUserNotes": [ + "Rejected invoices are not terminal; they can go back to draft.", + "Approved invoices should not be edited without reopening the workflow." + ], + "deferredExpectations": { + "graph": "Later graph fixtures should capture lifecycle states, transitions, and authority predicates.", + "coherence": "Later coherence checks should flag contradictory terminality claims." + } +} diff --git a/.brunch-fixtures/briefs/brief-003-derived-views.json b/.brunch-fixtures/briefs/brief-003-derived-views.json new file mode 100644 index 00000000..1c6abe26 --- /dev/null +++ b/.brunch-fixtures/briefs/brief-003-derived-views.json @@ -0,0 +1,19 @@ +{ + "schemaVersion": 1, + "id": "brief-003", + "title": "Project dashboard rollups", + "kernelTags": ["derived-data-views", "temporal-history"], + "productBrief": "A product lead wants a dashboard that rolls task status, blockers, and recent decisions up from individual project notes into one current view.", + "expectedStructuralObservations": [ + "Dashboard values are derived from underlying notes and decisions.", + "The system needs evidence for when a rollup was last refreshed." + ], + "scriptedUserNotes": [ + "If the source note changes, the dashboard should not silently stay stale.", + "Recent decisions should show where they came from." + ], + "deferredExpectations": { + "graph": "Later graph fixtures should capture source-to-view derivation edges and evidence anchors.", + "coherence": "Later coherence checks should flag stale projections when source facts change." + } +} diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index 512c7959..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,239 +0,0 @@ - - -# Scope Cards — FE-735 Review Fixes and Next M1 Slices - -## Orientation - -- **Containing seam:** M1 transport/projection seam: CLI transport-mode dispatch, named JSON-RPC handlers, coordinator-owned workspace/session state, Pi JSONL transcript projection, and fixture capture. -- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) remains the tracker/branch boundary. The initial print/RPC/projection cards landed, but review found two blocker defects in the projection/RPC seam. -- **Volatile state:** No `HANDOFF.md`; current review found that synthetic tests pass while real `SessionManager` JSONL projects as empty, and that public `session.elicitationExchanges` currently accepts an arbitrary filesystem path. -- **Main open risk:** fixture-driver work would encode the wrong transcript model if it proceeds before real Pi JSONL projection and coordinator-owned session access are fixed. -- **Frontier obligations:** keep transport modes distinct from agent modes/lenses (D23-L); keep named RPC methods product-shaped, not generic filesystem/data reads (D5-L, D19-L); keep Pi JSONL as transcript truth without chat/turn tables (D6-L, D12-L, D13-L, I10-L); derive/import/project TypeScript shapes from owning seams rather than duplicating Pi state spaces. - -## Blocking instruction - -Complete Cards 1 and 2 before any fixture-driver or brief-capture work. Card 3 may follow immediately after those blockers. Cards 4–5 are the next M1 delivery slices after the seam is trustworthy. - -## Card 1 — Real Pi JSONL elicitation projection - -**Status:** done - -### Weight - -Full scope card — corrects the transcript projection seam that M1 fixture capture depends on. - -### Target Behavior - -A `SessionManager`-created assistant→user JSONL transcript projects to one ready elicitation exchange. - -### Boundary Crossings - -```text -→ Pi SessionManager JSONL file -→ JSONL loader -→ Pi transcript-entry projection boundary -→ elicitation exchange projector -→ session projection result -``` - -### Risks and Assumptions - -- RISK: Tests keep using synthetic top-level `role` entries and miss Pi's real nested `message.role` shape → MITIGATION: add a test that writes the transcript through `SessionManager.appendMessage`, reloads the JSONL file, and asserts the exchange projection. -- RISK: Fixing real Pi message entries accidentally loses custom structured prompt/response support → MITIGATION: keep custom-entry tests, but make their shape match Pi custom entries and classify custom entries separately from message entries. -- RISK: Orphan user/response entries before any prompt are later paired with an unrelated prompt → MITIGATION: ignore unmatched response-side entries or return an explicit unmatched/invalid diagnostic shape; do not silently attach them to later prompts. -- ASSUMPTION: Pi exports enough session entry/message types to avoid restating the message state space → VALIDATE: import/project from Pi exported types where available; if not exported, keep a narrow local runtime projection from `unknown` and document it as a trust-boundary parser rather than a duplicate Pi DTO. - -### Acceptance Criteria - -✓ `elicitation-exchange.test.ts` creates a real persisted session with `SessionManager.create(...).appendMessage(...)`, loads that JSONL file, and observes one ready exchange with assistant prompt id and user response id. -✓ Existing synthetic tests are updated to use Pi-shaped entries (`entry.message.role`) or intentionally named boundary fixtures; no production classifier relies on top-level `entry.role` for Pi message entries. -✓ A transcript with a user response before any prompt does not produce an exchange pairing that response with a later assistant prompt. -✓ Structured Brunch prompt/response custom entries still project to the correct side when they use the Pi custom entry shape. - -### Verification Approach - -- Inner: unit tests over projector helpers — prove role classification, custom-entry classification, and orphan-response behavior. -- Middle: Pi JSONL round-trip test using `SessionManager` — proves projection against the actual canonical transcript store. -- Outer: none for this fix. - -### Cross-cutting obligations - -- Preserve Pi JSONL as transcript truth and avoid chat/turn tables. -- Use source-of-truth typing: import/infer/project Pi-owned shapes when possible; only declare local types for the new semantic projection (`ElicitationExchange*`). -- Keep exchange ranges stable enough for later observer jobs and replay fixtures. - -## Card 2 — Product-scoped session exchange RPC - -**Status:** done - -### Weight - -Full scope card — corrects the public RPC method boundary so it remains product-shaped and coordinator-owned. - -### Target Behavior - -`session.elicitationExchanges` projects the coordinator-selected Brunch session instead of reading an arbitrary client-supplied file path. - -### Boundary Crossings - -```text -→ JSON-RPC stdio request -→ named `session.*` handler -→ WorkspaceSessionCoordinator -→ selected session JSONL file under `.brunch/sessions/` -→ elicitation exchange projector -→ JSON-RPC response -``` - -### Risks and Assumptions - -- RISK: Keeping `{ file }` as a public param turns a named product method into a filesystem read primitive → MITIGATION: remove public file-path params; resolve the current session through the coordinator, or accept only a product identifier that is resolved under the workspace session directory. -- RISK: The handler creates a session in a `select_spec` workspace when the caller only asked for projection → MITIGATION: define and test the no-selected-session result/error explicitly; do not prompt and do not run an agent turn. -- RISK: JSON-RPC request typing treats invalid ids as valid because the type guard under-validates → MITIGATION: validate/project `id` at the runtime boundary while touching the handler parser. -- ASSUMPTION: For M1, projecting the current coordinator-selected session is enough; historical session lookup can wait → VALIDATE: contract tests cover current-session projection and reject raw file params. - -### Acceptance Criteria - -✓ `session.elicitationExchanges` with no params returns exchanges for the coordinator's current ready session. -✓ `session.elicitationExchanges` with `{ file: ... }` returns `Invalid params` and never reads that path. -✓ A no-selected-spec/session state returns a product-shaped JSON-RPC error or empty/no-session result that does not create a session or prompt the user. -✓ JSON-RPC requests with invalid `id` shapes are rejected as `Invalid Request` rather than being accepted by TypeScript-only narrowing. - -### Verification Approach - -- Inner: handler unit tests — prove params rejection, id validation, and no-selected-session behavior. -- Middle: stdio contract test — request `session.elicitationExchanges` through `brunch --mode rpc` and assert the response is derived from the coordinator-selected session. -- Outer: none for this fix. - -### Cross-cutting obligations - -- Public RPC methods remain named product methods, not generic data/filesystem APIs. -- Coordinator remains the owner of workspace/session selection and session binding. -- Keep raw-file projection as a private helper/test utility only if it remains useful for projector tests. - -## Card 3 — RPC/print projection parity smoke - -**Status:** done - -### Weight - -Light scope card — hardens an already-established seam after the two blocker fixes. - -### Objective - -Prove the print snapshot and RPC workspace snapshot expose the same product-shaped coordinator state. - -### Acceptance Criteria - -✓ A temp workspace with a selected spec produces matching key fields from `brunch --mode print` and `workspace.snapshot` over RPC. -✓ The parity test uses the real coordinator/store path rather than only injected fake states. -✓ The test does not require an agent turn or `InteractiveMode`. - -### Verification Approach - -- Inner: integration-style vitest with temp cwd. -- Middle: optional CLI spawn if direct `runBrunchCli` coverage is insufficient. - -### Cross-cutting obligations - -- Keep print as a snapshot transport mode only. -- Keep snapshot projection reusable without becoming a generic read-model platform. - -### Promotion checklist - -- [ ] Does this change a requirement? No. -- [ ] Does this create, retire, or invalidate an assumption? No. -- [ ] Does this make or reverse a non-trivial design decision? No. -- [ ] Does this establish a new seam-level invariant? No; it tests D19-L/D23-L. -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. -- [ ] Does it cross more than two major seams? No. -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. -- [ ] Can you not name the containing seam or current rationale from the live docs? No. - -## Card 4 — Fixture capture bundle skeleton - -**Status:** done - -### Weight - -Full scope card — starts the fixture-driver half of M1 over the now-trusted RPC/projection seam. - -### Target Behavior - -A deterministic fixture capture command writes a `.jsonl` plus `.meta.json` bundle for one scripted run. - -### Boundary Crossings - -```text -→ fixture driver command/module -→ `brunch --mode rpc` stdio client -→ `workspace.snapshot` / `session.elicitationExchanges` -→ selected Pi JSONL transcript -→ `.brunch-fixtures///` bundle writer -``` - -### Risks and Assumptions - -- RISK: The driver becomes a one-off harness disconnected from product RPC → MITIGATION: run it through the JSON-RPC stdio surface, not direct function calls, except for unit-level bundle writer tests. -- RISK: Bundle metadata overpromises graph/coherence artifacts before those substrates exist → MITIGATION: write `.jsonl` and `.meta.json` only, with explicit placeholders/omissions for future `.graph.json` and `.coherence.json`. -- RISK: LLM variability obscures whether capture plumbing works → MITIGATION: keep this first run deterministic/scripted; do not require a model-generated interview yet. -- ASSUMPTION: A replay-regression skeleton is valuable before full agent-as-user behavior exists → VALIDATE: bundle writer and RPC driver tests assert stable paths, metadata, and transcript/projection parity. - -### Acceptance Criteria - -✓ A fixture driver can start/connect to RPC mode, request workspace/session projections, and write a run directory under `.brunch-fixtures///`. -✓ The bundle includes the source session `.jsonl` and `.meta.json` with brief id, run id, timestamp, brunch version/commit if available, session id, and projection summary. -✓ The driver is deterministic in tests and does not require live LLM output. - -### Verification Approach - -- Inner: bundle writer unit tests — prove metadata shape and path layout. -- Middle: stdio driver integration test — prove capture through RPC and JSONL copy/projection parity. -- Outer: none until real brief walkthroughs land. - -### Cross-cutting obligations - -- Establish replay-regression fixture architecture without pretending property/adversarial layers are complete. -- Keep captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artifacts. -- Do not bypass RPC for the product behavior the fixture driver is meant to prove. - -## Card 5 — Seed first deterministic briefs - -**Status:** next - -### Weight - -Light scope card — text/fixture seed work inside the established fixture strategy. - -### Objective - -Create the first three deterministic brief files aligned with `BEHAVIORAL_KERNELS.md` and the fixture capture metadata shape. - -### Acceptance Criteria - -✓ `.brunch-fixtures/briefs/` contains briefs #1–#3 with stable ids, titles, kernel tags, expected structural observations, and deterministic scripted-user notes. -✓ Brief files validate against any schema/helper introduced by Card 4, or a minimal shape checker is added if no schema exists yet. -✓ Brief wording stays product-brief-like rather than implementation-test-like. - -### Verification Approach - -- Inner: brief shape/schema tests or fixture file checker. -- Middle: run the deterministic fixture capture against at least one seeded brief if Card 4 is complete. - -### Cross-cutting obligations - -- Keep the brief corpus aligned with replay/property/adversarial fixture architecture. -- Do not encode graph/coherence expectations before those substrates exist; note future expectations as deferred metadata if needed. - -### Promotion checklist - -- [ ] Does this change a requirement? No. -- [ ] Does this create, retire, or invalidate an assumption? No. -- [ ] Does this make or reverse a non-trivial design decision? No. -- [ ] Does this establish a new seam-level invariant? No. -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. -- [ ] Does it cross more than two major seams? No. -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. -- [ ] Can you not name the containing seam or current rationale from the live docs? No. diff --git a/memory/PLAN.md b/memory/PLAN.md index 5f2aefd5..cde9a305 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -77,7 +77,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** initial transport/projection scope queue complete: `--mode print`, `--mode rpc`, `workspace.snapshot`, and `session.elicitationExchanges` are implemented and verified. Next scope should cover fixture-driver capture and brief seeding. +- **Current execution pointer:** review blockers and next M1 slices complete: real Pi JSONL exchange projection, product-scoped `session.elicitationExchanges`, RPC/print snapshot parity, fixture bundle capture skeleton, and briefs #1–#3 are implemented and verified. Remaining frontier work should scope actual captured runs for briefs #1–#3 and any agent-as-user driver behavior beyond deterministic bundle capture. ### jsonl-session-viability diff --git a/src/brief-library.test.ts b/src/brief-library.test.ts new file mode 100644 index 00000000..eaa7e7f9 --- /dev/null +++ b/src/brief-library.test.ts @@ -0,0 +1,30 @@ +import { describe, expect, it } from "vitest" + +import { loadBriefLibrary } from "./brief-library.js" + +describe("fixture brief library", () => { + it("loads the first three deterministic product briefs", async () => { + const briefs = await loadBriefLibrary(".brunch-fixtures/briefs") + + expect(briefs.map((brief) => brief.id)).toEqual([ + "brief-001", + "brief-002", + "brief-003", + ]) + expect(briefs).toEqual( + Array.from({ length: 3 }, () => + expect.objectContaining({ + schemaVersion: 1, + title: expect.any(String), + kernelTags: expect.arrayContaining([expect.any(String)]), + productBrief: expect.stringMatching(/\w/u), + expectedStructuralObservations: expect.arrayContaining([ + expect.any(String), + ]), + scriptedUserNotes: expect.arrayContaining([expect.any(String)]), + }), + ), + ) + expect(briefs[0]?.productBrief).not.toContain("assert") + }) +}) diff --git a/src/brief-library.ts b/src/brief-library.ts new file mode 100644 index 00000000..ae6fba14 --- /dev/null +++ b/src/brief-library.ts @@ -0,0 +1,64 @@ +import { readdir, readFile } from "node:fs/promises" +import { join } from "node:path" + +export interface FixtureBrief { + schemaVersion: 1 + id: string + title: string + kernelTags: string[] + productBrief: string + expectedStructuralObservations: string[] + scriptedUserNotes: string[] + deferredExpectations?: { + graph?: string + coherence?: string + } +} + +export async function loadBriefLibrary(dir: string): Promise { + const files = (await readdir(dir)) + .filter((file) => file.endsWith(".json")) + .sort() + const briefs = await Promise.all( + files.map(async (file) => + parseBrief(await readFile(join(dir, file), "utf8"), file), + ), + ) + return briefs.sort((left, right) => left.id.localeCompare(right.id)) +} + +function parseBrief(content: string, source: string): FixtureBrief { + const parsed = JSON.parse(content) as unknown + if (!isFixtureBrief(parsed)) { + throw new Error(`${source} is not a valid fixture brief`) + } + return parsed +} + +function isFixtureBrief(value: unknown): value is FixtureBrief { + if (typeof value !== "object" || value === null) { + return false + } + + const brief = value as Partial + return ( + brief.schemaVersion === 1 && + typeof brief.id === "string" && + /^brief-\d{3}$/u.test(brief.id) && + typeof brief.title === "string" && + brief.title.length > 0 && + isNonEmptyStringArray(brief.kernelTags) && + typeof brief.productBrief === "string" && + brief.productBrief.length > 0 && + isNonEmptyStringArray(brief.expectedStructuralObservations) && + isNonEmptyStringArray(brief.scriptedUserNotes) + ) +} + +function isNonEmptyStringArray(value: unknown): value is string[] { + return ( + Array.isArray(value) && + value.length > 0 && + value.every((item) => typeof item === "string" && item.length > 0) + ) +} From a302a20d0a8a938207e9018bb12d9e92dc25e3e2 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:57:13 +0200 Subject: [PATCH 09/18] FE-735: Scope capture seam review fixes --- memory/CARDS.md | 230 ++++++++++++++++++++++++++++++++++++++++++++++++ memory/PLAN.md | 2 +- memory/SPEC.md | 2 +- 3 files changed, 232 insertions(+), 2 deletions(-) create mode 100644 memory/CARDS.md diff --git a/memory/CARDS.md b/memory/CARDS.md new file mode 100644 index 00000000..c627991b --- /dev/null +++ b/memory/CARDS.md @@ -0,0 +1,230 @@ + + +# Scope Cards — FE-735 Capture-Seam Fixes and Remaining M1 Slices + +## Orientation + +- **Containing seam:** M1 fixture-capture seam over the transport/projection layer: `brunch --mode rpc`, named `workspace.*` / `session.*` handlers, `WorkspaceSessionCoordinator`, Pi JSONL transcript truth, and `.brunch-fixtures/` bundle output. +- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) remains the Linear/branch boundary. Prior cards established print/RPC/projection and seeded briefs, but review found the real capture path still loses session identity without a fake coordinator. +- **Volatile state:** `memory/CARDS.md` had been exhausted/deleted; `memory/SPEC.md` has an uncommitted lexicon cleanup (`Lens switch`). `npm run verify` is green, but an additional no-injected-coordinator probe showed `captureFixtureRun()` copies a freshly opened empty session rather than the session that was seeded. +- **Main open risk:** M1 could appear fixture-ready while the fixture driver only proves an injected test double path, not the real Brunch host/session path. +- **Frontier obligations:** keep `WorkspaceSessionCoordinator` as the owner of session/spec selection (D21-L); keep public RPC methods product-shaped rather than filesystem/generic data APIs (D5-L, D19-L); keep Pi JSONL as transcript truth with no chat/turn store (D6-L, D12-L, D13-L, I10-L); keep captured-run bundles forward-compatible without overclaiming graph/coherence artifacts. + +## Blocking instruction + +Complete Card 1 before any actual captured-run work for briefs #1–#3. Cards 2–4 are hardening/cleanup and can be committed before or after Card 1, but should land before tying off FE-735. Card 5 depends on Card 1. + +## Card 1 — Stable current-session capture path + +**Status:** next + +### Weight + +Full scope card — fixes the real product seam between coordinator-owned session state, RPC projection, and fixture capture. + +### Target Behavior + +`captureFixtureRun()` copies the same selected session that `session.elicitationExchanges` projects. + +### Boundary Crossings + +```text +→ fixture capture caller +→ RPC stdio client (`workspace.snapshot`, `session.elicitationExchanges`) +→ Brunch host/coordinator session selection +→ selected Pi JSONL session file +→ `.brunch-fixtures///` bundle writer +``` + +### Risks and Assumptions + +- RISK: `FileWorkspaceSessionCoordinator.openExisting()` creates a fresh session on each call, so separate RPC requests see different session files → MITIGATION: add a no-injected-coordinator regression test first; fix by making the capture/RPC path use one stable host/session context or by teaching the coordinator to reopen/select the actual current session instead of creating a new one for each read. +- RISK: The fix bypasses `WorkspaceSessionCoordinator` and hardcodes `.brunch/sessions` lookup in fixture capture → MITIGATION: keep session identity resolution behind the coordinator/host seam; fixture capture should remain an RPC client over product handlers. +- RISK: Stabilizing current-session identity mutates the M0 session-binding invariant → MITIGATION: run existing coordinator and store-oracle tests; ensure exactly one `brunch.session_binding` per selected session remains true. +- ASSUMPTION: M1 only needs stable current-session capture, not arbitrary historical session selection → VALIDATE: a temp workspace with one selected spec/session and assistant→user entries captures that same JSONL and reports `exchangeCount: 1` without injecting a fake coordinator. + +### Acceptance Criteria + +✓ `fixture-capture.test.ts` (or equivalent) creates a real coordinator-backed temp workspace, appends assistant→user messages to the selected session, calls `captureFixtureRun()` without passing `coordinator`, and observes a copied JSONL containing those messages. +✓ The resulting `.meta.json` has `projectionSummary.status: "ready"` and `exchangeCount: 1` for that real no-injection capture. +✓ `workspace.snapshot` and `session.elicitationExchanges` in one capture operation refer to the same session id/file. +✓ Existing `verifyWorkspaceSessionStores` / coordinator tests still prove one binding per session and no incompatible session rebinding. + +### Verification Approach + +- Inner: regression unit/integration tests — prove no-injected-coordinator capture uses the real selected session. +- Middle: store/projection oracle — inspect temp `.brunch/state.json`, source JSONL, copied JSONL, and metadata for matching session identity and exchange count. +- Outer: none for this fix. + +### Cross-cutting obligations + +- Do not make fixture capture a privileged direct store reader for product semantics; use RPC/product handlers for projection. +- Preserve `cwd → spec → session` hierarchy and one-spec-per-session binding. +- Keep stable current-session identity narrow; defer historical session selection unless a later scope needs it. + +## Card 2 — Reconcile fixture brief format docs + +**Status:** queued + +### Weight + +Light scope card — naming/documentation cleanup inside the established fixture area. + +### Objective + +Make fixture brief documentation and plan text agree with the implemented JSON brief format. + +### Acceptance Criteria + +✓ `.brunch-fixtures/README.md` describes JSON brief files and no longer says the directory is empty until M1. +✓ `memory/PLAN.md` no longer says `brief-library-curation` outputs YAML or validates brief YAML unless the implementation is changed back to YAML. +✓ README layout examples match the current `.brunch-fixtures/briefs/brief-001-*.json` naming style. + +### Verification Approach + +- Inner: `npm run fix` / `npm run verify`. +- Middle: doc grep/check — no stale “briefs are YAML” or “empty by design until M1” claims remain in touched fixture docs. + +### Cross-cutting obligations + +- Keep one canonical brief format for M1. +- Do not move brief curation into a loose examples folder; it remains under `.brunch-fixtures/briefs/`. + +### Promotion checklist + +- [ ] Does this change a requirement? No. +- [ ] Does this create, retire, or invalidate an assumption? No. +- [ ] Does this make or reverse a non-trivial design decision? No; it reconciles docs to existing implementation. +- [ ] Does this establish a new seam-level invariant? No. +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. +- [ ] Does it cross more than two major seams? No. +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. +- [ ] Can you not name the containing seam or current rationale from the live docs? No. + +## Card 3 — Pi session-entry source-of-truth typing cleanup + +**Status:** queued + +### Weight + +Light scope card — type-source hardening inside the already-established projection seam. + +### Objective + +Make `elicitation-exchange.ts` project from Pi-owned session entry types instead of restating Pi message entry shape locally. + +### Acceptance Criteria + +✓ `elicitation-exchange.ts` imports/projects from Pi exported session entry types (`FileEntry`, `SessionEntry`, `SessionMessageEntry`, `CustomEntry`, `CustomMessageEntry`, or whichever exported types fit) where available. +✓ Local interfaces are retained only for Brunch semantic outputs (`EntryRange`, `ElicitationExchange`, `ElicitationExchangeProjection`) or narrow trust-boundary parse results that add new meaning. +✓ Tests still cover real `SessionManager` JSONL, structured Brunch custom entries, and orphan response handling. + +### Verification Approach + +- Inner: typecheck/build plus projector unit tests. +- Middle: real `SessionManager` JSONL round-trip test remains the seam oracle. + +### Cross-cutting obligations + +- Apply source-of-truth typing: import/infer/project; do not duplicate Pi's state space unless establishing a trust-boundary parser. +- Keep Brunch projection types as the new semantic boundary. + +### Promotion checklist + +- [ ] Does this change a requirement? No. +- [ ] Does this create, retire, or invalidate an assumption? No. +- [ ] Does this make or reverse a non-trivial design decision? No. +- [ ] Does this establish a new seam-level invariant? No. +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. +- [ ] Does it cross more than two major seams? No. +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. +- [ ] Can you not name the containing seam or current rationale from the live docs? No. + +## Card 4 — Brunch-owned fixture metadata version + +**Status:** queued + +### Weight + +Light scope card — hardens metadata correctness at the fixture bundle boundary. + +### Objective + +Fixture metadata reports Brunch package version without reading the caller project's `package.json` by accident. + +### Acceptance Criteria + +✓ `readPackageVersion()` or its replacement resolves Brunch-owned package metadata from a stable module/package-root source, or writes an explicit `unknown`/omitted value when unavailable. +✓ A test proves capture from a temp cwd containing a conflicting `package.json` does not record that caller package version as `brunchVersion`. +✓ Metadata remains deterministic when a timestamp is supplied. + +### Verification Approach + +- Inner: fixture-capture unit/integration test. +- Middle: temp workspace metadata inspection. + +### Cross-cutting obligations + +- Captured-run metadata should describe the Brunch driver/runtime, not the user's project unless a distinct user-project metadata field is later added. +- Keep metadata minimal and forward-compatible with later `.graph.json` / `.coherence.json` artifacts. + +### Promotion checklist + +- [ ] Does this change a requirement? No. +- [ ] Does this create, retire, or invalidate an assumption? No. +- [ ] Does this make or reverse a non-trivial design decision? No. +- [ ] Does this establish a new seam-level invariant? No. +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. +- [ ] Does it cross more than two major seams? No. +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. +- [ ] Can you not name the containing seam or current rationale from the live docs? No. + +## Card 5 — Actual captured runs for briefs #1–#3 + +**Status:** queued + +### Weight + +Full scope card — completes the remaining fixture-capture claim for M1 after the capture seam is trustworthy. + +### Target Behavior + +Briefs #1–#3 each have a deterministic captured run bundle. + +### Boundary Crossings + +```text +→ seeded brief file +→ deterministic scripted-user/driver path +→ `brunch --mode rpc` stdio surface +→ selected Pi JSONL transcript and exchange projection +→ `.brunch-fixtures///` captured bundle +``` + +### Risks and Assumptions + +- RISK: Captured runs are hand-authored files rather than produced by the fixture driver → MITIGATION: provide a command/test path that invokes the driver and writes bundles reproducibly. +- RISK: The driver pretends to be a full LLM agent-as-user before the loop is ready → MITIGATION: keep runs deterministic/scripted for M1, with metadata explicitly saying scripted/deterministic; defer generative/adversarial runs. +- RISK: Captures overclaim graph/coherence outputs before M4/M8 → MITIGATION: produce `.jsonl` + `.meta.json` only and record graph/coherence artifacts as absent/deferred, not empty truth. +- ASSUMPTION: A deterministic scripted response path is sufficient to establish the first replay-regression layer → VALIDATE: three bundles replay through projection checks and have non-empty exchange summaries. + +### Acceptance Criteria + +✓ Running the fixture driver creates one run bundle for each of `brief-001`, `brief-002`, and `brief-003` under `.brunch-fixtures///`. +✓ Each bundle contains copied `.jsonl` and `.meta.json` artifacts; metadata names the brief id, run id, scripted/deterministic mode, session id, projection summary, and absent/deferred graph/coherence artifacts. +✓ A replay/projection test loads each captured `.jsonl` and asserts projection parity with the metadata summary. +✓ The capture path uses JSON-RPC stdio product methods rather than direct projection calls for the behavior it is proving. + +### Verification Approach + +- Inner: fixture driver and metadata tests — prove bundle creation and metadata shape. +- Middle: replay-regression fixture test — load captured JSONL bundles and assert projection parity for briefs #1–#3. +- Outer: none; qualitative LLM elicitation remains deferred. + +### Cross-cutting obligations + +- Establish the replay-regression layer without claiming property/adversarial layers are complete. +- Keep fixture outputs forward-compatible with `.graph.json` and `.coherence.json` while not creating fake placeholders as canonical truth. +- Keep generated run IDs stable or deterministic enough for review; if timestamped, tests should select runs through metadata, not brittle names. diff --git a/memory/PLAN.md b/memory/PLAN.md index cde9a305..8a51f54e 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -77,7 +77,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** review blockers and next M1 slices complete: real Pi JSONL exchange projection, product-scoped `session.elicitationExchanges`, RPC/print snapshot parity, fixture bundle capture skeleton, and briefs #1–#3 are implemented and verified. Remaining frontier work should scope actual captured runs for briefs #1–#3 and any agent-as-user driver behavior beyond deterministic bundle capture. +- **Current execution pointer:** review found the fixture-capture skeleton still loses real current-session identity without a fake coordinator. Follow `memory/CARDS.md`: fix stable current-session capture first, then reconcile fixture docs/typing/metadata and produce actual deterministic captured runs for briefs #1–#3. ### jsonl-session-viability diff --git a/memory/SPEC.md b/memory/SPEC.md index 69438195..a6c694e0 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -229,7 +229,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Elicitation exchange** | A derived projection over Pi JSONL: prompt-side span (system/assistant/tool-side entries since the prior user response) plus response-side span (the user's text and/or structured action entries). This is the observer's default extraction unit. | | **Structured elicitation entry** | Optional Brunch custom transcript entry used when an elicitation prompt or response carries actions, choices, or other deterministic UI structure. Plain generative prompts can remain ordinary Pi messages. | | **Observer job** | Durable async work item keyed by session id and elicitation-exchange entry-range ids. It analyzes an exchange for graph mutations or low-confidence suggestions, and survives process restart. | -| **Lens** | A switchable framing of the active agent (e.g. interview, clarify, oracle-active). Switches are durable transcript entries. | +| **Lens switch** | A durable `brunch.lens_switch` transcript entry recording that the active agent/session changed lenses. The switch event is distinct from the lens concept itself. | | **Side task** | A scoped sub-agent invocation whose result returns through the shared command layer. | | **World update** | `worldUpdate` custom message synthesised in `prepareNextTurn` summarising relevant graph changes since the session's `lastSeenLsn`. | | **Mention ledger** | Per-session `(entity_id, snapshotted_lsn)` record driving discretionary staleness hints when an entity has changed since the agent last saw it. | From 0d756f21610410da1b6a6c26541b66dc11f61e8f Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 11:59:56 +0200 Subject: [PATCH 10/18] FE-735: Stabilize current session capture --- memory/CARDS.md | 2 +- src/fixture-capture.test.ts | 45 ++++++++++++++++++++++++++++ src/workspace-session-coordinator.ts | 19 +++++++++++- 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index c627991b..b1f983ab 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -18,7 +18,7 @@ Complete Card 1 before any actual captured-run work for briefs #1–#3. Cards 2 ## Card 1 — Stable current-session capture path -**Status:** next +**Status:** done ### Weight diff --git a/src/fixture-capture.test.ts b/src/fixture-capture.test.ts index bd79e14a..03061ab4 100644 --- a/src/fixture-capture.test.ts +++ b/src/fixture-capture.test.ts @@ -8,6 +8,51 @@ import { createWorkspaceSessionCoordinator } from "./workspace-session-coordinat import { captureFixtureRun } from "./fixture-capture.js" describe("fixture capture", () => { + it("captures the coordinator-selected session without injecting a test coordinator", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-real-")) + const workspace = await createWorkspaceSessionCoordinator({ + cwd, + }).startOrCreate({ + specTitle: "Fixture spec", + }) + workspace.session.manager.appendMessage({ + role: "assistant", + content: "Real selected question", + }) + workspace.session.manager.appendMessage({ + role: "user", + content: "Real selected answer", + }) + + const result = await captureFixtureRun({ + cwd, + briefId: "brief-001", + runId: "run-001", + timestamp: "2026-05-21T00:00:00.000Z", + }) + + const copiedJsonl = await readFile(result.jsonlFile, "utf8") + const metadata = JSON.parse(await readFile(result.metaFile, "utf8")) as { + session: { + id: string + sourceFile: string + } + projectionSummary: { + status: string + exchangeCount: number + } + } + + expect(copiedJsonl).toContain("Real selected question") + expect(copiedJsonl).toContain("Real selected answer") + expect(metadata.session.id).toBe(workspace.session.id) + expect(metadata.session.sourceFile).toBe(workspace.session.file) + expect(metadata.projectionSummary).toMatchObject({ + status: "ready", + exchangeCount: 1, + }) + }) + it("captures a deterministic JSONL and metadata bundle through RPC", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-")) const workspace = await createWorkspaceSessionCoordinator({ diff --git a/src/workspace-session-coordinator.ts b/src/workspace-session-coordinator.ts index 5e7bcc84..4afe47ce 100644 --- a/src/workspace-session-coordinator.ts +++ b/src/workspace-session-coordinator.ts @@ -107,7 +107,7 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { } } - const session = await createBoundSession(this.#cwd, state.currentSpec) + const session = await openCurrentSession(this.#cwd, state.currentSpec) return readyState(this.#cwd, state.currentSpec, session) } @@ -178,6 +178,23 @@ async function createBoundSession( return bindSessionToSpec(manager, spec) } +async function openCurrentSession( + cwd: string, + spec: WorkspaceSpecState, +): Promise { + await ensureWorkspaceDirs(cwd) + const files = await listSessionFiles(cwd) + const manager = + files.length === 0 + ? SessionManager.create(cwd, sessionDir(cwd)) + : SessionManager.continueRecent(cwd, sessionDir(cwd)) + const sessionFile = manager.getSessionFile() + if (!sessionFile) { + throw new Error("Pi SessionManager did not open a persisted session file") + } + return bindSessionToSpec(manager, spec) +} + function bindSessionToSpec( manager: SessionManager, spec: WorkspaceSpecState, From 77b2899aa89460ac44cb8c0620733541b4a58bf8 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:01:27 +0200 Subject: [PATCH 11/18] FE-735: Reconcile fixture brief format docs --- .brunch-fixtures/README.md | 28 +++++++------- docs/architecture/fixture-strategy.md | 53 +++++++++++++-------------- memory/CARDS.md | 2 +- memory/PLAN.md | 4 +- 4 files changed, 44 insertions(+), 43 deletions(-) diff --git a/.brunch-fixtures/README.md b/.brunch-fixtures/README.md index 13d8d803..bd8dfe73 100644 --- a/.brunch-fixtures/README.md +++ b/.brunch-fixtures/README.md @@ -9,27 +9,29 @@ This directory is the on-disk home of the fixture strategy described in ``` .brunch-fixtures/ -├── briefs/ # Curated product briefs (YAML) -│ ├── offline-kanban.yaml -│ ├── role-based-doc-sharing.yaml +├── briefs/ # Curated product briefs (JSON) +│ ├── brief-001-identity-reference.json +│ ├── brief-002-state-lifecycle.json +│ ├── brief-003-derived-views.json │ └── ... └── / └── / ├── .jsonl # Captured transcript - ├── .graph.json # Captured graph state - ├── .coherence.json # Captured coherence verdict + needs - └── .meta.json # Brief id, persona dials, model, timestamps + ├── .meta.json # Brief id, driver mode, session, projection summary + ├── .graph.json # Deferred until the graph plane exists + └── .coherence.json # Deferred until coherence is first-class ``` ## Status -Empty by design until the `mode-shell-and-fixture-driver` frontier (M1) lands -the JSON-RPC stdio agent-as-user driver. Briefs may be authored ahead of that -under the `brief-library-curation` parallel frontier — see `memory/PLAN.md`. +The first M1 briefs live under `briefs/` as JSON files. Captured runs are added +under each brief id by the JSON-RPC stdio fixture driver. ## Conventions -- Briefs are short, human-readable YAML; the captured runs are the heavy data. -- Brief ids are kebab-case and stable; runs are timestamped or content-hashed. -- Property invariants from the fixture-strategy doc are checked on every - capture (replay regression, property regression, adversarial / generative). +- Briefs are short, human-readable JSON; the captured runs are the heavy data. +- Brief ids are kebab-case and stable; runs are timestamped, content-hashed, or + deterministic for reviewable scripted captures. +- Replay regression runs check transcript reproduction first. Property and + adversarial / generative checks come online as later milestones provide graph + and coherence artifacts. diff --git a/docs/architecture/fixture-strategy.md b/docs/architecture/fixture-strategy.md index 86d8421f..021b6e21 100644 --- a/docs/architecture/fixture-strategy.md +++ b/docs/architecture/fixture-strategy.md @@ -92,32 +92,31 @@ Briefs are short, human-readable, and curated. The run artefacts are the heavy d ### Brief fixture format -```yaml -# .brunch-fixtures/briefs/offline-kanban.yaml -id: offline-kanban -title: Offline Kanban Editing -brief: | - We want to build a Kanban tool that engineering teams can use offline. - Multiple people edit the same board. Cards move through workflow states. - Some columns have WIP limits. -persona: - style: collaborative # terse | verbose | collaborative | indecisive - domain_literacy: high # low | medium | high - patience: medium # affects how many follow-ups before frustration - change_mind_probability: 0.1 # per-turn probability of revising an earlier answer -expected_kernels: - - state_lifecycle - - containment_topology - - concurrency_collaboration - - resource_accounting - - derived_data_views - - temporal_history -expected_entity_coverage: - intent: [requirement, assumption, invariant, decision, example] - oracle: [check, validation_method] -known_branch_points: - - "What should happen on offline-edit conflict?" -known_invalidations: [] +`.brunch-fixtures/briefs/brief-002-state-lifecycle.json`: + +```json +{ + "schemaVersion": 1, + "id": "brief-002", + "title": "Offline Kanban Editing", + "kernelTags": [ + "state-lifecycle", + "containment-topology", + "concurrency-collaboration", + "resource-accounting", + "derived-data-views", + "temporal-history" + ], + "productBrief": "We want to build a Kanban tool that engineering teams can use offline. Multiple people edit the same board. Cards move through workflow states. Some columns have WIP limits.", + "scriptedUserNotes": [ + "I care about what happens when two people move the same card offline.", + "Some columns should enforce WIP limits." + ], + "deferredExpectations": { + "graph": "Later graph fixtures should cover lifecycle states, containment, and conflict decisions.", + "coherence": "Later coherence checks should flag unresolved offline-edit conflict policy." + } +} ``` ### Starter set (seven briefs) @@ -217,7 +216,7 @@ The fixture harness threads through the existing milestone ladder; it does not n | Milestone | Fixture work | | --- | --- | -| **M0** (walking skeleton + TUI) | Begin capturing briefs as YAML. Manually-driven runs at the TUI produce first JSONL captures. Briefs cost nothing to write; the longer the library, the more leverage later. | +| **M0** (walking skeleton + TUI) | Begin curating briefs as JSON. Manually-driven runs at the TUI produce first JSONL captures. Briefs cost nothing to write; the longer the library, the more leverage later. | | **M1** (mode shell: print + rpc) | Stand up the agent-as-user harness against `brunch --mode rpc`. First **replay regression** fixtures land here, asserting transcript reproduction only. Graph plane does not yet exist; assertions are transcript-shaped. | | **M2** (JSONL session viability) | The captured transcripts *are* the JSONL session files. The fixture library's reproducibility is part of M2's evidence. | | **M3** (web shell) | The same offer-response fixtures drive the web client through its WebSocket; free coverage of the web shell against known-good runs. | diff --git a/memory/CARDS.md b/memory/CARDS.md index b1f983ab..5669359a 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -66,7 +66,7 @@ Full scope card — fixes the real product seam between coordinator-owned sessio ## Card 2 — Reconcile fixture brief format docs -**Status:** queued +**Status:** done ### Weight diff --git a/memory/PLAN.md b/memory/PLAN.md index 8a51f54e..2c9d7371 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -191,9 +191,9 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** unassigned - **Kind:** bounded feature - **Status:** not-started -- **Objective:** Author and review briefs #4–#7 plus the adversarial second tier per fixture-strategy. Outputs are YAML briefs and one or two reviewer notes. +- **Objective:** Author and review briefs #4–#7 plus the adversarial second tier per fixture-strategy. Outputs are JSON briefs and one or two reviewer notes. - **Acceptance:** Briefs #1–#7 present in `.brunch-fixtures/briefs/`; adversarial briefs present with documented targets; expectations for brief #7 satisfied per fixture-strategy. -- **Verification:** Doc review against fixture-strategy expectations; schema/checker validation for brief YAML once available; spot-replay if the relevant harness milestone has landed. +- **Verification:** Doc review against fixture-strategy expectations; schema/checker validation for brief JSON once available; spot-replay if the relevant harness milestone has landed. - **Cross-cutting obligations:** Keep the brief corpus aligned with the canonical replay/property/adversarial fixture model rather than letting it drift into a loose examples folder. - **Traceability:** R20 / A5-L - **Design docs:** [fixture-strategy.md §Brief library](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) From 100cb70baedee012721765c2c56a9ff2dc79070c Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:02:20 +0200 Subject: [PATCH 12/18] FE-735: Use Pi session entry types for exchange projection --- memory/CARDS.md | 2 +- src/elicitation-exchange.ts | 58 ++++++++++++++++++++++--------------- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index 5669359a..6c83300c 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -105,7 +105,7 @@ Make fixture brief documentation and plan text agree with the implemented JSON b ## Card 3 — Pi session-entry source-of-truth typing cleanup -**Status:** queued +**Status:** done ### Weight diff --git a/src/elicitation-exchange.ts b/src/elicitation-exchange.ts index 15dbc5d1..2e83dbc7 100644 --- a/src/elicitation-exchange.ts +++ b/src/elicitation-exchange.ts @@ -1,5 +1,13 @@ import { readFile } from "node:fs/promises" +import type { + CustomEntry, + CustomMessageEntry, + FileEntry, + SessionEntry, + SessionMessageEntry, +} from "@earendil-works/pi-coding-agent" + const STRUCTURED_RESPONSE_TYPES = new Set([ "brunch.elicitation_response", "brunch.action_response", @@ -29,28 +37,18 @@ export interface ElicitationExchangeProjection { openPrompt: OpenPromptProjection | null } -interface TranscriptEntry { - id: string - type?: string - role?: string - customType?: string - message?: { - role?: string - } -} - export async function loadJsonlTranscriptEntries( file: string, -): Promise { +): Promise { const content = await readFile(file, "utf8") return content .split("\n") .filter((line) => line.trim().length > 0) - .map((line) => JSON.parse(line) as unknown) + .map((line) => JSON.parse(line) as FileEntry) } export function projectElicitationExchanges( - entries: unknown[], + entries: readonly unknown[], ): ElicitationExchangeProjection { const exchanges: ElicitationExchange[] = [] let promptIds: string[] = [] @@ -111,16 +109,18 @@ function rangeFor(ids: string[]): EntryRange { return { start: ids[0]!, end: ids[ids.length - 1]! } } -function isTranscriptEntry(value: unknown): value is TranscriptEntry { +function isTranscriptEntry(value: unknown): value is SessionEntry { return ( typeof value === "object" && value !== null && - typeof (value as { id?: unknown }).id === "string" + (value as { type?: unknown }).type !== "session" && + typeof (value as { id?: unknown }).id === "string" && + typeof (value as { type?: unknown }).type === "string" ) } -function isPromptSideEntry(entry: TranscriptEntry): boolean { - if (entry.type === "custom" && entry.customType?.includes("prompt")) { +function isPromptSideEntry(entry: SessionEntry): boolean { + if (isCustomTranscriptEntry(entry) && entry.customType.includes("prompt")) { return true } @@ -128,19 +128,29 @@ function isPromptSideEntry(entry: TranscriptEntry): boolean { return role === "assistant" || role === "system" || role === "tool" } -function isResponseSideEntry(entry: TranscriptEntry): boolean { +function isResponseSideEntry(entry: SessionEntry): boolean { if (roleOf(entry) === "user") { return true } return ( - entry.type === "custom" && - STRUCTURED_RESPONSE_TYPES.has(entry.customType ?? "") + isCustomTranscriptEntry(entry) && + STRUCTURED_RESPONSE_TYPES.has(entry.customType) ) } -function roleOf(entry: TranscriptEntry): string | undefined { - if (entry.type === "message") { - return entry.message?.role +function isCustomTranscriptEntry( + entry: SessionEntry, +): entry is CustomEntry | CustomMessageEntry { + return entry.type === "custom" || entry.type === "custom_message" +} + +function roleOf(entry: SessionEntry): string | undefined { + if (isMessageEntry(entry)) { + return entry.message.role } - return entry.role + return undefined +} + +function isMessageEntry(entry: SessionEntry): entry is SessionMessageEntry { + return entry.type === "message" } From 7c4215286e1e0b657651fe85beb7c6d675ab2cad Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:03:04 +0200 Subject: [PATCH 13/18] FE-735: Report Brunch-owned fixture metadata version --- memory/CARDS.md | 2 +- src/fixture-capture.test.ts | 36 +++++++++++++++++++++++++++++++++++- src/fixture-capture.ts | 23 +++++++++++++++++++---- 3 files changed, 55 insertions(+), 6 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index 6c83300c..ac44b971 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -144,7 +144,7 @@ Make `elicitation-exchange.ts` project from Pi-owned session entry types instead ## Card 4 — Brunch-owned fixture metadata version -**Status:** queued +**Status:** done ### Weight diff --git a/src/fixture-capture.test.ts b/src/fixture-capture.test.ts index 03061ab4..d6d25fd7 100644 --- a/src/fixture-capture.test.ts +++ b/src/fixture-capture.test.ts @@ -1,4 +1,4 @@ -import { mkdtemp, readFile } from "node:fs/promises" +import { mkdtemp, readFile, writeFile } from "node:fs/promises" import { tmpdir } from "node:os" import { join } from "node:path" import { describe, expect, it } from "vitest" @@ -53,6 +53,40 @@ describe("fixture capture", () => { }) }) + it("reports Brunch's package version, not the caller project's version", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-package-")) + await writeFile( + join(cwd, "package.json"), + `${JSON.stringify({ name: "caller-project", version: "9.9.9" })}\n`, + ) + const workspace = await createWorkspaceSessionCoordinator({ + cwd, + }).startOrCreate({ + specTitle: "Fixture spec", + }) + workspace.session.manager.appendMessage({ + role: "assistant", + content: "Question", + }) + workspace.session.manager.appendMessage({ role: "user", content: "Answer" }) + + const result = await captureFixtureRun({ + cwd, + briefId: "brief-001", + runId: "run-001", + timestamp: "2026-05-21T00:00:00.000Z", + }) + + const metadata = JSON.parse(await readFile(result.metaFile, "utf8")) as { + brunchVersion: string + timestamp: string + } + + expect(metadata.brunchVersion).toBe("0.0.0") + expect(metadata.brunchVersion).not.toBe("9.9.9") + expect(metadata.timestamp).toBe("2026-05-21T00:00:00.000Z") + }) + it("captures a deterministic JSONL and metadata bundle through RPC", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-")) const workspace = await createWorkspaceSessionCoordinator({ diff --git a/src/fixture-capture.ts b/src/fixture-capture.ts index 246e6286..ebcdc503 100644 --- a/src/fixture-capture.ts +++ b/src/fixture-capture.ts @@ -1,6 +1,7 @@ import { copyFile, mkdir, readFile, writeFile } from "node:fs/promises" -import { join } from "node:path" +import { dirname, join } from "node:path" import { PassThrough } from "node:stream" +import { fileURLToPath } from "node:url" import { runBrunchCli } from "./brunch.js" import type { ElicitationExchangeProjection } from "./elicitation-exchange.js" @@ -115,8 +116,22 @@ async function callRpc( } async function readPackageVersion(): Promise { - const packageJson = JSON.parse(await readFile("package.json", "utf8")) as { - version?: unknown + try { + const packageJson = JSON.parse( + await readFile( + join(dirname(fileURLToPath(import.meta.url)), "..", "package.json"), + "utf8", + ), + ) as { + version?: unknown + } + return typeof packageJson.version === "string" + ? packageJson.version + : "unknown" + } catch (error) { + if ((error as NodeJS.ErrnoException).code === "ENOENT") { + return "unknown" + } + throw error } - return typeof packageJson.version === "string" ? packageJson.version : "0.0.0" } From d6054ecba8410c9f5bbc5a46e1c5358a788558fd Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:07:27 +0200 Subject: [PATCH 14/18] FE-735: Capture scripted brief runs --- .../brief-001/scripted-001/scripted-001.jsonl | 4 + .../scripted-001/scripted-001.meta.json | 28 +++ .../brief-002/scripted-001/scripted-001.jsonl | 4 + .../scripted-001/scripted-001.meta.json | 28 +++ .../brief-003/scripted-001/scripted-001.jsonl | 4 + .../scripted-001/scripted-001.meta.json | 28 +++ memory/CARDS.md | 230 ------------------ memory/PLAN.md | 10 +- memory/SPEC.md | 6 +- src/fixture-capture.test.ts | 90 ++++++- src/fixture-capture.ts | 69 +++++- src/workspace-session-coordinator.ts | 36 ++- 12 files changed, 287 insertions(+), 250 deletions(-) create mode 100644 .brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl create mode 100644 .brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json create mode 100644 .brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl create mode 100644 .brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json create mode 100644 .brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl create mode 100644 .brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json delete mode 100644 memory/CARDS.md diff --git a/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl new file mode 100644 index 00000000..d134f20a --- /dev/null +++ b/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl @@ -0,0 +1,4 @@ +{"type":"session","version":3,"id":"019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44","timestamp":"2026-05-21T10:05:57.935Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"d9384697","parentId":null,"timestamp":"2026-05-21T10:05:57.936Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-001 — Team knowledge cards: A small team wants a shared workspace for knowledge cards. Each card has a stable identity, a human-readable title, and may link to other cards even when titles change.","display":true,"id":"8a3ef713","parentId":"d9384697","timestamp":"2026-05-21T10:05:57.938Z"} +{"type":"message","id":"ce24ef6c","parentId":"8a3ef713","timestamp":"2026-05-21T10:05:57.938Z","message":{"role":"user","content":"I care about renaming cards without breaking links.\nTwo cards can have similar titles, so titles cannot be the only reference.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json new file mode 100644 index 00000000..d62172ac --- /dev/null +++ b/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json @@ -0,0 +1,28 @@ +{ + "schemaVersion": 1, + "briefId": "brief-001", + "runId": "scripted-001", + "timestamp": "2026-05-21T00:00:00.000Z", + "brunchVersion": "0.0.0", + "session": { + "id": "019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-935Z_019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44.jsonl" + }, + "driver": { + "mode": "scripted-deterministic" + }, + "projectionSummary": { + "status": "ready", + "exchangeCount": 1, + "openPrompt": false + }, + "artifacts": { + "jsonl": "scripted-001.jsonl", + "graph": { + "status": "deferred" + }, + "coherence": { + "status": "deferred" + } + } +} diff --git a/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl new file mode 100644 index 00000000..67c20ece --- /dev/null +++ b/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl @@ -0,0 +1,4 @@ +{"type":"session","version":3,"id":"019e49ff-bf36-7d6e-9989-d3a13ae5a516","timestamp":"2026-05-21T10:05:57.942Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf36-7d6e-9989-d3a13ae5a516","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"f88af345","parentId":null,"timestamp":"2026-05-21T10:05:57.942Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-002 — Approval workflow for vendor invoices: A finance team needs invoices to move from draft to submitted to approved or rejected. Only budget owners can approve, and rejected invoices can be revised and resubmitted.","display":true,"id":"0b084148","parentId":"f88af345","timestamp":"2026-05-21T10:05:57.942Z"} +{"type":"message","id":"fd820c1e","parentId":"0b084148","timestamp":"2026-05-21T10:05:57.942Z","message":{"role":"user","content":"Rejected invoices are not terminal; they can go back to draft.\nApproved invoices should not be edited without reopening the workflow.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json new file mode 100644 index 00000000..661e8627 --- /dev/null +++ b/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json @@ -0,0 +1,28 @@ +{ + "schemaVersion": 1, + "briefId": "brief-002", + "runId": "scripted-001", + "timestamp": "2026-05-21T00:00:00.000Z", + "brunchVersion": "0.0.0", + "session": { + "id": "019e49ff-bf36-7d6e-9989-d3a13ae5a516", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-942Z_019e49ff-bf36-7d6e-9989-d3a13ae5a516.jsonl" + }, + "driver": { + "mode": "scripted-deterministic" + }, + "projectionSummary": { + "status": "ready", + "exchangeCount": 1, + "openPrompt": false + }, + "artifacts": { + "jsonl": "scripted-001.jsonl", + "graph": { + "status": "deferred" + }, + "coherence": { + "status": "deferred" + } + } +} diff --git a/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl new file mode 100644 index 00000000..0a35a069 --- /dev/null +++ b/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl @@ -0,0 +1,4 @@ +{"type":"session","version":3,"id":"019e49ff-bf39-7d1e-9221-6f84107b55be","timestamp":"2026-05-21T10:05:57.945Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf39-7d1e-9221-6f84107b55be","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"ccf205bc","parentId":null,"timestamp":"2026-05-21T10:05:57.945Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-003 — Project dashboard rollups: A product lead wants a dashboard that rolls task status, blockers, and recent decisions up from individual project notes into one current view.","display":true,"id":"95dcaadc","parentId":"ccf205bc","timestamp":"2026-05-21T10:05:57.945Z"} +{"type":"message","id":"bbdf5866","parentId":"95dcaadc","timestamp":"2026-05-21T10:05:57.945Z","message":{"role":"user","content":"If the source note changes, the dashboard should not silently stay stale.\nRecent decisions should show where they came from.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json new file mode 100644 index 00000000..49fdd360 --- /dev/null +++ b/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json @@ -0,0 +1,28 @@ +{ + "schemaVersion": 1, + "briefId": "brief-003", + "runId": "scripted-001", + "timestamp": "2026-05-21T00:00:00.000Z", + "brunchVersion": "0.0.0", + "session": { + "id": "019e49ff-bf39-7d1e-9221-6f84107b55be", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-945Z_019e49ff-bf39-7d1e-9221-6f84107b55be.jsonl" + }, + "driver": { + "mode": "scripted-deterministic" + }, + "projectionSummary": { + "status": "ready", + "exchangeCount": 1, + "openPrompt": false + }, + "artifacts": { + "jsonl": "scripted-001.jsonl", + "graph": { + "status": "deferred" + }, + "coherence": { + "status": "deferred" + } + } +} diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index ac44b971..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,230 +0,0 @@ - - -# Scope Cards — FE-735 Capture-Seam Fixes and Remaining M1 Slices - -## Orientation - -- **Containing seam:** M1 fixture-capture seam over the transport/projection layer: `brunch --mode rpc`, named `workspace.*` / `session.*` handlers, `WorkspaceSessionCoordinator`, Pi JSONL transcript truth, and `.brunch-fixtures/` bundle output. -- **Frontier item:** `mode-shell-and-fixture-driver` (FE-735) remains the Linear/branch boundary. Prior cards established print/RPC/projection and seeded briefs, but review found the real capture path still loses session identity without a fake coordinator. -- **Volatile state:** `memory/CARDS.md` had been exhausted/deleted; `memory/SPEC.md` has an uncommitted lexicon cleanup (`Lens switch`). `npm run verify` is green, but an additional no-injected-coordinator probe showed `captureFixtureRun()` copies a freshly opened empty session rather than the session that was seeded. -- **Main open risk:** M1 could appear fixture-ready while the fixture driver only proves an injected test double path, not the real Brunch host/session path. -- **Frontier obligations:** keep `WorkspaceSessionCoordinator` as the owner of session/spec selection (D21-L); keep public RPC methods product-shaped rather than filesystem/generic data APIs (D5-L, D19-L); keep Pi JSONL as transcript truth with no chat/turn store (D6-L, D12-L, D13-L, I10-L); keep captured-run bundles forward-compatible without overclaiming graph/coherence artifacts. - -## Blocking instruction - -Complete Card 1 before any actual captured-run work for briefs #1–#3. Cards 2–4 are hardening/cleanup and can be committed before or after Card 1, but should land before tying off FE-735. Card 5 depends on Card 1. - -## Card 1 — Stable current-session capture path - -**Status:** done - -### Weight - -Full scope card — fixes the real product seam between coordinator-owned session state, RPC projection, and fixture capture. - -### Target Behavior - -`captureFixtureRun()` copies the same selected session that `session.elicitationExchanges` projects. - -### Boundary Crossings - -```text -→ fixture capture caller -→ RPC stdio client (`workspace.snapshot`, `session.elicitationExchanges`) -→ Brunch host/coordinator session selection -→ selected Pi JSONL session file -→ `.brunch-fixtures///` bundle writer -``` - -### Risks and Assumptions - -- RISK: `FileWorkspaceSessionCoordinator.openExisting()` creates a fresh session on each call, so separate RPC requests see different session files → MITIGATION: add a no-injected-coordinator regression test first; fix by making the capture/RPC path use one stable host/session context or by teaching the coordinator to reopen/select the actual current session instead of creating a new one for each read. -- RISK: The fix bypasses `WorkspaceSessionCoordinator` and hardcodes `.brunch/sessions` lookup in fixture capture → MITIGATION: keep session identity resolution behind the coordinator/host seam; fixture capture should remain an RPC client over product handlers. -- RISK: Stabilizing current-session identity mutates the M0 session-binding invariant → MITIGATION: run existing coordinator and store-oracle tests; ensure exactly one `brunch.session_binding` per selected session remains true. -- ASSUMPTION: M1 only needs stable current-session capture, not arbitrary historical session selection → VALIDATE: a temp workspace with one selected spec/session and assistant→user entries captures that same JSONL and reports `exchangeCount: 1` without injecting a fake coordinator. - -### Acceptance Criteria - -✓ `fixture-capture.test.ts` (or equivalent) creates a real coordinator-backed temp workspace, appends assistant→user messages to the selected session, calls `captureFixtureRun()` without passing `coordinator`, and observes a copied JSONL containing those messages. -✓ The resulting `.meta.json` has `projectionSummary.status: "ready"` and `exchangeCount: 1` for that real no-injection capture. -✓ `workspace.snapshot` and `session.elicitationExchanges` in one capture operation refer to the same session id/file. -✓ Existing `verifyWorkspaceSessionStores` / coordinator tests still prove one binding per session and no incompatible session rebinding. - -### Verification Approach - -- Inner: regression unit/integration tests — prove no-injected-coordinator capture uses the real selected session. -- Middle: store/projection oracle — inspect temp `.brunch/state.json`, source JSONL, copied JSONL, and metadata for matching session identity and exchange count. -- Outer: none for this fix. - -### Cross-cutting obligations - -- Do not make fixture capture a privileged direct store reader for product semantics; use RPC/product handlers for projection. -- Preserve `cwd → spec → session` hierarchy and one-spec-per-session binding. -- Keep stable current-session identity narrow; defer historical session selection unless a later scope needs it. - -## Card 2 — Reconcile fixture brief format docs - -**Status:** done - -### Weight - -Light scope card — naming/documentation cleanup inside the established fixture area. - -### Objective - -Make fixture brief documentation and plan text agree with the implemented JSON brief format. - -### Acceptance Criteria - -✓ `.brunch-fixtures/README.md` describes JSON brief files and no longer says the directory is empty until M1. -✓ `memory/PLAN.md` no longer says `brief-library-curation` outputs YAML or validates brief YAML unless the implementation is changed back to YAML. -✓ README layout examples match the current `.brunch-fixtures/briefs/brief-001-*.json` naming style. - -### Verification Approach - -- Inner: `npm run fix` / `npm run verify`. -- Middle: doc grep/check — no stale “briefs are YAML” or “empty by design until M1” claims remain in touched fixture docs. - -### Cross-cutting obligations - -- Keep one canonical brief format for M1. -- Do not move brief curation into a loose examples folder; it remains under `.brunch-fixtures/briefs/`. - -### Promotion checklist - -- [ ] Does this change a requirement? No. -- [ ] Does this create, retire, or invalidate an assumption? No. -- [ ] Does this make or reverse a non-trivial design decision? No; it reconciles docs to existing implementation. -- [ ] Does this establish a new seam-level invariant? No. -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. -- [ ] Does it cross more than two major seams? No. -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. -- [ ] Can you not name the containing seam or current rationale from the live docs? No. - -## Card 3 — Pi session-entry source-of-truth typing cleanup - -**Status:** done - -### Weight - -Light scope card — type-source hardening inside the already-established projection seam. - -### Objective - -Make `elicitation-exchange.ts` project from Pi-owned session entry types instead of restating Pi message entry shape locally. - -### Acceptance Criteria - -✓ `elicitation-exchange.ts` imports/projects from Pi exported session entry types (`FileEntry`, `SessionEntry`, `SessionMessageEntry`, `CustomEntry`, `CustomMessageEntry`, or whichever exported types fit) where available. -✓ Local interfaces are retained only for Brunch semantic outputs (`EntryRange`, `ElicitationExchange`, `ElicitationExchangeProjection`) or narrow trust-boundary parse results that add new meaning. -✓ Tests still cover real `SessionManager` JSONL, structured Brunch custom entries, and orphan response handling. - -### Verification Approach - -- Inner: typecheck/build plus projector unit tests. -- Middle: real `SessionManager` JSONL round-trip test remains the seam oracle. - -### Cross-cutting obligations - -- Apply source-of-truth typing: import/infer/project; do not duplicate Pi's state space unless establishing a trust-boundary parser. -- Keep Brunch projection types as the new semantic boundary. - -### Promotion checklist - -- [ ] Does this change a requirement? No. -- [ ] Does this create, retire, or invalidate an assumption? No. -- [ ] Does this make or reverse a non-trivial design decision? No. -- [ ] Does this establish a new seam-level invariant? No. -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. -- [ ] Does it cross more than two major seams? No. -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. -- [ ] Can you not name the containing seam or current rationale from the live docs? No. - -## Card 4 — Brunch-owned fixture metadata version - -**Status:** done - -### Weight - -Light scope card — hardens metadata correctness at the fixture bundle boundary. - -### Objective - -Fixture metadata reports Brunch package version without reading the caller project's `package.json` by accident. - -### Acceptance Criteria - -✓ `readPackageVersion()` or its replacement resolves Brunch-owned package metadata from a stable module/package-root source, or writes an explicit `unknown`/omitted value when unavailable. -✓ A test proves capture from a temp cwd containing a conflicting `package.json` does not record that caller package version as `brunchVersion`. -✓ Metadata remains deterministic when a timestamp is supplied. - -### Verification Approach - -- Inner: fixture-capture unit/integration test. -- Middle: temp workspace metadata inspection. - -### Cross-cutting obligations - -- Captured-run metadata should describe the Brunch driver/runtime, not the user's project unless a distinct user-project metadata field is later added. -- Keep metadata minimal and forward-compatible with later `.graph.json` / `.coherence.json` artifacts. - -### Promotion checklist - -- [ ] Does this change a requirement? No. -- [ ] Does this create, retire, or invalidate an assumption? No. -- [ ] Does this make or reverse a non-trivial design decision? No. -- [ ] Does this establish a new seam-level invariant? No. -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? No. -- [ ] Does it cross more than two major seams? No. -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? No. -- [ ] Can you not name the containing seam or current rationale from the live docs? No. - -## Card 5 — Actual captured runs for briefs #1–#3 - -**Status:** queued - -### Weight - -Full scope card — completes the remaining fixture-capture claim for M1 after the capture seam is trustworthy. - -### Target Behavior - -Briefs #1–#3 each have a deterministic captured run bundle. - -### Boundary Crossings - -```text -→ seeded brief file -→ deterministic scripted-user/driver path -→ `brunch --mode rpc` stdio surface -→ selected Pi JSONL transcript and exchange projection -→ `.brunch-fixtures///` captured bundle -``` - -### Risks and Assumptions - -- RISK: Captured runs are hand-authored files rather than produced by the fixture driver → MITIGATION: provide a command/test path that invokes the driver and writes bundles reproducibly. -- RISK: The driver pretends to be a full LLM agent-as-user before the loop is ready → MITIGATION: keep runs deterministic/scripted for M1, with metadata explicitly saying scripted/deterministic; defer generative/adversarial runs. -- RISK: Captures overclaim graph/coherence outputs before M4/M8 → MITIGATION: produce `.jsonl` + `.meta.json` only and record graph/coherence artifacts as absent/deferred, not empty truth. -- ASSUMPTION: A deterministic scripted response path is sufficient to establish the first replay-regression layer → VALIDATE: three bundles replay through projection checks and have non-empty exchange summaries. - -### Acceptance Criteria - -✓ Running the fixture driver creates one run bundle for each of `brief-001`, `brief-002`, and `brief-003` under `.brunch-fixtures///`. -✓ Each bundle contains copied `.jsonl` and `.meta.json` artifacts; metadata names the brief id, run id, scripted/deterministic mode, session id, projection summary, and absent/deferred graph/coherence artifacts. -✓ A replay/projection test loads each captured `.jsonl` and asserts projection parity with the metadata summary. -✓ The capture path uses JSON-RPC stdio product methods rather than direct projection calls for the behavior it is proving. - -### Verification Approach - -- Inner: fixture driver and metadata tests — prove bundle creation and metadata shape. -- Middle: replay-regression fixture test — load captured JSONL bundles and assert projection parity for briefs #1–#3. -- Outer: none; qualitative LLM elicitation remains deferred. - -### Cross-cutting obligations - -- Establish the replay-regression layer without claiming property/adversarial layers are complete. -- Keep fixture outputs forward-compatible with `.graph.json` and `.coherence.json` while not creating fake placeholders as canonical truth. -- Keep generated run IDs stable or deterministic enough for review; if timestamped, tests should select runs through metadata, not brittle names. diff --git a/memory/PLAN.md b/memory/PLAN.md index 2c9d7371..4b001e13 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -20,11 +20,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### Active -1. `mode-shell-and-fixture-driver` — Adds `--mode print` and `--mode rpc` over thin named RPC method families, lands the first agent-as-user fixture-capture run end-to-end, seeds the first three briefs from BEHAVIORAL_KERNELS.md. +1. `jsonl-session-viability` — Proves whether pi JSONL sessions can hold raw payloads, Brunch session binding, structured elicitation entries, and continuity metadata faithfully across reload. ### Next -1. `jsonl-session-viability` — Proves whether pi JSONL sessions can hold raw payloads, Brunch session binding, structured elicitation entries, and continuity metadata faithfully across reload. +1. `web-shell` — M3. Browser as thin remote head over the same host, TanStack Router + Query, one WebSocket RPC client, no REST read model. ### Parallel / Low-conflict @@ -33,7 +33,6 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### Horizon -- `web-shell` — M3. Browser as thin remote head over the same host, TanStack Router + Query, one WebSocket RPC client, no REST read model. - `graph-data-plane` — M4. SQLite-backed graph persistence; intent-plane nodes/edges; graph clock; change log; coherence-state homes. - `agent-graph-integration` — M5. Graph tools and observer extraction through pi extension seams; all writes via the shared command layer. - `authority-model` — M6. Three-tier policy (autonomous / requires-confirmation / human-only) end-to-end across modes. @@ -69,7 +68,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** [FE-735](https://linear.app/hash/issue/FE-735/mode-shell-and-fixture-driver-m1) (sub-issue of FE-702) - **Branch:** `ln/fe-735-mode-shell-fixture-driver` (stacked on `ln/fe-729-walking-skeleton`) - **Kind:** structural -- **Status:** in-progress +- **Status:** done - **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. @@ -77,7 +76,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** review found the fixture-capture skeleton still loses real current-session identity without a fake coordinator. Follow `memory/CARDS.md`: fix stable current-session capture first, then reconcile fixture docs/typing/metadata and produce actual deterministic captured runs for briefs #1–#3. +- **Current execution pointer:** complete; proceed to `jsonl-session-viability`. ### jsonl-session-viability @@ -260,6 +259,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ## Recently Completed +- 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests. Watch: M2 should use these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store. - 2026-05-20 `walking-skeleton` — Done: Brunch now launches through a real pi-backed TUI boot path with coordinator-first spec gating, project-local `.brunch/` state, self-describing Pi JSONL sessions via exactly one `brunch.session_binding`, same-spec `/new` coverage, persistent cwd / spec / phase / chat-mode chrome through pi's extension widget seam, a bin shim, store-only runbook checker, and type-ownership hardening against Pi exported types. Verified: `npm run verify`, manual TUI smoke in a scratch project, automated TUI/coordinator tests, store-only runbook oracle, and manual file inspection. Watch: M1 should reuse the coordinator/session truth rather than recreating boot/session mechanics. - 2026-05-20 `pre-poc-archive-and-reseed` — Done: razed pre-POC implementation, archived legacy docs and planning memory under `archive/`, tagged `next-baseline`, reseeded `memory/SPEC.md` and `memory/PLAN.md` from the three canonical POC architecture docs. Verified: `git log --oneline` shows three clean buckets; `archive/` contains all prior material. Watch: Phase 3 infra bootstrap is folded into `walking-skeleton`, not a separate frontier. diff --git a/memory/SPEC.md b/memory/SPEC.md index a6c694e0..dfb9a59c 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -134,7 +134,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c #### Interaction & UI shape - **D11-L — Workspace state hierarchy `cwd → spec → session`, with spec selection gated before any agent loop.** Spec selection is durable across `/new` and persisted in `.brunch/state.json`. Each Pi session is bound to exactly one spec by a `brunch.session_binding` custom entry at session start; switching specs selects or creates another session rather than mutating the spec of the current session. Depends on: A10-L. Supersedes: —. -- **D21-L — Workspace session coordination is the spec/session boot seam.** Brunch owns a narrow `WorkspaceSessionCoordinator` for boot, spec selection, and `/new` session creation. It is the only product module allowed to create or open Pi sessions for Brunch user flows and the only module allowed to write `brunch.session_binding`; callers receive `ready | select_spec | needs_human` workspace-session state and never mutate a session's bound spec. The coordinator hides `SessionManager.create(cwd, ".brunch/sessions/")`, internal session-start binding for pi-created replacement sessions, `.brunch/state.json` current-spec acceleration, binding validation, and chrome-state derivation. Because pi defers appending session JSONL until an assistant message exists, the coordinator flushes Brunch's binding when it is created, refreshes it at `before_agent_start`, and performs the final pre-assistant flush from Brunch's internal assistant `message_start` hook after pi has persisted the user message but before assistant persistence; each flush reloads the session file so pi's next assistant append does not duplicate the already-written prefix. Depends on: D6-L, D11-L. Supersedes: the loose `SpecRegistry` + caller-orchestrated session-binding mental model. +- **D21-L — Workspace session coordination is the spec/session boot seam.** Brunch owns a narrow `WorkspaceSessionCoordinator` for boot, spec selection, selected-session reopening, and `/new` session creation. It is the only product module allowed to create or open Pi sessions for Brunch user flows and the only module allowed to write `brunch.session_binding`; callers receive `ready | select_spec | needs_human` workspace-session state and never mutate a session's bound spec. The coordinator hides `SessionManager.create/open/continueRecent(cwd, ".brunch/sessions/")`, internal session-start binding for pi-created replacement sessions, `.brunch/state.json` current-spec and current-session-file acceleration, binding validation, and chrome-state derivation. Because pi defers appending session JSONL until an assistant message exists, the coordinator flushes Brunch's binding when it is created, refreshes it at `before_agent_start`, and performs the final pre-assistant flush from Brunch's internal assistant `message_start` hook after pi has persisted the user message but before assistant persistence; each flush reloads the session file so pi's next assistant append does not duplicate the already-written prefix. Depends on: D6-L, D11-L. Supersedes: the loose `SpecRegistry` + caller-orchestrated session-binding mental model. - **D22-L — M0 TUI chrome rides pi's extension UI widget seam.** Brunch's initial persistent chrome is mounted by an internal Brunch extension using pi's public `ExtensionUIContext.setWidget(..., { placement: "aboveEditor" })`, while spec selection remains a Brunch-owned boot gate before `InteractiveMode.run()`. Brunch does not fork pi, monkeypatch `InteractiveMode`, or expose generic pi extension configuration to users for M0 chrome. Depends on: A10-L, D2-L, D21-L. Supersedes: private-header/monkeypatch approaches for M0 chrome. - **D12-L — Elicitation-first interaction, transcript-native structured prompts.** Brunch treats system/assistant prompts and user responses as Pi transcript truth. Structured action/choice/freeform surfaces may be represented by Brunch custom entries when needed, but there is no DB-owned prompt/response entity; at idle, the session waits on a system/assistant-originated elicitation prompt. Depends on: D6-L, D11-L. Supersedes: —. - **D13-L — Capture-aware elicitation exchange projection.** Observer extraction consumes derived elicitation exchanges: a prompt-side span (all system/assistant/tool-side entries since the previous user response, including any structured/internal prompt content) plus a response-side span (user text and/or structured action entries). Role/span alternation is the default projection; typed markers are added only where structure/actions need deterministic replay. Depends on: A12-L, D12-L. Supersedes: —. @@ -151,7 +151,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I5-L | For every `brunch.lens_switch` entry and every session/spec binding transition, the session interest set is recomputed before the next agent turn. | planned (M7 property test) | D11-L | | I6-L | Every reconciliation need has `created_at_lsn ≤` current global LSN; `kind='impasse'` needs reference at least two graph nodes; resolved needs carry a strictly later `resolved_at_lsn`. | planned (M8 property test) | D8-L, I1-L | | I7-L | Every `framing_as` value belongs to the allowed matrix for that node's base kind. | planned (fixture property check) | D7-L | -| I8-L | Spec selection persists across pi `switchSession` (i.e. `/new`); each session has exactly one `brunch.session_binding`, and a session's bound spec never changes. | partially covered (M0 coordinator/TUI boot integration tests + store-only runbook checker; manual TUI smoke and JSONL reload viability still planned) | D11-L | +| I8-L | Spec selection persists across pi `switchSession` (i.e. `/new`); the selected session file is reopened consistently by headless projection/capture paths; each session has exactly one `brunch.session_binding`, and a session's bound spec never changes. | partially covered (M0 coordinator/TUI boot integration tests + store-only runbook checker; M1 no-injected-coordinator capture regression; manual TUI smoke and JSONL reload viability still planned) | D11-L, D21-L | | I9-L | Every `brunch.mention` payload is anchored to a stable `id`; the ledger never stores title-anchored references. | planned (M7 invariant) | D14-L | | I10-L | Structured elicitation prompts/responses live in the Pi transcript when structure is needed; elicitation exchanges are projected from the active branch, and no parallel canonical chat/turn table carries elicitation state. | planned (M1+ projection invariant) | D12-L, D13-L, D18-L | | I11-L | No durable graph mutation path — including migrations, maintenance scripts, observer-job writes, or side-task-attributed writes — may bypass the `CommandExecutor` path that performs authority/result classification, version checks, structural validation, transaction execution, LSN allocation, and change-log append. | planned (M4 architectural + migration invariants; M5 caller-boundary tests) | D4-L, D15-L, D16-L, D20-L | @@ -207,7 +207,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Spec** | A specification workspace, identified by its intent-graph root. Lives under `.brunch/`. Multiple specs may coexist per project. | | **Session** | An elicitation transcript belonging to one spec. Backed by a pi JSONL session under `.brunch/sessions/`. A spec may have many sessions over time; a session never changes specs. | | **Session binding** | The first Brunch custom entry in a session that binds the Pi session id to exactly one spec id and schema version. Makes JSONL self-describing; registry/index state is an acceleration, not the canonical binding. | -| **Workspace session coordinator** | The Brunch boot seam that returns `ready | select_spec | needs_human` workspace-session state for a cwd/mode, owns spec selection and `/new`, creates/opens Pi sessions through `SessionManager`, writes `brunch.session_binding`, and derives chrome state for callers. | +| **Workspace session coordinator** | The Brunch boot seam that returns `ready | select_spec | needs_human` workspace-session state for a cwd/mode, owns spec selection, selected-session reopening, and `/new`, creates/opens Pi sessions through `SessionManager`, writes `brunch.session_binding`, persists current spec/session acceleration in `.brunch/state.json`, and derives chrome state for callers. | | **Workspace state hierarchy** | `cwd → spec → session`. Each level scopes the one below it; spec is selected before any agent loop runs and persists across `/new`. | | **Intent graph** | The canonical specification-meaning plane. Authority over what the system is for. | | **Oracle graph** | Verification-strategy plane accountable to intent. Houses Checks, Validation Methods, Evidence, Obligations. | diff --git a/src/fixture-capture.test.ts b/src/fixture-capture.test.ts index d6d25fd7..cebf4adf 100644 --- a/src/fixture-capture.test.ts +++ b/src/fixture-capture.test.ts @@ -5,7 +5,14 @@ import { describe, expect, it } from "vitest" import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" import { createWorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" -import { captureFixtureRun } from "./fixture-capture.js" +import { + loadJsonlTranscriptEntries, + projectElicitationExchanges, +} from "./elicitation-exchange.js" +import { + captureDeterministicBriefRuns, + captureFixtureRun, +} from "./fixture-capture.js" describe("fixture capture", () => { it("captures the coordinator-selected session without injecting a test coordinator", async () => { @@ -139,6 +146,9 @@ describe("fixture capture", () => { id: expect.any(String), sourceFile: expect.stringContaining(".brunch/sessions"), }, + driver: { + mode: "scripted-deterministic", + }, projectionSummary: { status: "ready", exchangeCount: 1, @@ -146,10 +156,88 @@ describe("fixture capture", () => { }, artifacts: { jsonl: "run-001.jsonl", + graph: { status: "deferred" }, + coherence: { status: "deferred" }, }, }) expect(await readFile(result.jsonlFile, "utf8")).toContain( '"role":"assistant"', ) }) + + it("replays captured brief bundles through exchange projection", async () => { + for (const briefId of ["brief-001", "brief-002", "brief-003"]) { + const runId = "scripted-001" + const runDir = join(".brunch-fixtures", briefId, runId) + const metadata = JSON.parse( + await readFile(join(runDir, `${runId}.meta.json`), "utf8"), + ) as { + briefId: string + runId: string + projectionSummary: { + status: string + exchangeCount: number + openPrompt: boolean + } + } + const projection = projectElicitationExchanges( + await loadJsonlTranscriptEntries(join(runDir, `${runId}.jsonl`)), + ) + + expect(metadata.briefId).toBe(briefId) + expect(metadata.runId).toBe(runId) + expect({ + status: projection.status, + exchangeCount: projection.exchanges.length, + openPrompt: projection.openPrompt !== null, + }).toEqual(metadata.projectionSummary) + } + }) + + it("captures deterministic runs for the first three briefs", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-fixture-driver-")) + + const results = await captureDeterministicBriefRuns({ + cwd, + briefsDir: ".brunch-fixtures/briefs", + runId: "scripted-001", + timestamp: "2026-05-21T00:00:00.000Z", + }) + + expect(results).toHaveLength(3) + for (const result of results) { + const metadata = JSON.parse(await readFile(result.metaFile, "utf8")) as { + briefId: string + runId: string + driver: { mode: string } + session: { id: string } + projectionSummary: { + status: string + exchangeCount: number + openPrompt: boolean + } + artifacts: { + jsonl: string + graph: { status: string } + coherence: { status: string } + } + } + expect(metadata.runId).toBe("scripted-001") + expect(metadata.driver.mode).toBe("scripted-deterministic") + expect(metadata.session.id).toEqual(expect.any(String)) + expect(metadata.projectionSummary).toEqual({ + status: "ready", + exchangeCount: 1, + openPrompt: false, + }) + expect(metadata.artifacts).toEqual({ + jsonl: "scripted-001.jsonl", + graph: { status: "deferred" }, + coherence: { status: "deferred" }, + }) + expect(await readFile(result.jsonlFile, "utf8")).toContain( + metadata.briefId, + ) + } + }) }) diff --git a/src/fixture-capture.ts b/src/fixture-capture.ts index ebcdc503..1dda74a1 100644 --- a/src/fixture-capture.ts +++ b/src/fixture-capture.ts @@ -3,10 +3,14 @@ import { dirname, join } from "node:path" import { PassThrough } from "node:stream" import { fileURLToPath } from "node:url" +import { loadBriefLibrary, type FixtureBrief } from "./brief-library.js" import { runBrunchCli } from "./brunch.js" import type { ElicitationExchangeProjection } from "./elicitation-exchange.js" import type { WorkspaceSnapshot } from "./print-snapshot.js" -import type { WorkspaceSessionCoordinator } from "./workspace-session-coordinator.js" +import { + createWorkspaceSessionCoordinator, + type WorkspaceSessionCoordinator, +} from "./workspace-session-coordinator.js" export interface FixtureCaptureOptions { cwd: string @@ -22,6 +26,13 @@ export interface FixtureCaptureResult { metaFile: string } +export interface DeterministicBriefRunOptions { + cwd: string + briefsDir?: string + runId?: string + timestamp?: string +} + interface JsonRpcResponse { result?: T error?: { @@ -69,6 +80,9 @@ export async function captureFixtureRun( id: workspace.session.id, sourceFile: workspace.session.file, }, + driver: { + mode: "scripted-deterministic", + }, projectionSummary: { status: projection.status, exchangeCount: projection.exchanges.length, @@ -76,6 +90,8 @@ export async function captureFixtureRun( }, artifacts: { jsonl: `${options.runId}.jsonl`, + graph: { status: "deferred" }, + coherence: { status: "deferred" }, }, }, null, @@ -87,6 +103,57 @@ export async function captureFixtureRun( return { runDir, jsonlFile, metaFile } } +export async function captureDeterministicBriefRuns( + options: DeterministicBriefRunOptions, +): Promise { + const briefs = await loadBriefLibrary( + options.briefsDir ?? join(options.cwd, ".brunch-fixtures", "briefs"), + ) + const coordinator = createWorkspaceSessionCoordinator({ cwd: options.cwd }) + const results: FixtureCaptureResult[] = [] + + for (const brief of briefs) { + const workspace = await openScriptedBriefSession(coordinator, brief) + workspace.session.manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + `Elicitation prompt for ${brief.id} — ${brief.title}: ${brief.productBrief}`, + true, + ) + workspace.session.manager.appendMessage({ + role: "user", + content: brief.scriptedUserNotes.join("\n"), + timestamp: Date.parse(options.timestamp ?? new Date().toISOString()), + }) + await coordinator.bindCurrentSpecToSession(workspace.session.manager) + + results.push( + await captureFixtureRun({ + cwd: options.cwd, + briefId: brief.id, + runId: options.runId ?? "scripted-001", + ...(options.timestamp ? { timestamp: options.timestamp } : {}), + }), + ) + } + + return results +} + +async function openScriptedBriefSession( + coordinator: WorkspaceSessionCoordinator, + brief: FixtureBrief, +) { + const existing = await coordinator.openExisting() + if (existing.status === "ready") { + const next = await coordinator.createNewSessionForCurrentSpec() + if (next.status === "ready") { + return next + } + } + + return coordinator.startOrCreate({ specTitle: brief.title }) +} + async function callRpc( options: FixtureCaptureOptions, method: string, diff --git a/src/workspace-session-coordinator.ts b/src/workspace-session-coordinator.ts index 4afe47ce..d50042c7 100644 --- a/src/workspace-session-coordinator.ts +++ b/src/workspace-session-coordinator.ts @@ -23,6 +23,7 @@ export interface WorkspaceSpecState { interface WorkspaceStateFile { schemaVersion: 1 currentSpec: WorkspaceSpecState + currentSessionFile?: string } interface SessionBindingData { @@ -107,7 +108,12 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { } } - const session = await openCurrentSession(this.#cwd, state.currentSpec) + const session = await openCurrentSession( + this.#cwd, + state.currentSpec, + state.currentSessionFile, + ) + await writeCurrentWorkspaceState(this.#cwd, state.currentSpec, session.file) return readyState(this.#cwd, state.currentSpec, session) } @@ -117,14 +123,8 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { await ensureWorkspaceDirs(this.#cwd) const existing = await readWorkspaceState(this.#cwd) const spec = existing?.currentSpec ?? createSpec(options?.specTitle) - if (!existing) { - await writeWorkspaceState(this.#cwd, { - schemaVersion: STATE_SCHEMA_VERSION, - currentSpec: spec, - }) - } - const session = await createBoundSession(this.#cwd, spec) + await writeCurrentWorkspaceState(this.#cwd, spec, session.file) return readyState(this.#cwd, spec, session) } @@ -140,6 +140,7 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { } const session = await createBoundSession(this.#cwd, state.currentSpec) + await writeCurrentWorkspaceState(this.#cwd, state.currentSpec, session.file) return readyState(this.#cwd, state.currentSpec, session) } @@ -152,6 +153,7 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { } const session = bindSessionToSpec(manager, state.currentSpec) + await writeCurrentWorkspaceState(this.#cwd, state.currentSpec, session.file) return readyState(this.#cwd, state.currentSpec, session) } @@ -181,11 +183,13 @@ async function createBoundSession( async function openCurrentSession( cwd: string, spec: WorkspaceSpecState, + currentSessionFile: string | undefined, ): Promise { await ensureWorkspaceDirs(cwd) const files = await listSessionFiles(cwd) - const manager = - files.length === 0 + const manager = currentSessionFile + ? SessionManager.open(currentSessionFile, sessionDir(cwd), cwd) + : files.length === 0 ? SessionManager.create(cwd, sessionDir(cwd)) : SessionManager.continueRecent(cwd, sessionDir(cwd)) const sessionFile = manager.getSessionFile() @@ -285,6 +289,18 @@ async function writeWorkspaceState( await writeFile(statePath(cwd), `${JSON.stringify(state, null, 2)}\n`, "utf8") } +async function writeCurrentWorkspaceState( + cwd: string, + spec: WorkspaceSpecState, + currentSessionFile: string, +): Promise { + await writeWorkspaceState(cwd, { + schemaVersion: STATE_SCHEMA_VERSION, + currentSpec: spec, + currentSessionFile, + }) +} + function readyState( cwd: string, spec: WorkspaceSpecState, From a1f63ca555e4318379bbabbfc08f63fe6002899c Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:27:37 +0200 Subject: [PATCH 15/18] FE-735: Bind scripted captures per brief --- .../brief-001/scripted-001/scripted-001.jsonl | 8 +- .../scripted-001/scripted-001.meta.json | 4 +- .../brief-002/scripted-001/scripted-001.jsonl | 8 +- .../scripted-001/scripted-001.meta.json | 4 +- .../brief-003/scripted-001/scripted-001.jsonl | 8 +- .../scripted-001/scripted-001.meta.json | 4 +- memory/CARDS.md | 114 ++++++++++++++++++ memory/PLAN.md | 4 +- src/fixture-capture.test.ts | 47 +++++++- src/fixture-capture.ts | 13 +- src/workspace-session-coordinator.ts | 7 +- 11 files changed, 190 insertions(+), 31 deletions(-) create mode 100644 memory/CARDS.md diff --git a/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl index d134f20a..f12ac2fb 100644 --- a/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl +++ b/.brunch-fixtures/brief-001/scripted-001/scripted-001.jsonl @@ -1,4 +1,4 @@ -{"type":"session","version":3,"id":"019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44","timestamp":"2026-05-21T10:05:57.935Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} -{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"d9384697","parentId":null,"timestamp":"2026-05-21T10:05:57.936Z"} -{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-001 — Team knowledge cards: A small team wants a shared workspace for knowledge cards. Each card has a stable identity, a human-readable title, and may link to other cards even when titles change.","display":true,"id":"8a3ef713","parentId":"d9384697","timestamp":"2026-05-21T10:05:57.938Z"} -{"type":"message","id":"ce24ef6c","parentId":"8a3ef713","timestamp":"2026-05-21T10:05:57.938Z","message":{"role":"user","content":"I care about renaming cards without breaking links.\nTwo cards can have similar titles, so titles cannot be the only reference.","timestamp":1779321600000}} +{"type":"session","version":3,"id":"019e4a13-1a50-7eb3-a4f7-644d4bff42bd","timestamp":"2026-05-21T10:27:06.448Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e4a13-1a50-7eb3-a4f7-644d4bff42bd","specId":"spec-a3e8371c-af78-45ee-8b1a-7752317caa35","specTitle":"Team knowledge cards"},"id":"fc95cd0b","parentId":null,"timestamp":"2026-05-21T10:27:06.449Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-001 — Team knowledge cards: A small team wants a shared workspace for knowledge cards. Each card has a stable identity, a human-readable title, and may link to other cards even when titles change.","display":true,"id":"101e1d31","parentId":"fc95cd0b","timestamp":"2026-05-21T10:27:06.451Z"} +{"type":"message","id":"71549d11","parentId":"101e1d31","timestamp":"2026-05-21T10:27:06.451Z","message":{"role":"user","content":"I care about renaming cards without breaking links.\nTwo cards can have similar titles, so titles cannot be the only reference.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json index d62172ac..f32de479 100644 --- a/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json +++ b/.brunch-fixtures/brief-001/scripted-001/scripted-001.meta.json @@ -5,8 +5,8 @@ "timestamp": "2026-05-21T00:00:00.000Z", "brunchVersion": "0.0.0", "session": { - "id": "019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44", - "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-935Z_019e49ff-bf2f-7b7b-bcb1-3c5a89e76f44.jsonl" + "id": "019e4a13-1a50-7eb3-a4f7-644d4bff42bd", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-27-06-448Z_019e4a13-1a50-7eb3-a4f7-644d4bff42bd.jsonl" }, "driver": { "mode": "scripted-deterministic" diff --git a/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl index 67c20ece..a8831a6e 100644 --- a/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl +++ b/.brunch-fixtures/brief-002/scripted-001/scripted-001.jsonl @@ -1,4 +1,4 @@ -{"type":"session","version":3,"id":"019e49ff-bf36-7d6e-9989-d3a13ae5a516","timestamp":"2026-05-21T10:05:57.942Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} -{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf36-7d6e-9989-d3a13ae5a516","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"f88af345","parentId":null,"timestamp":"2026-05-21T10:05:57.942Z"} -{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-002 — Approval workflow for vendor invoices: A finance team needs invoices to move from draft to submitted to approved or rejected. Only budget owners can approve, and rejected invoices can be revised and resubmitted.","display":true,"id":"0b084148","parentId":"f88af345","timestamp":"2026-05-21T10:05:57.942Z"} -{"type":"message","id":"fd820c1e","parentId":"0b084148","timestamp":"2026-05-21T10:05:57.942Z","message":{"role":"user","content":"Rejected invoices are not terminal; they can go back to draft.\nApproved invoices should not be edited without reopening the workflow.","timestamp":1779321600000}} +{"type":"session","version":3,"id":"019e4a13-1a57-77cd-8c77-43922404adca","timestamp":"2026-05-21T10:27:06.455Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e4a13-1a57-77cd-8c77-43922404adca","specId":"spec-b37cabd4-12f5-4cea-bcbf-90a170e6f058","specTitle":"Approval workflow for vendor invoices"},"id":"99670cde","parentId":null,"timestamp":"2026-05-21T10:27:06.455Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-002 — Approval workflow for vendor invoices: A finance team needs invoices to move from draft to submitted to approved or rejected. Only budget owners can approve, and rejected invoices can be revised and resubmitted.","display":true,"id":"99dc94b3","parentId":"99670cde","timestamp":"2026-05-21T10:27:06.455Z"} +{"type":"message","id":"e0baa29d","parentId":"99dc94b3","timestamp":"2026-05-21T10:27:06.455Z","message":{"role":"user","content":"Rejected invoices are not terminal; they can go back to draft.\nApproved invoices should not be edited without reopening the workflow.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json index 661e8627..dc67b14a 100644 --- a/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json +++ b/.brunch-fixtures/brief-002/scripted-001/scripted-001.meta.json @@ -5,8 +5,8 @@ "timestamp": "2026-05-21T00:00:00.000Z", "brunchVersion": "0.0.0", "session": { - "id": "019e49ff-bf36-7d6e-9989-d3a13ae5a516", - "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-942Z_019e49ff-bf36-7d6e-9989-d3a13ae5a516.jsonl" + "id": "019e4a13-1a57-77cd-8c77-43922404adca", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-27-06-455Z_019e4a13-1a57-77cd-8c77-43922404adca.jsonl" }, "driver": { "mode": "scripted-deterministic" diff --git a/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl b/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl index 0a35a069..76e4a2d5 100644 --- a/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl +++ b/.brunch-fixtures/brief-003/scripted-001/scripted-001.jsonl @@ -1,4 +1,4 @@ -{"type":"session","version":3,"id":"019e49ff-bf39-7d1e-9221-6f84107b55be","timestamp":"2026-05-21T10:05:57.945Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} -{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e49ff-bf39-7d1e-9221-6f84107b55be","specId":"spec-f99bc5cb-f9e1-4fbb-a34e-60452b4eecc6","specTitle":"Team knowledge cards"},"id":"ccf205bc","parentId":null,"timestamp":"2026-05-21T10:05:57.945Z"} -{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-003 — Project dashboard rollups: A product lead wants a dashboard that rolls task status, blockers, and recent decisions up from individual project notes into one current view.","display":true,"id":"95dcaadc","parentId":"ccf205bc","timestamp":"2026-05-21T10:05:57.945Z"} -{"type":"message","id":"bbdf5866","parentId":"95dcaadc","timestamp":"2026-05-21T10:05:57.945Z","message":{"role":"user","content":"If the source note changes, the dashboard should not silently stay stale.\nRecent decisions should show where they came from.","timestamp":1779321600000}} +{"type":"session","version":3,"id":"019e4a13-1a59-712e-82ff-78900f655901","timestamp":"2026-05-21T10:27:06.457Z","cwd":"/Users/lunelson/Code/hashintel/brunch-next"} +{"type":"custom","customType":"brunch.session_binding","data":{"schemaVersion":1,"sessionId":"019e4a13-1a59-712e-82ff-78900f655901","specId":"spec-b804de5f-762e-4d27-a538-8b8d38139bec","specTitle":"Project dashboard rollups"},"id":"74df4b55","parentId":null,"timestamp":"2026-05-21T10:27:06.457Z"} +{"type":"custom_message","customType":"brunch.elicitation_prompt","content":"Elicitation prompt for brief-003 — Project dashboard rollups: A product lead wants a dashboard that rolls task status, blockers, and recent decisions up from individual project notes into one current view.","display":true,"id":"72fa4dfe","parentId":"74df4b55","timestamp":"2026-05-21T10:27:06.457Z"} +{"type":"message","id":"d47729ba","parentId":"72fa4dfe","timestamp":"2026-05-21T10:27:06.457Z","message":{"role":"user","content":"If the source note changes, the dashboard should not silently stay stale.\nRecent decisions should show where they came from.","timestamp":1779321600000}} diff --git a/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json b/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json index 49fdd360..df02c92b 100644 --- a/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json +++ b/.brunch-fixtures/brief-003/scripted-001/scripted-001.meta.json @@ -5,8 +5,8 @@ "timestamp": "2026-05-21T00:00:00.000Z", "brunchVersion": "0.0.0", "session": { - "id": "019e49ff-bf39-7d1e-9221-6f84107b55be", - "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-05-57-945Z_019e49ff-bf39-7d1e-9221-6f84107b55be.jsonl" + "id": "019e4a13-1a59-712e-82ff-78900f655901", + "sourceFile": "/Users/lunelson/Code/hashintel/brunch-next/.brunch/sessions/2026-05-21T10-27-06-457Z_019e4a13-1a59-712e-82ff-78900f655901.jsonl" }, "driver": { "mode": "scripted-deterministic" diff --git a/memory/CARDS.md b/memory/CARDS.md new file mode 100644 index 00000000..11feada2 --- /dev/null +++ b/memory/CARDS.md @@ -0,0 +1,114 @@ +# Scope cards — FE-735 M1 review fixes + +## Orientation + +- Containing seam: M1 mode shell / fixture driver, especially the JSON-RPC fixture capture path and elicitation-exchange projection over Pi JSONL. +- Frontier item: `mode-shell-and-fixture-driver` / FE-735; these are review-fix slices inside the same frontier, not new Linear or branch units. +- Volatile state: `HANDOFF.md` remains transfer-only and untracked; the completed M1 queue was deleted, but review found blocking correctness issues before M1 tie-off. +- Main open risk: the first golden fixture captures can look structurally valid while encoding the wrong product state, so fixes must assert semantic binding/projection parity rather than only file existence. +- Cross-cutting obligations: preserve thin named RPC methods over projection handlers, Pi JSONL as transcript truth, no canonical chat/turn store, source-of-truth typing from Pi session entries, and the replay/runbook oracle layer for M1. + +## Card 1 — status: done + +### Objective + +Scripted captures for briefs #1–#3 bind each captured session to the brief being captured. + +### Acceptance Criteria + +✓ Capturing deterministic runs for briefs #1–#3 produces JSONL whose single `brunch.session_binding.data.specTitle` matches the corresponding brief title. +✓ Capturing deterministic runs for briefs #1–#3 does not reuse brief #1's spec id/title for later brief sessions. +✓ The committed `scripted-001` bundles for briefs #1–#3 are regenerated so their bindings, prompts, metadata, and projection summaries agree. + +### Verification Approach + +- Inner: fixture-capture tests — assert per-brief binding/title/id semantics and existing projection metadata parity. +- Middle: fixture replay/parity test over committed bundles — catches stale golden files after regeneration. +- Outer: human brief-quality review remains manual/qualitative after the scripted oracle confirms structural correctness. + +### Cross-cutting obligations + +- Captured runs are replay-regression seeds, not generic examples; they must not smuggle wrong workspace/spec/session state into M2. +- Keep fixture capture routed through the coordinator/RPC-selected session path; do not reintroduce an injected-coordinator-only capture path. +- Preserve JSONL transcript truth and avoid any parallel chat/turn representation. + +### Promotion checklist + +- [ ] Does this change a requirement? +- [ ] Does this create, retire, or invalidate an assumption? +- [ ] Does this make or reverse a non-trivial design decision? +- [ ] Does this establish a new seam-level invariant? +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? +- [ ] Does it cross more than two major seams? +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? +- [ ] Can you not name the containing seam or current rationale from the live docs? + +## Card 2 — status: next + +### Objective + +Elicitation exchange projection treats Pi `toolResult` messages as prompt-side transcript entries using Pi-owned message roles. + +### Acceptance Criteria + +✓ A Pi-shaped `toolResult` message between assistant prompt and user response is included in the prompt-side entry range. +✓ The projector no longer checks for a non-canonical `tool` message role. +✓ The role helper preserves the Pi-owned message role union closely enough that future role-literal drift is type-visible instead of widened away unnecessarily. + +### Verification Approach + +- Inner: elicitation-exchange unit test with a real Pi-shaped tool-result entry. +- Inner: type-aware lint/build — verifies source-of-truth typing remains imported/projected from Pi rather than locally restated. +- Middle: existing RPC/session projection tests — ensure the handler still returns product-shaped exchanges from the selected session. + +### Cross-cutting obligations + +- Pi session entry and agent-message types own transcript shape; Brunch owns only the semantic `ElicitationExchange` projection. +- Preserve D13 prompt-side span semantics: system/assistant/tool-side entries since the previous user response belong to the prompt span. + +### Promotion checklist + +- [ ] Does this change a requirement? +- [ ] Does this create, retire, or invalidate an assumption? +- [ ] Does this make or reverse a non-trivial design decision? +- [ ] Does this establish a new seam-level invariant? +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? +- [ ] Does it cross more than two major seams? +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? +- [ ] Can you not name the containing seam or current rationale from the live docs? + +## Card 3 — status: next + +### Objective + +M1 manual verification is available as a single runbook command that prints expected outcomes and actual observed outputs. + +### Acceptance Criteria + +✓ `./runbooks/verify-m1.sh` runs from the repository root and prints clearly separated `Expected outputs` and `Actual outputs` sections. +✓ The runbook checks per-brief binding/title alignment, committed bundle metadata/projection parity, print-mode smoke output, and RPC `workspace.snapshot` / `session.elicitationExchanges` smoke output. +✓ The runbook exits nonzero on structural failures while still printing enough actual output for quick human diagnosis. +✓ The runbook includes explicit human-review prompts for qualitative judgments that cannot be fully automated yet, such as brief quality and golden-capture representativeness. + +### Verification Approach + +- Inner: runbook script smoke from tests or a direct command in the build slice — proves the command executes in a clean repo checkout. +- Middle: runbook oracle — checks durable artifacts and projection/RPC surfaces, matching SPEC §Runbook Oracle Design. +- Outer: human uses the runbook output to approve fixture representativeness and product shape. + +### Cross-cutting obligations + +- Runbooks are executable oracles over canonical stores/projection handlers, not ad hoc manual notes. +- Keep the output product-shaped; do not turn the runbook into a generic file dump. +- The runbook should aid manual judgment without pretending to automate LLM/brief quality review completely. + +### Promotion checklist + +- [ ] Does this change a requirement? +- [ ] Does this create, retire, or invalidate an assumption? +- [ ] Does this make or reverse a non-trivial design decision? +- [ ] Does this establish a new seam-level invariant? +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? +- [ ] Does it cross more than two major seams? +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? +- [ ] Can you not name the containing seam or current rationale from the live docs? diff --git a/memory/PLAN.md b/memory/PLAN.md index 4b001e13..888ffed2 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -68,7 +68,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** [FE-735](https://linear.app/hash/issue/FE-735/mode-shell-and-fixture-driver-m1) (sub-issue of FE-702) - **Branch:** `ln/fe-735-mode-shell-fixture-driver` (stacked on `ln/fe-729-walking-skeleton`) - **Kind:** structural -- **Status:** done +- **Status:** review-fix - **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. @@ -76,7 +76,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** complete; proceed to `jsonl-session-viability`. +- **Current execution pointer:** review-fix queue in `memory/CARDS.md`; complete those cards before proceeding to `jsonl-session-viability`. ### jsonl-session-viability diff --git a/src/fixture-capture.test.ts b/src/fixture-capture.test.ts index cebf4adf..9f9ddf18 100644 --- a/src/fixture-capture.test.ts +++ b/src/fixture-capture.test.ts @@ -205,6 +205,13 @@ describe("fixture capture", () => { }) expect(results).toHaveLength(3) + const seenSpecIds = new Set() + const expectedTitlesByBriefId = new Map([ + ["brief-001", "Team knowledge cards"], + ["brief-002", "Approval workflow for vendor invoices"], + ["brief-003", "Project dashboard rollups"], + ]) + for (const result of results) { const metadata = JSON.parse(await readFile(result.metaFile, "utf8")) as { briefId: string @@ -222,6 +229,10 @@ describe("fixture capture", () => { coherence: { status: string } } } + const jsonl = await readJsonl(result.jsonlFile) + const binding = singleSessionBinding(jsonl) + const expectedTitle = expectedTitlesByBriefId.get(metadata.briefId) + expect(metadata.runId).toBe("scripted-001") expect(metadata.driver.mode).toBe("scripted-deterministic") expect(metadata.session.id).toEqual(expect.any(String)) @@ -235,9 +246,43 @@ describe("fixture capture", () => { graph: { status: "deferred" }, coherence: { status: "deferred" }, }) - expect(await readFile(result.jsonlFile, "utf8")).toContain( + expect(expectedTitle).toBeDefined() + expect(binding.data.specTitle).toBe(expectedTitle) + expect(jsonl.map((entry) => JSON.stringify(entry)).join("\n")).toContain( metadata.briefId, ) + expect(seenSpecIds.has(binding.data.specId)).toBe(false) + seenSpecIds.add(binding.data.specId) } }) }) + +async function readJsonl(file: string): Promise { + return (await readFile(file, "utf8")) + .split("\n") + .filter((line) => line.trim().length > 0) + .map((line) => JSON.parse(line) as unknown) +} + +interface SessionBindingProjection { + data: { + specId: string + specTitle: string + } +} + +function singleSessionBinding(entries: unknown[]): SessionBindingProjection { + const bindings = entries.filter( + (entry): entry is SessionBindingProjection => + typeof entry === "object" && + entry !== null && + (entry as { customType?: unknown }).customType === + "brunch.session_binding" && + typeof (entry as { data?: { specId?: unknown } }).data?.specId === + "string" && + typeof (entry as { data?: { specTitle?: unknown } }).data?.specTitle === + "string", + ) + expect(bindings).toHaveLength(1) + return bindings[0]! +} diff --git a/src/fixture-capture.ts b/src/fixture-capture.ts index 1dda74a1..f84ea101 100644 --- a/src/fixture-capture.ts +++ b/src/fixture-capture.ts @@ -143,15 +143,10 @@ async function openScriptedBriefSession( coordinator: WorkspaceSessionCoordinator, brief: FixtureBrief, ) { - const existing = await coordinator.openExisting() - if (existing.status === "ready") { - const next = await coordinator.createNewSessionForCurrentSpec() - if (next.status === "ready") { - return next - } - } - - return coordinator.startOrCreate({ specTitle: brief.title }) + return coordinator.startOrCreate({ + specTitle: brief.title, + createNewSpec: true, + }) } async function callRpc( diff --git a/src/workspace-session-coordinator.ts b/src/workspace-session-coordinator.ts index d50042c7..91fd20df 100644 --- a/src/workspace-session-coordinator.ts +++ b/src/workspace-session-coordinator.ts @@ -76,6 +76,7 @@ export interface WorkspaceSessionCoordinator { openExisting(): Promise startOrCreate(options?: { specTitle?: string + createNewSpec?: boolean }): Promise createNewSessionForCurrentSpec(): Promise bindCurrentSpecToSession( @@ -119,10 +120,14 @@ class FileWorkspaceSessionCoordinator implements WorkspaceSessionCoordinator { async startOrCreate(options?: { specTitle?: string + createNewSpec?: boolean }): Promise { await ensureWorkspaceDirs(this.#cwd) const existing = await readWorkspaceState(this.#cwd) - const spec = existing?.currentSpec ?? createSpec(options?.specTitle) + const spec = + existing && !options?.createNewSpec + ? existing.currentSpec + : createSpec(options?.specTitle) const session = await createBoundSession(this.#cwd, spec) await writeCurrentWorkspaceState(this.#cwd, spec, session.file) return readyState(this.#cwd, spec, session) From 3ac5ccd30474f1d146b7188fe3119858ae7d865f Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:28:28 +0200 Subject: [PATCH 16/18] FE-735: Project tool results as prompt entries --- memory/CARDS.md | 2 +- src/elicitation-exchange.test.ts | 25 +++++++++++++++++++++++++ src/elicitation-exchange.ts | 6 ++++-- 3 files changed, 30 insertions(+), 3 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index 11feada2..e73a1965 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -43,7 +43,7 @@ Scripted captures for briefs #1–#3 bind each captured session to the brief bei - [ ] Is this the first touch in an unfamiliar seam from a fresh thread? - [ ] Can you not name the containing seam or current rationale from the live docs? -## Card 2 — status: next +## Card 2 — status: done ### Objective diff --git a/src/elicitation-exchange.test.ts b/src/elicitation-exchange.test.ts index 30838534..bbd60978 100644 --- a/src/elicitation-exchange.test.ts +++ b/src/elicitation-exchange.test.ts @@ -21,6 +21,17 @@ const structuredPrompt = { customType: "brunch.elicitation_prompt", data: { choices: ["A", "B"] }, } +const toolResult = { + id: "t1", + type: "message", + message: { + role: "toolResult", + toolCallId: "call-1", + toolName: "read", + content: [{ type: "text", text: "tool output" }], + isError: false, + }, +} const user = { id: "u1", type: "message", @@ -86,6 +97,20 @@ describe("elicitation exchange projection", () => { }) }) + it("includes Pi toolResult messages on the prompt side", () => { + const projection = projectElicitationExchanges([ + assistant, + toolResult, + user, + ]) + + expect(projection.exchanges[0]?.promptEntryIds).toEqual(["a1", "t1"]) + expect(projection.exchanges[0]?.promptRange).toEqual({ + start: "a1", + end: "t1", + }) + }) + it("returns an explicit empty/open shape for incomplete transcripts", () => { expect(projectElicitationExchanges([])).toEqual({ status: "empty", diff --git a/src/elicitation-exchange.ts b/src/elicitation-exchange.ts index 2e83dbc7..99d83300 100644 --- a/src/elicitation-exchange.ts +++ b/src/elicitation-exchange.ts @@ -125,7 +125,7 @@ function isPromptSideEntry(entry: SessionEntry): boolean { } const role = roleOf(entry) - return role === "assistant" || role === "system" || role === "tool" + return role === "assistant" || role === "toolResult" } function isResponseSideEntry(entry: SessionEntry): boolean { @@ -144,7 +144,9 @@ function isCustomTranscriptEntry( return entry.type === "custom" || entry.type === "custom_message" } -function roleOf(entry: SessionEntry): string | undefined { +function roleOf( + entry: SessionEntry, +): SessionMessageEntry["message"]["role"] | undefined { if (isMessageEntry(entry)) { return entry.message.role } From 436db26d95ce937b2e28a6c713736679a469516c Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:30:59 +0200 Subject: [PATCH 17/18] FE-735: Add M1 verification runbook --- memory/CARDS.md | 114 ----------------------------------- memory/PLAN.md | 6 +- runbooks/verify-m1.sh | 135 ++++++++++++++++++++++++++++++++++++++++++ src/runbook.test.ts | 25 ++++++++ 4 files changed, 163 insertions(+), 117 deletions(-) delete mode 100644 memory/CARDS.md create mode 100755 runbooks/verify-m1.sh create mode 100644 src/runbook.test.ts diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index e73a1965..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,114 +0,0 @@ -# Scope cards — FE-735 M1 review fixes - -## Orientation - -- Containing seam: M1 mode shell / fixture driver, especially the JSON-RPC fixture capture path and elicitation-exchange projection over Pi JSONL. -- Frontier item: `mode-shell-and-fixture-driver` / FE-735; these are review-fix slices inside the same frontier, not new Linear or branch units. -- Volatile state: `HANDOFF.md` remains transfer-only and untracked; the completed M1 queue was deleted, but review found blocking correctness issues before M1 tie-off. -- Main open risk: the first golden fixture captures can look structurally valid while encoding the wrong product state, so fixes must assert semantic binding/projection parity rather than only file existence. -- Cross-cutting obligations: preserve thin named RPC methods over projection handlers, Pi JSONL as transcript truth, no canonical chat/turn store, source-of-truth typing from Pi session entries, and the replay/runbook oracle layer for M1. - -## Card 1 — status: done - -### Objective - -Scripted captures for briefs #1–#3 bind each captured session to the brief being captured. - -### Acceptance Criteria - -✓ Capturing deterministic runs for briefs #1–#3 produces JSONL whose single `brunch.session_binding.data.specTitle` matches the corresponding brief title. -✓ Capturing deterministic runs for briefs #1–#3 does not reuse brief #1's spec id/title for later brief sessions. -✓ The committed `scripted-001` bundles for briefs #1–#3 are regenerated so their bindings, prompts, metadata, and projection summaries agree. - -### Verification Approach - -- Inner: fixture-capture tests — assert per-brief binding/title/id semantics and existing projection metadata parity. -- Middle: fixture replay/parity test over committed bundles — catches stale golden files after regeneration. -- Outer: human brief-quality review remains manual/qualitative after the scripted oracle confirms structural correctness. - -### Cross-cutting obligations - -- Captured runs are replay-regression seeds, not generic examples; they must not smuggle wrong workspace/spec/session state into M2. -- Keep fixture capture routed through the coordinator/RPC-selected session path; do not reintroduce an injected-coordinator-only capture path. -- Preserve JSONL transcript truth and avoid any parallel chat/turn representation. - -### Promotion checklist - -- [ ] Does this change a requirement? -- [ ] Does this create, retire, or invalidate an assumption? -- [ ] Does this make or reverse a non-trivial design decision? -- [ ] Does this establish a new seam-level invariant? -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? -- [ ] Does it cross more than two major seams? -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? -- [ ] Can you not name the containing seam or current rationale from the live docs? - -## Card 2 — status: done - -### Objective - -Elicitation exchange projection treats Pi `toolResult` messages as prompt-side transcript entries using Pi-owned message roles. - -### Acceptance Criteria - -✓ A Pi-shaped `toolResult` message between assistant prompt and user response is included in the prompt-side entry range. -✓ The projector no longer checks for a non-canonical `tool` message role. -✓ The role helper preserves the Pi-owned message role union closely enough that future role-literal drift is type-visible instead of widened away unnecessarily. - -### Verification Approach - -- Inner: elicitation-exchange unit test with a real Pi-shaped tool-result entry. -- Inner: type-aware lint/build — verifies source-of-truth typing remains imported/projected from Pi rather than locally restated. -- Middle: existing RPC/session projection tests — ensure the handler still returns product-shaped exchanges from the selected session. - -### Cross-cutting obligations - -- Pi session entry and agent-message types own transcript shape; Brunch owns only the semantic `ElicitationExchange` projection. -- Preserve D13 prompt-side span semantics: system/assistant/tool-side entries since the previous user response belong to the prompt span. - -### Promotion checklist - -- [ ] Does this change a requirement? -- [ ] Does this create, retire, or invalidate an assumption? -- [ ] Does this make or reverse a non-trivial design decision? -- [ ] Does this establish a new seam-level invariant? -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? -- [ ] Does it cross more than two major seams? -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? -- [ ] Can you not name the containing seam or current rationale from the live docs? - -## Card 3 — status: next - -### Objective - -M1 manual verification is available as a single runbook command that prints expected outcomes and actual observed outputs. - -### Acceptance Criteria - -✓ `./runbooks/verify-m1.sh` runs from the repository root and prints clearly separated `Expected outputs` and `Actual outputs` sections. -✓ The runbook checks per-brief binding/title alignment, committed bundle metadata/projection parity, print-mode smoke output, and RPC `workspace.snapshot` / `session.elicitationExchanges` smoke output. -✓ The runbook exits nonzero on structural failures while still printing enough actual output for quick human diagnosis. -✓ The runbook includes explicit human-review prompts for qualitative judgments that cannot be fully automated yet, such as brief quality and golden-capture representativeness. - -### Verification Approach - -- Inner: runbook script smoke from tests or a direct command in the build slice — proves the command executes in a clean repo checkout. -- Middle: runbook oracle — checks durable artifacts and projection/RPC surfaces, matching SPEC §Runbook Oracle Design. -- Outer: human uses the runbook output to approve fixture representativeness and product shape. - -### Cross-cutting obligations - -- Runbooks are executable oracles over canonical stores/projection handlers, not ad hoc manual notes. -- Keep the output product-shaped; do not turn the runbook into a generic file dump. -- The runbook should aid manual judgment without pretending to automate LLM/brief quality review completely. - -### Promotion checklist - -- [ ] Does this change a requirement? -- [ ] Does this create, retire, or invalidate an assumption? -- [ ] Does this make or reverse a non-trivial design decision? -- [ ] Does this establish a new seam-level invariant? -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? -- [ ] Does it cross more than two major seams? -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? -- [ ] Can you not name the containing seam or current rationale from the live docs? diff --git a/memory/PLAN.md b/memory/PLAN.md index 888ffed2..bfa44635 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -68,15 +68,15 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** [FE-735](https://linear.app/hash/issue/FE-735/mode-shell-and-fixture-driver-m1) (sub-issue of FE-702) - **Branch:** `ln/fe-735-mode-shell-fixture-driver` (stacked on `ln/fe-729-walking-skeleton`) - **Kind:** structural -- **Status:** review-fix +- **Status:** done - **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. -- **Verification:** Inner — verify gate plus projection-handler unit tests for elicitation exchange ranges. Middle — deterministic first captured run, stdio RPC handler contract tests, and replay-regression fixture(s) asserting transcript reproduction/projection parity (SPEC §Oracle Strategy by Loop Tier). Outer — the three-layer fixture model is established in skeleton form here; property and adversarial layers come online as later milestones supply graph/coherence substrates. +- **Verification:** Inner — verify gate plus projection-handler unit tests for elicitation exchange ranges. Middle — deterministic first captured run, stdio RPC handler contract tests, replay-regression fixture(s) asserting transcript reproduction/projection parity, and `./runbooks/verify-m1.sh` for store/projection/manual-smoke evidence (SPEC §Oracle Strategy by Loop Tier). Outer — the three-layer fixture model is established in skeleton form here; property and adversarial layers come online as later milestones supply graph/coherence substrates; brief quality and golden-capture representativeness remain explicit human review prompts in the runbook. - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) -- **Current execution pointer:** review-fix queue in `memory/CARDS.md`; complete those cards before proceeding to `jsonl-session-viability`. +- **Current execution pointer:** complete after M1 review fixes; proceed to `jsonl-session-viability`. ### jsonl-session-viability diff --git a/runbooks/verify-m1.sh b/runbooks/verify-m1.sh new file mode 100755 index 00000000..1eff15c1 --- /dev/null +++ b/runbooks/verify-m1.sh @@ -0,0 +1,135 @@ +#!/usr/bin/env bash +set -u -o pipefail + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +TSX_LOADER="$ROOT/node_modules/tsx/dist/loader.mjs" +export ROOT TSX_LOADER +cd "$ROOT" || exit 1 + +failures=0 +TMP_WORKSPACE="" + +record_failure() { + echo "FAIL: $*" + failures=$((failures + 1)) +} + +run_check() { + local label="$1" + shift + echo "\n## $label" + if "$@"; then + echo "PASS: $label" + else + local status=$? + record_failure "$label exited $status" + fi +} + +cleanup() { + if [[ -n "$TMP_WORKSPACE" && -d "$TMP_WORKSPACE" ]]; then + rm -rf "$TMP_WORKSPACE" + fi +} +trap cleanup EXIT + +echo "# M1 mode shell and fixture driver runbook" +echo +echo "## Expected outputs" +echo "- Each committed scripted bundle has one brunch.session_binding whose specTitle matches its brief title." +echo "- Each committed bundle metadata projection summary matches projection from its JSONL transcript." +echo "- Print mode emits a product-shaped workspace snapshot for a selected runbook spec." +echo "- RPC workspace.snapshot and session.elicitationExchanges return product-shaped JSON-RPC results." +echo "- Human review remains responsible for brief quality and golden-capture representativeness." +echo +echo "## Actual outputs" + +run_check "Per-brief binding/title alignment and metadata/projection parity" \ + node --import "$TSX_LOADER" --input-type=module <<'NODE' +import { readFile } from "node:fs/promises" +import { join } from "node:path" +import { loadBriefLibrary } from "./src/brief-library.ts" +import { loadJsonlTranscriptEntries, projectElicitationExchanges } from "./src/elicitation-exchange.ts" + +const briefs = await loadBriefLibrary(".brunch-fixtures/briefs") +const expected = new Map(briefs.map((brief) => [brief.id, brief.title])) +const briefIds = ["brief-001", "brief-002", "brief-003"] +const seenSpecIds = new Set() + +for (const briefId of briefIds) { + const runId = "scripted-001" + const runDir = join(".brunch-fixtures", briefId, runId) + const jsonlFile = join(runDir, `${runId}.jsonl`) + const metaFile = join(runDir, `${runId}.meta.json`) + const entries = await loadJsonlTranscriptEntries(jsonlFile) + const bindings = entries.filter((entry) => entry?.customType === "brunch.session_binding") + if (bindings.length !== 1) { + throw new Error(`${briefId}: expected one session binding, found ${bindings.length}`) + } + const binding = bindings[0] + const expectedTitle = expected.get(briefId) + if (binding.data.specTitle !== expectedTitle) { + throw new Error(`${briefId}: binding title ${binding.data.specTitle} did not match ${expectedTitle}`) + } + if (seenSpecIds.has(binding.data.specId)) { + throw new Error(`${briefId}: reused spec id ${binding.data.specId}`) + } + seenSpecIds.add(binding.data.specId) + + const metadata = JSON.parse(await readFile(metaFile, "utf8")) + const projection = projectElicitationExchanges(entries) + const actualSummary = { + status: projection.status, + exchangeCount: projection.exchanges.length, + openPrompt: projection.openPrompt !== null, + } + if (JSON.stringify(actualSummary) !== JSON.stringify(metadata.projectionSummary)) { + throw new Error(`${briefId}: projection summary mismatch`) + } + if (metadata.artifacts.graph.status !== "deferred" || metadata.artifacts.coherence.status !== "deferred") { + throw new Error(`${briefId}: graph/coherence artifacts should be deferred in M1`) + } + console.log(`${briefId}: ${binding.data.specTitle}; exchanges=${actualSummary.exchangeCount}; graph=${metadata.artifacts.graph.status}; coherence=${metadata.artifacts.coherence.status}`) +} +NODE + +TMP_WORKSPACE="$(mktemp -d "${TMPDIR:-/tmp}/brunch-m1-runbook.XXXXXX")" +export TMP_WORKSPACE +node --import "$TSX_LOADER" --input-type=module <<'NODE' +import { createWorkspaceSessionCoordinator } from "./src/workspace-session-coordinator.ts" + +const cwd = process.env.TMP_WORKSPACE +const coordinator = createWorkspaceSessionCoordinator({ cwd }) +const workspace = await coordinator.startOrCreate({ specTitle: "M1 runbook smoke" }) +workspace.session.manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + "Runbook prompt: confirm the M1 mode shell is product-shaped.", + true, +) +workspace.session.manager.appendMessage({ role: "user", content: "Runbook response" }) +await coordinator.bindCurrentSpecToSession(workspace.session.manager) +NODE + +run_check "Print-mode smoke output" \ + bash -c 'cd "$TMP_WORKSPACE" && node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode print | tee "$TMP_WORKSPACE/print.out" && grep -q "M1 runbook smoke" "$TMP_WORKSPACE/print.out"' + +run_check "RPC workspace.snapshot smoke output" \ + bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"workspace.snapshot\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc | tee "$TMP_WORKSPACE/workspace-rpc.out" && grep -q "M1 runbook smoke" "$TMP_WORKSPACE/workspace-rpc.out" && grep -q "\"session\"" "$TMP_WORKSPACE/workspace-rpc.out"' + +run_check "RPC session.elicitationExchanges smoke output" \ + bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"session.elicitationExchanges\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc | tee "$TMP_WORKSPACE/exchanges-rpc.out" && grep -q "\"status\":\"ready\"" "$TMP_WORKSPACE/exchanges-rpc.out" && grep -q "promptEntryIds" "$TMP_WORKSPACE/exchanges-rpc.out"' + +echo +echo "## Human review prompts" +echo "- Brief quality: Do briefs #1-#3 read like useful product briefs rather than implementation-shaped test fixtures?" +echo "- Golden-capture representativeness: Does at least one scripted-001 JSONL/meta bundle look plausible as a replay seed?" +echo "- Product shape: Do print/RPC outputs expose workspace/session/exchange concepts rather than generic file dumps?" + +if [[ "$failures" -gt 0 ]]; then + echo + echo "Runbook failed with $failures structural failure(s)." + exit 1 +fi + +echo +echo "Runbook structural checks passed; complete the human review prompts above before final M1 acceptance." diff --git a/src/runbook.test.ts b/src/runbook.test.ts new file mode 100644 index 00000000..18c2ab56 --- /dev/null +++ b/src/runbook.test.ts @@ -0,0 +1,25 @@ +import { access } from "node:fs/promises" +import { constants } from "node:fs" +import { execFile } from "node:child_process" +import { promisify } from "node:util" +import { describe, expect, it } from "vitest" + +const execFileAsync = promisify(execFile) + +describe("M1 runbook", () => { + it("runs and prints expected plus actual outputs", async () => { + await access("runbooks/verify-m1.sh", constants.X_OK) + + const { stdout } = await execFileAsync("./runbooks/verify-m1.sh", { + timeout: 120_000, + maxBuffer: 1024 * 1024 * 4, + }) + + expect(stdout).toContain("Expected outputs") + expect(stdout).toContain("Actual outputs") + expect(stdout).toContain("Human review prompts") + expect(stdout).toContain("brief-001") + expect(stdout).toContain("workspace.snapshot") + expect(stdout).toContain("session.elicitationExchanges") + }) +}) From e32eb02b1a1992602c162681f9e850838bd883dd Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 12:56:27 +0200 Subject: [PATCH 18/18] FE-735: Reconcile M1 fixture verification --- docs/architecture/fixture-strategy.md | 33 ++++++++++++--------------- memory/PLAN.md | 4 ++-- memory/SPEC.md | 4 ++-- runbooks/verify-m1.sh | 8 +++---- 4 files changed, 23 insertions(+), 26 deletions(-) diff --git a/docs/architecture/fixture-strategy.md b/docs/architecture/fixture-strategy.md index 021b6e21..94c24cbe 100644 --- a/docs/architecture/fixture-strategy.md +++ b/docs/architecture/fixture-strategy.md @@ -98,23 +98,20 @@ Briefs are short, human-readable, and curated. The run artefacts are the heavy d { "schemaVersion": 1, "id": "brief-002", - "title": "Offline Kanban Editing", - "kernelTags": [ - "state-lifecycle", - "containment-topology", - "concurrency-collaboration", - "resource-accounting", - "derived-data-views", - "temporal-history" + "title": "Approval workflow for vendor invoices", + "kernelTags": ["state-lifecycle", "authority-capability"], + "productBrief": "A finance team needs invoices to move from draft to submitted to approved or rejected. Only budget owners can approve, and rejected invoices can be revised and resubmitted.", + "expectedStructuralObservations": [ + "Invoice states and legal transitions must be explicit.", + "Approval authority depends on the budget owner role." ], - "productBrief": "We want to build a Kanban tool that engineering teams can use offline. Multiple people edit the same board. Cards move through workflow states. Some columns have WIP limits.", "scriptedUserNotes": [ - "I care about what happens when two people move the same card offline.", - "Some columns should enforce WIP limits." + "Rejected invoices are not terminal; they can go back to draft.", + "Approved invoices should not be edited without reopening the workflow." ], "deferredExpectations": { - "graph": "Later graph fixtures should cover lifecycle states, containment, and conflict decisions.", - "coherence": "Later coherence checks should flag unresolved offline-edit conflict policy." + "graph": "Later graph fixtures should capture lifecycle states, transitions, and authority predicates.", + "coherence": "Later coherence checks should flag contradictory terminality claims." } } ``` @@ -123,15 +120,15 @@ Briefs are short, human-readable, and curated. The run artefacts are the heavy d | # | Brief | Active kernels (expected) | Stretches | | --- | --- | --- | --- | -| 1 | **Offline Kanban** | State/lifecycle, containment, concurrency, resource accounting, derived data, temporal | Kernel doc's flagship; broad behavioral coverage | -| 2 | **Role-based document sharing** | Identity, authority, containment, temporal (revocation), observability, change/migration | Authority cascades; nested inheritance | -| 3 | **Subscription billing** | Resource accounting, state/lifecycle, transactions, external effects, error/recovery, temporal | Transaction + external-effects boundary; assurance-level pressure | +| 1 | **Team knowledge cards** | Identity/reference, containment topology | Stable identity versus mutable titles; links that must survive renames | +| 2 | **Approval workflow for vendor invoices** | State/lifecycle, authority/capability | Non-terminal rejection, reopening approved work, role-gated transitions | +| 3 | **Project dashboard rollups** | Derived-data views, temporal history | Stale projections, source evidence for rollups and decisions | | 4 | **Calendar scheduling with notifications** | Concurrency (overlap), authority, external effects, error/recovery | External-effects + recovery semantics | | 5 | **Knowledge-graph editor (meta)** | Identity, containment, change/migration, observability, validation | Brunch describing itself; sanity check on the modelling | | 6 | **Verified sort algorithm** | Validation/normalization, formal properties only | Narrow but stretches `formal_property` requirements, `Obligation` nodes, `proof` / `model_check` validation methods, and assurance-level computation | | 7 | **"Notion meets Linear meets Slack"** | Forces scope-boundary clarification before any kernel can engage | Adversarial; stresses offer-first interaction and scope-card affordance | -Briefs 1–3 are already worked out in [`BEHAVIORAL_KERNELS.md`](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/BEHAVIORAL_KERNELS.md); they should be the first three captured. +Briefs #1–#3 are the first curated M1 seeds under `.brunch-fixtures/briefs/`. They are intentionally thin, human-reviewed product briefs for transcript/projection replay, not final evidence that Brunch's elicitation interaction logic or knowledge-flow model is correct. ### Brief #7 expectations — "Notion meets Linear meets Slack" @@ -217,7 +214,7 @@ The fixture harness threads through the existing milestone ladder; it does not n | Milestone | Fixture work | | --- | --- | | **M0** (walking skeleton + TUI) | Begin curating briefs as JSON. Manually-driven runs at the TUI produce first JSONL captures. Briefs cost nothing to write; the longer the library, the more leverage later. | -| **M1** (mode shell: print + rpc) | Stand up the agent-as-user harness against `brunch --mode rpc`. First **replay regression** fixtures land here, asserting transcript reproduction only. Graph plane does not yet exist; assertions are transcript-shaped. | +| **M1** (mode shell: print + rpc) | Stand up the first fixture-capture path against `brunch --mode rpc`. First **replay regression** fixtures land here, asserting transcript reproduction/projection only. Graph plane does not yet exist; assertions are transcript-shaped, and scripted exchange shape should not be treated as final elicitation behavior. | | **M2** (JSONL session viability) | The captured transcripts *are* the JSONL session files. The fixture library's reproducibility is part of M2's evidence. | | **M3** (web shell) | The same offer-response fixtures drive the web client through its WebSocket; free coverage of the web shell against known-good runs. | | **M4** (graph data plane) | Graph snapshots become part of the run-fixture bundle. The first **property regression** assertions land here. | diff --git a/memory/PLAN.md b/memory/PLAN.md index bfa44635..ba6f764f 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -71,7 +71,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Status:** done - **Objective:** Add `--mode print` and `--mode rpc` transport dispatchers over the same Brunch host and named RPC method-family handlers; land the agent-as-user JSON-RPC stdio driver; prove transcript projection of elicitation exchanges; and capture the first replay-regression fixtures for at least briefs #1–#3. For M1, print mode is a snapshot renderer/proof-of-life, not a single-turn agent run. - **Why now / unlocks:** Proves D5-L (JSON-RPC primary) and unlocks the fixture-driven feedback loop. Without this milestone, every downstream milestone has only manual TUI evidence. -- **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three briefs from BEHAVIORAL_KERNELS.md are captured. +- **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three curated briefs are captured. - **Verification:** Inner — verify gate plus projection-handler unit tests for elicitation exchange ranges. Middle — deterministic first captured run, stdio RPC handler contract tests, replay-regression fixture(s) asserting transcript reproduction/projection parity, and `./runbooks/verify-m1.sh` for store/projection/manual-smoke evidence (SPEC §Oracle Strategy by Loop Tier). Outer — the three-layer fixture model is established in skeleton form here; property and adversarial layers come online as later milestones supply graph/coherence substrates; brief quality and golden-capture representativeness remain explicit human review prompts in the runbook. - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. - **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L @@ -259,7 +259,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ## Recently Completed -- 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests. Watch: M2 should use these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store. +- 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests, `./runbooks/verify-m1.sh`, and human inspection that briefs/captures/product-shaped outputs are good on their current terms. Watch: M2 should use these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store; later elicitation work must revisit the encoded interaction logic, expectations, and knowledge-flow assumptions rather than treating the scripted M1 exchange shape as final product behavior. - 2026-05-20 `walking-skeleton` — Done: Brunch now launches through a real pi-backed TUI boot path with coordinator-first spec gating, project-local `.brunch/` state, self-describing Pi JSONL sessions via exactly one `brunch.session_binding`, same-spec `/new` coverage, persistent cwd / spec / phase / chat-mode chrome through pi's extension widget seam, a bin shim, store-only runbook checker, and type-ownership hardening against Pi exported types. Verified: `npm run verify`, manual TUI smoke in a scratch project, automated TUI/coordinator tests, store-only runbook oracle, and manual file inspection. Watch: M1 should reuse the coordinator/session truth rather than recreating boot/session mechanics. - 2026-05-20 `pre-poc-archive-and-reseed` — Done: razed pre-POC implementation, archived legacy docs and planning memory under `archive/`, tagged `next-baseline`, reseeded `memory/SPEC.md` and `memory/PLAN.md` from the three canonical POC architecture docs. Verified: `git log --oneline` shows three clean buckets; `archive/` contains all prior material. Watch: Phase 3 infra bootstrap is folded into `walking-skeleton`, not a separate frontier. diff --git a/memory/SPEC.md b/memory/SPEC.md index dfb9a59c..0ab87889 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -327,7 +327,7 @@ The first required runbook is M0: after manual TUI interaction, a checker proves ### Design Notes -- **Deterministic before generative.** M1 should prefer a deterministic or tightly scripted user-agent path for the first captured run before relying on LLM persona variance. Generative/adversarial probes come after the transcript and fixture substrate is trusted. +- **Deterministic before generative.** M1 should prefer a deterministic or tightly scripted user-agent path for the first captured run before relying on LLM persona variance. Generative/adversarial probes come after the transcript and fixture substrate is trusted. M1 scripted captures prove the transport/projection/fixture substrate on its current terms; they do not settle the final elicitation interaction logic, knowledge flow, or prompt/response expectation model. - **Projection handlers are oracles, not stores.** Read/subscription tests should prove handlers reconstruct truth from Pi JSONL, `.brunch/state.json`, or SQLite graph/change log; they should not introduce a canonical view-store just for testing. - **Behavioral quality boundary.** Inner/middle loops prove structural validity, durable state, invariants, and expected graph/property coverage. “Good interview”, “good question”, and “coherent UX feel” remain outer-loop checklist/generative-fixture judgments until enough examples justify sharper metrics. - **Subscriptions are scoped for the POC.** Initial subscription oracles should prove initial snapshot plus ordered live updates. Reconnect/resume semantics are acknowledged but deferred unless a frontier explicitly depends on them. @@ -337,7 +337,7 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | Blind spot | Reason | Mitigation | Revisit trigger | | --- | --- | --- | --- | | Full TUI automation | Cost exceeds value before the product state seams are proven. | Manual checklist plus artifact/query runbook oracle. | Manual TUI steps become frequent/flaky or block CI confidence. | -| LLM elicitation quality | No stable deterministic ground truth for “good interview” early in the POC. | Brief library, human-reviewed golden captures, adversarial probes, expected structural coverage. | Repeated fixture failures where structure passes but elicitation is judged poor. | +| LLM elicitation quality and interaction flow | No stable deterministic ground truth for “good interview” early in the POC, and M1 scripted exchanges intentionally encode only a thin current exchange model. | Brief library, human-reviewed golden captures, adversarial probes, expected structural coverage, and later review of knowledge flow through real elicitation loops. | Repeated fixture failures where structure passes but elicitation is judged poor, or M2/M3 reveals that prompt/response markers, offer envelopes, or knowledge-flow assumptions need sharper transcript semantics. | | Subscription reconnect/resume | POC can prove snapshot + live update without hardening network recovery yet. | Contract tests for initial snapshot and ordered update sequence. | Web/RPC clients need robust reconnect semantics or long-running fixture runs expose drift. | | Performance and scale | Local POC graph/session sizes are small; premature budgets may distort design. | Keep exports/checkers text-native and simple; add budgets when slow tests appear. | `npm run verify` or fixture runs exceed acceptable local iteration time. | | Cross-platform terminal rendering | TUI chrome visuals may differ by terminal. | Test state derivation and keep manual smoke on primary dev environment. | Distribution target broadens or terminal rendering bugs recur. | diff --git a/runbooks/verify-m1.sh b/runbooks/verify-m1.sh index 1eff15c1..71e2f279 100755 --- a/runbooks/verify-m1.sh +++ b/runbooks/verify-m1.sh @@ -17,9 +17,9 @@ record_failure() { run_check() { local label="$1" shift - echo "\n## $label" + printf "\n## %s\n" "$label" if "$@"; then - echo "PASS: $label" + printf "\nPASS: %s\n" "$label" else local status=$? record_failure "$label exited $status" @@ -114,10 +114,10 @@ run_check "Print-mode smoke output" \ bash -c 'cd "$TMP_WORKSPACE" && node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode print | tee "$TMP_WORKSPACE/print.out" && grep -q "M1 runbook smoke" "$TMP_WORKSPACE/print.out"' run_check "RPC workspace.snapshot smoke output" \ - bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"workspace.snapshot\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc | tee "$TMP_WORKSPACE/workspace-rpc.out" && grep -q "M1 runbook smoke" "$TMP_WORKSPACE/workspace-rpc.out" && grep -q "\"session\"" "$TMP_WORKSPACE/workspace-rpc.out"' + bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"workspace.snapshot\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc > "$TMP_WORKSPACE/workspace-rpc.out" && node -e "const fs=require(\"node:fs\"); const path=process.env.TMP_WORKSPACE + \"/workspace-rpc.out\"; console.log(JSON.stringify(JSON.parse(fs.readFileSync(path, \"utf8\")), null, 2))" && grep -q "M1 runbook smoke" "$TMP_WORKSPACE/workspace-rpc.out" && grep -q "\"session\"" "$TMP_WORKSPACE/workspace-rpc.out"' run_check "RPC session.elicitationExchanges smoke output" \ - bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"session.elicitationExchanges\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc | tee "$TMP_WORKSPACE/exchanges-rpc.out" && grep -q "\"status\":\"ready\"" "$TMP_WORKSPACE/exchanges-rpc.out" && grep -q "promptEntryIds" "$TMP_WORKSPACE/exchanges-rpc.out"' + bash -c 'cd "$TMP_WORKSPACE" && printf "%s\n" "{\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"session.elicitationExchanges\"}" | node --import "$TSX_LOADER" "$ROOT/src/brunch.ts" --mode rpc > "$TMP_WORKSPACE/exchanges-rpc.out" && node -e "const fs=require(\"node:fs\"); const path=process.env.TMP_WORKSPACE + \"/exchanges-rpc.out\"; console.log(JSON.stringify(JSON.parse(fs.readFileSync(path, \"utf8\")), null, 2))" && grep -q "\"status\":\"ready\"" "$TMP_WORKSPACE/exchanges-rpc.out" && grep -q "promptEntryIds" "$TMP_WORKSPACE/exchanges-rpc.out"' echo echo "## Human review prompts"