From 04f638da16eb97e8be87fa56214b4d075fb862c3 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 13:15:30 +0200 Subject: [PATCH 1/9] FE-736: Start JSONL session viability frontier --- memory/PLAN.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/memory/PLAN.md b/memory/PLAN.md index ba6f764f..27bea827 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -81,9 +81,10 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### jsonl-session-viability - **Name:** JSONL session viability proof -- **Linear:** unassigned +- **Linear:** [FE-736](https://linear.app/hash/issue/FE-736/jsonl-session-viability-proof) +- **Branch:** `ln/fe-736-jsonl-session-viability` (stacked on `ln/fe-735-mode-shell-fixture-driver`) - **Kind:** structural -- **Status:** not-started +- **Status:** active - **Objective:** Prove whether pi `SessionManager` JSONL in `.brunch/sessions/` is rich enough to carry raw assistant/user payloads, Brunch session binding (`brunch.session_binding`), structured elicitation prompt/response entries when needed, other custom entries (`brunch.lens_switch`, `brunch.side_task_result`, `worldUpdate`, `brunch.mention`, `brunch.mention_staleness_hint`), and session-scoped continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. - **Why now / unlocks:** Validates A2-L and pins D6-L. If JSONL is insufficient, M2 produces a sharply scoped fallback proposal that all later milestones can plan against. - **Acceptance:** Round-trip reload of a captured session preserves raw payloads byte-equivalent (modulo timestamps); session binding and structured elicitation entries survive; elicitation exchanges can be re-projected from the active branch after reload; all named Brunch custom entries survive, including side-task-result delivery entries when present; continuity metadata survives. If any of these fail, the failure is sharply documented and a fallback path is proposed (project richer substrate / mirror JSONL into richer records / propose pi upstream change). From ee97cfcee02000fb8a663ff1f7b97c9d69bec185 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 14:53:19 +0200 Subject: [PATCH 2/9] test: prove coordinator JSONL reload parity --- memory/CARDS.md | 176 ++++++++++++++++++++++ memory/PLAN.md | 1 + src/workspace-session-coordinator.test.ts | 130 ++++++++++++++-- 3 files changed, 296 insertions(+), 11 deletions(-) create mode 100644 memory/CARDS.md diff --git a/memory/CARDS.md b/memory/CARDS.md new file mode 100644 index 00000000..3d2ccfd4 --- /dev/null +++ b/memory/CARDS.md @@ -0,0 +1,176 @@ +# Scope cards — FE-736 JSONL session viability + +## Orientation + +- Containing seam: transcript persistence over Pi `SessionManager` JSONL under `.brunch/sessions/`, with Brunch custom transcript entries and coordinator-created session binding layered on top. +- Frontier item: `jsonl-session-viability` / FE-736 on `ln/fe-736-jsonl-session-viability`; these cards are slices inside the same frontier, not new Linear issues or branches. +- Volatile state: no `HANDOFF.md` is present; M1 captures were human-reviewed as good structural replay seeds on their current terms, but they are not final evidence for elicitation interaction logic or knowledge flow. +- Main open risk: Pi JSONL may preserve entries syntactically while Brunch accidentally consumes the wrong semantic path — file-linear entries instead of active branch, raw custom entries instead of LLM-context custom messages, or binding state that only works because the coordinator flushed through a private seam. +- Cross-cutting obligations: preserve Pi JSONL as transcript truth unless proven insufficient; avoid a parallel canonical chat/turn store; validate `WorkspaceSessionCoordinator` sessions including `/new`; keep projection handlers as oracles over canonical stores; carry the replay/property/adversarial fixture strategy forward without treating scripted M1 exchange shape as final product behavior. + +## Card 1 — status: done + +### Target Behavior + +Coordinator-created sessions remain self-describing after Pi JSONL reload. + +### Boundary Crossings + +```text +→ WorkspaceSessionCoordinator-created Brunch session +→ Pi SessionManager append/flush JSONL persistence +→ Pi SessionManager.open reload +→ Brunch transcript/projection assertions +``` + +### Risks and Assumptions + +- RISK: Pi normalizes timestamps, ids, or message content during open/rewrite → MITIGATION: compare payload fields that should be stable and explicitly document allowed timestamp/id variance. +- RISK: The coordinator's pre-assistant flush path masks reload behavior that real sessions do not share → MITIGATION: test through the public coordinator path and `SessionManager.open`, not direct JSON parsing alone. +- ASSUMPTION: Pi JSONL preserves Brunch `brunch.session_binding` custom entries across binding-only, first-message, and `/new` coordinator lifecycles → VALIDATE: open the persisted files and compare binding cardinality plus binding data → memory/SPEC.md §Open Assumptions A2-L. + +### Acceptance Criteria + +✓ `jsonl binding-only coordinator session reloads` — a newly coordinator-created session with no assistant message can be reopened and has exactly one `brunch.session_binding`. +✓ `jsonl coordinator pre-assistant flush does not duplicate prefix` — after a binding-only reload and first assistant/user append, the JSONL file has one session header and exactly one binding. +✓ `jsonl session reload preserves coordinator binding` — a coordinator-created transcript has exactly one `brunch.session_binding` after `SessionManager.open`, with the same session id, spec id, and spec title. +✓ `jsonl coordinator new session reloads same spec` — `createNewSessionForCurrentSpec()` creates a distinct session id/file whose reloaded binding carries the unchanged spec id and title. +✓ `jsonl session reload projects the same simple exchange` — `projectElicitationExchanges` returns the same prompt/response entry ids before and after reload for the simple coordinator-created transcript. + +### Verification Approach + +- Inner: round-trip unit tests — prove local reload and projection behavior against Pi `SessionManager`. +- Middle: artifact oracle — inspect the actual persisted JSONL path from the coordinator rather than an in-memory fixture. + +### Cross-cutting obligations + +- Use Pi-owned session entry/message types where possible; Brunch owns only semantic projection types. +- Do not introduce a canonical chat/turn table or a Brunch-side mirror store to make the test pass. +- Treat failure as viability evidence, not as an invitation to silently widen Brunch's local parser. + +## Card 2 — status: next + +### Target Behavior + +Representative Pi message and Brunch custom transcript payloads survive Pi JSONL reload byte-equivalently. + +### Boundary Crossings + +```text +→ Pi raw user/assistant message fixtures and Brunch custom event fixture payloads +→ Pi SessionManager message/custom/custom_message entry persistence +→ Pi SessionManager.open reload +→ Brunch survival-matrix and context-participation assertions +``` + +### Risks and Assumptions + +- RISK: Some future Brunch custom entries do not yet have production constructors → MITIGATION: use minimal test fixtures that exercise Pi JSONL persistence while keeping schemas local to the test or a narrowly named viability helper. +- RISK: The test over-specifies final payload schemas before their frontiers land → MITIGATION: assert preservation of representative payload envelopes, `customType` names, and context participation where required, not final product semantics. +- ASSUMPTION: Pi JSONL preserves raw Pi message payloads and unknown Brunch custom-entry payloads without requiring Pi schema changes → VALIDATE: reload a matrix of named entries and compare stable payload fields → memory/SPEC.md §Open Assumptions A2-L. + +### Acceptance Criteria + +✓ `jsonl raw user assistant payload survival` — representative user and assistant messages, including non-trivial content shapes beyond one plain string, survive reload without being projected into Brunch-local DTOs. +✓ `jsonl custom entry survival matrix` — `brunch.lens_switch`, `brunch.mention`, `brunch.mention_staleness_hint`, and other non-context Brunch custom entries survive reload with `customType` and `data` intact. +✓ `jsonl custom message survival matrix` — context-carrying entries such as `worldUpdate`, `brunch.side_task_result`, and structured elicitation prompts survive reload with `customType`, `content`, `display`, and `details` intact. +✓ `jsonl custom messages re-enter pi context` — after reload, `SessionManager.buildSessionContext()` includes the representative `custom_message` entries on the active branch with the same custom type and content. +✓ `jsonl continuity metadata survival` — representative `lastSeenLsn`, interest-set, and compaction-anchor metadata survives reload in the chosen transcript-native shape, including any Pi-native `compaction.details` shape chosen for anchors. +✓ `jsonl structured elicitation survival` — structured prompt/response custom entries survive reload distinctly from ordinary user/assistant messages. + +### Verification Approach + +- Inner: schema/shape validation at the boundary — compare raw message fields plus custom `data` / `content` / `details` round trips for representative Brunch entry families. +- Middle: round-trip oracle — persist with Pi APIs, reload with Pi APIs, then assert Brunch-visible semantics and Pi context reconstruction from the reloaded entries. + +### Cross-cutting obligations + +- Keep this as a JSONL viability proof, not a commitment to final side-task, mention, or continuity subsystem schemas. +- New helper names should use lexicon terms: session binding, structured elicitation entry, lens switch, side-task result, world update, mention ledger. +- Use Pi-exported entry/message types for envelopes; Brunch-owned fixture types should cover only Brunch payload semantics. +- If a payload cannot be represented without a new Brunch schema owner, stop and surface that as a design/scoping issue rather than inventing a broad store. + +## Card 3 — status: next + +### Target Behavior + +Elicitation exchange projection after reload uses Pi's active branch. + +### Boundary Crossings + +```text +→ Branched Pi session fixture +→ Pi JSONL tree/leaf persistence +→ Pi SessionManager.open reload +→ Brunch elicitation exchange projection +``` + +### Risks and Assumptions + +- RISK: `loadJsonlTranscriptEntries` currently reads file entries linearly and may not reflect Pi's active branch semantics → MITIGATION: compare projection from Pi's active branch after reload against any file-linear projection, then make the product projection use the active-branch source if needed. +- RISK: Branching APIs behave differently from the initial M1 linear captures → MITIGATION: use a minimal fork/branch fixture with one abandoned branch and one active branch. +- ASSUMPTION: Pi JSONL stores enough tree/leaf information to re-project elicitation exchanges from the active branch after reload → VALIDATE: reload the branched session and assert only active-branch prompt/response ids appear → memory/SPEC.md §Open Assumptions A12-L. + +### Acceptance Criteria + +✓ `jsonl active branch projection excludes abandoned exchange` — after reload, an exchange on an abandoned branch is absent from Brunch's projected exchanges. +✓ `jsonl active branch projection preserves selected exchange` — after reload, the active branch's prompt/response exchange remains projectable with stable ranges. +✓ `session.elicitationExchanges uses active branch semantics` — the RPC handler projects the selected session's active branch rather than blindly projecting every JSONL line when branch state exists. +✓ `jsonl active branch custom messages enter context only once` — reloaded custom-message entries on abandoned branches do not appear in the active branch projection or context, while active-branch custom messages do. + +### Verification Approach + +- Inner: round-trip projection test — builds a branched Pi session, reloads it, and compares projected exchanges from `SessionManager.getBranch()` rather than raw file order. +- Middle: RPC contract test — proves the named `session.elicitationExchanges` method follows the same active-branch semantics as the projection helper. + +### Cross-cutting obligations + +- Preserve D13 capture-aware projection: exchanges are derived from Pi transcript truth, not stored as canonical chat/turn rows. +- Keep RPC thin: fix projection source/semantics in the projection handler path, not by adding file params or a generic read model. +- If Pi JSONL cannot expose a stable active branch after reload, record a sharply bounded insufficiency for the M2 fallback decision. + +## Card 4 — status: next + +### Target Behavior + +Committed M1 scripted captures are reloadable JSONL evidence for M2. + +### Boundary Crossings + +```text +→ `.brunch-fixtures//scripted-001/` committed run bundle +→ Pi SessionManager.open-backed Brunch projection path +→ Brunch elicitation exchange projection +→ Fixture replay parity assertions +``` + +### Risks and Assumptions + +- RISK: committed fixture metadata contains local absolute source paths that should not be part of portable parity → MITIGATION: assert parity against bundle-local JSONL and metadata fields that are intentionally stable. +- RISK: M1 scripted captures encode thin interaction logic that later changes → MITIGATION: use them only for transcript reload/projection parity, not as final elicitation-quality goldens. +- ASSUMPTION: The M1 run bundles are sufficient replay seeds for transcript-first M2 evidence → VALIDATE: reload/project each committed bundle and compare stable metadata summaries → memory/SPEC.md §Open Assumptions A2-L, A5-L. + +### Acceptance Criteria + +✓ `m1 fixture bundles reload for transcript parity` — briefs #1–#3 can be loaded from bundle-local JSONL without relying on `meta.session.sourceFile` absolute paths. +✓ `m1 fixture bundle metadata matches reprojected exchanges` — each bundle's projection summary equals the projection from its JSONL transcript after reload through the same projection path used by `session.elicitationExchanges`. +✓ `m1 fixture bundle bindings match briefs` — each bundle still has exactly one session binding whose spec title matches the brief title. +✓ `m1 fixture metadata treats source file as provenance only` — absolute `meta.session.sourceFile` may be present as provenance, but replay parity depends on `artifacts.jsonl` and bundle-local paths. + +### Verification Approach + +- Inner: fixture replay regression tests — assert stable metadata and projection summaries for committed M1 bundles. +- Middle: replay oracle — proves M1 captures are usable M2 transcript evidence without introducing a parallel fixture store or a file-linear projection special case. +- Outer: no new human review required unless the builder changes brief content or scripted user notes. + +### Cross-cutting obligations + +- Do not make absolute local paths part of golden fixture truth. +- Keep graph/coherence artifacts deferred in M2 unless the graph/coherence substrates land separately. +- Preserve the human-reviewed caveat: M1 captures are good structural seeds on current terms, not final product-behavior evidence. + +## Queue discipline + +- Build cards in order and commit after each passing slice. +- If any card demonstrates JSONL insufficiency, stop the queue, preserve the failing oracle, and route back for `ln-spike` or `ln-spec`/`ln-plan` fallback reconciliation before continuing. +- Delete `memory/CARDS.md` when all queued cards are complete or superseded. diff --git a/memory/PLAN.md b/memory/PLAN.md index 27bea827..33142d17 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -92,6 +92,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Cross-cutting obligations:** This frontier is the transcript-side proof for the shared event substrate that later carries structured elicitation entries, session binding, lens switches, side-task results, mentions, and `worldUpdate` without inventing a parallel channel or canonical chat/turn store. JSONL viability must validate sessions created through the `WorkspaceSessionCoordinator`, including the first-entry binding and `/new` same-spec behavior. - **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L / I3-L, I8-L, I10-L / A2-L, A12-L - **Design docs:** archived [jsonl-session-viability-note](file:///Users/lunelson/Code/hashintel/brunch-next/archive/archive/docs/architecture/jsonl-session-viability-note.md) +- **Current execution pointer:** `memory/CARDS.md` queue — build JSONL reload parity, custom-entry survival, active-branch projection, and M1 fixture replay evidence slices. ### web-shell diff --git a/src/workspace-session-coordinator.test.ts b/src/workspace-session-coordinator.test.ts index b4977bf6..46d529ff 100644 --- a/src/workspace-session-coordinator.test.ts +++ b/src/workspace-session-coordinator.test.ts @@ -4,11 +4,21 @@ import { join } from "node:path" import { describe, expect, it } from "vitest" +import { SessionManager } from "@earendil-works/pi-coding-agent" + +import { projectElicitationExchanges } from "./elicitation-exchange.js" import { createWorkspaceSessionCoordinator, verifyWorkspaceSessionStores, } from "./workspace-session-coordinator.js" +const SESSION_BINDING_TYPE = "brunch.session_binding" + +type JsonlLine = { + type?: string + customType?: string +} + describe("WorkspaceSessionCoordinator", () => { it("creates scoped state, a bound pi session, and derivable chrome state", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) @@ -39,7 +49,7 @@ describe("WorkspaceSessionCoordinator", () => { expect(oracle.sessions[0]?.binding.specId).toBe(result.spec.id) }) - it("creates a same-spec new session without mutating the first session binding", async () => { + it("jsonl coordinator new session reloads same spec", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) const coordinator = createWorkspaceSessionCoordinator({ cwd }) @@ -53,6 +63,30 @@ describe("WorkspaceSessionCoordinator", () => { expect(second.spec.id).toBe(first.spec.id) expect(second.session.id).not.toBe(first.session.id) + const reloadedFirst = SessionManager.open( + first.session.file, + undefined, + cwd, + ) + const reloadedSecond = SessionManager.open( + second.session.file, + undefined, + cwd, + ) + const firstBinding = reloadedFirst + .getEntries() + .find((entry) => entry.customType === SESSION_BINDING_TYPE) + const secondBinding = reloadedSecond + .getEntries() + .find((entry) => entry.customType === SESSION_BINDING_TYPE) + + expect(firstBinding).toMatchObject({ + data: { specId: first.spec.id, specTitle: "Scratch spec" }, + }) + expect(secondBinding).toMatchObject({ + data: { specId: first.spec.id, specTitle: "Scratch spec" }, + }) + const oracle = await verifyWorkspaceSessionStores({ cwd, expectedSessionCount: 2, @@ -71,7 +105,53 @@ describe("WorkspaceSessionCoordinator", () => { ) }) - it("does not duplicate the binding when pi later flushes the first assistant message", async () => { + it("jsonl binding-only coordinator session reloads", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) + const coordinator = createWorkspaceSessionCoordinator({ cwd }) + + const result = await coordinator.startOrCreate({ + specTitle: "Scratch spec", + }) + const reloaded = SessionManager.open(result.session.file, undefined, cwd) + const bindings = reloaded + .getEntries() + .filter((entry) => entry.customType === SESSION_BINDING_TYPE) + + expect(bindings).toHaveLength(1) + expect(bindings[0]).toMatchObject({ + customType: SESSION_BINDING_TYPE, + data: { + sessionId: result.session.id, + specId: result.spec.id, + specTitle: result.spec.title, + }, + }) + }) + + it("jsonl coordinator pre-assistant flush does not duplicate prefix", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) + const coordinator = createWorkspaceSessionCoordinator({ cwd }) + + const result = await coordinator.startOrCreate({ + specTitle: "Scratch spec", + }) + const reloaded = SessionManager.open(result.session.file, undefined, cwd) + reloaded.appendMessage({ role: "assistant", content: "hello" }) + reloaded.appendMessage({ role: "user", content: "hi" }) + + const content = await readFile(result.session.file, "utf8") + const lines = content + .split("\n") + .filter((line) => line.trim().length > 0) + .map((line) => JSON.parse(line) as JsonlLine) + + expect(lines.filter((entry) => entry.type === "session")).toHaveLength(1) + expect( + lines.filter((entry) => entry.customType === SESSION_BINDING_TYPE), + ).toHaveLength(1) + }) + + it("jsonl session reload preserves coordinator binding", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) const coordinator = createWorkspaceSessionCoordinator({ cwd }) @@ -82,17 +162,21 @@ describe("WorkspaceSessionCoordinator", () => { role: "assistant", content: "hello", }) + result.session.manager.appendMessage({ role: "user", content: "answer" }) - const oracle = await verifyWorkspaceSessionStores({ - cwd, - expectedSessionCount: 1, + const reloaded = SessionManager.open(result.session.file, undefined, cwd) + const bindings = reloaded + .getEntries() + .filter((entry) => entry.customType === SESSION_BINDING_TYPE) + + expect(bindings).toHaveLength(1) + expect(bindings[0]).toMatchObject({ + data: { + sessionId: result.session.id, + specId: result.spec.id, + specTitle: "Scratch spec", + }, }) - expect(oracle.ok).toBe(true) - if (!oracle.ok) { - expect(oracle.errors).toEqual([]) - return - } - expect(oracle.sessions[0]?.bindingCount).toBe(1) }) it("does not duplicate pre-assistant entries when flushed after the user message and before assistant persistence", async () => { @@ -125,6 +209,30 @@ describe("WorkspaceSessionCoordinator", () => { } }) + it("jsonl session reload projects the same simple exchange", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) + const coordinator = createWorkspaceSessionCoordinator({ cwd }) + + const result = await coordinator.startOrCreate({ + specTitle: "Scratch spec", + }) + result.session.manager.appendMessage({ + role: "assistant", + content: "Question", + }) + result.session.manager.appendMessage({ role: "user", content: "Answer" }) + + const beforeReload = projectElicitationExchanges( + result.session.manager.getBranch(), + ) + const afterReload = projectElicitationExchanges( + SessionManager.open(result.session.file, undefined, cwd).getBranch(), + ) + + expect(afterReload).toEqual(beforeReload) + expect(afterReload.exchanges).toHaveLength(1) + }) + it("binds a pi-created replacement session to the current spec", async () => { const cwd = await mkdtemp(join(tmpdir(), "brunch-ws-")) const coordinator = createWorkspaceSessionCoordinator({ cwd }) From 46a8e288bdc076759f1bde91bc13e2552fca1e8e Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 14:55:57 +0200 Subject: [PATCH 3/9] test: prove JSONL payload survival --- memory/CARDS.md | 2 +- src/jsonl-session-viability.test.ts | 298 ++++++++++++++++++++++++++++ 2 files changed, 299 insertions(+), 1 deletion(-) create mode 100644 src/jsonl-session-viability.test.ts diff --git a/memory/CARDS.md b/memory/CARDS.md index 3d2ccfd4..c2dda210 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -48,7 +48,7 @@ Coordinator-created sessions remain self-describing after Pi JSONL reload. - Do not introduce a canonical chat/turn table or a Brunch-side mirror store to make the test pass. - Treat failure as viability evidence, not as an invitation to silently widen Brunch's local parser. -## Card 2 — status: next +## Card 2 — status: done ### Target Behavior diff --git a/src/jsonl-session-viability.test.ts b/src/jsonl-session-viability.test.ts new file mode 100644 index 00000000..1f0a9a63 --- /dev/null +++ b/src/jsonl-session-viability.test.ts @@ -0,0 +1,298 @@ +import { mkdtempSync } from "node:fs" +import { tmpdir } from "node:os" +import { join } from "node:path" + +import { describe, expect, it } from "vitest" + +import { + SessionManager, + type CustomEntry, + type CustomMessageEntry, + type SessionEntry, + type SessionMessageEntry, +} from "@earendil-works/pi-coding-agent" + +interface PersistedSessionFixture { + file: string + manager: SessionManager +} + +describe("Pi JSONL transcript viability", () => { + it("jsonl raw user assistant payload survival", async () => { + const { file, manager } = createPersistedSession() + const userContent = [ + { type: "text" as const, text: "Describe this image" }, + { + type: "image" as const, + image: "data:image/png;base64,ZmFrZQ==", + mimeType: "image/png", + }, + ] + const assistantContent = [ + { type: "text" as const, text: "Here is a structured answer." }, + ] + + manager.appendMessage({ role: "user", content: userContent }) + manager.appendMessage({ role: "assistant", content: assistantContent }) + + const reloaded = SessionManager.open(file) + const messages = reloaded.getEntries().filter(isMessageEntry) + + expect(messages.map((entry) => entry.message)).toEqual([ + { role: "user", content: userContent }, + { role: "assistant", content: assistantContent }, + ]) + }) + + it("jsonl custom entry survival matrix", async () => { + const { file, manager } = createPersistedSession() + const customEntries = [ + ["brunch.lens_switch", { lens: "verification-design", reason: "test" }], + [ + "brunch.mention", + { entityId: "node-1", snapshottedLsn: 7, title: "Known node" }, + ], + [ + "brunch.mention_staleness_hint", + { entityId: "node-1", seenLsn: 7, currentLsn: 9 }, + ], + [ + "brunch.continuity", + { + lastSeenLsn: 9, + interestSet: ["node-1", "node-2"], + compactionAnchorIds: ["anchor-1"], + }, + ], + ] as const + + for (const [customType, data] of customEntries) { + manager.appendCustomEntry(customType, data) + } + flushPreAssistantEntries(manager) + + const reloaded = SessionManager.open(file) + const customByType = new Map( + reloaded + .getEntries() + .filter(isCustomEntry) + .map((entry) => [entry.customType, entry.data]), + ) + + for (const [customType, data] of customEntries) { + expect(customByType.get(customType)).toEqual(data) + } + }) + + it("jsonl custom message survival matrix", async () => { + const { file, manager } = createPersistedSession() + const worldUpdate = { + changedSinceLsn: 11, + items: [{ id: "node-1", lsn: 12, title: "Updated node" }], + } + const sideTaskResult = { + taskId: "side-task-1", + status: "succeeded", + summary: "Found related risk.", + } + const structuredPrompt = { + promptId: "prompt-1", + kind: "radio", + choices: ["A", "B"], + } + + manager.appendCustomMessageEntry( + "worldUpdate", + "Node node-1 changed since your last turn.", + true, + worldUpdate, + ) + manager.appendCustomMessageEntry( + "brunch.side_task_result", + [{ type: "text", text: "Side task result: Found related risk." }], + false, + sideTaskResult, + ) + manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + "Choose the better framing.", + true, + structuredPrompt, + ) + flushPreAssistantEntries(manager) + + const reloaded = SessionManager.open(file) + const customMessages = reloaded.getEntries().filter(isCustomMessageEntry) + + expect(customMessages).toEqual([ + expect.objectContaining({ + customType: "worldUpdate", + content: "Node node-1 changed since your last turn.", + display: true, + details: worldUpdate, + }), + expect.objectContaining({ + customType: "brunch.side_task_result", + content: [ + { type: "text", text: "Side task result: Found related risk." }, + ], + display: false, + details: sideTaskResult, + }), + expect.objectContaining({ + customType: "brunch.elicitation_prompt", + content: "Choose the better framing.", + display: true, + details: structuredPrompt, + }), + ]) + }) + + it("jsonl custom messages re-enter pi context", async () => { + const { file, manager } = createPersistedSession() + manager.appendCustomMessageEntry( + "worldUpdate", + "World update: node-1 changed.", + true, + { changedSinceLsn: 3 }, + ) + manager.appendCustomEntry("brunch.lens_switch", { lens: "observer" }) + manager.appendCustomMessageEntry( + "brunch.side_task_result", + "Side task completed.", + false, + { taskId: "task-1" }, + ) + flushPreAssistantEntries(manager) + + const contextMessages = SessionManager.open(file) + .buildSessionContext() + .messages.filter((message) => message.role === "custom") + + expect(contextMessages).toEqual([ + expect.objectContaining({ + role: "custom", + customType: "worldUpdate", + content: "World update: node-1 changed.", + }), + expect.objectContaining({ + role: "custom", + customType: "brunch.side_task_result", + content: "Side task completed.", + }), + ]) + }) + + it("jsonl continuity metadata survival", async () => { + const { file, manager } = createPersistedSession() + const anchorEntryId = manager.appendMessage({ + role: "assistant", + content: "Anchor before compaction", + }) + const continuity = { + lastSeenLsn: 42, + interestSet: ["node-a", "node-b"], + compactionAnchors: [{ entryId: anchorEntryId, graphNodeId: "node-a" }], + } + + manager.appendCustomEntry("brunch.continuity", continuity) + manager.appendCompaction("Compacted summary", anchorEntryId, 1_234, { + brunch: { continuity }, + }) + flushPreAssistantEntries(manager) + + const reloaded = SessionManager.open(file) + const customContinuity = reloaded + .getEntries() + .filter(isCustomEntry) + .find((entry) => entry.customType === "brunch.continuity") + const compaction = reloaded + .getEntries() + .find((entry) => entry.type === "compaction") + + expect(customContinuity?.data).toEqual(continuity) + expect(compaction).toMatchObject({ + details: { brunch: { continuity } }, + }) + }) + + it("jsonl structured elicitation survival", async () => { + const { file, manager } = createPersistedSession() + const promptDetails = { + promptId: "prompt-1", + surface: "checkbox", + choices: ["fast", "safe"], + } + const responseData = { + promptId: "prompt-1", + selected: ["safe"], + freeform: "Prefer safety.", + } + + manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + "Select priorities.", + true, + promptDetails, + ) + manager.appendMessage({ role: "user", content: "I choose safety." }) + manager.appendCustomEntry("brunch.elicitation_response", responseData) + flushPreAssistantEntries(manager) + + const reloadedEntries = SessionManager.open(file).getEntries() + const structuredPrompt = reloadedEntries.find( + (entry) => + isCustomMessageEntry(entry) && + entry.customType === "brunch.elicitation_prompt", + ) + const ordinaryUser = reloadedEntries.find( + (entry) => isMessageEntry(entry) && entry.message.role === "user", + ) + const structuredResponse = reloadedEntries.find( + (entry) => + isCustomEntry(entry) && + entry.customType === "brunch.elicitation_response", + ) + + expect(structuredPrompt).toMatchObject({ + type: "custom_message", + details: promptDetails, + }) + expect(ordinaryUser).toMatchObject({ + type: "message", + message: { role: "user", content: "I choose safety." }, + }) + expect(structuredResponse).toMatchObject({ + type: "custom", + data: responseData, + }) + }) +}) + +function createPersistedSession(): PersistedSessionFixture { + const cwd = mkdtempSync(join(tmpdir(), "brunch-jsonl-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + const file = manager.getSessionFile() + if (!file) { + throw new Error("Expected persisted session file") + } + return { file, manager } +} + +function flushPreAssistantEntries(manager: SessionManager): void { + manager.appendMessage({ role: "assistant", content: "Persistence sentinel" }) +} + +function isMessageEntry(entry: SessionEntry): entry is SessionMessageEntry { + return entry.type === "message" +} + +function isCustomEntry(entry: SessionEntry): entry is CustomEntry { + return entry.type === "custom" +} + +function isCustomMessageEntry( + entry: SessionEntry, +): entry is CustomMessageEntry { + return entry.type === "custom_message" +} From ff83adc2a8ab3948ffeca0864ade0de9824d7a12 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 14:58:36 +0200 Subject: [PATCH 4/9] fix: project elicitation exchanges from active branch --- memory/CARDS.md | 2 +- src/elicitation-exchange.test.ts | 131 +++++++++++++++++++++++++++++++ src/elicitation-exchange.ts | 29 +++++-- src/rpc.test.ts | 45 +++++++++++ src/rpc.ts | 6 +- 5 files changed, 204 insertions(+), 9 deletions(-) diff --git a/memory/CARDS.md b/memory/CARDS.md index c2dda210..0b0705f0 100644 --- a/memory/CARDS.md +++ b/memory/CARDS.md @@ -90,7 +90,7 @@ Representative Pi message and Brunch custom transcript payloads survive Pi JSONL - Use Pi-exported entry/message types for envelopes; Brunch-owned fixture types should cover only Brunch payload semantics. - If a payload cannot be represented without a new Brunch schema owner, stop and surface that as a design/scoping issue rather than inventing a broad store. -## Card 3 — status: next +## Card 3 — status: done ### Target Behavior diff --git a/src/elicitation-exchange.test.ts b/src/elicitation-exchange.test.ts index bbd60978..523f255f 100644 --- a/src/elicitation-exchange.test.ts +++ b/src/elicitation-exchange.test.ts @@ -6,6 +6,7 @@ import { describe, expect, it } from "vitest" import { SessionManager } from "@earendil-works/pi-coding-agent" import { + loadActiveBranchTranscriptEntries, loadJsonlTranscriptEntries, projectElicitationExchanges, } from "./elicitation-exchange.js" @@ -167,6 +168,136 @@ describe("elicitation exchange projection", () => { ) }) + it("jsonl active branch projection excludes abandoned exchange", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-pi-branch-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + const abandonedPromptId = manager.appendMessage({ + role: "assistant", + content: "Abandoned prompt", + }) + const abandonedResponseId = manager.appendMessage({ + role: "user", + content: "Abandoned answer", + }) + manager.resetLeaf() + const activePromptId = manager.appendMessage({ + role: "assistant", + content: "Active prompt", + }) + const activeResponseId = manager.appendMessage({ + role: "user", + content: "Active answer", + }) + + const fileLinearProjection = projectElicitationExchanges( + await loadJsonlTranscriptEntries(manager.getSessionFile()!), + ) + const activeBranchProjection = projectElicitationExchanges( + loadActiveBranchTranscriptEntries(manager.getSessionFile()!, { cwd }), + ) + + expect( + fileLinearProjection.exchanges.map( + (exchange) => exchange.responseEntryIds, + ), + ).toEqual([[abandonedResponseId], [activeResponseId]]) + expect(activeBranchProjection.exchanges).toHaveLength(1) + expect(activeBranchProjection.exchanges[0]?.promptEntryIds).toEqual([ + activePromptId, + ]) + expect(activeBranchProjection.exchanges[0]?.responseEntryIds).toEqual([ + activeResponseId, + ]) + expect( + activeBranchProjection.exchanges.some((exchange) => + exchange.promptEntryIds.includes(abandonedPromptId), + ), + ).toBe(false) + }) + + it("jsonl active branch projection preserves selected exchange", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-pi-branch-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + const sharedPromptId = manager.appendMessage({ + role: "assistant", + content: "Choose a path", + }) + manager.appendMessage({ role: "user", content: "Old path" }) + manager.branch(sharedPromptId) + const selectedResponseId = manager.appendMessage({ + role: "user", + content: "Selected path", + }) + + const activeBranchProjection = projectElicitationExchanges( + loadActiveBranchTranscriptEntries(manager.getSessionFile()!, { cwd }), + ) + + expect(activeBranchProjection).toMatchObject({ + status: "ready", + exchanges: [ + { + promptRange: { start: sharedPromptId, end: sharedPromptId }, + responseRange: { start: selectedResponseId, end: selectedResponseId }, + promptEntryIds: [sharedPromptId], + responseEntryIds: [selectedResponseId], + }, + ], + openPrompt: null, + }) + }) + + it("jsonl active branch custom messages enter context only once", async () => { + const cwd = await mkdtemp(join(tmpdir(), "brunch-pi-branch-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + const abandonedPromptId = manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + "Abandoned custom prompt", + true, + { promptId: "abandoned" }, + ) + manager.appendMessage({ role: "user", content: "Abandoned answer" }) + manager.resetLeaf() + const activePromptId = manager.appendCustomMessageEntry( + "brunch.elicitation_prompt", + "Active custom prompt", + true, + { promptId: "active" }, + ) + manager.appendMessage({ role: "user", content: "Active answer" }) + manager.appendMessage({ + role: "assistant", + content: "Persistence sentinel", + }) + + const reloaded = SessionManager.open( + manager.getSessionFile()!, + undefined, + cwd, + ) + const activeBranch = loadActiveBranchTranscriptEntries( + manager.getSessionFile()!, + { + cwd, + }, + ) + const projection = projectElicitationExchanges(activeBranch) + const contextMessages = reloaded + .buildSessionContext() + .messages.filter((message) => message.role === "custom") + + expect(projection.exchanges[0]?.promptEntryIds).toEqual([activePromptId]) + expect(projection.exchanges[0]?.promptEntryIds).not.toContain( + abandonedPromptId, + ) + expect(contextMessages).toHaveLength(1) + expect(contextMessages[0]).toMatchObject({ + role: "custom", + customType: "brunch.elicitation_prompt", + content: "Active custom prompt", + }) + }) + it("loads newline-delimited Pi transcript entries from disk", async () => { const dir = await mkdtemp(join(tmpdir(), "brunch-jsonl-")) const file = join(dir, "session.jsonl") diff --git a/src/elicitation-exchange.ts b/src/elicitation-exchange.ts index 99d83300..93428d92 100644 --- a/src/elicitation-exchange.ts +++ b/src/elicitation-exchange.ts @@ -1,11 +1,12 @@ import { readFile } from "node:fs/promises" -import type { - CustomEntry, - CustomMessageEntry, - FileEntry, - SessionEntry, - SessionMessageEntry, +import { + SessionManager, + type CustomEntry, + type CustomMessageEntry, + type FileEntry, + type SessionEntry, + type SessionMessageEntry, } from "@earendil-works/pi-coding-agent" const STRUCTURED_RESPONSE_TYPES = new Set([ @@ -37,6 +38,11 @@ export interface ElicitationExchangeProjection { openPrompt: OpenPromptProjection | null } +export interface ActiveBranchTranscriptOptions { + cwd?: string + sessionDir?: string +} + export async function loadJsonlTranscriptEntries( file: string, ): Promise { @@ -47,6 +53,17 @@ export async function loadJsonlTranscriptEntries( .map((line) => JSON.parse(line) as FileEntry) } +export function loadActiveBranchTranscriptEntries( + file: string, + options?: ActiveBranchTranscriptOptions, +): SessionEntry[] { + return SessionManager.open( + file, + options?.sessionDir, + options?.cwd, + ).getBranch() +} + export function projectElicitationExchanges( entries: readonly unknown[], ): ElicitationExchangeProjection { diff --git a/src/rpc.test.ts b/src/rpc.test.ts index 8a46228f..565220e9 100644 --- a/src/rpc.test.ts +++ b/src/rpc.test.ts @@ -76,6 +76,27 @@ async function createSessionFile(): Promise { return manager.getSessionFile()! } +async function createBranchedSessionFile(): Promise<{ + file: string + activePromptId: string + abandonedPromptId: string +}> { + const cwd = await mkdtemp(join(tmpdir(), "brunch-rpc-branch-")) + const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) + const abandonedPromptId = manager.appendMessage({ + role: "assistant", + content: "Abandoned prompt", + }) + manager.appendMessage({ role: "user", content: "Abandoned answer" }) + manager.resetLeaf() + const activePromptId = manager.appendMessage({ + role: "assistant", + content: "Active prompt", + }) + manager.appendMessage({ role: "user", content: "Active answer" }) + return { file: manager.getSessionFile()!, activePromptId, abandonedPromptId } +} + describe("JSON-RPC handlers", () => { it("serves a named workspace snapshot method", async () => { const handlers = createRpcHandlers({ coordinator: coordinator() }) @@ -119,6 +140,30 @@ describe("JSON-RPC handlers", () => { }) }) + it("session.elicitationExchanges uses active branch semantics", async () => { + const { file, activePromptId, abandonedPromptId } = + await createBranchedSessionFile() + const handlers = createRpcHandlers({ + coordinator: coordinator(readyState(file)), + }) + + const response = await handlers.handle({ + jsonrpc: "2.0", + id: 8, + method: "session.elicitationExchanges", + }) + + expect(response).toMatchObject({ + jsonrpc: "2.0", + id: 8, + result: { + status: "ready", + exchanges: [{ promptEntryIds: [activePromptId] }], + }, + }) + expect(JSON.stringify(response)).not.toContain(abandonedPromptId) + }) + it("rejects raw file params on session elicitation exchange RPC", async () => { const handlers = createRpcHandlers({ coordinator: coordinator() }) diff --git a/src/rpc.ts b/src/rpc.ts index e717df85..9b2ebe46 100644 --- a/src/rpc.ts +++ b/src/rpc.ts @@ -2,7 +2,7 @@ import { createInterface } from "node:readline/promises" import type { Readable, Writable } from "node:stream" import { - loadJsonlTranscriptEntries, + loadActiveBranchTranscriptEntries, projectElicitationExchanges, } from "./elicitation-exchange.js" import { workspaceSnapshotFromState } from "./print-snapshot.js" @@ -67,7 +67,9 @@ export function createRpcHandlers(options: { ) } - const entries = await loadJsonlTranscriptEntries(state.session.file) + const entries = loadActiveBranchTranscriptEntries(state.session.file, { + cwd: state.cwd, + }) return success(request.id ?? null, projectElicitationExchanges(entries)) } From 55c6de79693fb8fb63ab8e9cdd4904a66ea03297 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 15:01:17 +0200 Subject: [PATCH 5/9] test: replay M1 fixtures through JSONL reload --- memory/CARDS.md | 176 ---------------------------- memory/PLAN.md | 14 +-- memory/SPEC.md | 10 +- src/jsonl-session-viability.test.ts | 151 +++++++++++++++++++++++- 4 files changed, 162 insertions(+), 189 deletions(-) delete mode 100644 memory/CARDS.md diff --git a/memory/CARDS.md b/memory/CARDS.md deleted file mode 100644 index 0b0705f0..00000000 --- a/memory/CARDS.md +++ /dev/null @@ -1,176 +0,0 @@ -# Scope cards — FE-736 JSONL session viability - -## Orientation - -- Containing seam: transcript persistence over Pi `SessionManager` JSONL under `.brunch/sessions/`, with Brunch custom transcript entries and coordinator-created session binding layered on top. -- Frontier item: `jsonl-session-viability` / FE-736 on `ln/fe-736-jsonl-session-viability`; these cards are slices inside the same frontier, not new Linear issues or branches. -- Volatile state: no `HANDOFF.md` is present; M1 captures were human-reviewed as good structural replay seeds on their current terms, but they are not final evidence for elicitation interaction logic or knowledge flow. -- Main open risk: Pi JSONL may preserve entries syntactically while Brunch accidentally consumes the wrong semantic path — file-linear entries instead of active branch, raw custom entries instead of LLM-context custom messages, or binding state that only works because the coordinator flushed through a private seam. -- Cross-cutting obligations: preserve Pi JSONL as transcript truth unless proven insufficient; avoid a parallel canonical chat/turn store; validate `WorkspaceSessionCoordinator` sessions including `/new`; keep projection handlers as oracles over canonical stores; carry the replay/property/adversarial fixture strategy forward without treating scripted M1 exchange shape as final product behavior. - -## Card 1 — status: done - -### Target Behavior - -Coordinator-created sessions remain self-describing after Pi JSONL reload. - -### Boundary Crossings - -```text -→ WorkspaceSessionCoordinator-created Brunch session -→ Pi SessionManager append/flush JSONL persistence -→ Pi SessionManager.open reload -→ Brunch transcript/projection assertions -``` - -### Risks and Assumptions - -- RISK: Pi normalizes timestamps, ids, or message content during open/rewrite → MITIGATION: compare payload fields that should be stable and explicitly document allowed timestamp/id variance. -- RISK: The coordinator's pre-assistant flush path masks reload behavior that real sessions do not share → MITIGATION: test through the public coordinator path and `SessionManager.open`, not direct JSON parsing alone. -- ASSUMPTION: Pi JSONL preserves Brunch `brunch.session_binding` custom entries across binding-only, first-message, and `/new` coordinator lifecycles → VALIDATE: open the persisted files and compare binding cardinality plus binding data → memory/SPEC.md §Open Assumptions A2-L. - -### Acceptance Criteria - -✓ `jsonl binding-only coordinator session reloads` — a newly coordinator-created session with no assistant message can be reopened and has exactly one `brunch.session_binding`. -✓ `jsonl coordinator pre-assistant flush does not duplicate prefix` — after a binding-only reload and first assistant/user append, the JSONL file has one session header and exactly one binding. -✓ `jsonl session reload preserves coordinator binding` — a coordinator-created transcript has exactly one `brunch.session_binding` after `SessionManager.open`, with the same session id, spec id, and spec title. -✓ `jsonl coordinator new session reloads same spec` — `createNewSessionForCurrentSpec()` creates a distinct session id/file whose reloaded binding carries the unchanged spec id and title. -✓ `jsonl session reload projects the same simple exchange` — `projectElicitationExchanges` returns the same prompt/response entry ids before and after reload for the simple coordinator-created transcript. - -### Verification Approach - -- Inner: round-trip unit tests — prove local reload and projection behavior against Pi `SessionManager`. -- Middle: artifact oracle — inspect the actual persisted JSONL path from the coordinator rather than an in-memory fixture. - -### Cross-cutting obligations - -- Use Pi-owned session entry/message types where possible; Brunch owns only semantic projection types. -- Do not introduce a canonical chat/turn table or a Brunch-side mirror store to make the test pass. -- Treat failure as viability evidence, not as an invitation to silently widen Brunch's local parser. - -## Card 2 — status: done - -### Target Behavior - -Representative Pi message and Brunch custom transcript payloads survive Pi JSONL reload byte-equivalently. - -### Boundary Crossings - -```text -→ Pi raw user/assistant message fixtures and Brunch custom event fixture payloads -→ Pi SessionManager message/custom/custom_message entry persistence -→ Pi SessionManager.open reload -→ Brunch survival-matrix and context-participation assertions -``` - -### Risks and Assumptions - -- RISK: Some future Brunch custom entries do not yet have production constructors → MITIGATION: use minimal test fixtures that exercise Pi JSONL persistence while keeping schemas local to the test or a narrowly named viability helper. -- RISK: The test over-specifies final payload schemas before their frontiers land → MITIGATION: assert preservation of representative payload envelopes, `customType` names, and context participation where required, not final product semantics. -- ASSUMPTION: Pi JSONL preserves raw Pi message payloads and unknown Brunch custom-entry payloads without requiring Pi schema changes → VALIDATE: reload a matrix of named entries and compare stable payload fields → memory/SPEC.md §Open Assumptions A2-L. - -### Acceptance Criteria - -✓ `jsonl raw user assistant payload survival` — representative user and assistant messages, including non-trivial content shapes beyond one plain string, survive reload without being projected into Brunch-local DTOs. -✓ `jsonl custom entry survival matrix` — `brunch.lens_switch`, `brunch.mention`, `brunch.mention_staleness_hint`, and other non-context Brunch custom entries survive reload with `customType` and `data` intact. -✓ `jsonl custom message survival matrix` — context-carrying entries such as `worldUpdate`, `brunch.side_task_result`, and structured elicitation prompts survive reload with `customType`, `content`, `display`, and `details` intact. -✓ `jsonl custom messages re-enter pi context` — after reload, `SessionManager.buildSessionContext()` includes the representative `custom_message` entries on the active branch with the same custom type and content. -✓ `jsonl continuity metadata survival` — representative `lastSeenLsn`, interest-set, and compaction-anchor metadata survives reload in the chosen transcript-native shape, including any Pi-native `compaction.details` shape chosen for anchors. -✓ `jsonl structured elicitation survival` — structured prompt/response custom entries survive reload distinctly from ordinary user/assistant messages. - -### Verification Approach - -- Inner: schema/shape validation at the boundary — compare raw message fields plus custom `data` / `content` / `details` round trips for representative Brunch entry families. -- Middle: round-trip oracle — persist with Pi APIs, reload with Pi APIs, then assert Brunch-visible semantics and Pi context reconstruction from the reloaded entries. - -### Cross-cutting obligations - -- Keep this as a JSONL viability proof, not a commitment to final side-task, mention, or continuity subsystem schemas. -- New helper names should use lexicon terms: session binding, structured elicitation entry, lens switch, side-task result, world update, mention ledger. -- Use Pi-exported entry/message types for envelopes; Brunch-owned fixture types should cover only Brunch payload semantics. -- If a payload cannot be represented without a new Brunch schema owner, stop and surface that as a design/scoping issue rather than inventing a broad store. - -## Card 3 — status: done - -### Target Behavior - -Elicitation exchange projection after reload uses Pi's active branch. - -### Boundary Crossings - -```text -→ Branched Pi session fixture -→ Pi JSONL tree/leaf persistence -→ Pi SessionManager.open reload -→ Brunch elicitation exchange projection -``` - -### Risks and Assumptions - -- RISK: `loadJsonlTranscriptEntries` currently reads file entries linearly and may not reflect Pi's active branch semantics → MITIGATION: compare projection from Pi's active branch after reload against any file-linear projection, then make the product projection use the active-branch source if needed. -- RISK: Branching APIs behave differently from the initial M1 linear captures → MITIGATION: use a minimal fork/branch fixture with one abandoned branch and one active branch. -- ASSUMPTION: Pi JSONL stores enough tree/leaf information to re-project elicitation exchanges from the active branch after reload → VALIDATE: reload the branched session and assert only active-branch prompt/response ids appear → memory/SPEC.md §Open Assumptions A12-L. - -### Acceptance Criteria - -✓ `jsonl active branch projection excludes abandoned exchange` — after reload, an exchange on an abandoned branch is absent from Brunch's projected exchanges. -✓ `jsonl active branch projection preserves selected exchange` — after reload, the active branch's prompt/response exchange remains projectable with stable ranges. -✓ `session.elicitationExchanges uses active branch semantics` — the RPC handler projects the selected session's active branch rather than blindly projecting every JSONL line when branch state exists. -✓ `jsonl active branch custom messages enter context only once` — reloaded custom-message entries on abandoned branches do not appear in the active branch projection or context, while active-branch custom messages do. - -### Verification Approach - -- Inner: round-trip projection test — builds a branched Pi session, reloads it, and compares projected exchanges from `SessionManager.getBranch()` rather than raw file order. -- Middle: RPC contract test — proves the named `session.elicitationExchanges` method follows the same active-branch semantics as the projection helper. - -### Cross-cutting obligations - -- Preserve D13 capture-aware projection: exchanges are derived from Pi transcript truth, not stored as canonical chat/turn rows. -- Keep RPC thin: fix projection source/semantics in the projection handler path, not by adding file params or a generic read model. -- If Pi JSONL cannot expose a stable active branch after reload, record a sharply bounded insufficiency for the M2 fallback decision. - -## Card 4 — status: next - -### Target Behavior - -Committed M1 scripted captures are reloadable JSONL evidence for M2. - -### Boundary Crossings - -```text -→ `.brunch-fixtures//scripted-001/` committed run bundle -→ Pi SessionManager.open-backed Brunch projection path -→ Brunch elicitation exchange projection -→ Fixture replay parity assertions -``` - -### Risks and Assumptions - -- RISK: committed fixture metadata contains local absolute source paths that should not be part of portable parity → MITIGATION: assert parity against bundle-local JSONL and metadata fields that are intentionally stable. -- RISK: M1 scripted captures encode thin interaction logic that later changes → MITIGATION: use them only for transcript reload/projection parity, not as final elicitation-quality goldens. -- ASSUMPTION: The M1 run bundles are sufficient replay seeds for transcript-first M2 evidence → VALIDATE: reload/project each committed bundle and compare stable metadata summaries → memory/SPEC.md §Open Assumptions A2-L, A5-L. - -### Acceptance Criteria - -✓ `m1 fixture bundles reload for transcript parity` — briefs #1–#3 can be loaded from bundle-local JSONL without relying on `meta.session.sourceFile` absolute paths. -✓ `m1 fixture bundle metadata matches reprojected exchanges` — each bundle's projection summary equals the projection from its JSONL transcript after reload through the same projection path used by `session.elicitationExchanges`. -✓ `m1 fixture bundle bindings match briefs` — each bundle still has exactly one session binding whose spec title matches the brief title. -✓ `m1 fixture metadata treats source file as provenance only` — absolute `meta.session.sourceFile` may be present as provenance, but replay parity depends on `artifacts.jsonl` and bundle-local paths. - -### Verification Approach - -- Inner: fixture replay regression tests — assert stable metadata and projection summaries for committed M1 bundles. -- Middle: replay oracle — proves M1 captures are usable M2 transcript evidence without introducing a parallel fixture store or a file-linear projection special case. -- Outer: no new human review required unless the builder changes brief content or scripted user notes. - -### Cross-cutting obligations - -- Do not make absolute local paths part of golden fixture truth. -- Keep graph/coherence artifacts deferred in M2 unless the graph/coherence substrates land separately. -- Preserve the human-reviewed caveat: M1 captures are good structural seeds on current terms, not final product-behavior evidence. - -## Queue discipline - -- Build cards in order and commit after each passing slice. -- If any card demonstrates JSONL insufficiency, stop the queue, preserve the failing oracle, and route back for `ln-spike` or `ln-spec`/`ln-plan` fallback reconciliation before continuing. -- Delete `memory/CARDS.md` when all queued cards are complete or superseded. diff --git a/memory/PLAN.md b/memory/PLAN.md index 33142d17..c580370a 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -20,11 +20,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### Active -1. `jsonl-session-viability` — Proves whether pi JSONL sessions can hold raw payloads, Brunch session binding, structured elicitation entries, and continuity metadata faithfully across reload. +1. `web-shell` — M3. Browser as thin remote head over the same host, TanStack Router + Query, one WebSocket RPC client, no REST read model. ### Next -1. `web-shell` — M3. Browser as thin remote head over the same host, TanStack Router + Query, one WebSocket RPC client, no REST read model. +1. `graph-data-plane` — M4. SQLite-backed graph persistence; intent-plane nodes/edges; graph clock; change log; coherence-state homes. ### Parallel / Low-conflict @@ -33,7 +33,6 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ### Horizon -- `graph-data-plane` — M4. SQLite-backed graph persistence; intent-plane nodes/edges; graph clock; change log; coherence-state homes. - `agent-graph-integration` — M5. Graph tools and observer extraction through pi extension seams; all writes via the shared command layer. - `authority-model` — M6. Three-tier policy (autonomous / requires-confirmation / human-only) end-to-end across modes. - `turn-boundary-reconciliation` — M7. Graph-revision tracking, session interest sets, `worldUpdate` injection, and the mention-staleness hint synthesiser. @@ -84,15 +83,15 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** [FE-736](https://linear.app/hash/issue/FE-736/jsonl-session-viability-proof) - **Branch:** `ln/fe-736-jsonl-session-viability` (stacked on `ln/fe-735-mode-shell-fixture-driver`) - **Kind:** structural -- **Status:** active +- **Status:** done - **Objective:** Prove whether pi `SessionManager` JSONL in `.brunch/sessions/` is rich enough to carry raw assistant/user payloads, Brunch session binding (`brunch.session_binding`), structured elicitation prompt/response entries when needed, other custom entries (`brunch.lens_switch`, `brunch.side_task_result`, `worldUpdate`, `brunch.mention`, `brunch.mention_staleness_hint`), and session-scoped continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. - **Why now / unlocks:** Validates A2-L and pins D6-L. If JSONL is insufficient, M2 produces a sharply scoped fallback proposal that all later milestones can plan against. - **Acceptance:** Round-trip reload of a captured session preserves raw payloads byte-equivalent (modulo timestamps); session binding and structured elicitation entries survive; elicitation exchanges can be re-projected from the active branch after reload; all named Brunch custom entries survive, including side-task-result delivery entries when present; continuity metadata survives. If any of these fail, the failure is sharply documented and a fallback path is proposed (project richer substrate / mirror JSONL into richer records / propose pi upstream change). -- **Verification:** Inner — verify gate plus synthetic JSONL projection tests. Middle — JSONL round-trip/property tests for raw payloads, `brunch.session_binding`, structured elicitation entries, active-branch exchange projection, and coordinator-created `/new` sessions. Outer — fixture replay parity across the transcript-first run bundle. +- **Verification:** Inner — verify gate plus synthetic JSONL projection tests. Middle — JSONL round-trip/property tests for raw payloads, `brunch.session_binding`, structured elicitation entries, active-branch exchange projection, coordinator-created `/new` sessions, and M1 fixture replay parity. Outer — fixture replay parity across the transcript-first run bundle; no new human review was required because brief content and scripted user notes did not change. - **Cross-cutting obligations:** This frontier is the transcript-side proof for the shared event substrate that later carries structured elicitation entries, session binding, lens switches, side-task results, mentions, and `worldUpdate` without inventing a parallel channel or canonical chat/turn store. JSONL viability must validate sessions created through the `WorkspaceSessionCoordinator`, including the first-entry binding and `/new` same-spec behavior. - **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L / I3-L, I8-L, I10-L / A2-L, A12-L - **Design docs:** archived [jsonl-session-viability-note](file:///Users/lunelson/Code/hashintel/brunch-next/archive/archive/docs/architecture/jsonl-session-viability-note.md) -- **Current execution pointer:** `memory/CARDS.md` queue — build JSONL reload parity, custom-entry survival, active-branch projection, and M1 fixture replay evidence slices. +- **Current execution pointer:** complete; proceed to `web-shell`. ### web-shell @@ -261,7 +260,8 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ## Recently Completed -- 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests, `./runbooks/verify-m1.sh`, and human inspection that briefs/captures/product-shaped outputs are good on their current terms. Watch: M2 should use these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store; later elicitation work must revisit the encoded interaction logic, expectations, and knowledge-flow assumptions rather than treating the scripted M1 exchange shape as final product behavior. +- 2026-05-21 `jsonl-session-viability` — Done: Pi JSONL reload preserves coordinator-created binding-only sessions, first assistant/user flushes without duplicate prefixes, `/new` same-spec bindings, raw user/assistant payloads, representative Brunch custom entries, context-participating custom messages, continuity/compaction metadata, structured elicitation entries, active-branch exchange projection, and M1 bundle-local replay parity for briefs #1–#3. `session.elicitationExchanges` now projects from Pi's active branch instead of file-linear JSONL. Verified: `npm run verify` after each slice. Watch: M2 validates JSONL as sufficient on current POC terms, but later side-task, mention, and continuity frontiers still own their final payload semantics. +- 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests, `./runbooks/verify-m1.sh`, and human inspection that briefs/captures/product-shaped outputs are good on their current terms. Watch: M2 used these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store; later elicitation work must revisit the encoded interaction logic, expectations, and knowledge-flow assumptions rather than treating the scripted M1 exchange shape as final product behavior. - 2026-05-20 `walking-skeleton` — Done: Brunch now launches through a real pi-backed TUI boot path with coordinator-first spec gating, project-local `.brunch/` state, self-describing Pi JSONL sessions via exactly one `brunch.session_binding`, same-spec `/new` coverage, persistent cwd / spec / phase / chat-mode chrome through pi's extension widget seam, a bin shim, store-only runbook checker, and type-ownership hardening against Pi exported types. Verified: `npm run verify`, manual TUI smoke in a scratch project, automated TUI/coordinator tests, store-only runbook oracle, and manual file inspection. Watch: M1 should reuse the coordinator/session truth rather than recreating boot/session mechanics. - 2026-05-20 `pre-poc-archive-and-reseed` — Done: razed pre-POC implementation, archived legacy docs and planning memory under `archive/`, tagged `next-baseline`, reseeded `memory/SPEC.md` and `memory/PLAN.md` from the three canonical POC architecture docs. Verified: `git log --oneline` shows three clean buckets; `archive/` contains all prior material. Watch: Phase 3 infra bootstrap is folded into `walking-skeleton`, not a separate frontier. diff --git a/memory/SPEC.md b/memory/SPEC.md index 0ab87889..21ff61ca 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -84,7 +84,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | # | Assumption | Confidence | Status | Depends on | Validation approach | | --- | --- | --- | --- | --- | --- | | A1-L | `pi-coding-agent` exposes enough seams (services, custom message roles, `prepareNextTurn`, `transformContext`, RPC mode, JSONL sessions, extension UI surface) to host all M0–M9 capabilities without forking pi. | high | open | D1-L | M0–M2: walking skeleton + mode shell + JSONL viability prove the substrate. | -| A2-L | pi JSONL sessions can faithfully hold raw assistant/user payloads, Brunch session binding, structured elicitation entries, and continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. | medium | open | D6-L, I3-L | M2 — JSONL session viability slice. If false, fall back per the three options in PRD §6. | +| A2-L | pi JSONL sessions can faithfully hold raw assistant/user payloads, Brunch session binding, structured elicitation entries, and continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. | high | validated | D6-L, I3-L | M2 JSONL viability tests preserve coordinator-created bindings, raw user/assistant payloads, representative custom entries/messages, continuity metadata, structured elicitation entries, and M1 fixture replay parity. Later frontiers still own final payload semantics for side tasks, mentions, and continuity. | | A3-L | A single Brunch-owned command layer (with optimistic concurrency, validation, audit, and coherence triggers) is sufficient for both agent and human writers across all four modes for the POC's graph scale. | medium | open | D4-L | M4 + M5 + M6: graph plane, agent-↔-graph wiring, and authority tiers all routed through the same surface. | | A4-L | A monotonic global LSN per commit (one-LSN-per-transaction) is adequate for change-log replay, reconciliation-need ordering, and mention staleness without per-row vector clocks. | high | open | I1-L, I4-L | M4 + M7: replay fidelity and `worldUpdate` ordering tests. | | A5-L | An agent-as-user driver running over JSON-RPC stdio can produce regression-quality fixtures across a curated brief library. | medium | open | D5-L | M1 — first replay-regression fixtures land. | @@ -94,7 +94,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | A9-L | A session-scoped mention ledger of (`entity_id`, `snapshotted_lsn`) is the right granularity for staleness hints; transcript-scoped or graph-scoped ledgers are not needed for the POC. | low | open | I7-L | M7 — turn-boundary reconciliation slice; observed via fixture runs that stress re-read decisions. | | A10-L | A persistent TUI chrome region showing cwd / spec / phase / chat-mode can be added on top of `pi-tui`'s root layout without modifying pi. | medium | open | D2-L | M0 — walking skeleton attempts to mount the chrome; escalates to a pi upstream issue only if blocked. | | A11-L | Pi's `prepareNextTurn` plus custom-message delivery are sufficient to express side-task result delivery without inventing a second event plane or forking pi. | medium | open | D15-L | M5 + M7: side-task registry wiring and next-turn delivery proof. | -| A12-L | Pi JSONL branch entries have enough stable ids, roles, and custom-entry support to project elicitation exchanges from system/assistant spans plus user-response spans without mandatory markers around every prompt. | medium | open | D12-L, D13-L, I10-L | M1–M2: fixture captures and JSONL projection tests reveal whether role/span alternation needs explicit prompt markers. | +| A12-L | Pi JSONL branch entries have enough stable ids, roles, and custom-entry support to project elicitation exchanges from system/assistant spans plus user-response spans without mandatory markers around every prompt. | high | validated | D12-L, D13-L, I10-L | M1–M2 captures and active-branch projection tests prove prompt/response exchange projection across reload, including branch exclusion and structured custom prompt/response entries. | | A13-L | A durable observer-job queue keyed by session id and elicitation-exchange entry range can recover async extraction after process interruption without reintroducing canonical chat/turn tables; whether this shares storage with a generalized work-item/reconciliation table can be deferred. | medium | open | D18-L, I14-L | M5: observer extraction tests exercise restart/idempotence once graph writes exist. | ### Active Decisions @@ -146,14 +146,14 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | --- | --- | --- | --- | | I1-L | One global LSN per commit; every change-log entry, graph-node version, and reconciliation-need carries an LSN strictly monotonic with the global clock. | planned (M4 invariant tests) | D4-L, D6-L, D8-L | | I2-L | All durable graph mutations originate from the Brunch command layer; no caller bypasses validation, audit, or coherence triggering. | planned (M5 architectural test + lint rule) | D4-L | -| I3-L | Transcript reload reproduces raw assistant/user payloads plus Brunch session binding, structured elicitation entries, and other custom transcript entries byte-equivalently (modulo timestamps). | planned (M2 viability tests) | A2-L, D6-L | +| I3-L | Transcript reload reproduces raw assistant/user payloads plus Brunch session binding, structured elicitation entries, and other custom transcript entries byte-equivalently (modulo timestamps). | covered (M2 JSONL viability round-trip tests) | A2-L, D6-L | | I4-L | For every `worldUpdate` entry, all named graph items have LSNs strictly greater than the session's pre-update `lastSeenLsn`. | planned (M7 property test) | D6-L, I1-L | | I5-L | For every `brunch.lens_switch` entry and every session/spec binding transition, the session interest set is recomputed before the next agent turn. | planned (M7 property test) | D11-L | | I6-L | Every reconciliation need has `created_at_lsn ≤` current global LSN; `kind='impasse'` needs reference at least two graph nodes; resolved needs carry a strictly later `resolved_at_lsn`. | planned (M8 property test) | D8-L, I1-L | | I7-L | Every `framing_as` value belongs to the allowed matrix for that node's base kind. | planned (fixture property check) | D7-L | | I8-L | Spec selection persists across pi `switchSession` (i.e. `/new`); the selected session file is reopened consistently by headless projection/capture paths; each session has exactly one `brunch.session_binding`, and a session's bound spec never changes. | partially covered (M0 coordinator/TUI boot integration tests + store-only runbook checker; M1 no-injected-coordinator capture regression; manual TUI smoke and JSONL reload viability still planned) | D11-L, D21-L | | I9-L | Every `brunch.mention` payload is anchored to a stable `id`; the ledger never stores title-anchored references. | planned (M7 invariant) | D14-L | -| I10-L | Structured elicitation prompts/responses live in the Pi transcript when structure is needed; elicitation exchanges are projected from the active branch, and no parallel canonical chat/turn table carries elicitation state. | planned (M1+ projection invariant) | D12-L, D13-L, D18-L | +| I10-L | Structured elicitation prompts/responses live in the Pi transcript when structure is needed; elicitation exchanges are projected from the active branch, and no parallel canonical chat/turn table carries elicitation state. | covered (M1 exchange projection tests + M2 active-branch JSONL/RPC projection tests) | D12-L, D13-L, D18-L | | I11-L | No durable graph mutation path — including migrations, maintenance scripts, observer-job writes, or side-task-attributed writes — may bypass the `CommandExecutor` path that performs authority/result classification, version checks, structural validation, transaction execution, LSN allocation, and change-log append. | planned (M4 architectural + migration invariants; M5 caller-boundary tests) | D4-L, D15-L, D16-L, D20-L | | I12-L | Side-task results are delivered only at turn boundaries; no side-task result may steer or mutate the active turn outside the next-turn delivery path. | planned (M7 side-task delivery invariant) | D15-L | | I13-L | At any idle session branch leaf, the latest unresolved interaction state is system/assistant-originated: user input is a response to an elicitation prompt, not ambient chat. | planned (M1 fixture + transcript projection tests) | D12-L | @@ -319,7 +319,7 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | I7-L | M4+ schema/property tests over framing matrix plus brief fixture assertions. | | I8-L | M0 runbook oracle plus M2 coordinator-created JSONL reload tests. | | I9-L | M7 mention parser/ledger unit tests and staleness property tests. | -| I10-L | M1/M2 exchange projection tests, JSONL fixture replay, and no chat/turn architectural test. | +| I10-L | M1/M2 exchange projection tests, active-branch JSONL fixture replay, and no chat/turn architectural test. | | I11-L | M4/M5 no-bypass architectural test plus command transaction integration tests. | | I12-L | M7 side-task delivery invariant tests and adversarial fixture when side tasks are active. | | I13-L | M1 fixture/projection checks for idle branch leaf state. | diff --git a/src/jsonl-session-viability.test.ts b/src/jsonl-session-viability.test.ts index 1f0a9a63..4804cd35 100644 --- a/src/jsonl-session-viability.test.ts +++ b/src/jsonl-session-viability.test.ts @@ -1,6 +1,7 @@ import { mkdtempSync } from "node:fs" +import { readFile } from "node:fs/promises" import { tmpdir } from "node:os" -import { join } from "node:path" +import { dirname, join } from "node:path" import { describe, expect, it } from "vitest" @@ -12,11 +13,58 @@ import { type SessionMessageEntry, } from "@earendil-works/pi-coding-agent" +import { + loadActiveBranchTranscriptEntries, + projectElicitationExchanges, + type ElicitationExchangeProjection, +} from "./elicitation-exchange.js" + +const M1_FIXTURE_IDS = ["brief-001", "brief-002", "brief-003"] as const +const M1_RUN_ID = "scripted-001" + interface PersistedSessionFixture { file: string manager: SessionManager } +interface SessionBindingData { + schemaVersion: 1 + sessionId: string + specId: string + specTitle: string +} + +interface M1FixtureMeta { + briefId: string + runId: string + session: { + id: string + sourceFile: string + } + projectionSummary: { + status: ElicitationExchangeProjection["status"] + exchangeCount: number + openPrompt: boolean + } + artifacts: { + jsonl: string + graph: { status: "deferred" } + coherence: { status: "deferred" } + } +} + +interface M1Brief { + id: string + title: string +} + +interface M1FixtureBundle { + bundleDir: string + jsonlPath: string + meta: M1FixtureMeta + brief: M1Brief +} + describe("Pi JSONL transcript viability", () => { it("jsonl raw user assistant payload survival", async () => { const { file, manager } = createPersistedSession() @@ -269,6 +317,107 @@ describe("Pi JSONL transcript viability", () => { }) }) +describe("M1 fixture JSONL replay parity", () => { + it("m1 fixture bundles reload for transcript parity", async () => { + for (const briefId of M1_FIXTURE_IDS) { + const bundle = await loadM1FixtureBundle(briefId) + const reloaded = SessionManager.open( + bundle.jsonlPath, + undefined, + process.cwd(), + ) + + expect(reloaded.getHeader()).toMatchObject({ id: bundle.meta.session.id }) + expect(reloaded.getEntries()).not.toHaveLength(0) + expect(bundle.meta.artifacts.jsonl).toBe(`${M1_RUN_ID}.jsonl`) + } + }) + + it("m1 fixture bundle metadata matches reprojected exchanges", async () => { + for (const briefId of M1_FIXTURE_IDS) { + const bundle = await loadM1FixtureBundle(briefId) + const projection = projectElicitationExchanges( + loadActiveBranchTranscriptEntries(bundle.jsonlPath, { + cwd: process.cwd(), + }), + ) + + expect(summaryForProjection(projection)).toEqual( + bundle.meta.projectionSummary, + ) + } + }) + + it("m1 fixture bundle bindings match briefs", async () => { + for (const briefId of M1_FIXTURE_IDS) { + const bundle = await loadM1FixtureBundle(briefId) + const bindings = SessionManager.open( + bundle.jsonlPath, + undefined, + process.cwd(), + ) + .getEntries() + .filter( + (entry): entry is CustomEntry => + entry.type === "custom" && + entry.customType === "brunch.session_binding", + ) + + expect(bindings).toHaveLength(1) + expect(bindings[0]?.data).toMatchObject({ + sessionId: bundle.meta.session.id, + specTitle: bundle.brief.title, + }) + } + }) + + it("m1 fixture metadata treats source file as provenance only", async () => { + for (const briefId of M1_FIXTURE_IDS) { + const bundle = await loadM1FixtureBundle(briefId) + + expect(bundle.meta.session.sourceFile).toMatch(/^\//u) + expect(bundle.jsonlPath).toBe( + join(bundle.bundleDir, bundle.meta.artifacts.jsonl), + ) + expect(bundle.jsonlPath).not.toBe(bundle.meta.session.sourceFile) + } + }) +}) + +async function loadM1FixtureBundle( + briefId: typeof M1_FIXTURE_IDS[number], +): Promise { + const bundleDir = join(".brunch-fixtures", briefId, M1_RUN_ID) + const metaPath = join(bundleDir, `${M1_RUN_ID}.meta.json`) + const meta = JSON.parse(await readFile(metaPath, "utf8")) as M1FixtureMeta + const jsonlPath = join(dirname(metaPath), meta.artifacts.jsonl) + const briefPath = join( + ".brunch-fixtures", + "briefs", + `${briefId}-${briefSlug(briefId)}.json`, + ) + const brief = JSON.parse(await readFile(briefPath, "utf8")) as M1Brief + return { bundleDir, jsonlPath, meta, brief } +} + +function briefSlug(briefId: typeof M1_FIXTURE_IDS[number]): string { + return { + "brief-001": "identity-reference", + "brief-002": "state-lifecycle", + "brief-003": "derived-views", + }[briefId] +} + +function summaryForProjection( + projection: ElicitationExchangeProjection, +): M1FixtureMeta["projectionSummary"] { + return { + status: projection.status, + exchangeCount: projection.exchanges.length, + openPrompt: projection.openPrompt !== null, + } +} + function createPersistedSession(): PersistedSessionFixture { const cwd = mkdtempSync(join(tmpdir(), "brunch-jsonl-")) const manager = SessionManager.create(cwd, join(cwd, ".brunch/sessions")) From 38a9940d499130f4791fc05372dbae7ddeba0dab Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 16:35:36 +0200 Subject: [PATCH 6/9] sync op, closing out frontier M2 --- archive/docs/archive/PLAN_HISTORY.md | 1 + memory/PLAN.md | 20 +++++++------- memory/SPEC.md | 40 +++++++++++++++------------- 3 files changed, 33 insertions(+), 28 deletions(-) diff --git a/archive/docs/archive/PLAN_HISTORY.md b/archive/docs/archive/PLAN_HISTORY.md index e561cb51..6fa693c6 100644 --- a/archive/docs/archive/PLAN_HISTORY.md +++ b/archive/docs/archive/PLAN_HISTORY.md @@ -191,3 +191,4 @@ Archived out of `memory/PLAN.md` during `ln-sync` so the live plan keeps only th - [2026-05-08] FE-698 Anthropic scenario adapter — Added a probe-only Anthropic AI SDK adapter behind the existing `PromptScenarioModelAdapter` seam. Web-research prompt scenarios now map rendered prompts to AI SDK system content and rendered context packs to user prompt content under mocked tests, with unsupported providers rejected before model construction. Verified: `npm run verify`. Watch: this is not the shared AI runtime provider seam; OpenRouter/provider-neutral routing, credential UX, Pi, web tools, CLI/UI, persistence, and Brunch mutations remain out of scope. - [2026-05-08] FE-698 prompt scenario execution probe — Web-research prompt scenarios can now execute through an injected fakeable model adapter and serialize `succeeded` / `failed` execution results with raw output or deterministic error text, while no-provider artifacts remain deterministic `not-run` snapshots. Structured parsing is explicitly `not-applicable` for this prose-only web-research path. Verified: `npm run verify`. Watch: real provider adapters, Pi, web tools, CLI/UI, persistence, and mutating Brunch handlers remain out of scope for this foundation slice. - [2026-05-06] Multi-chat substrate + reconciliation needs (FE-697) — `chat` table with one interview chat per spec, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, plus the `reconciliation_need` queue with directed source/target items, narrow `kind`/`status`, partial unique index on open rows, cascade FK. Spec creation inserts spec + interview chat in one transaction; `advanceHead` is transactional. No user-visible change. Verified: `npm run verify` (673 tests) plus manual fixture playback (39 specs / 81 turns / dual-pointer equivalence). A82 / A83 validated for Phase 1. +- 2026-05-20 — **Pre-POC archive and reseed** — razed pre-POC implementation, archived legacy docs and planning memory under `archive/`, tagged `next-baseline`, and reseeded `memory/SPEC.md` and `memory/PLAN.md` from the three canonical POC architecture docs. Phase 3 infra bootstrap was folded into `walking-skeleton` rather than remaining an independent frontier. diff --git a/memory/PLAN.md b/memory/PLAN.md index c580370a..30da4f30 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -73,7 +73,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Acceptance:** `brunch --mode print` and `brunch --mode rpc` boot from the same host setup; the first `session.*` / `workspace.*` RPC handlers are named product methods rather than a generic read gateway; an agent-as-user driver completes at least one brief end-to-end over stdio by responding to elicitation prompts; captured JSONL can be projected into prompt/response elicitation exchanges; a `.jsonl` + `.meta.json` bundle is written under `.brunch-fixtures/`; the first three curated briefs are captured. - **Verification:** Inner — verify gate plus projection-handler unit tests for elicitation exchange ranges. Middle — deterministic first captured run, stdio RPC handler contract tests, replay-regression fixture(s) asserting transcript reproduction/projection parity, and `./runbooks/verify-m1.sh` for store/projection/manual-smoke evidence (SPEC §Oracle Strategy by Loop Tier). Outer — the three-layer fixture model is established in skeleton form here; property and adversarial layers come online as later milestones supply graph/coherence substrates; brief quality and golden-capture representativeness remain explicit human review prompts in the runbook. - **Cross-cutting obligations:** Keep transport mode distinct from agent modes/lenses; do not make print mode select or imply an agent strategy in M1. Keep the captured-run format forward-compatible with later `.graph.json` and `.coherence.json` artefacts; establish exchange projection over Pi JSONL without creating canonical chat/turn tables; keep read/subscription architecture thin — named RPC method families and projection handlers over canonical stores, not a generic read-model platform; this frontier establishes the first layer of the canonical replay/property/adversarial fixture architecture rather than a one-off harness. -- **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L, A12-L +- **Traceability:** R4, R5, R11, R16, R17, R20 / D5-L, D12-L, D13-L, D18-L, D19-L / I3-L, I10-L, I13-L / A1-L, A5-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) - **Current execution pointer:** complete after M1 review fixes; proceed to `jsonl-session-viability`. @@ -85,11 +85,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Kind:** structural - **Status:** done - **Objective:** Prove whether pi `SessionManager` JSONL in `.brunch/sessions/` is rich enough to carry raw assistant/user payloads, Brunch session binding (`brunch.session_binding`), structured elicitation prompt/response entries when needed, other custom entries (`brunch.lens_switch`, `brunch.side_task_result`, `worldUpdate`, `brunch.mention`, `brunch.mention_staleness_hint`), and session-scoped continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. -- **Why now / unlocks:** Validates A2-L and pins D6-L. If JSONL is insufficient, M2 produces a sharply scoped fallback proposal that all later milestones can plan against. -- **Acceptance:** Round-trip reload of a captured session preserves raw payloads byte-equivalent (modulo timestamps); session binding and structured elicitation entries survive; elicitation exchanges can be re-projected from the active branch after reload; all named Brunch custom entries survive, including side-task-result delivery entries when present; continuity metadata survives. If any of these fail, the failure is sharply documented and a fallback path is proposed (project richer substrate / mirror JSONL into richer records / propose pi upstream change). -- **Verification:** Inner — verify gate plus synthetic JSONL projection tests. Middle — JSONL round-trip/property tests for raw payloads, `brunch.session_binding`, structured elicitation entries, active-branch exchange projection, coordinator-created `/new` sessions, and M1 fixture replay parity. Outer — fixture replay parity across the transcript-first run bundle; no new human review was required because brief content and scripted user notes did not change. +- **Why now / unlocks:** Validated the JSONL-first transcript strategy and pinned D6-L for Brunch-supported linear sessions. If JSONL had been insufficient, M2 would have produced a sharply scoped fallback proposal that all later milestones could plan against. +- **Acceptance:** Round-trip reload of a captured linear session preserves raw payloads byte-equivalent (modulo timestamps); session binding and structured elicitation entries survive; elicitation exchanges can be re-projected after reload; all named Brunch custom entries survive, including side-task-result delivery entries when present; continuity metadata survives. Defensive branch-shape tests document Pi substrate behavior, but branch-aware Brunch sessions are not product-supported per D24-L. If core linear-session viability fails, the failure is sharply documented and a fallback path is proposed (project richer substrate / mirror JSONL into richer records / propose pi upstream change). +- **Verification:** Inner — verify gate plus synthetic JSONL projection tests. Middle — JSONL round-trip/property tests for raw payloads, `brunch.session_binding`, structured elicitation entries, defensive branch-shape projection behavior, coordinator-created `/new` sessions, and M1 fixture replay parity. Outer — fixture replay parity across the transcript-first run bundle; no new human review was required because brief content and scripted user notes did not change. - **Cross-cutting obligations:** This frontier is the transcript-side proof for the shared event substrate that later carries structured elicitation entries, session binding, lens switches, side-task results, mentions, and `worldUpdate` without inventing a parallel channel or canonical chat/turn store. JSONL viability must validate sessions created through the `WorkspaceSessionCoordinator`, including the first-entry binding and `/new` same-spec behavior. -- **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L / I3-L, I8-L, I10-L / A2-L, A12-L +- **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L, D24-L / I3-L, I8-L, I10-L, I15-L - **Design docs:** archived [jsonl-session-viability-note](file:///Users/lunelson/Code/hashintel/brunch-next/archive/archive/docs/architecture/jsonl-session-viability-note.md) - **Current execution pointer:** complete; proceed to `web-shell`. @@ -102,10 +102,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Objective:** `brunch --mode web` serves a native Brunch React app (TanStack Router + Query) over one WebSocket-backed JSON-RPC client; no second backend API, REST read model, or browser-owned product runtime is invented; `pi-web-ui` is not used. - **Why now / unlocks:** Proves D10-L. Unlocks parallel UI work and visualises graph + coherence state. Sequenced after M2 so the transcript substrate is pinned before clients depend on it. - **Acceptance:** Web client connects via WebSocket RPC, lists specs and workspace state through `session.*` / `workspace.*` projection handlers, renders a transcript and the persistent chrome region, and round-trips structured elicitation prompts/responses plus freeform user input through the same transcript conventions as TUI. -- **Verification:** Inner gate plus WebSocket/handler contract tests. Middle — manual browser smoke paired with projection/query postconditions for `session.*` / `workspace.*`, transcript rendering state, and structured elicitation round-trip. Outer — at least one fixture replays into the web renderer; qualitative UX remains manual checklist. -- **Cross-cutting obligations:** Preserve the single command/event substrate: the browser is a thin remote head over the same elicitation/transcript/session machinery, not a second data plane, REST-backed read client, generic read gateway, or custom interaction contract. -- **Traceability:** R4, R11, R12, R16, R17 / D5-L, D10-L, D12-L, D13-L, D19-L +- **Verification:** Inner gate plus WebSocket/handler contract tests. Middle — manual browser smoke paired with projection/query postconditions for `session.*` / `workspace.*`, linear transcript-policy guards, transcript rendering state, and structured elicitation round-trip. Outer — at least one fixture replays into the web renderer; qualitative UX remains manual checklist. +- **Cross-cutting obligations:** Preserve the single command/event substrate: the browser is a thin remote head over the same elicitation/transcript/session machinery, not a second data plane, REST-backed read client, generic read gateway, or custom interaction contract. Carry D24-L linear transcript policy forward before adding another session-consuming surface: block Brunch-controlled `/tree`/`/fork`/`/clone` branch flows where Pi hooks permit, and make transcript readers fail fast on non-linear JSONL rather than adapting it. +- **Traceability:** R4, R8, R11, R12, R16, R17 / D5-L, D10-L, D12-L, D13-L, D19-L, D24-L / I15-L - **Design docs:** [prd.md §M3, §Frontend Architecture](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md) +- **Current execution pointer:** first slice should harden the linear transcript policy (block Pi branch flows where hooks permit and make transcript readers reject non-linear JSONL) before adding the browser surface. ### graph-data-plane @@ -260,10 +261,9 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta ## Recently Completed -- 2026-05-21 `jsonl-session-viability` — Done: Pi JSONL reload preserves coordinator-created binding-only sessions, first assistant/user flushes without duplicate prefixes, `/new` same-spec bindings, raw user/assistant payloads, representative Brunch custom entries, context-participating custom messages, continuity/compaction metadata, structured elicitation entries, active-branch exchange projection, and M1 bundle-local replay parity for briefs #1–#3. `session.elicitationExchanges` now projects from Pi's active branch instead of file-linear JSONL. Verified: `npm run verify` after each slice. Watch: M2 validates JSONL as sufficient on current POC terms, but later side-task, mention, and continuity frontiers still own their final payload semantics. +- 2026-05-21 `jsonl-session-viability` — Done: Pi JSONL reload preserves coordinator-created binding-only sessions, first assistant/user flushes without duplicate prefixes, `/new` same-spec bindings, raw user/assistant payloads, representative Brunch custom entries, context-participating custom messages, continuity/compaction metadata, structured elicitation entries, defensive active-branch projection behavior, and M1 bundle-local replay parity for briefs #1–#3. Verified: `npm run verify` after each slice. Watch: M2 validates JSONL as sufficient for Brunch-supported linear sessions on current POC terms; branch-aware Brunch sessions are intentionally unsupported per D24-L, and later side-task, mention, and continuity frontiers still own their final payload semantics. - 2026-05-21 `mode-shell-and-fixture-driver` — Done: print and RPC transport modes boot through the Brunch host; named `workspace.snapshot` and `session.elicitationExchanges` handlers project coordinator-selected session state; fixture capture copies the same selected Pi JSONL session projected by RPC; brief metadata is Brunch-owned and marks graph/coherence artifacts deferred; briefs #1–#3 have scripted deterministic replay bundles under `.brunch-fixtures//scripted-001/`. Verified: `npm run verify`, RPC/print parity smoke, exchange projection tests, fixture replay/projection parity tests, `./runbooks/verify-m1.sh`, and human inspection that briefs/captures/product-shaped outputs are good on their current terms. Watch: M2 used these captured transcripts as JSONL reload evidence without turning them into a parallel chat/turn store; later elicitation work must revisit the encoded interaction logic, expectations, and knowledge-flow assumptions rather than treating the scripted M1 exchange shape as final product behavior. - 2026-05-20 `walking-skeleton` — Done: Brunch now launches through a real pi-backed TUI boot path with coordinator-first spec gating, project-local `.brunch/` state, self-describing Pi JSONL sessions via exactly one `brunch.session_binding`, same-spec `/new` coverage, persistent cwd / spec / phase / chat-mode chrome through pi's extension widget seam, a bin shim, store-only runbook checker, and type-ownership hardening against Pi exported types. Verified: `npm run verify`, manual TUI smoke in a scratch project, automated TUI/coordinator tests, store-only runbook oracle, and manual file inspection. Watch: M1 should reuse the coordinator/session truth rather than recreating boot/session mechanics. -- 2026-05-20 `pre-poc-archive-and-reseed` — Done: razed pre-POC implementation, archived legacy docs and planning memory under `archive/`, tagged `next-baseline`, reseeded `memory/SPEC.md` and `memory/PLAN.md` from the three canonical POC architecture docs. Verified: `git log --oneline` shows three clean buckets; `archive/` contains all prior material. Watch: Phase 3 infra bootstrap is folded into `walking-skeleton`, not a separate frontier. Older history: `archive/docs/archive/PLAN_HISTORY.md` diff --git a/memory/SPEC.md b/memory/SPEC.md index 21ff61ca..753ff354 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -30,6 +30,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - Do not solve mid-turn distributed consistency; the contract is turn-boundary clean only. - Do not reuse `pi-web-ui` for the browser surface; the web client is a native Brunch React app. - Do not expose a generic `records.*` data model. The vocabulary is graph-native (`graph.*`, `intent.*`, `oracle.*`, `design.*`, `plan.*`) or session-native (`session.*`). +- Do not support Pi's in-place session branching (`/tree`) or branch-derived replacement flows (`/fork`, `/clone`) as Brunch product behavior in the POC. Branch-aware continuity, staleness, and coherence are deferred; Brunch-controlled flows should block branch creation/navigation, and Brunch transcript readers should reject branched JSONL rather than flattening or adapting it. - Do not build a generic read-model platform, REST read API, DB-backed chat/turn projection, or canonical cross-store event spine just to keep clients synchronized. Prefer thin named RPC method families and projection handlers over canonical stores. - Do not require TUI or agent internals to serialize through JSON-RPC when they can call the same handlers in-process; sameness of handlers matters more than sameness of transport. - Do not adopt Flue as the harness substrate. Stay on `pi-coding-agent`; adopt Flue *patterns* (sandbox abstraction, remote-deploy shape, MCP adapter) selectively, post-POC. @@ -51,7 +52,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c #### Persistence & data model 7. Brunch must store spec-workspace graph truth in SQLite-backed graph-native persistence. -8. Brunch must prove that transcript persistence is rich enough for raw assistant and user payloads, session binding, structured elicitation entries, and continuity metadata — using pi JSONL sessions if sufficient, or a justified fallback otherwise. +8. Brunch must prove that transcript persistence is rich enough for raw assistant and user payloads, session binding, structured elicitation entries, and continuity metadata — using pi JSONL sessions if sufficient, or a justified fallback otherwise. For the POC, Brunch-supported Pi JSONL sessions are linear and coordinator-bound; branch-aware transcript semantics are unsupported until explicitly designed. 9. Brunch must treat the intent graph as canonical specification meaning, with oracle, design, and plan graphs as accountable downstream planes. #### Mutation, transport & subscriptions @@ -81,10 +82,11 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c ### Open Assumptions + + | # | Assumption | Confidence | Status | Depends on | Validation approach | | --- | --- | --- | --- | --- | --- | | A1-L | `pi-coding-agent` exposes enough seams (services, custom message roles, `prepareNextTurn`, `transformContext`, RPC mode, JSONL sessions, extension UI surface) to host all M0–M9 capabilities without forking pi. | high | open | D1-L | M0–M2: walking skeleton + mode shell + JSONL viability prove the substrate. | -| A2-L | pi JSONL sessions can faithfully hold raw assistant/user payloads, Brunch session binding, structured elicitation entries, and continuity metadata (`lastSeenLsn`, interest sets, compaction anchors) through reload. | high | validated | D6-L, I3-L | M2 JSONL viability tests preserve coordinator-created bindings, raw user/assistant payloads, representative custom entries/messages, continuity metadata, structured elicitation entries, and M1 fixture replay parity. Later frontiers still own final payload semantics for side tasks, mentions, and continuity. | | A3-L | A single Brunch-owned command layer (with optimistic concurrency, validation, audit, and coherence triggers) is sufficient for both agent and human writers across all four modes for the POC's graph scale. | medium | open | D4-L | M4 + M5 + M6: graph plane, agent-↔-graph wiring, and authority tiers all routed through the same surface. | | A4-L | A monotonic global LSN per commit (one-LSN-per-transaction) is adequate for change-log replay, reconciliation-need ordering, and mention staleness without per-row vector clocks. | high | open | I1-L, I4-L | M4 + M7: replay fidelity and `worldUpdate` ordering tests. | | A5-L | An agent-as-user driver running over JSON-RPC stdio can produce regression-quality fixtures across a curated brief library. | medium | open | D5-L | M1 — first replay-regression fixtures land. | @@ -94,7 +96,6 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | A9-L | A session-scoped mention ledger of (`entity_id`, `snapshotted_lsn`) is the right granularity for staleness hints; transcript-scoped or graph-scoped ledgers are not needed for the POC. | low | open | I7-L | M7 — turn-boundary reconciliation slice; observed via fixture runs that stress re-read decisions. | | A10-L | A persistent TUI chrome region showing cwd / spec / phase / chat-mode can be added on top of `pi-tui`'s root layout without modifying pi. | medium | open | D2-L | M0 — walking skeleton attempts to mount the chrome; escalates to a pi upstream issue only if blocked. | | A11-L | Pi's `prepareNextTurn` plus custom-message delivery are sufficient to express side-task result delivery without inventing a second event plane or forking pi. | medium | open | D15-L | M5 + M7: side-task registry wiring and next-turn delivery proof. | -| A12-L | Pi JSONL branch entries have enough stable ids, roles, and custom-entry support to project elicitation exchanges from system/assistant spans plus user-response spans without mandatory markers around every prompt. | high | validated | D12-L, D13-L, I10-L | M1–M2 captures and active-branch projection tests prove prompt/response exchange projection across reload, including branch exclusion and structured custom prompt/response entries. | | A13-L | A durable observer-job queue keyed by session id and elicitation-exchange entry range can recover async extraction after process interruption without reintroducing canonical chat/turn tables; whether this shares storage with a generalized work-item/reconciliation table can be deferred. | medium | open | D18-L, I14-L | M5: observer extraction tests exercise restart/idempotence once graph writes exist. | ### Active Decisions @@ -126,10 +127,11 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c #### Persistence -- **D6-L — JSONL-first transcript persistence in `.brunch/sessions/`; SQLite-backed graph persistence in `.brunch/`.** Two durability surfaces with distinct responsibilities. Transcript starts on pi `SessionManager` redirected to the project-local directory; graph plane is SQLite from M4. Brunch does not recreate canonical `chat` or `turn` tables while Pi JSONL remains viable. Depends on: A2-L. Supersedes: —. +- **D6-L — JSONL-first transcript persistence in `.brunch/sessions/`; SQLite-backed graph persistence in `.brunch/`.** Two durability surfaces with distinct responsibilities. Transcript starts on pi `SessionManager` redirected to the project-local directory; graph plane is SQLite from M4. Brunch does not recreate canonical `chat` or `turn` tables while Pi JSONL remains viable for Brunch-supported linear sessions. Validated by M2. Supersedes: —. - **D15-L — Side tasks are a first-class Brunch subsystem delivered through the same transcript/event substrate.** Background sub-agents are tracked by a Brunch-owned `SideTaskRegistry`; results are never injected mid-turn and instead arrive at the next-turn boundary through the existing custom-message plus `prepareNextTurn` path. Side-task writes remain subject to the same command-layer authority as primary-agent writes. Depends on: A11-L, D4-L. Supersedes: —. - **D16-L — Graph persistence uses Drizzle over `better-sqlite3`, with one-LSN-per-commit and no bypass paths.** The command layer owns precondition checks, structural validation, entity writes, LSN allocation, change-log append, and any coherence updates inside one transaction. This rule applies equally to migrations and maintenance code; there is no privileged write path outside the command-executor protocol. Depends on: A3-L, A4-L. Supersedes: —. - **D18-L — Observer extraction is exchange-keyed durable work, not a chat/turn store.** After a user response closes an elicitation exchange, Brunch may enqueue an observer job keyed by session id plus exchange entry ids; jobs survive process restart and graph writes still route through the command layer. Routine observer jobs are operational queue state, not reconciliation needs by default; low-confidence or conflicting findings may create reconciliation needs. Depends on: A13-L, D4-L, D13-L, D16-L. Supersedes: the old DB-backed `chat` / `turn` mental model. +- **D24-L — Brunch POC enforces a linear transcript policy over Pi JSONL.** Pi's session tree is a substrate capability, not a Brunch product surface. Until branch-aware continuity/coherence is explicitly designed, Brunch-controlled interactive/runtime flows block `/tree`, `/fork`, and `/clone` through the thinnest available Pi hooks; transcript readers reject non-linear session files instead of flattening, adapting, migrating, or selecting a branch. This is intentional fail-fast pre-release posture: avoid compatibility debt with Pi internals or earlier Brunch revisions, and keep wrapper/adapter layers minimal. Depends on: D6-L, D11-L, D13-L. Supersedes: treating active-branch projection as Brunch product semantics. #### Interaction & UI shape @@ -137,7 +139,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D21-L — Workspace session coordination is the spec/session boot seam.** Brunch owns a narrow `WorkspaceSessionCoordinator` for boot, spec selection, selected-session reopening, and `/new` session creation. It is the only product module allowed to create or open Pi sessions for Brunch user flows and the only module allowed to write `brunch.session_binding`; callers receive `ready | select_spec | needs_human` workspace-session state and never mutate a session's bound spec. The coordinator hides `SessionManager.create/open/continueRecent(cwd, ".brunch/sessions/")`, internal session-start binding for pi-created replacement sessions, `.brunch/state.json` current-spec and current-session-file acceleration, binding validation, and chrome-state derivation. Because pi defers appending session JSONL until an assistant message exists, the coordinator flushes Brunch's binding when it is created, refreshes it at `before_agent_start`, and performs the final pre-assistant flush from Brunch's internal assistant `message_start` hook after pi has persisted the user message but before assistant persistence; each flush reloads the session file so pi's next assistant append does not duplicate the already-written prefix. Depends on: D6-L, D11-L. Supersedes: the loose `SpecRegistry` + caller-orchestrated session-binding mental model. - **D22-L — M0 TUI chrome rides pi's extension UI widget seam.** Brunch's initial persistent chrome is mounted by an internal Brunch extension using pi's public `ExtensionUIContext.setWidget(..., { placement: "aboveEditor" })`, while spec selection remains a Brunch-owned boot gate before `InteractiveMode.run()`. Brunch does not fork pi, monkeypatch `InteractiveMode`, or expose generic pi extension configuration to users for M0 chrome. Depends on: A10-L, D2-L, D21-L. Supersedes: private-header/monkeypatch approaches for M0 chrome. - **D12-L — Elicitation-first interaction, transcript-native structured prompts.** Brunch treats system/assistant prompts and user responses as Pi transcript truth. Structured action/choice/freeform surfaces may be represented by Brunch custom entries when needed, but there is no DB-owned prompt/response entity; at idle, the session waits on a system/assistant-originated elicitation prompt. Depends on: D6-L, D11-L. Supersedes: —. -- **D13-L — Capture-aware elicitation exchange projection.** Observer extraction consumes derived elicitation exchanges: a prompt-side span (all system/assistant/tool-side entries since the previous user response, including any structured/internal prompt content) plus a response-side span (user text and/or structured action entries). Role/span alternation is the default projection; typed markers are added only where structure/actions need deterministic replay. Depends on: A12-L, D12-L. Supersedes: —. +- **D13-L — Capture-aware elicitation exchange projection.** Observer extraction consumes derived elicitation exchanges: a prompt-side span (all system/assistant/tool-side entries since the previous user response, including any structured/internal prompt content) plus a response-side span (user text and/or structured action entries). Role/span alternation is the default projection in Brunch-supported linear sessions; typed markers are added only where structure/actions need deterministic replay. Depends on: D12-L, D24-L. Supersedes: —. - **D14-L — `#`-mentions are ID-anchored, with a session-scoped mention ledger.** Autocomplete may resolve by title but insertion always rewrites to ID-anchored. Per-session `(entity_id, snapshotted_lsn)` ledger drives discretionary `brunch.mention_staleness_hint` entries in `prepareNextTurn`. Depends on: A9-L, I4-L. Supersedes: —. ### Critical Invariants @@ -146,18 +148,19 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | --- | --- | --- | --- | | I1-L | One global LSN per commit; every change-log entry, graph-node version, and reconciliation-need carries an LSN strictly monotonic with the global clock. | planned (M4 invariant tests) | D4-L, D6-L, D8-L | | I2-L | All durable graph mutations originate from the Brunch command layer; no caller bypasses validation, audit, or coherence triggering. | planned (M5 architectural test + lint rule) | D4-L | -| I3-L | Transcript reload reproduces raw assistant/user payloads plus Brunch session binding, structured elicitation entries, and other custom transcript entries byte-equivalently (modulo timestamps). | covered (M2 JSONL viability round-trip tests) | A2-L, D6-L | +| I3-L | Transcript reload reproduces raw assistant/user payloads plus Brunch session binding, structured elicitation entries, and other custom transcript entries byte-equivalently (modulo timestamps). | covered (M2 JSONL viability round-trip tests) | D6-L | | I4-L | For every `worldUpdate` entry, all named graph items have LSNs strictly greater than the session's pre-update `lastSeenLsn`. | planned (M7 property test) | D6-L, I1-L | | I5-L | For every `brunch.lens_switch` entry and every session/spec binding transition, the session interest set is recomputed before the next agent turn. | planned (M7 property test) | D11-L | | I6-L | Every reconciliation need has `created_at_lsn ≤` current global LSN; `kind='impasse'` needs reference at least two graph nodes; resolved needs carry a strictly later `resolved_at_lsn`. | planned (M8 property test) | D8-L, I1-L | | I7-L | Every `framing_as` value belongs to the allowed matrix for that node's base kind. | planned (fixture property check) | D7-L | -| I8-L | Spec selection persists across pi `switchSession` (i.e. `/new`); the selected session file is reopened consistently by headless projection/capture paths; each session has exactly one `brunch.session_binding`, and a session's bound spec never changes. | partially covered (M0 coordinator/TUI boot integration tests + store-only runbook checker; M1 no-injected-coordinator capture regression; manual TUI smoke and JSONL reload viability still planned) | D11-L, D21-L | +| I8-L | Spec selection persists across pi `switchSession` (i.e. `/new`); the selected session file is reopened consistently by headless projection/capture paths; each session has exactly one `brunch.session_binding`, and a session's bound spec never changes. | partially covered (M0 coordinator/TUI boot integration tests + store-only runbook checker; M1 no-injected-coordinator capture regression; M2 coordinator-created JSONL reload tests; manual TUI smoke still planned) | D11-L, D21-L | | I9-L | Every `brunch.mention` payload is anchored to a stable `id`; the ledger never stores title-anchored references. | planned (M7 invariant) | D14-L | -| I10-L | Structured elicitation prompts/responses live in the Pi transcript when structure is needed; elicitation exchanges are projected from the active branch, and no parallel canonical chat/turn table carries elicitation state. | covered (M1 exchange projection tests + M2 active-branch JSONL/RPC projection tests) | D12-L, D13-L, D18-L | +| I10-L | Structured elicitation prompts/responses live in the Pi transcript when structure is needed; Brunch-supported elicitation exchanges are projected only from linear coordinator-bound sessions, and no parallel canonical chat/turn table carries elicitation state. | covered for projection shape (M1 exchange projection tests + M2 JSONL/RPC projection tests); linearity enforcement planned with D24-L hardening | D12-L, D13-L, D18-L, D24-L | | I11-L | No durable graph mutation path — including migrations, maintenance scripts, observer-job writes, or side-task-attributed writes — may bypass the `CommandExecutor` path that performs authority/result classification, version checks, structural validation, transaction execution, LSN allocation, and change-log append. | planned (M4 architectural + migration invariants; M5 caller-boundary tests) | D4-L, D15-L, D16-L, D20-L | | I12-L | Side-task results are delivered only at turn boundaries; no side-task result may steer or mutate the active turn outside the next-turn delivery path. | planned (M7 side-task delivery invariant) | D15-L | -| I13-L | At any idle session branch leaf, the latest unresolved interaction state is system/assistant-originated: user input is a response to an elicitation prompt, not ambient chat. | planned (M1 fixture + transcript projection tests) | D12-L | +| I13-L | At any idle linear session leaf, the latest unresolved interaction state is system/assistant-originated: user input is a response to an elicitation prompt, not ambient chat. | planned (M1 fixture + transcript projection tests) | D12-L, D24-L | | I14-L | Observer jobs are keyed by session id plus elicitation-exchange entry-range ids and have durable status; replay/restart cannot enqueue duplicate observer jobs for the same exchange. | planned (M5 observer queue tests) | D18-L, D4-L | +| I15-L | Brunch-controlled flows do not create or navigate Pi session branches, and Brunch transcript readers fail fast on non-linear JSONL rather than flattening, migrating, or branch-selecting. | planned (linear transcript policy guard tests before/within M3 web-shell) | D24-L, D6-L, D11-L, D13-L | ## Future Direction Register @@ -185,7 +188,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - Browser, RPC driver, TUI, and agent tools should share named Brunch handlers. Transports adapt those handlers; they do not define product semantics. - Live client views should use subscriptions over the same RPC method families rather than pair REST GETs with a separate event channel. - Query/subscription helpers may exist as implementation conveniences, but they must remain subordinate to concrete product methods (`session.*`, `workspace.*`, `graph.*`, `coherence.*`) and must not become a generic platform Brunch now owns. -- Initial POC read methods should stay close to current needs: transcript, elicitation-exchange projection, chrome/workspace state, and later graph/coherence projections. +- Initial POC read methods should stay close to current needs: linear transcript validation, elicitation-exchange projection, chrome/workspace state, and later graph/coherence projections. ### Elicitation UI primitive choice @@ -205,7 +208,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Lens** | A narrower interpretive or task perspective applied within or alongside an agent mode, such as technical-design, verification-design, or disambiguation. Lenses may eventually be driven by skills, but are not part of M1 transport-mode proof. | | **Print snapshot** | The M1 meaning of the print transport mode: boot the Brunch host, resolve workspace/spec/session state through the coordinator, render product-shaped state, and exit without running an agent turn. | | **Spec** | A specification workspace, identified by its intent-graph root. Lives under `.brunch/`. Multiple specs may coexist per project. | -| **Session** | An elicitation transcript belonging to one spec. Backed by a pi JSONL session under `.brunch/sessions/`. A spec may have many sessions over time; a session never changes specs. | +| **Session** | An elicitation transcript belonging to one spec. Backed by a linear pi JSONL session under `.brunch/sessions/`. A spec may have many sessions over time; a session never changes specs. Pi branch/tree mechanics are unsupported Brunch product behavior in the POC. | | **Session binding** | The first Brunch custom entry in a session that binds the Pi session id to exactly one spec id and schema version. Makes JSONL self-describing; registry/index state is an acceleration, not the canonical binding. | | **Workspace session coordinator** | The Brunch boot seam that returns `ready | select_spec | needs_human` workspace-session state for a cwd/mode, owns spec selection, selected-session reopening, and `/new`, creates/opens Pi sessions through `SessionManager`, writes `brunch.session_binding`, persists current spec/session acceleration in `.brunch/state.json`, and derives chrome state for callers. | | **Workspace state hierarchy** | `cwd → spec → session`. Each level scopes the one below it; spec is selected before any agent loop runs and persists across `/new`. | @@ -224,9 +227,9 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Subscription** | A long-lived RPC operation that delivers live updates, often with an initial snapshot, for views that must stay current with session, workspace, graph, or coherence state. | | **Transport adapter** | The stdio, WebSocket, HTTP-shim, or in-process wrapper around the same Brunch handlers. Transport adapters do not own product semantics. | | **Canonical store** | The persistence surface that owns a fact: Pi JSONL for session transcript truth, `.brunch/state.json` for lightweight workspace binding state, SQLite graph/change log for graph truth and coherence substrates. | -| **Elicitation prompt** | System- or assistant-originated transcript span that prompts/directs the user's next response. At idle, the active session branch ends with an unresolved elicitation prompt. | +| **Elicitation prompt** | System- or assistant-originated transcript span that prompts/directs the user's next response. At idle, a Brunch-supported linear session ends with an unresolved elicitation prompt. | | **User response** | User-originated text and/or structured action selection responding to the current elicitation prompt. There is no ambient chat input in the POC model. | -| **Elicitation exchange** | A derived projection over Pi JSONL: prompt-side span (system/assistant/tool-side entries since the prior user response) plus response-side span (the user's text and/or structured action entries). This is the observer's default extraction unit. | +| **Elicitation exchange** | A derived projection over Brunch-supported linear Pi JSONL: prompt-side span (system/assistant/tool-side entries since the prior user response) plus response-side span (the user's text and/or structured action entries). This is the observer's default extraction unit. | | **Structured elicitation entry** | Optional Brunch custom transcript entry used when an elicitation prompt or response carries actions, choices, or other deterministic UI structure. Plain generative prompts can remain ordinary Pi messages. | | **Observer job** | Durable async work item keyed by session id and elicitation-exchange entry-range ids. It analyzes an exchange for graph mutations or low-confidence suggestions, and survives process restart. | | **Lens switch** | A durable `brunch.lens_switch` transcript entry recording that the active agent/session changed lenses. The switch event is distinct from the lens concept itself. | @@ -287,10 +290,10 @@ Infrastructure is not yet fully laid (Phase 3 of POC bootstrapping). Commands fo | Inner | Type-aware lint, type checks, fast unit tests | Local module correctness, typed command/result shapes, projection helper behavior. | D12-L, D13-L, D20-L, D21-L. | | Inner | Schema/shape validation at boundaries | JSON-RPC payloads, command results, structured elicitation entries, fixture metadata, graph exports. | R8, R10, R11, R17; I3-L, I10-L, I11-L. | | Middle | **Runbook oracles**: prose manual actions plus executable postcondition checkers | Interactive seams leave correct durable state. Early M0 checkers may inspect stores only; once handlers exist, prefer projection-including checks. | D11-L, D21-L; I8-L, I13-L; A10-L. | -| Middle | Round-trip tests | JSONL reload, elicitation exchange projection, compaction, graph export/import, command result serialization. | A2-L, A12-L; I3-L, I8-L, I10-L. | +| Middle | Round-trip tests | JSONL reload, linear transcript validation, elicitation exchange projection, compaction, graph export/import, command result serialization. | D6-L, D13-L, D24-L; I3-L, I8-L, I10-L, I15-L. | | Middle | Property-based / model-based tests | LSN monotonicity, change-log replay, reconciliation-need invariants, mention staleness, interest-set recomputation, side-task delivery ordering. | A4-L, A8-L, A9-L, A11-L; I1-L, I4-L, I5-L, I6-L, I9-L, I12-L. | | Middle | Contract tests | Named RPC method families and transport adapters share handler semantics; subscriptions deliver initial snapshot plus ordered updates; `CommandExecutor` hides policy/transaction details. | D5-L, D19-L, D20-L; R11, R12. | -| Middle | Architectural boundary tests | No direct ORM/SQLite mutation outside `CommandExecutor`; no canonical chat/turn store; TUI/RPC/fixture code does not write `brunch.session_binding`. | D4-L, D6-L, D18-L, D21-L; I2-L, I10-L, I11-L. | +| Middle | Architectural boundary tests | No direct ORM/SQLite mutation outside `CommandExecutor`; no canonical chat/turn store; TUI/RPC/fixture code does not write `brunch.session_binding`; Brunch wrappers do not expose Pi branch creation/navigation as product behavior. | D4-L, D6-L, D18-L, D21-L, D24-L; I2-L, I10-L, I11-L, I15-L. | | Middle | Fixture replay and property assertions | Brief-driven sessions still produce structurally valid transcript/graph/coherence artifacts despite model drift. | A5-L, A6-L, A7-L; I7-L; R20. | | Outer | Manual walkthrough with checklist | UX/presentation life: TUI chrome, spec selector, web shell feel, coherence visibility, elicitation usefulness. | A10-L; R4, R14, R16. | | Outer | Adversarial / generative fixture probes | Elicitation quality, human-gated `needs_human`, contradictory requirements, cross-session updates, long-horizon compaction. | A5-L, A8-L, A9-L, A11-L; I4-L, I6-L, I12-L, I13-L. | @@ -319,16 +322,17 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | I7-L | M4+ schema/property tests over framing matrix plus brief fixture assertions. | | I8-L | M0 runbook oracle plus M2 coordinator-created JSONL reload tests. | | I9-L | M7 mention parser/ledger unit tests and staleness property tests. | -| I10-L | M1/M2 exchange projection tests, active-branch JSONL fixture replay, and no chat/turn architectural test. | +| I10-L | M1/M2 exchange projection tests, linear transcript validation, and no chat/turn architectural test. | | I11-L | M4/M5 no-bypass architectural test plus command transaction integration tests. | | I12-L | M7 side-task delivery invariant tests and adversarial fixture when side tasks are active. | -| I13-L | M1 fixture/projection checks for idle branch leaf state. | +| I13-L | M1 fixture/projection checks for idle linear-session leaf state. | | I14-L | M5 observer-job restart/idempotence tests. | +| I15-L | Brunch extension/runtime guard tests for `/tree`/`/fork`/`/clone` blocking plus transcript-reader non-linearity rejection tests. | ### Design Notes - **Deterministic before generative.** M1 should prefer a deterministic or tightly scripted user-agent path for the first captured run before relying on LLM persona variance. Generative/adversarial probes come after the transcript and fixture substrate is trusted. M1 scripted captures prove the transport/projection/fixture substrate on its current terms; they do not settle the final elicitation interaction logic, knowledge flow, or prompt/response expectation model. -- **Projection handlers are oracles, not stores.** Read/subscription tests should prove handlers reconstruct truth from Pi JSONL, `.brunch/state.json`, or SQLite graph/change log; they should not introduce a canonical view-store just for testing. +- **Projection handlers are oracles, not stores.** Read/subscription tests should prove handlers reconstruct truth from Brunch-supported linear Pi JSONL, `.brunch/state.json`, or SQLite graph/change log; they should not introduce a canonical view-store just for testing. - **Behavioral quality boundary.** Inner/middle loops prove structural validity, durable state, invariants, and expected graph/property coverage. “Good interview”, “good question”, and “coherent UX feel” remain outer-loop checklist/generative-fixture judgments until enough examples justify sharper metrics. - **Subscriptions are scoped for the POC.** Initial subscription oracles should prove initial snapshot plus ordered live updates. Reconnect/resume semantics are acknowledged but deferred unless a frontier explicitly depends on them. From 8ff44489858061d095fcdcb0c4d8aa781d671411 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 17:03:48 +0200 Subject: [PATCH 7/9] major agent-mode/-lens design session spec captures Amp-Thread-ID: https://ampcode.com/threads/T-019e4a8e-0210-76fe-9f11-8aece6190fc9 Co-authored-by: Amp --- docs/design/ELICITATION_LENSES.md | 220 ++++++++++++++++++++++++++++++ docs/design/REVIEW_SETS.md | 197 ++++++++++++++++++++++++++ memory/PLAN.md | 49 ++++--- memory/SPEC.md | 89 +++++++++--- 4 files changed, 521 insertions(+), 34 deletions(-) create mode 100644 docs/design/ELICITATION_LENSES.md create mode 100644 docs/design/REVIEW_SETS.md diff --git a/docs/design/ELICITATION_LENSES.md b/docs/design/ELICITATION_LENSES.md new file mode 100644 index 00000000..ec279442 --- /dev/null +++ b/docs/design/ELICITATION_LENSES.md @@ -0,0 +1,220 @@ +# Elicitation Lenses + +Long-form companion to `memory/SPEC.md` decisions D25-L (lens within elicitor), D26-L (extractive/generative split), D30-L (grounding gate + density + epistemic honesty), and D31-L (meta-rubric heuristic). For the review-set mechanism that generative lenses depend on, see [REVIEW_SETS.md](REVIEW_SETS.md). + +SPEC is the authoritative register; this document is the rationale and texture. + +## Anchoring concepts + +### Fan-out / fan-in as unifying pattern + +Three product-level flows share a structure: + +| Flow | Object of variation | Fan-in move | +|---|---|---| +| **candidate-spec** | *territory* — alternative problem framings | mostly pick one (framings have internal coherence; cherry-picking produces incoherent specs) | +| **technical-design** | *map* — module shapes / seams interior to a chosen territory | synthesis is legitimate (combine insights across alternatives) | +| **verification-design** | *gauges* — oracle ensembles judging a chosen territory + map | compose (oracles are additive; redundancy across families is a feature) | + +All three are "design-it-twice" moments where the agent's job is not to optimize a single answer but to **make variation legible** so the user can recognize what they value. That is structurally different from a quiz flow: the user is not supplying answers they already hold; they are recognizing preferences against rendered alternatives. + +### Lens vs agent-mode + +D23-L distinguishes: + +- **Agent-mode** — coarse operational strategy: `elicitor`, `observer`, `reviewer`, `reconciler` (and future `generalist`). +- **Lens** — a narrower interpretive perspective applied within an agent-mode. + +The strategies described here (`step-by-step`, `disambiguate-via-examples`, `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, `project-requirements-from-upstream`) are all **lenses within the `elicitor` agent-mode**. `observer` and `reviewer` are agent-modes in their own right (async background roles), not lenses. + +## Lens catalogue (starter set) + +Lenses split into two families by capture mechanism. The **family distinction** is the durable architectural commitment (D26-L); the specific lens list is expected to evolve. + +### Extractive lenses + +Produce single-exchange interactions; the `observer` agent-mode extracts implicit info post-exchange. + +- **`step-by-step`** — agent asks one focused question at a time +- **`disambiguate-via-examples`** — agent surfaces contrastive examples to force a discriminating user response (see [Behavioral Kernels](./BEHAVIORAL_KERNELS.md)) + +### Generative lenses + +Produce batch proposals carrying structured entity-draft payloads; the elicitor captures the proposal at proposal time; the `reviewer` agent-mode analyzes post-acceptance. + +- **`propose-scenarios-with-tradeoffs`** — candidate-spec flow at the territory level +- **`propose-design-shapes`** — technical-design flow at the map level +- **`propose-oracle-ensembles`** — verification-design flow at the gauges level +- **`project-requirements-from-upstream`** — derive requirements / acceptance criteria as a batch from upstream graph material + +## Grounding and density + +### The grounding bundle + +Generative lenses require a minimum bundle of session-level anchors before they can produce non-speculative output: + +| Anchor | Question it answers | +|---|---| +| **Domain** | What kind of thing is being built? | +| **Protagonist** | Who is this for? | +| **Pain / pull** | What's the friction or aspiration motivating it? | +| **Constraint** | What's binding (time, regulatory, integration, organizational, technical)? | + +Each anchor is fillable in a sentence. The constraint anchor is where volunteered technical constraints land — caught and held as boundary conditions, not refused. With the bundle in place, the agent has **legitimate axes to vary on** when fanning out (different protagonists as primary, different pains framed as central, different constraints as binding). + +### Lens is always available — output scales with density + +The lens itself is never gated. A user can request a generative lens at any density. What scales is the **rendering resolution** of the output and the **epistemic-status** signaling on it: + +| Spec density | Mode of generative output | Per-alternative artifact resolution | +|---|---|---| +| Empty / thin (no grounding bundle yet) | **framing proposals** (Shape Up pitches) | low — name, 200–400 word pitch, breadboard sketch, fat-marker anchor scenario; `inferred` epistemic status; explicit "let's ground more before committing" suggestion | +| Moderate (some intent-graph nodes exist) | **scenario sketches** | medium — concrete situations the framing centers, plus which existing nodes get foregrounded/recontextualized | +| Rich (substantial intent-graph) | **completion proposals** | high — specific node/edge fills with rationale, gap analysis | +| Mature (full spec exists) | **refactor proposals** | high — alternative re-framings of existing material, presented as diffs | + +The same lens (`propose-scenarios-with-tradeoffs`) produces fundamentally different artifacts at different densities. The agent diagnoses which mode is appropriate; the user can override ("propose at lower resolution; I want framings again"). + +### Why a gate isn't a refusal + +This design sidesteps two failure modes: + +- **(A) User demands the impossible** — without grounding, there's nothing to ground a candidate-spec on. The agent could refuse, but that introduces friction and reads as gating. +- **(B) System gates and refuses** — refusing creates the impression that the agent "decided" the user can't have what they want. + +The resolution: the lens is always available. The agent produces *some form* of what was asked for, with epistemic-status honestly reflecting how much weight to put on it. The user gets traction immediately; the system stays honest. + +## Epistemic-status signaling + +Generative-lens outputs carry an `epistemic_status` field (`inferred | assumed | asserted | observed`) per the existing lexicon entry. Status is set based on grounding density at proposal time: + +| Grounding density | Default epistemic status of generative output | +|---|---| +| empty | `inferred` | +| thin (1–2 anchors) | `inferred` or `assumed` | +| moderate (3 anchors) | `assumed` | +| rich (all 4 anchors plus some intent-graph) | `asserted` | +| mature | `observed` where backed by graph entities; `asserted` for novel projections | + +UI renderings of low-status proposals should *feel* speculative: visible hedging marks, lower visual weight, explicit "speculative — based on N anchors so far" footers. This is a **presentation contract** (I17-L), not just a metadata field. + +## Scenario uses + +Scenarios are a recurring rendering primitive across lenses with three distinguishable uses: + +| Use | Role | Where it appears | Persistence | +|---|---|---|---| +| **Anchor scenario** | illustrates a single framing / option from inside it | embedded in a pitch or option preview | transcript-rendered, not persisted as graph entity | +| **Contrastive scenario** | distinguishes two options from each other | comparison UI | transcript-rendered | +| **Probing scenario** | forces the user to react to disambiguate intent | interactive elicitation prompt | transcript-rendered; user response persists per existing elicitation mechanics | + +All three share a shape: a particular vignette, deliberately under-specified at the boundaries (fat-marker), illustrative not prescriptive, carrying an implicit "vs not-this". A scenario-entry primitive may eventually be worth extracting as a typed custom entry; for now scenarios live as transcript content with role distinguished by context. + +**Terminology guard.** Scenarios (user-facing, runtime) are distinct from **briefs** (`.brunch-fixtures/briefs/`, dev-only inputs for the agent-as-user fixture driver). Briefs are testing infrastructure; scenarios are product surface. Do not conflate. + +## Establishment offers and intent hints + +Two transcript-native custom entries make lens-driven elicitation inspectable and routable. + +### `brunch.establishment_offer` + +Emitted by the elicitor at points where a next move is offered to the user. Approximate shape: + +```text +{ + customType: "brunch.establishment_offer", + payload: { + perceived_gaps: ["protagonist not anchored", "no binding constraint volunteered"], + offered_strategies: [ + { lens: "step-by-step", preview: "..." }, + { lens: "propose-scenarios-with-tradeoffs", preview: "...", caveat: "grounding is thin" } + ], + recommended_lens: "step-by-step", + confidence: "asserted" + } +} +``` + +Architecturally analogous to `worldUpdate` (D17-L): a durable custom transcript entry summarizing agent-side reasoning for downstream consumption. Properties: + +- **Inspectable post-hoc** without changing the agent's runtime model (the agent still derives next moves from transcript + tools + lens + sys prompt) +- **Fixture-able** — golden captures can assert that certain offers appeared under certain grounding conditions +- **Chrome-feedable** — ambient affordances render from the most recent establishment-offer entry +- **Forward-compatible** — if establishment-needs ever graduates to a graph-side substrate later (alongside reconciliation needs per D8-L), the transcript history is the migration source (A15-L) + +### `brunch.elicitor_intent_hint` + +Emitted alongside a prompt or proposal, declaring lens and semantic targets (e.g. expected ontological sub-type) for downstream observer/reviewer routing. Approximate shape: + +```text +{ + customType: "brunch.elicitor_intent_hint", + payload: { + lens: "step-by-step", + semantic_target: { kind: "intent_node", framing_as: "problem" } + } +} +``` + +Observer/reviewer routing filters on `lens` (I18-L) to decide which agent-mode consumes the exchange: + +- Extractive lens → observer job enqueued on exchange completion +- Generative lens → no observer; reviewer job enqueued on batch acceptance + +Both consume `semantic_target` as extraction guidance when present. + +## Meta-rubric heuristic (D31-L) + +Comparison rubrics for fan-out alternatives across all three flows attempt to express each axis in terms of four meta-axes: + +| Meta-axis | What it asks | +|---|---| +| **Legibility / cost-of-knowing** | How much must you carry in your head to use this? | +| **Failure modes** | How does this go wrong? | +| **Coverage / range** | What's covered vs left out? | +| **Commitment** | What does picking this lock in downstream? | + +Per-flow instantiation: + +| Meta-axis | candidate-spec | technical-design | verification-design | +|---|---|---|---| +| Legibility | how much must the team carry to act under this framing? | depth, locality, leverage | oracle weight to read / run / maintain | +| Failure modes | which contradictions or coherence breaks does this framing invite? | ease of misuse | what the oracle misses; false-positive rate | +| Coverage | appetite, what's foregrounded, what's refused | general vs specialized | coverage across invariants / claims | +| Commitment | what does this framing commit tech and verification to? | implementation efficiency | infra cost, fixture commitment, run time | + +**Soft commitment, not architectural enforcement.** The elicitor attempts the meta-frame when generating rubrics; project-specific axes are allowed alongside; the meta-frame is dropped when it doesn't fit. The hypothesis (uniform comparison UI across all three flows is more useful than per-flow improvisation) is testable via fixture comparison. Promote to schema/UI uniformity only if it holds up. + +## Offer-tree as orientation, not menu + +The agent maintains an implicit decision tree at each prompt — *"we need to establish X, Y, Z; shall we explore step-by-step, or via examples, or shall I propose alternatives?"* — that drives its offers. This tree is **not surfaced as a user-facing menu** by default. + +Risks of surfacing the tree as a menu: + +- **False predictability** — implies determinism the agent doesn't have; next turn's tree may differ +- **Loss of trust** — user overrides recommended path because "X is also available" +- **Wrong abstraction** — users care about what's *happening*, not what *could* happen +- **Defeats agentic purpose** — pushes next-move judgment back to the user + +What rescues a narrow version is **orientation under stuck-ness**: a scoped "where are we?" view showing what's established and what's open, available on user request, focused on review and orientation rather than next-action selection. The agent still owns the next move. + +In any case, the underlying tree structure — encoded as `brunch.establishment_offer` entries — is durable transcript content. Fixture-able and diagnosable even when not surfaced as a UI. + +## Temperamental fit (A17-L) + +Extractive and generative lenses are temperamentally different modes of working through a spec. Some users prefer to be quizzed (low cognitive load per move, slow drip, high control); others prefer to be proposed-to (high cognitive load per move, big bang, low cognitive load between moves). This may eventually warrant a user-level setting (preference for extractive vs generative); deferred until fixture evidence shows it matters for adoption. + +## Open questions tracked in SPEC + +- **A14-L** — Can LLM elicitors reliably produce graph-structurally-legal intent-graph proposals for generative lenses? +- **A15-L** — Are transcript-native establishment offers sufficient, or will a graph-side establishment-needs substrate eventually be needed? +- **A16-L** — Reviewer triggering policy (always-on vs lens-keyed) and reviewer scope; deferred to per-lens decisions. +- **A17-L** — User-level temperamental preference for extractive vs generative; deferred until adoption evidence justifies a setting. + +## Cross-references + +- Review-set mechanism: [REVIEW_SETS.md](REVIEW_SETS.md) +- Behavioral kernels (disambiguation patterns): [BEHAVIORAL_KERNELS.md](BEHAVIORAL_KERNELS.md) +- Custom-message event substrate: SPEC.md D17-L +- Reconciliation-need substrate: SPEC.md D8-L +- CommandExecutor contract: SPEC.md D20-L diff --git a/docs/design/REVIEW_SETS.md b/docs/design/REVIEW_SETS.md new file mode 100644 index 00000000..90cc52f6 --- /dev/null +++ b/docs/design/REVIEW_SETS.md @@ -0,0 +1,197 @@ +# Review Sets + +Long-form companion to `memory/SPEC.md` decisions D27-L (proposal payload + batch acceptance), D28-L (regeneration as successor entries + projection filtering), and D29-L (reviewer agent-mode). For lens taxonomy and grounding, see [ELICITATION_LENSES.md](ELICITATION_LENSES.md). + +SPEC is the authoritative register; this document is rationale and texture. + +## Concept + +A **review set** is a batch proposal generated by a generative lens, presented to the user for review-cycle acceptance. The interaction is modeled on the GitHub PR-review-cycle: + +- Agent proposes a batch (the "PR") +- User reviews and either: + - **Approves** — batch becomes one atomic `CommandExecutor` call (one LSN, one change-log entry) + - **Requests changes** with comments — agent regenerates per comments and re-proposes (the "force-push") + - **Rejects** — batch dropped; agent may offer a different lens or re-ground + +**There is no "accept with edits"** as a primitive. The cycle handles granularity *before* acceptance; post-hoc sub-batch granularity is not representable through any product API (I15-L). + +This pattern is **reusable across generative lenses**: the same mechanism that handles `propose-scenarios-with-tradeoffs` acceptance also handles `project-requirements-from-upstream` acceptance and (presumably) any future generative lens — `propose-design-shapes`, `propose-oracle-ensembles`, and so on. It is a shared UX primitive worth building once. + +## Proposal payload shape + +Generative-lens proposals carry **structured entity-draft payloads** in the proposal custom entry. The proposal contains the graph entities and edges that *would* be created on acceptance, in a form `CommandExecutor` can validate without re-parsing. + +Approximate shape (refined during M5 implementation): + +```text +{ + customType: "brunch.review_set_proposal", + payload: { + lens: "propose-scenarios-with-tradeoffs", + epistemic_status: "asserted", + proposal_version: 2, + supersedes: "", // null on first proposal + pitch: { + name: "...", + narrative: "...", + anchor_scenarios: [ { title, vignette }, ... ] + }, + entity_drafts: [ + { draft_id, kind: "intent_node", framing_as: "problem", title, body }, + { draft_id, kind: "intent_node", framing_as: "persona", title, body }, + ... + ], + edge_drafts: [ + { from_draft_id, to_draft_id, relation }, + ... + ], + rubric: { + axes: [ + { meta_axis: "legibility", value, rating }, + ... + ] + } + } +} +``` + +### Why structured payloads at proposal time, not natural language parsed at acceptance + +- **No parse failures at the policy gate.** Validation can dry-run against `CommandExecutor` at proposal time, surfacing `structural_illegal` and `policy_blocked` discriminants *before* the user reviews. Catching structural errors at proposal generation is far cheaper than at commit. +- **Deterministic replay.** Fixtures see the exact entities; no NL-parse variance between runs. +- **The user reviews the actual thing.** What they approve is what gets committed; no translation layer between review and commit. +- **Reviewer reads a well-formed batch.** Post-acceptance analysis doesn't have to re-parse what just happened. + +### The risk this design carries + +A14-L tracks the open assumption: **LLM elicitors must reliably produce graph-structurally-legal payloads** (entity drafts and edges that pass `CommandExecutor` structural validation). This is validated via: + +- Fixture replay across briefs exercising generative lenses +- Dry-run validation reports at proposal time + +If LLM reliability proves insufficient, fallbacks are possible without changing the user-facing review-cycle: + +- Constrained generation (structured-output mode against a schema) +- Retry-with-feedback loops on validation failure +- Natural-language proposals with parse-at-accept as a fallback path + +The decision to take the structured route now is deliberate: the costs of the alternative (parse failures at commit, ambiguous review semantics, non-deterministic replay) are higher than the reliability risk, which has known mitigations. + +## Regeneration semantics + +When the user requests changes, the agent regenerates and appends a new proposal entry. The previous proposal is *not* deleted; it remains as transcript history. + +### Successor entries in the linear session + +Regenerated proposals are **appended as successor entries within the linear Pi JSONL session**, with `supersedes` pointing to the predecessor proposal entry. The latest proposal is authoritative; prior proposals remain visible as raw transcript history. + +This stays within Brunch's linear transcript policy (D24-L) — **no Pi branching is created**. The model is "append-only history with a `supersedes` chain", not "fork-on-regenerate". + +Properties: + +- Keeps replay/fixture able to see the negotiation history +- Aligns with elicitation-as-transcript-truth (D12-L, I10-L) +- Stays within the linear transcript policy (D24-L) +- Preserves the audit trail of *why* the batch ended up the way it did + +### Projection asymmetry + +Raw JSONL captures everything (every proposal version), but **projections used to drive the agent** (context injection, summarization) filter to the **accepted set only**. The agent doesn't re-process every superseded proposal as live context. + +This asymmetry is intentional: + +- **Raw JSONL** = "capture everything" store, for replay, audit, and human review of the negotiation +- **Projections** = filtered views, for context economy and signal-to-noise in agent prompts + +The filter is by `supersedes` chain: when proposal A is superseded by B, only B (or whatever ultimately wins) reaches the agent's context window. The reviewer, by contrast, sees only the accepted set (no regeneration history needed for coherence analysis). + +## Batch acceptance + +Acceptance is one `CommandExecutor` call carrying the entire accepted batch: + +```text +commandExecutor.acceptReviewSet({ + session_id, + proposal_entry_id, + acceptance_attribution: { user_id, ... } +}) +``` + +The executor: + +1. Resolves the proposal entry's `entity_drafts` and `edge_drafts` +2. Allocates one LSN +3. Validates the batch structurally and against policy (largely pre-validated at proposal time) +4. Writes all entities and edges in one transaction +5. Appends one change-log entry attributed to the user via `acceptance_attribution` +6. Triggers coherence updates +7. Enqueues the reviewer job + +Properties: + +- **One LSN per acceptance** (preserves I1-L) +- **Atomic** — partial acceptance is not representable (I15-L) +- **One change-log entry** — audit trail clean +- **Reviewer enqueued, not invoked synchronously** — latency posture matches observer + +### Sub-batch granularity is sacrificed + +Future undo or diff at sub-batch granularity is not directly addressable: one acceptance produces one change-log entry. The alternative (exploding to N LSNs per batch) would violate I1-L's intent. + +The review cycle handles the granularity concern *before* acceptance — the user reviews and refines the batch via the request-changes cycle, then commits a settled batch. Post-hoc sub-batch granularity matters less than it would in a "commit-then-edit" model. + +## Reviewer agent-mode (D29-L) + +The reviewer is an agent-mode mirror of observer, instantiated to handle batch-acceptance analysis. + +### Comparison with observer + +| Property | Observer | Reviewer | +|---|---|---| +| Trigger | completion of single elicitation exchange | acceptance of batch proposal | +| Scope | one elicitation exchange | the accepted batch + its graph neighborhood | +| Extracts | implicit info → small graph mutations or reconciliation needs | coherence / completeness / gap analysis | +| Latency posture | async, per-exchange job | async, per-batch job | +| Job key | `(session_id, exchange_entry_range)` | `(session_id, batch_acceptance_entry_id)` | +| Write authority | high-confidence graph mutations *or* `reconciliation_need` records via `CommandExecutor` | **`reconciliation_need` records only** via `CommandExecutor` | +| Result delivery | next-turn boundary via custom message | next-turn boundary via custom message | + +### Write authority is narrow + +Reviewer is **advisory**. It writes only to the `reconciliation_need` substrate; it never writes graph entities, edges, change-log entries directly, or any other record class (I16-L). Routine findings — *"this batch is missing a non-goal"*, *"personas X and Y have overlapping but not identical responsibilities"* — surface as reconciliation needs the user can address in subsequent elicitation. The batch acceptance is the user's atomic commitment; the reviewer doesn't amend it. + +Rationale: + +- Preserves batch-acceptance as the user's atomic commitment +- Keeps I2-L / I11-L intact (no bypass of `CommandExecutor`) +- Aligns with the substrate philosophy where reconciliation needs are first-class (D8-L) +- Avoids the "silent reviewer mutation" failure mode where the user's accepted batch is altered by background work + +### Latency rationale (mirrors observer) + +The observer was introduced specifically to avoid overloading the elicitor LLM with dual-purpose work (driving *and* capturing implicit info) — both for procedural-conflict reasons and for latency reasons. + +The reviewer faces the same constraint at batch-acceptance time: synchronous analysis would block the next-turn surface. Async fits the same shape. + +### In-flight signaling + +After approval, the reviewer may take seconds-to-minutes. The user is **not blocked**; they can proceed with further elicitation. Reviewer findings arrive at the next turn boundary as `reconciliation_need` items routed to the chrome's reconciliation-need surface (per D17-L next-turn delivery). The chrome region surfaces an *"N reviewer findings open"* indicator when findings exist; no badge required during the in-flight period itself. + +## Open questions tracked in SPEC + +- **A14-L** — LLM elicitors producing graph-structurally-legal proposals reliably; fallback strategies known if assumption fails. +- **A16-L** — Reviewer triggering policy (always-on vs lens-keyed) and scope (how-far-neighborhood); deferred to per-lens decisions. + +## Future direction + +**Candidate artefacts.** The Future Direction Register names *candidate artefacts* (pre-graph, agent-proposed, awaiting user adjudication). When that substrate exists, reviewer's *suggestion-shaped* findings (e.g. "consider adding a `non_goal` for X") may route there instead of to `reconciliation_need`. *Contradiction-shaped* findings (genuine conflicts in the accepted material) will continue to route to `reconciliation_need`. The POC routes everything to `reconciliation_need` for simplicity. + +## Cross-references + +- Elicitation lens taxonomy: [ELICITATION_LENSES.md](ELICITATION_LENSES.md) +- Reconciliation-need substrate: SPEC.md D8-L +- Observer extraction: SPEC.md D13-L, D18-L +- `CommandExecutor` contract: SPEC.md D20-L +- Custom-message event substrate: SPEC.md D17-L +- Linear transcript policy: SPEC.md D24-L diff --git a/memory/PLAN.md b/memory/PLAN.md index 30da4f30..c814350d 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -30,6 +30,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - `brief-library-curation` — Author and review briefs #4–#7 plus the adversarial second tier; can proceed independently once `walking-skeleton` exists. Briefs are text, no code dependency. - `fixture-strategy-evolution` — Iterate `fixture-strategy.md` (property invariants, brief expectations) as fixtures are captured. Doc-only. +- `pi-ui-extension-patterns` — Prove the Pi extension seams Brunch needs for lens/review-set UX: custom slash commands, styled persistent chrome (color/glyphs), modal/popover overlays, radio/checkbox/select prompts, clickable/navigable action buttons, picker/list-selection modals. Spike-shaped probe whose output is a feasibility matrix + minimum-viable wrappers that downstream frontiers (M5 lenses/review-sets, M6 authority gates, M7 turn-boundary delivery) can build on. Can run in parallel with `web-shell` and ahead of `graph-data-plane`. ### Horizon @@ -89,7 +90,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Acceptance:** Round-trip reload of a captured linear session preserves raw payloads byte-equivalent (modulo timestamps); session binding and structured elicitation entries survive; elicitation exchanges can be re-projected after reload; all named Brunch custom entries survive, including side-task-result delivery entries when present; continuity metadata survives. Defensive branch-shape tests document Pi substrate behavior, but branch-aware Brunch sessions are not product-supported per D24-L. If core linear-session viability fails, the failure is sharply documented and a fallback path is proposed (project richer substrate / mirror JSONL into richer records / propose pi upstream change). - **Verification:** Inner — verify gate plus synthetic JSONL projection tests. Middle — JSONL round-trip/property tests for raw payloads, `brunch.session_binding`, structured elicitation entries, defensive branch-shape projection behavior, coordinator-created `/new` sessions, and M1 fixture replay parity. Outer — fixture replay parity across the transcript-first run bundle; no new human review was required because brief content and scripted user notes did not change. - **Cross-cutting obligations:** This frontier is the transcript-side proof for the shared event substrate that later carries structured elicitation entries, session binding, lens switches, side-task results, mentions, and `worldUpdate` without inventing a parallel channel or canonical chat/turn store. JSONL viability must validate sessions created through the `WorkspaceSessionCoordinator`, including the first-entry binding and `/new` same-spec behavior. -- **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L, D24-L / I3-L, I8-L, I10-L, I15-L +- **Traceability:** R7, R8, R16, R17, R19 / D6-L, D11-L, D12-L, D13-L, D18-L, D24-L / I3-L, I8-L, I10-L, I19-L - **Design docs:** archived [jsonl-session-viability-note](file:///Users/lunelson/Code/hashintel/brunch-next/archive/archive/docs/architecture/jsonl-session-viability-note.md) - **Current execution pointer:** complete; proceed to `web-shell`. @@ -104,7 +105,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Acceptance:** Web client connects via WebSocket RPC, lists specs and workspace state through `session.*` / `workspace.*` projection handlers, renders a transcript and the persistent chrome region, and round-trips structured elicitation prompts/responses plus freeform user input through the same transcript conventions as TUI. - **Verification:** Inner gate plus WebSocket/handler contract tests. Middle — manual browser smoke paired with projection/query postconditions for `session.*` / `workspace.*`, linear transcript-policy guards, transcript rendering state, and structured elicitation round-trip. Outer — at least one fixture replays into the web renderer; qualitative UX remains manual checklist. - **Cross-cutting obligations:** Preserve the single command/event substrate: the browser is a thin remote head over the same elicitation/transcript/session machinery, not a second data plane, REST-backed read client, generic read gateway, or custom interaction contract. Carry D24-L linear transcript policy forward before adding another session-consuming surface: block Brunch-controlled `/tree`/`/fork`/`/clone` branch flows where Pi hooks permit, and make transcript readers fail fast on non-linear JSONL rather than adapting it. -- **Traceability:** R4, R8, R11, R12, R16, R17 / D5-L, D10-L, D12-L, D13-L, D19-L, D24-L / I15-L +- **Traceability:** R4, R8, R11, R12, R16, R17 / D5-L, D10-L, D12-L, D13-L, D19-L, D24-L / I19-L - **Design docs:** [prd.md §M3, §Frontend Architecture](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md) - **Current execution pointer:** first slice should harden the linear transcript policy (block Pi branch flows where hooks permit and make transcript readers reject non-linear JSONL) before adding the browser surface. @@ -128,12 +129,12 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** unassigned - **Kind:** structural - **Status:** not-started -- **Objective:** Brunch installs graph tools through pi's extension seams; agent graph operations and observer-extraction writes route exclusively through the Brunch-owned command layer; web, TUI, and agent all observe the same changes. -- **Acceptance:** Agent can create / update / link intent-plane nodes via Brunch tools that call the `CommandExecutor`; an observer job can process a projected elicitation exchange and either write high-confidence graph changes or surface low-confidence suggestions/reconciliation work through the same executor; an architectural test or lint rule prevents direct DB access or caller-side authority bypass outside the command layer; the same change observed across TUI and (if M3 lands) web client; if the registry lands here, side-task-attributed writes follow the same command-executor path. -- **Verification:** Inner gate plus graph-tool/observer command shape tests. Middle — `CommandExecutor` contract tests, direct-DB no-bypass checks, observer-job idempotence/restart tests keyed by exchange range, and cross-surface projection checks. Outer — kernel-card-output coverage assertions begin landing per brief; side-task/observer-attributed writes, if present, remain indistinguishable from other writes at the command-layer boundary except for attribution. -- **Cross-cutting obligations:** Preserve the single-authority mutation rule for primary-agent, observer, and side-task flows by making the `CommandExecutor` the only mutation entry; observer jobs are durable operational queue entries keyed to elicitation exchanges, not a revived chat/turn store or privileged write path for background work. -- **Traceability:** R10, R13, R17 / D4-L, D13-L, D15-L, D18-L, D20-L / I2-L, I11-L, I14-L / A3-L, A11-L, A13-L -- **Design docs:** [prd.md §M5, §Authority Model](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md), [pi-seam-extensions.md §1 Async side-chain sub-agents](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md#1-async-side-chain-sub-agents) +- **Objective:** Brunch installs graph tools through pi's extension seams; agent graph operations, observer-extraction writes, reviewer-attributed advisory writes, and generative-lens batch acceptances all route exclusively through the Brunch-owned command layer; web, TUI, and agent all observe the same changes. +- **Acceptance:** Agent can create / update / link intent-plane nodes via Brunch tools that call the `CommandExecutor`; an observer job can process a projected elicitation exchange and either write high-confidence graph changes or surface low-confidence suggestions/reconciliation work through the same executor; a reviewer job can process an accepted review set and surface advisory `reconciliation_need` findings (only) via the same executor; the `acceptReviewSet` command commits a generative-lens batch atomically as one LSN and one change-log entry; an architectural test or lint rule prevents direct DB access, caller-side authority bypass outside the command layer, and reviewer-attributed writes to anything other than `reconciliation_need`; the same change observed across TUI and (if M3 lands) web client; if the registry lands here, side-task-attributed writes follow the same command-executor path. +- **Verification:** Inner — verify gate plus graph-tool/observer/reviewer command shape tests, proposal-entry schema validation (`brunch.review_set_proposal` must declare `epistemic_status`), elicitor-emitted-entry schema validation (must declare `lens`). Middle — `CommandExecutor` contract tests including `acceptReviewSet` discriminants, direct-DB no-bypass checks, observer-job idempotence/restart tests keyed by exchange range, reviewer-job restart/idempotence tests keyed by batch-acceptance entry id, reviewer-write-target architectural boundary test (rejects non-`reconciliation_need` targets), `acceptReviewSet` batch-atomicity property tests (one LSN / one change-log entry; partial-batch impossible under mid-batch validation failure), `supersedes`-chain acyclicity property tests, lens-routing correctness property tests, differential test comparing dry-run validation at proposal time vs real-run validation at acceptance, and cross-surface projection checks. Outer — kernel-card-output coverage assertions begin landing per brief; first generative-lens fixture (e.g. `propose-scenarios-with-tradeoffs`) replays through review cycle + acceptance; A14-L proposal structural-legality rate captured in fixture metadata as POC-phase fitness (not merge gate); 1–2 known-bad coherence-problem briefs exercise reviewer precision; side-task / observer / reviewer-attributed writes remain indistinguishable from other writes at the command-layer boundary except for attribution and reviewer's narrow target. +- **Cross-cutting obligations:** Preserve the single-authority mutation rule for primary-agent, observer, reviewer, side-task, and batch-acceptance flows by making the `CommandExecutor` the only mutation entry; observer and reviewer jobs are durable operational queue entries keyed to transcript anchors, not a revived chat/turn store or privileged write path for background work; reviewer is advisory and writes only to `reconciliation_need`; lens metadata on elicitor-emitted entries routes observer vs reviewer consumption. +- **Traceability:** R10, R13, R17, R21, R22, R23 / D4-L, D13-L, D15-L, D18-L, D20-L, D25-L, D26-L, D27-L, D28-L, D29-L / I2-L, I11-L, I14-L, I15-L, I16-L, I17-L, I18-L / A3-L, A11-L, A13-L, A14-L, A16-L +- **Design docs:** [prd.md §M5, §Authority Model](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md), [pi-seam-extensions.md §1 Async side-chain sub-agents](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md#1-async-side-chain-sub-agents), [ELICITATION_LENSES.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/ELICITATION_LENSES.md), [REVIEW_SETS.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/REVIEW_SETS.md) ### authority-model @@ -153,11 +154,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** unassigned - **Kind:** structural - **Status:** not-started -- **Objective:** Graph-revision tracking; session interest sets; `worldUpdate` synthesised by `prepareNextTurn`; mention-ledger staleness hints; side-task-result drain at the same boundary; session/spec binding transitions — and any lens switches present by then — recompute interest set before next agent turn. -- **Acceptance:** Cross-session paired-brief fixture exercises `worldUpdate` filtering; mention-staleness hints synthesise when an entity changed since last snapshot; succeeded side-task results are delivered only at the next turn boundary; session/spec binding transitions and any emitted `brunch.lens_switch` entries recompute interest sets. -- **Verification:** Inner gate plus mention-ledger/session-interest unit tests. Middle — generated LSN/change traces and property tests for I4-L, I5-L, I9-L, and I12-L; subscription/update ordering checks for turn-boundary messages. Outer — paired-brief adversarial capture passes, including side-task delivery when the side-task subsystem is active. -- **Cross-cutting obligations:** This frontier is the rendezvous point for Brunch's shared next-turn event semantics: `worldUpdate`, side-task results, lens changes, session/spec binding state, and mention staleness must coexist without inventing a second event plane. -- **Traceability:** R11, R13, R14, R18 / D6-L, D11-L, D14-L, D15-L / I1-L, I4-L, I5-L, I9-L, I12-L / A4-L, A9-L, A11-L +- **Objective:** Graph-revision tracking; session interest sets; `worldUpdate` synthesised by `prepareNextTurn`; mention-ledger staleness hints; side-task-result and reviewer-finding drain at the same boundary; session/spec binding transitions — and any lens switches present by then — recompute interest set before next agent turn. +- **Acceptance:** Cross-session paired-brief fixture exercises `worldUpdate` filtering; mention-staleness hints synthesise when an entity changed since last snapshot; succeeded side-task results are delivered only at the next turn boundary; reviewer findings from earlier batch acceptances arrive as advisory `reconciliation_need` items at the same boundary, never mid-turn; session/spec binding transitions and any emitted `brunch.lens_switch` entries recompute interest sets. +- **Verification:** Inner gate plus mention-ledger/session-interest unit tests. Middle — generated LSN/change traces and property tests for I4-L, I5-L, I9-L, I12-L, I16-L; subscription/update ordering checks for turn-boundary messages including reviewer findings. Outer — paired-brief adversarial capture passes, including side-task delivery and reviewer-finding delivery when those subsystems are active. +- **Cross-cutting obligations:** This frontier is the rendezvous point for Brunch's shared next-turn event semantics: `worldUpdate`, side-task results, reviewer findings, lens changes, session/spec binding state, and mention staleness must coexist without inventing a second event plane. +- **Traceability:** R11, R13, R14, R18, R21 / D6-L, D11-L, D14-L, D15-L, D17-L, D29-L / I1-L, I4-L, I5-L, I9-L, I12-L, I16-L / A4-L, A9-L, A11-L, A16-L - **Design docs:** [pi-seam-extensions.md §1 Async side-chain sub-agents](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md#1-async-side-chain-sub-agents), [pi-seam-extensions.md §5 Graph-entity mentions](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md) ### coherence-first-class @@ -180,8 +181,8 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Kind:** structural - **Status:** not-started - **Objective:** Compaction preserves graph and coherence anchors; interest sets can widen beyond direct reads when needed; conflict signaling remains intelligible at long horizons. -- **Acceptance:** Long-horizon adversarial brief (50+ turns) replays through compaction with `lastSeenLsn`, interest set, and session binding preserved; spec/session changes across compaction boundaries do not desync; active spec and any in-flight side-task, observer-job, or lens bookkeeping remain intelligible after compaction. -- **Verification:** Inner gate plus continuity-metadata unit tests. Middle — compaction round-trip/property tests for `lastSeenLsn`, interest set, session binding, graph/coherence anchors, and active side-task/observer bookkeeping. Outer — long-horizon fixture passes, including continuity checks for side-task and interest-set state when present. +- **Acceptance:** Long-horizon adversarial brief (50+ turns) replays through compaction with `lastSeenLsn`, interest set, and session binding preserved; spec/session changes across compaction boundaries do not desync; active spec and any in-flight side-task, observer-job, reviewer-job, or lens bookkeeping remain intelligible after compaction; the latest `brunch.establishment_offer` entry remains reconstructable across compaction so ambient-affordance chrome continues to render the current offer. +- **Verification:** Inner gate plus continuity-metadata unit tests. Middle — compaction round-trip/property tests for `lastSeenLsn`, interest set, session binding, graph/coherence anchors, active side-task/observer/reviewer bookkeeping, and latest-establishment-offer reconstruction. Outer — long-horizon fixture passes, including continuity checks for side-task, interest-set, and establishment-offer state when present. - **Cross-cutting obligations:** Preserve the coherence anchors, session binding, session continuity metadata, and side-task/observer/spec state that earlier milestones attached to the shared transcript/event substrate; preserve lens state only if a lens subsystem has landed by then. - **Traceability:** R15 / D6-L, D15-L / I12-L - **Design docs:** [prd.md §Continuity, Divergence, and Coherence](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md) @@ -212,6 +213,20 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Traceability:** A5-L - **Design docs:** [fixture-strategy.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/fixture-strategy.md) +### pi-ui-extension-patterns + +- **Name:** Prove Pi extension patterns for Brunch UI affordances +- **Linear:** unassigned +- **Kind:** structural (spike-flavored) +- **Status:** not-started +- **Objective:** Demonstrate that Pi's extension seams can host the UI affordances Brunch needs for elicitation-lens and review-set flows without forking Pi or building a parallel rendering substrate. Catalog and prototype: custom slash commands routed through Brunch handlers; persistent chrome with TUI styling/color/glyphs beyond the current minimal status line; modal/popover overlays for proposal review; radio/checkbox/select prompts for multi-choice answers and lens-selection offers; clickable/navigable action buttons for accept/request-changes/reject affordances; picker/list-selection modals for spec switching and entity selection. The output is a feasibility matrix mapping each affordance to (a) the Pi seam(s) used, (b) Brunch-owned wrapper code required, (c) controllability cost for the agent-as-user driver, and (d) residual risks — plus minimum-viable wrappers that later frontiers can call directly. +- **Acceptance:** A short design memo (`docs/architecture/pi-ui-extension-patterns.md` or section in `pi-seam-extensions.md`) catalogs the affordance matrix with verdicts (`proven` / `feasible-with-cost` / `requires-pi-change` / `not-feasible`); a runnable demo wires at least one representative of each viable category through Brunch's TUI host (custom slash command, styled chrome element, modal/popover, multi-choice prompt, action button, picker modal); the agent-as-user driver can controllably exercise the multi-choice and action-button affordances (informs the controllability/cost answer in `D27-L` and reviewer-flow oracle design); the matrix explicitly records which affordances are unviable so downstream UX design does not assume them; SPEC.md and PLAN.md links to the memo are added where M5/M6/M7 verification depends on a charted affordance. +- **Verification:** Inner — verify gate plus unit tests for any extension wrappers added. Middle — runbook oracles per affordance category (manual checklist + executable postcondition checker on chrome state, JSONL custom entries emitted, or command-result discriminants); contract tests for any new Brunch handler shape introduced (slash command router, modal request/response, picker selection). Outer — manual TUI walkthrough validating visual quality and interaction feel; comparative walkthrough between scripted-driver and manual paths to record controllability cost. +- **Cross-cutting obligations:** Preserve the linear-transcript invariant (`I19-L`) — affordance prototypes must not introduce branch creation, mid-turn state mutations outside the command layer, or a parallel chat/turn store. Multi-choice affordances must integrate with the existing capture-aware offer envelope (`pi-seam-extensions.md §4`) and the structured elicitation-entry shape. Slash commands and action buttons must route writes through the `CommandExecutor`. Any new custom-entry kinds must declare `lens` per `I18-L` if elicitor-emitted. +- **Why now / unlocks:** Lens/review-set/reviewer UX in M5 and authority gating in M6 both assume Brunch can render rich interactive affordances over Pi without forking it. Proving the affordance set early de-risks those frontiers and lets the agent-as-user-driver extension question (controllability vs cost trade flagged in `ln-oracles` pass) be answered with evidence rather than estimation. Can run in parallel with `web-shell` (M3) because TUI seams are independent of the web transport. +- **Traceability:** R4, R14, R16, R20, R21 / D2-L, D11-L, D24-L, D25-L, D26-L, D27-L, D29-L / I18-L, I19-L / A10-L, A14-L, A17-L +- **Design docs:** [pi-seam-extensions.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md), [ELICITATION_LENSES.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/ELICITATION_LENSES.md), [REVIEW_SETS.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/REVIEW_SETS.md); new memo to be created during the spike. + ### flue-pattern-adoption - **Name:** Adopt selected Flue patterns post-POC @@ -290,7 +305,9 @@ walking-skeleton │ │ │ │ │ │ │ └── (oracle-design-plan-graphs — horizon) │ │ │ - │ │ └── web-shell (M3, can run parallel after M2) + │ │ ├── web-shell (M3, can run parallel after M2) + │ │ │ + │ │ └── pi-ui-extension-patterns (parallel after M2; informs M5/M6/M7) │ │ │ └── brief-library-curation (parallel after M0) │ diff --git a/memory/SPEC.md b/memory/SPEC.md index 753ff354..3fef4ca2 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -73,16 +73,20 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c 17. Brunch must support action, radio (single-select), checkbox (multi-select), and freeform-plus-choice response surfaces as optional typed transcript entries, and must be able to project elicitation exchanges from Pi JSONL for observer extraction. 18. Brunch must support `#`-mentions of graph entities anchored to stable IDs, with session-scoped staleness tracking that produces discretionary re-read hints during `prepareNextTurn`. 19. Brunch must enforce a workspace state hierarchy `cwd → spec → session`, where the active spec is selected before any agent loop runs, persists across `/new`, and binds each session to exactly one spec. +20. Brunch must support multiple elicitation lenses within the `elicitor` agent-mode, with the agent owning lens selection and offer through transcript-native establishment offers; lens metadata is carried on elicitor-emitted custom entries for downstream routing. +21. Brunch must distinguish *extractive* lenses (single-exchange, observer-extracted) from *generative* lenses (batch-proposal, captured at proposal time as structured entity-draft payloads, reviewer-analyzed post-acceptance). +22. Brunch must establish a minimum grounding bundle (domain, protagonist, pain/pull, and constraint anchors) before generative lenses produce non-speculative output; lenses remain always-available with epistemic-status signaling honestly reflecting grounding density. +23. Brunch must support a review-cycle acceptance pattern for generative-lens proposals — approve / request changes (triggering regeneration) / reject — with batch acceptance committed atomically as one CommandExecutor call; partial acceptance is not representable. #### Verification & fixtures -20. Brunch must ship a brief library and an agent-as-user driver over the JSON-RPC stdio surface to capture replayable golden runs and property-checkable fixtures. +24. Brunch must ship a brief library and an agent-as-user driver over the JSON-RPC stdio surface to capture replayable golden runs and property-checkable fixtures. ## Live Architecture Register ### Open Assumptions - + | # | Assumption | Confidence | Status | Depends on | Validation approach | | --- | --- | --- | --- | --- | --- | @@ -97,6 +101,10 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | A10-L | A persistent TUI chrome region showing cwd / spec / phase / chat-mode can be added on top of `pi-tui`'s root layout without modifying pi. | medium | open | D2-L | M0 — walking skeleton attempts to mount the chrome; escalates to a pi upstream issue only if blocked. | | A11-L | Pi's `prepareNextTurn` plus custom-message delivery are sufficient to express side-task result delivery without inventing a second event plane or forking pi. | medium | open | D15-L | M5 + M7: side-task registry wiring and next-turn delivery proof. | | A13-L | A durable observer-job queue keyed by session id and elicitation-exchange entry range can recover async extraction after process interruption without reintroducing canonical chat/turn tables; whether this shares storage with a generalized work-item/reconciliation table can be deferred. | medium | open | D18-L, I14-L | M5: observer extraction tests exercise restart/idempotence once graph writes exist. | +| A14-L | LLM elicitor agents can reliably produce graph-structurally-legal intent-graph proposals (well-formed entity drafts and semantic edges that pass `CommandExecutor` structural validation) for generative lenses. | medium | open | D27-L | Fixture replay across briefs that exercise `propose-scenarios-with-tradeoffs`-shaped lenses; dry-run `CommandExecutor` validation at proposal time before user review. Fallback (constrained generation, retry-with-feedback, or NL-parse-at-accept) preserves the user-facing review-cycle if reliability is insufficient. | +| A15-L | Establishment hints as transcript-native custom entries (`brunch.establishment_offer`) provide sufficient inspectability, fixture-ability, and ambient-affordance source without a separate establishment-needs graph substrate; whether such a substrate ever shares storage with reconciliation needs can be deferred. | medium | open | D25-L, D30-L | M5+: fixture inspection confirms lens offers are reconstructable from transcript; chrome region renders ambient affordances from the latest such entry. | +| A16-L | Reviewer triggering policy (always-on vs lens-keyed) and reviewer scope (batch + how-far-neighborhood) can be deferred to per-lens decisions without architectural commitment now. | low | open | D29-L | M5+: empirical — observer/reviewer integration reveals which policy avoids unacceptable next-turn latency without losing relevant findings. | +| A17-L | A user-level temperamental preference for extractive vs generative lenses meaningfully affects adoption and eventually warrants expression as a user-level setting. | low | open | D25-L, D26-L | Deferred; surfaces from outer-loop walkthroughs and adversarial fixtures once both lens families exist in product. | ### Active Decisions @@ -116,6 +124,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D4-L — One shared mutation surface owns graph truth.** Every semantic graph mutation routes through Brunch-owned typed command handlers responsible for validation, structural legality, optimistic concurrency, event emission, audit attribution, and coherence triggering. Agents and adapters must not touch the ORM or SQLite directly. Depends on: A3-L. Supersedes: —. - **D20-L — Command execution owns the pre-M6 authority seam.** Callers submit product commands to a Brunch `CommandExecutor` and receive a structured result; they do not call a standalone authority service or graph persistence directly. The executor is the public mutation boundary that hides attribution, optimistic concurrency, structural validation, the minimal pre-M6 policy classifier, transaction execution, LSN allocation, change-log append, and coherence-trigger hooks. Before M6, the policy logic may be deliberately small, but the result shape must already include `needs_human`, `policy_blocked`, `version_conflict`, and `structural_illegal` so early RPC, print, agent-tool, observer-job, and side-task code cannot bake in permissive mode-specific shortcuts. Depends on: D4-L, D16-L. Supersedes: the separate optional `AuthorityGate` / generic policy-service mental model. +- **D27-L — Generative-lens proposals are structured entity-draft payloads; batch acceptance is one atomic `CommandExecutor` call.** The elicitor's proposal custom entry (`brunch.review_set_proposal`) contains the graph entities and edges that *would* be created on acceptance, in a form `CommandExecutor` can dry-run-validate at proposal time so `structural_illegal` / `policy_blocked` discriminants surface before the user reviews. Acceptance is one `acceptReviewSet` command that consumes one LSN, writes the entire batch in one transaction, appends one change-log entry attributed to the user, triggers coherence updates, and enqueues the reviewer job. "Accept with edits" does not exist as a primitive: the cycle is approve / request changes (triggers regeneration of a successor proposal) / reject. Depends on: A14-L, D4-L, D20-L, D26-L. Supersedes: any caller-side multi-step "patch then commit" mental model. #### Transport & client @@ -131,6 +140,8 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D15-L — Side tasks are a first-class Brunch subsystem delivered through the same transcript/event substrate.** Background sub-agents are tracked by a Brunch-owned `SideTaskRegistry`; results are never injected mid-turn and instead arrive at the next-turn boundary through the existing custom-message plus `prepareNextTurn` path. Side-task writes remain subject to the same command-layer authority as primary-agent writes. Depends on: A11-L, D4-L. Supersedes: —. - **D16-L — Graph persistence uses Drizzle over `better-sqlite3`, with one-LSN-per-commit and no bypass paths.** The command layer owns precondition checks, structural validation, entity writes, LSN allocation, change-log append, and any coherence updates inside one transaction. This rule applies equally to migrations and maintenance code; there is no privileged write path outside the command-executor protocol. Depends on: A3-L, A4-L. Supersedes: —. - **D18-L — Observer extraction is exchange-keyed durable work, not a chat/turn store.** After a user response closes an elicitation exchange, Brunch may enqueue an observer job keyed by session id plus exchange entry ids; jobs survive process restart and graph writes still route through the command layer. Routine observer jobs are operational queue state, not reconciliation needs by default; low-confidence or conflicting findings may create reconciliation needs. Depends on: A13-L, D4-L, D13-L, D16-L. Supersedes: the old DB-backed `chat` / `turn` mental model. +- **D28-L — Regenerated review-set proposals are appended as successor entries in the linear Pi JSONL session; projection helpers filter to the accepted set for context economy.** When the user requests changes, the agent appends a successor proposal entry that references its predecessor via `supersedes`; prior proposals are *not* deleted from JSONL but remain visible as raw transcript history. This stays within Brunch's linear transcript policy — no Pi branching is created. Pi JSONL is treated as a "capture everything" store for replay and audit. Projection helpers used to drive the agent (context injection, summarization) walk the `supersedes` chain and surface only the latest (or ultimately accepted) proposal — the agent does not re-process every superseded proposal as live context. The reviewer likewise sees only the accepted set, not the regeneration history. Depends on: D6-L, D12-L, D17-L, D24-L, D27-L. Supersedes: any "in-place edit" or "fork-on-regenerate" mental model. +- **D29-L — Reviewer is an `observer`-shaped agent-mode with narrow write authority.** After a batch acceptance closes, Brunch may enqueue a reviewer job keyed by session id plus the batch-acceptance entry id; the job survives process restart and analyzes the accepted batch plus its graph neighborhood for coherence, completeness, and gaps. **Reviewer writes only `reconciliation_need` records via the `CommandExecutor`**; it never writes graph entities, edges, change-log entries directly, or any other record class. Findings reach the user through next-turn delivery as advisory items on the reconciliation-need surface — the batch acceptance remains the user's atomic commitment and the reviewer cannot amend it. (Suggestion-shaped findings may later route to candidate-artefacts when that substrate exists; the POC routes everything to reconciliation needs.) Depends on: A16-L, D4-L, D8-L, D15-L, D17-L, D18-L, D20-L, D27-L. Supersedes: any "reviewer may quietly amend the graph" mental model. - **D24-L — Brunch POC enforces a linear transcript policy over Pi JSONL.** Pi's session tree is a substrate capability, not a Brunch product surface. Until branch-aware continuity/coherence is explicitly designed, Brunch-controlled interactive/runtime flows block `/tree`, `/fork`, and `/clone` through the thinnest available Pi hooks; transcript readers reject non-linear session files instead of flattening, adapting, migrating, or selecting a branch. This is intentional fail-fast pre-release posture: avoid compatibility debt with Pi internals or earlier Brunch revisions, and keep wrapper/adapter layers minimal. Depends on: D6-L, D11-L, D13-L. Supersedes: treating active-branch projection as Brunch product semantics. #### Interaction & UI shape @@ -141,6 +152,10 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D12-L — Elicitation-first interaction, transcript-native structured prompts.** Brunch treats system/assistant prompts and user responses as Pi transcript truth. Structured action/choice/freeform surfaces may be represented by Brunch custom entries when needed, but there is no DB-owned prompt/response entity; at idle, the session waits on a system/assistant-originated elicitation prompt. Depends on: D6-L, D11-L. Supersedes: —. - **D13-L — Capture-aware elicitation exchange projection.** Observer extraction consumes derived elicitation exchanges: a prompt-side span (all system/assistant/tool-side entries since the previous user response, including any structured/internal prompt content) plus a response-side span (user text and/or structured action entries). Role/span alternation is the default projection in Brunch-supported linear sessions; typed markers are added only where structure/actions need deterministic replay. Depends on: D12-L, D24-L. Supersedes: —. - **D14-L — `#`-mentions are ID-anchored, with a session-scoped mention ledger.** Autocomplete may resolve by title but insertion always rewrites to ID-anchored. Per-session `(entity_id, snapshotted_lsn)` ledger drives discretionary `brunch.mention_staleness_hint` entries in `prepareNextTurn`. Depends on: A9-L, I4-L. Supersedes: —. +- **D25-L — Elicitation strategies are *lenses* within the `elicitor` agent-mode, not separate agent-modes.** Lens is metadata on elicitor-emitted custom transcript entries (`brunch.elicitor_intent_hint`, `brunch.establishment_offer`, `brunch.review_set_proposal`, etc.); agent-modes (`elicitor`, `observer`, `reviewer`, `reconciler`) remain orthogonal. The known starter lens set is `step-by-step`, `disambiguate-via-examples`, `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, and `project-requirements-from-upstream`; the catalogue is expected to grow. Observer-job and reviewer-job routing filters on lens. Depends on: D12-L, D17-L, D23-L. Supersedes: collapsing strategy and agent-mode into one vocabulary axis. +- **D26-L — Lenses split into *extractive* and *generative* families by capture mechanism.** Extractive lenses produce single-exchange interactions whose implicit content is captured by the `observer` agent-mode post-exchange (e.g. `step-by-step`, `disambiguate-via-examples`). Generative lenses produce batch proposals whose entity-draft payloads are captured by the elicitor *at proposal time*, with the `reviewer` agent-mode running advisory analysis post-acceptance (e.g. `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, `project-requirements-from-upstream`). The family distinction is durable; the specific lens list is expected to evolve. Depends on: D18-L, D25-L. Supersedes: a single uniform "agent asks questions" mental model. +- **D30-L — Grounding is a precondition gate for generative-lens output, with epistemic-status signaling honestly tracking grounding density; lenses themselves are always available.** A minimum grounding bundle — *domain anchor*, *protagonist anchor*, *pain/pull anchor*, *constraint anchor* — must be established before generative lenses produce non-speculative output. Generative-lens proposals declare `epistemic_status` (`inferred | assumed | asserted | observed`) consistent with grounding density at proposal time; UI renderings reflect this status so low-status proposals *feel* speculative (visible hedging, lower visual weight, explicit "speculative — based on N anchors so far" footers). The lens is never refused: the agent always produces *some form* of what was asked for, but its output resolution and epistemic load honestly reflect what grounding supports. Rendering mode scales with density: empty/thin → framing proposals (Shape Up pitches); moderate → scenario sketches; rich → completion proposals; mature → refactor proposals. Depends on: D26-L. Supersedes: gating-by-refusal as a UX move. +- **D31-L — A four-axis meta-rubric is a soft heuristic for fan-out comparison rubrics across all three flows; not architecturally enforced.** When generating comparison rubrics for fan-out alternatives across candidate-spec, technical-design, and verification-design flows, the elicitor attempts to express each axis in terms of (*legibility / cost-of-knowing*, *failure modes*, *coverage / range*, *commitment*). Project-specific axes are allowed alongside; the meta-frame is dropped when it doesn't fit. The hypothesis (uniform comparison UI across all three flows) is testable via fixture comparison; promote to schema/UI only if it holds up. Depends on: D25-L, D26-L. Supersedes: a hardcoded per-flow rubric. ### Critical Invariants @@ -160,7 +175,11 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I12-L | Side-task results are delivered only at turn boundaries; no side-task result may steer or mutate the active turn outside the next-turn delivery path. | planned (M7 side-task delivery invariant) | D15-L | | I13-L | At any idle linear session leaf, the latest unresolved interaction state is system/assistant-originated: user input is a response to an elicitation prompt, not ambient chat. | planned (M1 fixture + transcript projection tests) | D12-L, D24-L | | I14-L | Observer jobs are keyed by session id plus elicitation-exchange entry-range ids and have durable status; replay/restart cannot enqueue duplicate observer jobs for the same exchange. | planned (M5 observer queue tests) | D18-L, D4-L | -| I15-L | Brunch-controlled flows do not create or navigate Pi session branches, and Brunch transcript readers fail fast on non-linear JSONL rather than flattening, migrating, or branch-selecting. | planned (linear transcript policy guard tests before/within M3 web-shell) | D24-L, D6-L, D11-L, D13-L | +| I15-L | Every review-set acceptance routes through `CommandExecutor` as one atomic `acceptReviewSet` command producing one LSN, one change-log entry, and one transaction over the entire batch. Partial acceptance is not representable through any product API. | planned (M5+ batch-acceptance command tests; review-set fixture parity) | D20-L, D27-L; I1-L, I11-L | +| I16-L | Reviewer-attributed writes target only the `reconciliation_need` substrate; no reviewer-attributed `CommandExecutor` call writes graph entities, edges, change-log entries directly, or any other record class. | planned (M5+ architectural test on reviewer command writers; reviewer-attributed command-result audit) | D29-L; I2-L, I11-L | +| I17-L | Every generative-lens proposal entry (`brunch.review_set_proposal`) declares an `epistemic_status` (`inferred | assumed | asserted | observed`) consistent with grounding-bundle coverage at proposal time; UI renderings honor this status as a presentation contract. | planned (M5+ proposal-entry schema test; fixture asserts status under thin and rich grounding) | D30-L; A14-L | +| I18-L | Every elicitor-emitted prompt or proposal custom entry (`brunch.elicitor_intent_hint`, `brunch.establishment_offer`, `brunch.review_set_proposal`) carries a `lens` field; observer-job and reviewer-job routing filters on this field. | planned (M5+ observer/reviewer routing tests; transcript-shape contract test) | D25-L, D26-L, D29-L | +| I19-L | Brunch-controlled flows do not create or navigate Pi session branches, and Brunch transcript readers fail fast on non-linear JSONL rather than flattening, migrating, or branch-selecting. | planned (linear transcript policy guard tests before/within M3 web-shell) | D24-L, D6-L, D11-L, D13-L | ## Future Direction Register @@ -240,8 +259,22 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Epistemic status** | Confidence basis: `observed | asserted | assumed | inferred`. Like `authority`, this is a context-shaping label for attention, grouping, and compression rather than a complete theory of truth. | | **Framing-as** | Orthogonal modality classifying a node's product role (e.g. `problem`, `persona`, `non_goal`) within an allowed matrix. | | **Kernel** | A behavioural elicitation pattern from `docs/design/BEHAVIORAL_KERNELS.md` (state/lifecycle, containment, concurrency, etc.). | -| **Brief** | A short curated product brief in `.brunch-fixtures/briefs/`, run by the agent-as-user driver to produce golden captures. | +| **Brief** | A short curated product brief in `.brunch-fixtures/briefs/`, run by the agent-as-user driver to produce golden captures. Dev-only fixture input; distinct from runtime user-facing **scenarios**. | | **Capture / Run / Fixture** | A captured agent-as-user run produces a `.jsonl` transcript, `.graph.json`, `.coherence.json`, and `.meta.json` bundle under `.brunch-fixtures///`. | +| **Elicitation lens** | A narrower interpretive strategy applied within the `elicitor` agent-mode — e.g. `step-by-step`, `disambiguate-via-examples`, `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, `project-requirements-from-upstream`. Lens is metadata on elicitor-emitted custom transcript entries. Agent-modes (`elicitor` / `observer` / `reviewer` / `reconciler`) remain orthogonal. | +| **Extractive lens** | A lens producing single-question / single-answer exchanges; implicit content is captured post-exchange by the `observer` agent-mode. Low cognitive load per move; small graph mutations. | +| **Generative lens** | A lens producing batch proposals (structured entity-draft payloads in `brunch.review_set_proposal` entries); proposals are captured by the elicitor at proposal time, with the `reviewer` agent-mode running advisory analysis post-acceptance. Higher cognitive load per move; large graph mutations on acceptance. | +| **Grounding bundle** | The minimum set of session-level anchors required before generative lenses produce non-speculative output: a *domain anchor*, a *protagonist anchor*, a *pain/pull anchor*, and a *constraint anchor*. Captured technical constraints land in the constraint anchor and bound subsequent technical-design fan-outs. | +| **Grounding anchor** | One sentence-scale fact captured during early elicitation that contributes to the grounding bundle. | +| **Establishment offer** | A `brunch.establishment_offer` custom transcript entry summarising the elicitor's perceived gaps, the available lens strategies for the next move, the recommended lens, and the agent's confidence. Source of ambient affordances rendered in the chrome region; inspectable post-hoc and fixture-able. | +| **Elicitor intent hint** | A `brunch.elicitor_intent_hint` custom transcript entry emitted alongside a prompt or proposal, declaring `lens` and semantic targets (e.g. expected ontological sub-type) for downstream observer/reviewer routing and extraction guidance. | +| **Review set** | A batch proposal generated by a generative lens, presented to the user for review-cycle acceptance (approve / request changes / reject), modeled on the GitHub PR-review-cycle. | +| **Batch acceptance** | The single `CommandExecutor` call (`acceptReviewSet`) that commits an entire review set atomically as one LSN and one change-log entry, attributed to the user. The only mutation a generative-lens acceptance produces. | +| **Reviewer** | An agent-mode that runs async after batch acceptance, scoped to the accepted batch plus graph neighborhood, analyzing for coherence / completeness / gaps. Authority is narrow: writes only `reconciliation_need` records via `CommandExecutor`. Architecturally a mirror of `observer`. | +| **Anchor scenario** | A particular vignette embedded inside one alternative pitch to ground its framing. Transcript-rendered; not persisted as a graph entity. | +| **Contrastive scenario** | A particular vignette distinguishing two alternatives, surfaced in comparison UI. Transcript-rendered. | +| **Probing scenario** | A particular vignette posed by the elicitor to force a user response that disambiguates intent. Transcript-rendered; user response persists per existing elicitation mechanics. | +| **Meta-rubric** | The soft heuristic axis set (*legibility / cost-of-knowing*, *failure modes*, *coverage / range*, *commitment*) the elicitor attempts when generating fan-out comparison rubrics across candidate-spec, technical-design, and verification-design flows. Not architecturally enforced. | ## Verification Design @@ -255,13 +288,23 @@ Brunch uses a three-layer stance: 2. **Middle loop:** runbook oracles, round-trip/property tests, contract tests, and fixture replay prove frontier seams against durable artifacts. 3. **Outer loop:** adversarial/generative fixtures and manual walkthroughs assess LLM elicitation quality, UX feel, and long-horizon coherence that cannot be reduced to schema checks. +**POC-phase posture (M0–M9): viable-and-reasonable, not hardened.** Across the POC milestone ladder, the goal is "the system is viable and works at least reasonably well" — proof-of-life for each architectural claim, not statistical robustness. The implications for oracle design: + +- **Structural invariants stay hard gates** (atomicity, no-bypass, write-target restrictions, schema conformance, supersedes acyclicity). These don't get cheaper to defer; getting them wrong corrupts the substrate. +- **LLM-behavioral metrics — proposal structural-legality rate, lens-recommendation appropriateness, reviewer-finding precision — are *tracked as fitness*, not gated.** Captured per-run in fixture metadata; surfaced for human review; thresholds noted as targets (e.g. ≥95% legality on first attempt) but failure to hit them does not block merges during POC. +- **Multi-run variance probes use conservative replication** (3 runs middle-loop, 5 outer-loop) — enough to detect catastrophic instability, not enough to characterize tail distributions. Higher replication is post-POC. +- **Adversarial/generative fixture campaigns stay small and targeted** during POC: one or two known-bad scenarios per relevant invariant, not exhaustive coverage. Coverage breadth is post-POC. +- **Deferred to post-POC hardening:** mutation testing, large-seed campaigns, performance budgets, accessibility audits, formal pass-rate thresholds as merge gates, exhaustive adversarial coverage. + +The structural/behavioral split is the key discipline: never let a behavioral fitness metric weaken a structural gate, and never demand statistical confidence on a behavioral metric during POC that the LLM-budget cost cannot bear. + ### Diagnostic Assessment | Dimension | Score | Notes | Raised by | | --- | --- | --- | --- | -| Observability | partial, improving to high by M4/M5 | Text-native artifacts are planned (`.brunch/state.json`, Pi JSONL, command results, graph exports, coherence exports, fixture bundles). M0 TUI chrome and M3 browser UX remain partly visual unless paired with artifact/query checks. | Runbook oracles; projection handlers; graph/coherence exports. | -| Reproducibility | partial | Fixture briefs and captured runs create a repeatable path, but M1/M2 must first prove the agent-as-user harness and JSONL projection/reload discipline. LLM runs remain variable, so deterministic postcondition checks and property assertions are required. | Deterministic runbook checks; captured-run metadata; replay/property fixtures. | -| Controllability | partial | `npm run fix` / `npm run verify` are agent-controllable. TUI/browser/manual flows are not fully controllable yet, so early frontiers use human action plus executable postcondition checks rather than full UI automation. | Store/projection postcondition checkers; later stdio/WebSocket drivers. | +| Observability | partial, improving to high by M4/M5 | Text-native artifacts are planned (`.brunch/state.json`, Pi JSONL, command results, graph exports, coherence exports, fixture bundles). Generative-lens material adds further text-native surfaces: `brunch.review_set_proposal`, `brunch.establishment_offer`, `brunch.elicitor_intent_hint` entries plus reviewer-finding `reconciliation_need` records. *Structural* observability is high; *behavioral* observability (proposal quality, lens-recommendation appropriateness, reviewer precision) remains low and outer-loop only. M0 TUI chrome and M3 browser UX remain partly visual unless paired with artifact/query checks. | Runbook oracles; projection handlers; graph/coherence exports; transcript projection of lens/establishment/proposal entries. | +| Reproducibility | partial | Fixture briefs and captured runs create a repeatable path. M1/M2 proved the agent-as-user harness and JSONL projection/reload discipline. LLM runs remain variable, so deterministic postcondition checks and property assertions are required; generative-lens flows additionally need seeded multi-run probes to characterize structural-legality rate at all. Driver extension for review-cycle flows (approve / request-changes / reject) is conditional on cost being worth the controllability gain. | Deterministic runbook checks; captured-run metadata; replay/property fixtures; (planned) review-cycle driver extension. | +| Controllability | partial → high (conditional) | `npm run fix` / `npm run verify` are agent-controllable. The agent-as-user stdio RPC driver covers extractive-lens flows end-to-end; extending it to drive review-cycle acceptance/regeneration would lift generative-lens controllability to "high" but carries implementation cost. TUI/browser/manual flows for ambient affordances, in-flight reviewer signals, and chrome rendering remain runbook-oracle territory. | Store/projection postcondition checkers; stdio/WebSocket drivers; (planned) review-cycle driver extension; runbook oracles for chrome surfaces. | ### Verification Commands @@ -287,16 +330,17 @@ Infrastructure is not yet fully laid (Phase 3 of POC bootstrapping). Commands fo | Loop | Oracle family | Proves | Primary claims | | --- | --- | --- | --- | -| Inner | Type-aware lint, type checks, fast unit tests | Local module correctness, typed command/result shapes, projection helper behavior. | D12-L, D13-L, D20-L, D21-L. | -| Inner | Schema/shape validation at boundaries | JSON-RPC payloads, command results, structured elicitation entries, fixture metadata, graph exports. | R8, R10, R11, R17; I3-L, I10-L, I11-L. | -| Middle | **Runbook oracles**: prose manual actions plus executable postcondition checkers | Interactive seams leave correct durable state. Early M0 checkers may inspect stores only; once handlers exist, prefer projection-including checks. | D11-L, D21-L; I8-L, I13-L; A10-L. | -| Middle | Round-trip tests | JSONL reload, linear transcript validation, elicitation exchange projection, compaction, graph export/import, command result serialization. | D6-L, D13-L, D24-L; I3-L, I8-L, I10-L, I15-L. | -| Middle | Property-based / model-based tests | LSN monotonicity, change-log replay, reconciliation-need invariants, mention staleness, interest-set recomputation, side-task delivery ordering. | A4-L, A8-L, A9-L, A11-L; I1-L, I4-L, I5-L, I6-L, I9-L, I12-L. | -| Middle | Contract tests | Named RPC method families and transport adapters share handler semantics; subscriptions deliver initial snapshot plus ordered updates; `CommandExecutor` hides policy/transaction details. | D5-L, D19-L, D20-L; R11, R12. | -| Middle | Architectural boundary tests | No direct ORM/SQLite mutation outside `CommandExecutor`; no canonical chat/turn store; TUI/RPC/fixture code does not write `brunch.session_binding`; Brunch wrappers do not expose Pi branch creation/navigation as product behavior. | D4-L, D6-L, D18-L, D21-L, D24-L; I2-L, I10-L, I11-L, I15-L. | -| Middle | Fixture replay and property assertions | Brief-driven sessions still produce structurally valid transcript/graph/coherence artifacts despite model drift. | A5-L, A6-L, A7-L; I7-L; R20. | -| Outer | Manual walkthrough with checklist | UX/presentation life: TUI chrome, spec selector, web shell feel, coherence visibility, elicitation usefulness. | A10-L; R4, R14, R16. | -| Outer | Adversarial / generative fixture probes | Elicitation quality, human-gated `needs_human`, contradictory requirements, cross-session updates, long-horizon compaction. | A5-L, A8-L, A9-L, A11-L; I4-L, I6-L, I12-L, I13-L. | +| Inner | Type-aware lint, type checks, fast unit tests | Local module correctness, typed command/result shapes (including `acceptReviewSet` and reviewer-writable record-class types), projection helper behavior (including `supersedes`-chain filtering). | D12-L, D13-L, D20-L, D21-L, D27-L, D28-L, D29-L. | +| Inner | Schema/shape validation at boundaries | JSON-RPC payloads, command results, structured elicitation entries, fixture metadata, graph exports, `brunch.review_set_proposal` / `brunch.establishment_offer` / `brunch.elicitor_intent_hint` custom-entry payloads (lens presence, `epistemic_status` presence, entity-draft shape). | R8, R10, R11, R17, R20, R21, R23; I3-L, I10-L, I11-L, I17-L, I18-L. | +| Middle | **Runbook oracles**: prose manual actions plus executable postcondition checkers | Interactive seams leave correct durable state. Early M0 checkers may inspect stores only; once handlers exist, prefer projection-including checks. Extends to in-flight reviewer-signal chrome behavior and ambient-affordance rendering from latest establishment-offer entry. | D11-L, D21-L, D25-L, D29-L; I8-L, I13-L; A10-L. | +| Middle | Round-trip tests | JSONL reload, linear transcript validation, elicitation exchange projection, compaction, graph export/import, command result serialization, `supersedes`-chain reconstruction across regeneration. | D6-L, D13-L, D24-L, D28-L; I3-L, I8-L, I10-L, I19-L. | +| Middle | Property-based / model-based tests | LSN monotonicity, change-log replay, reconciliation-need invariants, mention staleness, interest-set recomputation, side-task delivery ordering, **batch-acceptance atomicity (one LSN / one change-log entry, partial-batch impossible even under mid-batch validation failure)**, **`supersedes`-chain acyclicity and unique-leaf-per-thread**, **lens-routing correctness (generated elicitor entries route to the right consumer)**, **reviewer-finding turn-boundary delivery ordering**. | A4-L, A8-L, A9-L, A11-L; I1-L, I4-L, I5-L, I6-L, I9-L, I12-L, I15-L, I16-L, I18-L. | +| Middle | Contract tests | Named RPC method families and transport adapters share handler semantics; subscriptions deliver initial snapshot plus ordered updates; `CommandExecutor` hides policy/transaction details; `acceptReviewSet` returns expected structured discriminants. | D5-L, D19-L, D20-L, D27-L; R11, R12. | +| Middle | Architectural boundary tests | No direct ORM/SQLite mutation outside `CommandExecutor`; no canonical chat/turn store; TUI/RPC/fixture code does not write `brunch.session_binding`; Brunch wrappers do not expose Pi branch creation/navigation as product behavior; reviewer-attributed writes target only `reconciliation_need`. | D4-L, D6-L, D18-L, D21-L, D24-L, D29-L; I2-L, I10-L, I11-L, I16-L, I19-L. | +| Middle | **Differential testing** | Dry-run validation at proposal time matches real-run validation at acceptance time (no drift between modes); free-form-generation vs constrained-generation legality rates (informs whether fallback path is needed per A14-L). | D27-L; A14-L. | +| Middle | Fixture replay and property assertions | Brief-driven sessions still produce structurally valid transcript/graph/coherence artifacts despite model drift. For generative lenses: **structural-legality rate of LLM proposals tracked per-run in fixture metadata as POC-phase fitness, not a merge gate**; first-attempt vs retry-with-feedback rates surfaced for human review. | A5-L, A6-L, A7-L, A14-L; I7-L; R20, R21, R22, R23. | +| Outer | Manual walkthrough with checklist | UX/presentation life: TUI chrome, spec selector, web shell feel, coherence visibility, elicitation usefulness. Adds: ambient-affordance rendering from establishment-offer entries; proposal/framing quality review; lens-recommendation appropriateness; review-cycle UX (approve / request-changes / reject); meta-rubric comparative-usefulness review (D31-L hypothesis test). | A10-L, A17-L; R4, R14, R16, R20, R21. | +| Outer | Adversarial / generative fixture probes | Elicitation quality, human-gated `needs_human`, contradictory requirements, cross-session updates, long-horizon compaction, **reviewer-finding precision via small targeted set of briefs designed to produce *known* coherence problems** (POC-scope: 1–2 known-bad scenarios per relevant invariant, not exhaustive coverage). | A5-L, A8-L, A9-L, A11-L, A14-L; I4-L, I6-L, I12-L, I13-L, I16-L. | ### Runbook Oracle Design @@ -327,7 +371,11 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | I12-L | M7 side-task delivery invariant tests and adversarial fixture when side tasks are active. | | I13-L | M1 fixture/projection checks for idle linear-session leaf state. | | I14-L | M5 observer-job restart/idempotence tests. | -| I15-L | Brunch extension/runtime guard tests for `/tree`/`/fork`/`/clone` blocking plus transcript-reader non-linearity rejection tests. | +| I15-L | M5+ middle-loop property tests for batch-acceptance atomicity (one LSN / one change-log entry, partial-batch impossible under mid-batch validation failure) paired with `acceptReviewSet` contract tests; review-set fixture parity in replay. | +| I16-L | M5+ middle-loop architectural boundary test on reviewer-attributed `CommandExecutor` writers (rejects any non-`reconciliation_need` target); paired with reviewer-attributed command-result audit fixture. | +| I17-L | M5+ inner-loop schema validation on `brunch.review_set_proposal` entries (must declare `epistemic_status`); paired with outer-loop fixture assertion that status varies appropriately with grounding density (POC-phase fitness, not gate). | +| I18-L | M5+ inner-loop schema validation on elicitor-emitted custom entries (must declare `lens`); paired with middle-loop property test that generated entries route to the correct observer/reviewer consumer. | +| I19-L | Brunch extension/runtime guard tests for `/tree`/`/fork`/`/clone` blocking plus transcript-reader non-linearity rejection tests. | ### Design Notes @@ -345,6 +393,11 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | Subscription reconnect/resume | POC can prove snapshot + live update without hardening network recovery yet. | Contract tests for initial snapshot and ordered update sequence. | Web/RPC clients need robust reconnect semantics or long-running fixture runs expose drift. | | Performance and scale | Local POC graph/session sizes are small; premature budgets may distort design. | Keep exports/checkers text-native and simple; add budgets when slow tests appear. | `npm run verify` or fixture runs exceed acceptable local iteration time. | | Cross-platform terminal rendering | TUI chrome visuals may differ by terminal. | Test state derivation and keep manual smoke on primary dev environment. | Distribution target broadens or terminal rendering bugs recur. | +| Lens-recommendation appropriateness | No deterministic ground truth for "did the agent offer the right strategy at the right time" given temperament + grounding density inputs. | Brief-driven outer-loop walkthrough; small targeted scenarios where recommended lens is judged by reviewer; tracked as fitness, not gated. | Repeated user complaints that the offered strategies feel wrong, or fixture review reveals systematic mis-offers. | +| Framing/proposal quality at thin grounding | Generative-lens proposals may be syntactically legal but semantically weak when grounding is thin; `epistemic_status` honesty may not be enforceable without human judgment. | A14-L proposal-legality rate tracked as fitness; outer-loop walkthrough of proposals under thin vs rich grounding; `epistemic_status` distribution surfaced per run. | Acceptance-without-rework rates drop, or reviewers consistently mark proposals as `inferred`/`asserted` despite asserted grounding. | +| Reviewer finding precision (false positives/negatives) | Advisory-only reviewer can spam reconciliation needs (false positives) or miss real coherence gaps (false negatives); both erode trust. | Targeted adversarial briefs with known-bad coherence problems; precision/recall surfaced per run as fitness; user can dismiss reviewer findings without consequence. | Users systematically ignore reviewer findings, or coherence gaps slip past reviewer in known-bad fixtures. | +| In-flight reviewer-signal UX | Chrome rendering of "reviewer running / has findings" before next-turn delivery is not yet designed; cost may exceed value in POC. | Runbook oracle on chrome state after batch-accept; defer in-flight progress affordances unless a frontier explicitly demands them. | Users report confusion about whether reviewer ran or completed; or async job latency makes silence feel like failure. | +| Meta-rubric usefulness (D31-L) | Universal evaluative dimensions (complexity, lock-in, etc.) may or may not be productive across lens types; this is an unproven hypothesis. | Comparative outer-loop walkthrough: same proposal scenario with and without meta-rubric framing; user judgment captured in fixture metadata. | Meta-rubric framings are consistently ignored by users, or consistently produce better decisions — either signal warrants spec revision. | ### Acceptance Criteria From 734e85217c5973223dc55de898f2ab1060e681de Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 17:16:43 +0200 Subject: [PATCH 8/9] tighten lens review-set planning contracts Amp-Thread-ID: https://ampcode.com/threads/T-019e4b15-7745-7201-9383-57779c7738b8 Co-authored-by: Amp --- memory/PLAN.md | 22 +++++++++++----------- memory/SPEC.md | 15 +++++++++------ 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/memory/PLAN.md b/memory/PLAN.md index c814350d..3ac885a2 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -30,7 +30,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - `brief-library-curation` — Author and review briefs #4–#7 plus the adversarial second tier; can proceed independently once `walking-skeleton` exists. Briefs are text, no code dependency. - `fixture-strategy-evolution` — Iterate `fixture-strategy.md` (property invariants, brief expectations) as fixtures are captured. Doc-only. -- `pi-ui-extension-patterns` — Prove the Pi extension seams Brunch needs for lens/review-set UX: custom slash commands, styled persistent chrome (color/glyphs), modal/popover overlays, radio/checkbox/select prompts, clickable/navigable action buttons, picker/list-selection modals. Spike-shaped probe whose output is a feasibility matrix + minimum-viable wrappers that downstream frontiers (M5 lenses/review-sets, M6 authority gates, M7 turn-boundary delivery) can build on. Can run in parallel with `web-shell` and ahead of `graph-data-plane`. +- `pi-ui-extension-patterns` — Prove the Pi extension seams Brunch needs for lens/review-set UX: custom slash commands, styled persistent chrome (color/glyphs), modal/popover overlays, radio/checkbox/select prompts, clickable/navigable action buttons, picker/list-selection modals, and ambient establishment-offer rendering that stays orientation-first rather than becoming a default lens menu. Spike-shaped probe whose output is a feasibility matrix + minimum-viable wrappers that downstream frontiers (M5 lenses/review-sets, M6 authority gates, M7 turn-boundary delivery) can build on. Can run in parallel with `web-shell` and ahead of `graph-data-plane`. ### Horizon @@ -104,7 +104,7 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Why now / unlocks:** Proves D10-L. Unlocks parallel UI work and visualises graph + coherence state. Sequenced after M2 so the transcript substrate is pinned before clients depend on it. - **Acceptance:** Web client connects via WebSocket RPC, lists specs and workspace state through `session.*` / `workspace.*` projection handlers, renders a transcript and the persistent chrome region, and round-trips structured elicitation prompts/responses plus freeform user input through the same transcript conventions as TUI. - **Verification:** Inner gate plus WebSocket/handler contract tests. Middle — manual browser smoke paired with projection/query postconditions for `session.*` / `workspace.*`, linear transcript-policy guards, transcript rendering state, and structured elicitation round-trip. Outer — at least one fixture replays into the web renderer; qualitative UX remains manual checklist. -- **Cross-cutting obligations:** Preserve the single command/event substrate: the browser is a thin remote head over the same elicitation/transcript/session machinery, not a second data plane, REST-backed read client, generic read gateway, or custom interaction contract. Carry D24-L linear transcript policy forward before adding another session-consuming surface: block Brunch-controlled `/tree`/`/fork`/`/clone` branch flows where Pi hooks permit, and make transcript readers fail fast on non-linear JSONL rather than adapting it. +- **Cross-cutting obligations:** Preserve the single command/event substrate: the browser is a thin remote head over the same elicitation/transcript/session machinery, not a second data plane, REST-backed read client, generic read gateway, or custom interaction contract. Carry D24-L linear transcript policy forward before adding another session-consuming surface: block Brunch-controlled `/tree`/`/fork`/`/clone` branch flows where Pi hooks permit, and make transcript readers fail fast on non-linear JSONL rather than adapting it. If/when `brunch.establishment_offer` entries are present, browser chrome should project the latest offer as ambient orientation rather than inventing a browser-only strategy menu. - **Traceability:** R4, R8, R11, R12, R16, R17 / D5-L, D10-L, D12-L, D13-L, D19-L, D24-L / I19-L - **Design docs:** [prd.md §M3, §Frontend Architecture](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md) - **Current execution pointer:** first slice should harden the linear transcript policy (block Pi branch flows where hooks permit and make transcript readers reject non-linear JSONL) before adding the browser surface. @@ -129,11 +129,11 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** unassigned - **Kind:** structural - **Status:** not-started -- **Objective:** Brunch installs graph tools through pi's extension seams; agent graph operations, observer-extraction writes, reviewer-attributed advisory writes, and generative-lens batch acceptances all route exclusively through the Brunch-owned command layer; web, TUI, and agent all observe the same changes. -- **Acceptance:** Agent can create / update / link intent-plane nodes via Brunch tools that call the `CommandExecutor`; an observer job can process a projected elicitation exchange and either write high-confidence graph changes or surface low-confidence suggestions/reconciliation work through the same executor; a reviewer job can process an accepted review set and surface advisory `reconciliation_need` findings (only) via the same executor; the `acceptReviewSet` command commits a generative-lens batch atomically as one LSN and one change-log entry; an architectural test or lint rule prevents direct DB access, caller-side authority bypass outside the command layer, and reviewer-attributed writes to anything other than `reconciliation_need`; the same change observed across TUI and (if M3 lands) web client; if the registry lands here, side-task-attributed writes follow the same command-executor path. -- **Verification:** Inner — verify gate plus graph-tool/observer/reviewer command shape tests, proposal-entry schema validation (`brunch.review_set_proposal` must declare `epistemic_status`), elicitor-emitted-entry schema validation (must declare `lens`). Middle — `CommandExecutor` contract tests including `acceptReviewSet` discriminants, direct-DB no-bypass checks, observer-job idempotence/restart tests keyed by exchange range, reviewer-job restart/idempotence tests keyed by batch-acceptance entry id, reviewer-write-target architectural boundary test (rejects non-`reconciliation_need` targets), `acceptReviewSet` batch-atomicity property tests (one LSN / one change-log entry; partial-batch impossible under mid-batch validation failure), `supersedes`-chain acyclicity property tests, lens-routing correctness property tests, differential test comparing dry-run validation at proposal time vs real-run validation at acceptance, and cross-surface projection checks. Outer — kernel-card-output coverage assertions begin landing per brief; first generative-lens fixture (e.g. `propose-scenarios-with-tradeoffs`) replays through review cycle + acceptance; A14-L proposal structural-legality rate captured in fixture metadata as POC-phase fitness (not merge gate); 1–2 known-bad coherence-problem briefs exercise reviewer precision; side-task / observer / reviewer-attributed writes remain indistinguishable from other writes at the command-layer boundary except for attribution and reviewer's narrow target. -- **Cross-cutting obligations:** Preserve the single-authority mutation rule for primary-agent, observer, reviewer, side-task, and batch-acceptance flows by making the `CommandExecutor` the only mutation entry; observer and reviewer jobs are durable operational queue entries keyed to transcript anchors, not a revived chat/turn store or privileged write path for background work; reviewer is advisory and writes only to `reconciliation_need`; lens metadata on elicitor-emitted entries routes observer vs reviewer consumption. -- **Traceability:** R10, R13, R17, R21, R22, R23 / D4-L, D13-L, D15-L, D18-L, D20-L, D25-L, D26-L, D27-L, D28-L, D29-L / I2-L, I11-L, I14-L, I15-L, I16-L, I17-L, I18-L / A3-L, A11-L, A13-L, A14-L, A16-L +- **Objective:** Brunch installs graph tools through pi's extension seams; agent graph operations, observer-extraction writes, reviewer-attributed advisory writes, generative-lens batch acceptances, and the transcript-native establishment/intent-hint surfaces all route exclusively through the Brunch-owned command layer and shared event substrate; web, TUI, and agent all observe the same changes. +- **Acceptance:** Agent can create / update / link intent-plane nodes via Brunch tools that call the `CommandExecutor`; elicitor turns emit `brunch.establishment_offer` and `brunch.elicitor_intent_hint` entries with the lens/routing metadata needed by downstream consumers; generative-lens proposals carry explicit grounding-bundle coverage plus `epistemic_status`, and only dry-run-valid proposals surface as reviewable review sets; an observer job can process a projected elicitation exchange and either write high-confidence graph changes or surface low-confidence suggestions/reconciliation work through the same executor; a reviewer job can process an accepted review set and surface advisory `reconciliation_need` findings (only) via the same executor; the `acceptReviewSet` command commits a generative-lens batch atomically as one LSN and one change-log entry; the initial POC reviewer trigger/scope policy is recorded in implementation docs/tests rather than left implicit; an architectural test or lint rule prevents direct DB access, caller-side authority bypass outside the command layer, and reviewer-attributed writes to anything other than `reconciliation_need`; the same change observed across TUI and (if M3 lands) web client; if the registry lands here, side-task-attributed writes follow the same command-executor path. +- **Verification:** Inner — verify gate plus graph-tool/observer/reviewer command shape tests, proposal-entry schema validation (`brunch.review_set_proposal` must declare `epistemic_status` and grounding coverage), establishment-offer / elicitor-intent-hint schema validation (must declare `lens`), and projection-helper tests for latest-offer lookup. Middle — `CommandExecutor` contract tests including `acceptReviewSet` discriminants and the rule that only dry-run-valid proposals become reviewable review sets, direct-DB no-bypass checks, observer-job idempotence/restart tests keyed by exchange range, reviewer-job restart/idempotence tests keyed by batch-acceptance entry id, reviewer-write-target architectural boundary test (rejects non-`reconciliation_need` targets), `acceptReviewSet` batch-atomicity property tests (one LSN / one change-log entry; partial-batch impossible under mid-batch validation failure), `supersedes`-chain acyclicity property tests, lens-routing correctness property tests, differential test comparing dry-run validation at proposal time vs real-run validation at acceptance, and cross-surface projection checks. Outer — kernel-card-output coverage assertions begin landing per brief; first generative-lens fixture (e.g. `propose-scenarios-with-tradeoffs`) replays through review cycle + acceptance; A14-L proposal structural-legality rate captured in fixture metadata as POC-phase fitness (not merge gate); 1–2 known-bad coherence-problem briefs exercise reviewer precision; side-task / observer / reviewer-attributed writes remain indistinguishable from other writes at the command-layer boundary except for attribution and reviewer's narrow target. +- **Cross-cutting obligations:** Preserve the single-authority mutation rule for primary-agent, observer, reviewer, side-task, and batch-acceptance flows by making the `CommandExecutor` the only mutation entry; observer and reviewer jobs are durable operational queue entries keyed to transcript anchors, not a revived chat/turn store or privileged write path for background work; reviewer is advisory and writes only to `reconciliation_need`; lens metadata on elicitor-emitted entries routes observer vs reviewer consumption; establishment offers remain orientation artifacts for chrome/web surfaces rather than a default exhaustive lens picker. +- **Traceability:** R10, R13, R17, R21, R22, R23 / D4-L, D13-L, D15-L, D18-L, D20-L, D25-L, D26-L, D27-L, D28-L, D29-L, D30-L, D32-L / I2-L, I11-L, I14-L, I15-L, I16-L, I17-L, I18-L, I20-L / A3-L, A11-L, A13-L, A14-L, A16-L - **Design docs:** [prd.md §M5, §Authority Model](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/prd.md), [pi-seam-extensions.md §1 Async side-chain sub-agents](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md#1-async-side-chain-sub-agents), [ELICITATION_LENSES.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/ELICITATION_LENSES.md), [REVIEW_SETS.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/REVIEW_SETS.md) ### authority-model @@ -219,12 +219,12 @@ Brunch-next is starting from a deliberately razed slate on the `next` branch (ta - **Linear:** unassigned - **Kind:** structural (spike-flavored) - **Status:** not-started -- **Objective:** Demonstrate that Pi's extension seams can host the UI affordances Brunch needs for elicitation-lens and review-set flows without forking Pi or building a parallel rendering substrate. Catalog and prototype: custom slash commands routed through Brunch handlers; persistent chrome with TUI styling/color/glyphs beyond the current minimal status line; modal/popover overlays for proposal review; radio/checkbox/select prompts for multi-choice answers and lens-selection offers; clickable/navigable action buttons for accept/request-changes/reject affordances; picker/list-selection modals for spec switching and entity selection. The output is a feasibility matrix mapping each affordance to (a) the Pi seam(s) used, (b) Brunch-owned wrapper code required, (c) controllability cost for the agent-as-user driver, and (d) residual risks — plus minimum-viable wrappers that later frontiers can call directly. -- **Acceptance:** A short design memo (`docs/architecture/pi-ui-extension-patterns.md` or section in `pi-seam-extensions.md`) catalogs the affordance matrix with verdicts (`proven` / `feasible-with-cost` / `requires-pi-change` / `not-feasible`); a runnable demo wires at least one representative of each viable category through Brunch's TUI host (custom slash command, styled chrome element, modal/popover, multi-choice prompt, action button, picker modal); the agent-as-user driver can controllably exercise the multi-choice and action-button affordances (informs the controllability/cost answer in `D27-L` and reviewer-flow oracle design); the matrix explicitly records which affordances are unviable so downstream UX design does not assume them; SPEC.md and PLAN.md links to the memo are added where M5/M6/M7 verification depends on a charted affordance. +- **Objective:** Demonstrate that Pi's extension seams can host the UI affordances Brunch needs for elicitation-lens and review-set flows without forking Pi or building a parallel rendering substrate. Catalog and prototype: custom slash commands routed through Brunch handlers; persistent chrome with TUI styling/color/glyphs beyond the current minimal status line; modal/popover overlays for proposal review; radio/checkbox/select prompts for multi-choice answers and user-invoked orientation/selection affordances; clickable/navigable action buttons for accept/request-changes/reject affordances; picker/list-selection modals for spec switching and entity selection; ambient rendering of the latest `brunch.establishment_offer`. The output is a feasibility matrix mapping each affordance to (a) the Pi seam(s) used, (b) Brunch-owned wrapper code required, (c) controllability cost for the agent-as-user driver, and (d) residual risks — plus minimum-viable wrappers that later frontiers can call directly. +- **Acceptance:** A short design memo (`docs/architecture/pi-ui-extension-patterns.md` or section in `pi-seam-extensions.md`) catalogs the affordance matrix with verdicts (`proven` / `feasible-with-cost` / `requires-pi-change` / `not-feasible`); the matrix distinguishes ambient establishment-offer rendering from any user-invoked orientation view and records that Brunch is not building a default exhaustive lens menu; a runnable demo wires at least one representative of each viable category through Brunch's TUI host (custom slash command, styled chrome element, modal/popover, multi-choice prompt, action button, picker modal, establishment-offer chrome rendering); the agent-as-user driver can controllably exercise the multi-choice and action-button affordances (informs the controllability/cost answer in `D27-L` and reviewer-flow oracle design); the matrix explicitly records which affordances are unviable so downstream UX design does not assume them; SPEC.md and PLAN.md links to the memo are added where M5/M6/M7 verification depends on a charted affordance. - **Verification:** Inner — verify gate plus unit tests for any extension wrappers added. Middle — runbook oracles per affordance category (manual checklist + executable postcondition checker on chrome state, JSONL custom entries emitted, or command-result discriminants); contract tests for any new Brunch handler shape introduced (slash command router, modal request/response, picker selection). Outer — manual TUI walkthrough validating visual quality and interaction feel; comparative walkthrough between scripted-driver and manual paths to record controllability cost. -- **Cross-cutting obligations:** Preserve the linear-transcript invariant (`I19-L`) — affordance prototypes must not introduce branch creation, mid-turn state mutations outside the command layer, or a parallel chat/turn store. Multi-choice affordances must integrate with the existing capture-aware offer envelope (`pi-seam-extensions.md §4`) and the structured elicitation-entry shape. Slash commands and action buttons must route writes through the `CommandExecutor`. Any new custom-entry kinds must declare `lens` per `I18-L` if elicitor-emitted. +- **Cross-cutting obligations:** Preserve the linear-transcript invariant (`I19-L`) — affordance prototypes must not introduce branch creation, mid-turn state mutations outside the command layer, or a parallel chat/turn store. Multi-choice affordances must integrate with the existing capture-aware offer envelope (`pi-seam-extensions.md §4`) and the structured elicitation-entry shape. Slash commands and action buttons must route writes through the `CommandExecutor`. Any new custom-entry kinds must declare `lens` per `I18-L` if elicitor-emitted. Establishment-offer affordances must stay orientation-first and user-invoked when expanded, rather than turning the full offer tree into a default next-action menu. - **Why now / unlocks:** Lens/review-set/reviewer UX in M5 and authority gating in M6 both assume Brunch can render rich interactive affordances over Pi without forking it. Proving the affordance set early de-risks those frontiers and lets the agent-as-user-driver extension question (controllability vs cost trade flagged in `ln-oracles` pass) be answered with evidence rather than estimation. Can run in parallel with `web-shell` (M3) because TUI seams are independent of the web transport. -- **Traceability:** R4, R14, R16, R20, R21 / D2-L, D11-L, D24-L, D25-L, D26-L, D27-L, D29-L / I18-L, I19-L / A10-L, A14-L, A17-L +- **Traceability:** R4, R14, R16, R20, R21 / D2-L, D11-L, D24-L, D25-L, D26-L, D27-L, D29-L, D32-L / I18-L, I19-L / A10-L, A14-L, A17-L - **Design docs:** [pi-seam-extensions.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/architecture/pi-seam-extensions.md), [ELICITATION_LENSES.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/ELICITATION_LENSES.md), [REVIEW_SETS.md](file:///Users/lunelson/Code/hashintel/brunch-next/docs/design/REVIEW_SETS.md); new memo to be created during the spike. ### flue-pattern-adoption diff --git a/memory/SPEC.md b/memory/SPEC.md index 3fef4ca2..efb08f90 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -124,7 +124,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D4-L — One shared mutation surface owns graph truth.** Every semantic graph mutation routes through Brunch-owned typed command handlers responsible for validation, structural legality, optimistic concurrency, event emission, audit attribution, and coherence triggering. Agents and adapters must not touch the ORM or SQLite directly. Depends on: A3-L. Supersedes: —. - **D20-L — Command execution owns the pre-M6 authority seam.** Callers submit product commands to a Brunch `CommandExecutor` and receive a structured result; they do not call a standalone authority service or graph persistence directly. The executor is the public mutation boundary that hides attribution, optimistic concurrency, structural validation, the minimal pre-M6 policy classifier, transaction execution, LSN allocation, change-log append, and coherence-trigger hooks. Before M6, the policy logic may be deliberately small, but the result shape must already include `needs_human`, `policy_blocked`, `version_conflict`, and `structural_illegal` so early RPC, print, agent-tool, observer-job, and side-task code cannot bake in permissive mode-specific shortcuts. Depends on: D4-L, D16-L. Supersedes: the separate optional `AuthorityGate` / generic policy-service mental model. -- **D27-L — Generative-lens proposals are structured entity-draft payloads; batch acceptance is one atomic `CommandExecutor` call.** The elicitor's proposal custom entry (`brunch.review_set_proposal`) contains the graph entities and edges that *would* be created on acceptance, in a form `CommandExecutor` can dry-run-validate at proposal time so `structural_illegal` / `policy_blocked` discriminants surface before the user reviews. Acceptance is one `acceptReviewSet` command that consumes one LSN, writes the entire batch in one transaction, appends one change-log entry attributed to the user, triggers coherence updates, and enqueues the reviewer job. "Accept with edits" does not exist as a primitive: the cycle is approve / request changes (triggers regeneration of a successor proposal) / reject. Depends on: A14-L, D4-L, D20-L, D26-L. Supersedes: any caller-side multi-step "patch then commit" mental model. +- **D27-L — Generative-lens proposals are structured entity-draft payloads; batch acceptance is one atomic `CommandExecutor` call.** The elicitor's proposal custom entry (`brunch.review_set_proposal`) contains the graph entities and edges that *would* be created on acceptance, in a form `CommandExecutor` can dry-run-validate at proposal time so `structural_illegal` / `policy_blocked` discriminants surface before the user reviews. Only proposals that pass this dry-run validation are surfaced as user-reviewable review sets; invalid generations stay internal to retry/regeneration paths rather than becoming review UI state. Acceptance is one `acceptReviewSet` command that consumes one LSN, writes the entire batch in one transaction, appends one change-log entry attributed to the user, triggers coherence updates, and enqueues the reviewer job. "Accept with edits" does not exist as a primitive: the cycle is approve / request changes (triggers regeneration of a successor proposal) / reject. Depends on: A14-L, D4-L, D20-L, D26-L. Supersedes: any caller-side multi-step "patch then commit" mental model. #### Transport & client @@ -154,7 +154,8 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D14-L — `#`-mentions are ID-anchored, with a session-scoped mention ledger.** Autocomplete may resolve by title but insertion always rewrites to ID-anchored. Per-session `(entity_id, snapshotted_lsn)` ledger drives discretionary `brunch.mention_staleness_hint` entries in `prepareNextTurn`. Depends on: A9-L, I4-L. Supersedes: —. - **D25-L — Elicitation strategies are *lenses* within the `elicitor` agent-mode, not separate agent-modes.** Lens is metadata on elicitor-emitted custom transcript entries (`brunch.elicitor_intent_hint`, `brunch.establishment_offer`, `brunch.review_set_proposal`, etc.); agent-modes (`elicitor`, `observer`, `reviewer`, `reconciler`) remain orthogonal. The known starter lens set is `step-by-step`, `disambiguate-via-examples`, `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, and `project-requirements-from-upstream`; the catalogue is expected to grow. Observer-job and reviewer-job routing filters on lens. Depends on: D12-L, D17-L, D23-L. Supersedes: collapsing strategy and agent-mode into one vocabulary axis. - **D26-L — Lenses split into *extractive* and *generative* families by capture mechanism.** Extractive lenses produce single-exchange interactions whose implicit content is captured by the `observer` agent-mode post-exchange (e.g. `step-by-step`, `disambiguate-via-examples`). Generative lenses produce batch proposals whose entity-draft payloads are captured by the elicitor *at proposal time*, with the `reviewer` agent-mode running advisory analysis post-acceptance (e.g. `propose-scenarios-with-tradeoffs`, `propose-design-shapes`, `propose-oracle-ensembles`, `project-requirements-from-upstream`). The family distinction is durable; the specific lens list is expected to evolve. Depends on: D18-L, D25-L. Supersedes: a single uniform "agent asks questions" mental model. -- **D30-L — Grounding is a precondition gate for generative-lens output, with epistemic-status signaling honestly tracking grounding density; lenses themselves are always available.** A minimum grounding bundle — *domain anchor*, *protagonist anchor*, *pain/pull anchor*, *constraint anchor* — must be established before generative lenses produce non-speculative output. Generative-lens proposals declare `epistemic_status` (`inferred | assumed | asserted | observed`) consistent with grounding density at proposal time; UI renderings reflect this status so low-status proposals *feel* speculative (visible hedging, lower visual weight, explicit "speculative — based on N anchors so far" footers). The lens is never refused: the agent always produces *some form* of what was asked for, but its output resolution and epistemic load honestly reflect what grounding supports. Rendering mode scales with density: empty/thin → framing proposals (Shape Up pitches); moderate → scenario sketches; rich → completion proposals; mature → refactor proposals. Depends on: D26-L. Supersedes: gating-by-refusal as a UX move. +- **D30-L — Grounding is a precondition gate for generative-lens output, with epistemic-status signaling honestly tracking grounding density; lenses themselves are always available.** A minimum grounding bundle — *domain anchor*, *protagonist anchor*, *pain/pull anchor*, *constraint anchor* — must be established before generative lenses produce non-speculative output. Generative-lens proposals declare `epistemic_status` (`inferred | assumed | asserted | observed`) consistent with grounding density at proposal time, and proposal/offer payloads carry explicit grounding-bundle coverage for those four anchors so UI copy, fixture assertions, and reviewer/debug tooling can justify that status rather than infer it from free text. UI renderings reflect this status so low-status proposals *feel* speculative (visible hedging, lower visual weight, explicit "speculative — based on N anchors so far" footers). The lens is never refused: the agent always produces *some form* of what was asked for, but its output resolution and epistemic load honestly reflect what grounding supports. Rendering mode scales with density: empty/thin → framing proposals (Shape Up pitches); moderate → scenario sketches; rich → completion proposals; mature → refactor proposals. Depends on: D26-L. Supersedes: gating-by-refusal as a UX move. +- **D32-L — Establishment offers are orientation artifacts, not a default next-action menu.** `brunch.establishment_offer` records the agent's current offer tree and recommended next move as durable transcript state. Ambient chrome or web affordances may render the latest offer, and Brunch may expose a user-invoked orientation view summarizing what is established vs open, but Brunch does not surface an exhaustive lens/offer chooser by default; the agent still owns next-move selection unless the user explicitly asks to inspect alternatives. Depends on: D25-L, D30-L, A15-L. Supersedes: UI interpretations that turn establishment offers into a persistent strategy menu. - **D31-L — A four-axis meta-rubric is a soft heuristic for fan-out comparison rubrics across all three flows; not architecturally enforced.** When generating comparison rubrics for fan-out alternatives across candidate-spec, technical-design, and verification-design flows, the elicitor attempts to express each axis in terms of (*legibility / cost-of-knowing*, *failure modes*, *coverage / range*, *commitment*). Project-specific axes are allowed alongside; the meta-frame is dropped when it doesn't fit. The hypothesis (uniform comparison UI across all three flows) is testable via fixture comparison; promote to schema/UI only if it holds up. Depends on: D25-L, D26-L. Supersedes: a hardcoded per-flow rubric. ### Critical Invariants @@ -177,9 +178,10 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I14-L | Observer jobs are keyed by session id plus elicitation-exchange entry-range ids and have durable status; replay/restart cannot enqueue duplicate observer jobs for the same exchange. | planned (M5 observer queue tests) | D18-L, D4-L | | I15-L | Every review-set acceptance routes through `CommandExecutor` as one atomic `acceptReviewSet` command producing one LSN, one change-log entry, and one transaction over the entire batch. Partial acceptance is not representable through any product API. | planned (M5+ batch-acceptance command tests; review-set fixture parity) | D20-L, D27-L; I1-L, I11-L | | I16-L | Reviewer-attributed writes target only the `reconciliation_need` substrate; no reviewer-attributed `CommandExecutor` call writes graph entities, edges, change-log entries directly, or any other record class. | planned (M5+ architectural test on reviewer command writers; reviewer-attributed command-result audit) | D29-L; I2-L, I11-L | -| I17-L | Every generative-lens proposal entry (`brunch.review_set_proposal`) declares an `epistemic_status` (`inferred | assumed | asserted | observed`) consistent with grounding-bundle coverage at proposal time; UI renderings honor this status as a presentation contract. | planned (M5+ proposal-entry schema test; fixture asserts status under thin and rich grounding) | D30-L; A14-L | +| I17-L | Every generative-lens proposal entry (`brunch.review_set_proposal`) declares an `epistemic_status` (`inferred | assumed | asserted | observed`) and explicit grounding-bundle coverage for the four grounding anchors, with the status consistent with that coverage at proposal time; UI renderings honor this status as a presentation contract. | planned (M5+ proposal-entry schema test; fixture asserts status under thin and rich grounding) | D30-L; A14-L | | I18-L | Every elicitor-emitted prompt or proposal custom entry (`brunch.elicitor_intent_hint`, `brunch.establishment_offer`, `brunch.review_set_proposal`) carries a `lens` field; observer-job and reviewer-job routing filters on this field. | planned (M5+ observer/reviewer routing tests; transcript-shape contract test) | D25-L, D26-L, D29-L | | I19-L | Brunch-controlled flows do not create or navigate Pi session branches, and Brunch transcript readers fail fast on non-linear JSONL rather than flattening, migrating, or branch-selecting. | planned (linear transcript policy guard tests before/within M3 web-shell) | D24-L, D6-L, D11-L, D13-L | +| I20-L | Every user-reviewable generative-lens proposal has already passed proposal-time dry-run structural/policy validation against `CommandExecutor`; proposals that fail dry-run validation do not surface as reviewable review sets. | planned (M5+ proposal-validation contract + differential tests) | D27-L; A14-L | ## Future Direction Register @@ -266,7 +268,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | **Generative lens** | A lens producing batch proposals (structured entity-draft payloads in `brunch.review_set_proposal` entries); proposals are captured by the elicitor at proposal time, with the `reviewer` agent-mode running advisory analysis post-acceptance. Higher cognitive load per move; large graph mutations on acceptance. | | **Grounding bundle** | The minimum set of session-level anchors required before generative lenses produce non-speculative output: a *domain anchor*, a *protagonist anchor*, a *pain/pull anchor*, and a *constraint anchor*. Captured technical constraints land in the constraint anchor and bound subsequent technical-design fan-outs. | | **Grounding anchor** | One sentence-scale fact captured during early elicitation that contributes to the grounding bundle. | -| **Establishment offer** | A `brunch.establishment_offer` custom transcript entry summarising the elicitor's perceived gaps, the available lens strategies for the next move, the recommended lens, and the agent's confidence. Source of ambient affordances rendered in the chrome region; inspectable post-hoc and fixture-able. | +| **Establishment offer** | A `brunch.establishment_offer` custom transcript entry summarising the elicitor's perceived gaps, the available lens strategies for the next move, the recommended lens, and the agent's confidence. Source of ambient affordances rendered in the chrome region; inspectable post-hoc and fixture-able. Orientation artifact, not a default exhaustive strategy menu. | | **Elicitor intent hint** | A `brunch.elicitor_intent_hint` custom transcript entry emitted alongside a prompt or proposal, declaring `lens` and semantic targets (e.g. expected ontological sub-type) for downstream observer/reviewer routing and extraction guidance. | | **Review set** | A batch proposal generated by a generative lens, presented to the user for review-cycle acceptance (approve / request changes / reject), modeled on the GitHub PR-review-cycle. | | **Batch acceptance** | The single `CommandExecutor` call (`acceptReviewSet`) that commits an entire review set atomically as one LSN and one change-log entry, attributed to the user. The only mutation a generative-lens acceptance produces. | @@ -331,11 +333,11 @@ Infrastructure is not yet fully laid (Phase 3 of POC bootstrapping). Commands fo | Loop | Oracle family | Proves | Primary claims | | --- | --- | --- | --- | | Inner | Type-aware lint, type checks, fast unit tests | Local module correctness, typed command/result shapes (including `acceptReviewSet` and reviewer-writable record-class types), projection helper behavior (including `supersedes`-chain filtering). | D12-L, D13-L, D20-L, D21-L, D27-L, D28-L, D29-L. | -| Inner | Schema/shape validation at boundaries | JSON-RPC payloads, command results, structured elicitation entries, fixture metadata, graph exports, `brunch.review_set_proposal` / `brunch.establishment_offer` / `brunch.elicitor_intent_hint` custom-entry payloads (lens presence, `epistemic_status` presence, entity-draft shape). | R8, R10, R11, R17, R20, R21, R23; I3-L, I10-L, I11-L, I17-L, I18-L. | +| Inner | Schema/shape validation at boundaries | JSON-RPC payloads, command results, structured elicitation entries, fixture metadata, graph exports, `brunch.review_set_proposal` / `brunch.establishment_offer` / `brunch.elicitor_intent_hint` custom-entry payloads (lens presence, `epistemic_status`, grounding coverage, entity-draft shape). | R8, R10, R11, R17, R20, R21, R23; I3-L, I10-L, I11-L, I17-L, I18-L. | | Middle | **Runbook oracles**: prose manual actions plus executable postcondition checkers | Interactive seams leave correct durable state. Early M0 checkers may inspect stores only; once handlers exist, prefer projection-including checks. Extends to in-flight reviewer-signal chrome behavior and ambient-affordance rendering from latest establishment-offer entry. | D11-L, D21-L, D25-L, D29-L; I8-L, I13-L; A10-L. | | Middle | Round-trip tests | JSONL reload, linear transcript validation, elicitation exchange projection, compaction, graph export/import, command result serialization, `supersedes`-chain reconstruction across regeneration. | D6-L, D13-L, D24-L, D28-L; I3-L, I8-L, I10-L, I19-L. | | Middle | Property-based / model-based tests | LSN monotonicity, change-log replay, reconciliation-need invariants, mention staleness, interest-set recomputation, side-task delivery ordering, **batch-acceptance atomicity (one LSN / one change-log entry, partial-batch impossible even under mid-batch validation failure)**, **`supersedes`-chain acyclicity and unique-leaf-per-thread**, **lens-routing correctness (generated elicitor entries route to the right consumer)**, **reviewer-finding turn-boundary delivery ordering**. | A4-L, A8-L, A9-L, A11-L; I1-L, I4-L, I5-L, I6-L, I9-L, I12-L, I15-L, I16-L, I18-L. | -| Middle | Contract tests | Named RPC method families and transport adapters share handler semantics; subscriptions deliver initial snapshot plus ordered updates; `CommandExecutor` hides policy/transaction details; `acceptReviewSet` returns expected structured discriminants. | D5-L, D19-L, D20-L, D27-L; R11, R12. | +| Middle | Contract tests | Named RPC method families and transport adapters share handler semantics; subscriptions deliver initial snapshot plus ordered updates; `CommandExecutor` hides policy/transaction details; `acceptReviewSet` returns expected structured discriminants; only prevalidated proposals become reviewable review sets. | D5-L, D19-L, D20-L, D27-L; R11, R12. | | Middle | Architectural boundary tests | No direct ORM/SQLite mutation outside `CommandExecutor`; no canonical chat/turn store; TUI/RPC/fixture code does not write `brunch.session_binding`; Brunch wrappers do not expose Pi branch creation/navigation as product behavior; reviewer-attributed writes target only `reconciliation_need`. | D4-L, D6-L, D18-L, D21-L, D24-L, D29-L; I2-L, I10-L, I11-L, I16-L, I19-L. | | Middle | **Differential testing** | Dry-run validation at proposal time matches real-run validation at acceptance time (no drift between modes); free-form-generation vs constrained-generation legality rates (informs whether fallback path is needed per A14-L). | D27-L; A14-L. | | Middle | Fixture replay and property assertions | Brief-driven sessions still produce structurally valid transcript/graph/coherence artifacts despite model drift. For generative lenses: **structural-legality rate of LLM proposals tracked per-run in fixture metadata as POC-phase fitness, not a merge gate**; first-attempt vs retry-with-feedback rates surfaced for human review. | A5-L, A6-L, A7-L, A14-L; I7-L; R20, R21, R22, R23. | @@ -376,6 +378,7 @@ The first required runbook is M0: after manual TUI interaction, a checker proves | I17-L | M5+ inner-loop schema validation on `brunch.review_set_proposal` entries (must declare `epistemic_status`); paired with outer-loop fixture assertion that status varies appropriately with grounding density (POC-phase fitness, not gate). | | I18-L | M5+ inner-loop schema validation on elicitor-emitted custom entries (must declare `lens`); paired with middle-loop property test that generated entries route to the correct observer/reviewer consumer. | | I19-L | Brunch extension/runtime guard tests for `/tree`/`/fork`/`/clone` blocking plus transcript-reader non-linearity rejection tests. | +| I20-L | M5+ proposal-validation contract and differential tests proving only dry-run-valid proposals become reviewable review sets. | ### Design Notes From c6c33eec726258fc4daa92c209ca6313c7429766 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Thu, 21 May 2026 17:20:06 +0200 Subject: [PATCH 9/9] draft a new ln-witness skill --- .agents/skills/ln-witness/SKILL.md | 139 ++++++++++++++++++ .../ln-witness/assets/witness-rubric.md | 121 +++++++++++++++ docs/praxis/ln-skills.md | 5 + 3 files changed, 265 insertions(+) create mode 100644 .agents/skills/ln-witness/SKILL.md create mode 100644 .agents/skills/ln-witness/assets/witness-rubric.md diff --git a/.agents/skills/ln-witness/SKILL.md b/.agents/skills/ln-witness/SKILL.md new file mode 100644 index 00000000..d38126bb --- /dev/null +++ b/.agents/skills/ln-witness/SKILL.md @@ -0,0 +1,139 @@ +--- +name: ln-witness +description: "Audit a test suite for what it actually proves — attribute tests to behavioral kernels, place each on the progressive-checkability ladder, surface unwitnessed proof obligations, and generate contrastive rivals the tests fail to rule out. Use when a slice has tests but verification confidence is unclear, when tests pass but the spec feels under-witnessed, or when the user asks what these tests prove." +argument-hint: "[test files, directory, or frontier item to audit; include relevant invariants or kernels if known]" +--- + +# Ln Witness + +Audit a test suite for evidentiary strength, not line coverage. + +A passing test is not yet a witness. A witness is a test that exercises a named claim under the mutations its kernel cares about, sits on a known rung of the progressive-checkability ladder, and rules out at least one plausible rival interpretation. `ln-witness` makes those three properties explicit and surfaces where they are missing. + +Sibling skills: +- [`ln-oracles`](../ln-oracles/SKILL.md) chooses verification strategy *before* tests exist; `ln-witness` audits what the resulting tests actually prove. +- [`ln-design`](../ln-design/SKILL.md) forces rival module shapes into view before commitment; `ln-witness` forces rival *interpretations* into view to expose witness gaps. Same epistemic stance, different artifact. +- [`ln-review`](../ln-review/SKILL.md) audits code shape; `ln-witness` audits test evidence. + +Do not create standalone audit documents without explicit permission. Findings reconcile back into `memory/SPEC.md` (invariants, blind spots), `memory/PLAN.md` (frontier verification notes), or the active scope card. + +Read the [witness rubric](assets/witness-rubric.md) before starting. It defines the progressive-checkability ladder and the proof obligations per behavioral kernel. + +## Input + +What to audit: $ARGUMENTS + +Read in this order: +1. `memory/SPEC.md` — invariants, acceptance criteria, verification design, and any §Active Kernels notes. +2. `memory/PLAN.md` — frontier definitions whose verification notes reference the tests under audit. +3. `docs/design/BEHAVIORAL_KERNELS.md` — the kernel taxonomy, proof obligations, and example test shapes the audit will measure against. +4. The test files themselves and the code under test. + +If no intent graph or kernel annotation exists yet, the audit drops to heuristic mode: infer active kernels from test names, assertions, and code under test, and flag this softness in the report. + +## Procedure + +This is an **interactive process**. Present each step's findings and grill the user before moving on. Do not produce a finished audit in one pass. + +### 1. Identify active kernels + +For the tests in scope, name which of the fifteen behavioral kernels appear to be exercised. Use the signal-phrase routing table from `BEHAVIORAL_KERNELS.md` against test names, assertion targets, and the code under test. Keep the active set small (typically two to four kernels per scope). + +**Grill**: Are these the kernels the spec considers active for this scope, or has the test suite drifted toward a different set than the spec intends? If the spec is silent on active kernels, is this an opportunity to promote them into `memory/SPEC.md`? + +### 2. Audit mode — attribute and rate + +For each test in scope, fill three columns: + +| Column | Question | +| --- | --- | +| **Kernel** | Which kernel(s) does this test probe? `none` is a valid answer (incidental regression catcher). | +| **Witnessed claim** | Which `memory/SPEC.md` invariant, criterion, or example item does this test witness? `none` is valid but should be flagged. | +| **Ladder rung** | Where on the [progressive-checkability ladder](assets/witness-rubric.md) does this test sit? (positive example → counterexample → regression → property → runtime contract → state-machine rule → invariant → proof obligation) | + +Present the table to the user. Tests with `kernel: none` or `witnessed claim: none` are either incidental or symptoms of spec gaps; both deserve a sentence of justification. + +**Grill**: For tests stuck at the "positive example" rung, ask: is this the appropriate strength, or should it be promoted to a property/contract? For tests with no witnessed claim, ask: should the claim be added to `memory/SPEC.md`, or is this test load-bearing only as a regression net? + +### 3. Audit mode — unwitnessed obligations + +For each active kernel, list its canonical proof obligations from the [rubric](assets/witness-rubric.md) and mark which are covered by at least one test at the appropriate rung. Examples: + +- Containment & topology: `add / move / delete / reorder preserves topology` — four obligations. +- State & lifecycle: `every state reachable / every transition exercised / terminal states are sinks / forbidden transitions rejected` — four obligations. +- Authority & capability: `permitted action succeeds / forbidden action rejected / delegated capability flows / revocation propagates` — four obligations. + +A kernel with three of four obligations covered is an honest report; a kernel with one of four covered is a finding. + +**Grill**: For each gap, ask: is this an acceptable deferral (cost exceeds value, deferred to outer loop, not currently in scope) or a real blind spot? What would trigger writing the missing test? + +### 4. Rivalry mode — contrastive alternatives + +This is where `ln-witness` inherits the design-it-twice DNA. For each *witnessed* invariant, generate two to four plausible rival interpretations the test suite would also satisfy. Borrow the contrastive-question shapes from `BEHAVIORAL_KERNELS.md` §Contrastive questions. + +Worked shape: + +``` +Invariant: Deleting a project archives its tasks. + +Tests witness this by asserting: after delete, tasks.status === 'archived'. + +Rivals the tests fail to rule out: + A. Tasks are archived only at the moment of delete; later mutations to + the deleted project's tasks are silently accepted. + B. Archived tasks remain editable through the API even though the UI + hides them. + C. Archive is a soft delete; a second delete on the project hard-deletes + the tasks without warning. + +Discriminating tests: + → For A: mutate an archived task after parent delete; assert rejection. + → For B: attempt PATCH on archived task; assert 403 or equivalent. + → For C: double-delete the project; assert second call is idempotent. +``` + +Present the rivals to the user. Each rival is one of: **close** (write the discriminating test), **accept** (mark the invariant as under-witnessed with explicit scope), or **escalate** (the rival reveals real spec ambiguity — route back to `ln-disambiguate` or `ln-spec`). + +**Grill**: For each rival, ask: is this a plausible interpretation in this domain, or a strawman? Plausibility matters — rivalry mode loses value if the rivals are not interpretations a reasonable reader would actually entertain. + +### 5. Reconcile findings + +Aggregate the audit into three buckets: + +- **Strong witnesses** — tests at property/contract/invariant rung tied to named claims with no plausible uneliminated rivals. +- **Weak witnesses** — tests at example/regression rung, or tied to claims with uneliminated rivals, where promotion is feasible and worth it. +- **Honest gaps** — unwitnessed obligations and uneliminated rivals the user explicitly accepts as deferrals, with revisit triggers. + +A test suite with zero weak witnesses or zero honest gaps is either trivial or dishonest. + +## Output + +Present the audit as a structured report in chat. Do not write a standalone audit document unless the user explicitly asks for one. + +Update durable docs only where findings warrant: + +- **`memory/SPEC.md`** — add or strengthen invariants the audit surfaced; record acknowledged blind spots in §Verification Design with revisit triggers. +- **`memory/PLAN.md`** — refresh `Verification` annotations on affected frontier items; queue follow-up frontier items for closing significant gaps. +- **Active scope card** — note discriminating tests to write in the current slice if they fit; do not let audit findings silently widen the slice. + +### Cross-reference integrity + +After writing, verify: +- Every promoted invariant has at least one named witnessing test +- Every acknowledged blind spot has a revisit trigger +- No rival marked **escalate** is left without a routing recommendation + +## Routing + +After presenting the audit, present these options to the user (use `tool-ask-question`): + +| # | Label | Target | Why | +| --- | ---------------------- | ----------------- | ------------------------------------------------------------ | +| 1 | Close gaps now | `ln-scope` | Audit surfaced discriminating tests worth a focused slice | +| 2 | Disambiguate spec | `ln-disambiguate` | A rival revealed real ambiguity in intent | +| 3 | Revise spec | `ln-spec` | Audit promoted invariants or new blind spots into the spec | +| 4 | Revisit oracle strategy| `ln-oracles` | Gaps suggest the verification design itself is under-powered | +| 5 | Refactor tests | `ln-refactor` | Tests are correctly aimed but structurally weak | +| 6 | Back to triage | `ln-consult` | Findings reshape direction; reassess | + +Recommended: **1** if gaps are local and cheap; **2** if a rival surfaced spec ambiguity; **4** if multiple kernels show systematic under-witnessing. diff --git a/.agents/skills/ln-witness/assets/witness-rubric.md b/.agents/skills/ln-witness/assets/witness-rubric.md new file mode 100644 index 00000000..baf6e934 --- /dev/null +++ b/.agents/skills/ln-witness/assets/witness-rubric.md @@ -0,0 +1,121 @@ +# Witness Rubric + +The rubric `ln-witness` measures tests against. Two parts: the **progressive-checkability ladder** (how strong is this test as evidence?) and the **per-kernel proof obligations** (what must be witnessed for each active kernel?). + +Source synthesis: `docs/design/BEHAVIORAL_KERNELS.md` (kernel taxonomy, example tests), `archive/docs/archive/design/INTENT_SPEC_EVOLUTION.md` §2 (Progressive Checkability). + +## Progressive-checkability ladder + +A test's evidentiary strength is named by its rung, not scored 0–100. Higher rungs subsume lower ones; do not double-count. + +| # | Rung | What it proves | Typical shape | +| --- | --- | --- | --- | +| 1 | **Positive example** | One concrete input produces an expected output. | `expect(f(x)).toBe(y)` | +| 2 | **Counterexample** | One concrete input is correctly rejected. | `expect(() => f(bad)).toThrow()` | +| 3 | **Regression test** | A previously-broken case stays fixed. Same shape as 1 or 2 plus provenance. | Named after the bug/issue it captures | +| 4 | **Property test** | A relation holds across many generated inputs (round-trip, idempotence, metamorphic, invariant, model-based). | `fc.assert(fc.property(...))` | +| 5 | **Runtime contract** | A predicate is enforced at the boundary every time the boundary is crossed in production, not only in tests. | `assert`, schema validation at I/O, type-narrowing guard | +| 6 | **State-machine rule** | A transition is permitted iff its guard holds; forbidden transitions are rejected by construction. | State-machine library, exhaustive transition test | +| 7 | **Invariant** | A property holds across *all* reachable states and mutations, enforced structurally (types, schema, encapsulation). | "Cannot construct an invalid X" — illegal states unrepresentable | +| 8 | **Proof obligation** | A formal property a verifier (Dafny / Lean / TLA+ / property-based with adversarial generators) discharges. | Discharged spec, model-checked transition system | + +### Reading the ladder + +- A test at rung 1 is not a defect — most tests live at rungs 1–3. The question is whether the *claim* deserves a higher rung. A claim labeled "invariant" in `memory/SPEC.md` witnessed only at rung 1 is the gap to surface. +- Rungs 5–7 are about *structural enforcement*: the system makes the bad state hard or impossible to reach, not just detected after the fact. Promotion from 4 to 5 is often the highest-leverage move. +- Rung 8 is rare in product code. It belongs in the audit only when the spec explicitly names a `proof_candidate` claim. + +## Per-kernel proof obligations + +For each behavioral kernel, the canonical mutations a sufficient test suite must witness. Drawn from `BEHAVIORAL_KERNELS.md` Proof obligations / Example tests sections; phrased as binary checks the audit can apply. + +### Structural + +**Identity & reference** +- Every entity has a stable identifier preserved across mutations. +- Dangling references are rejected at the boundary that creates them. +- Reference equality is distinguishable from value equality where it matters. + +**Containment & topology** +- `add` preserves the topological invariant (acyclicity, single-parent, ordering, uniqueness — whichever apply). +- `move` preserves the invariant; an item appears in exactly one place after move. +- `delete` preserves the invariant; the declared cascade policy fires. +- `reorder` preserves the invariant; order is well-defined. + +**Validation & normalization** +- Valid inputs accepted; canonical form is what reaches downstream code. +- Invalid inputs rejected at the boundary with a diagnosable error. +- Equivalent-but-non-canonical inputs normalize to the same canonical form (round-trip). + +### Behavioral + +**State & lifecycle** +- Every declared state is reachable from the initial state. +- Every declared transition is exercised under its guard. +- Terminal states are sinks (no transition leaves them, or only declared escape transitions do). +- Forbidden transitions are rejected, not silently no-op'd. + +**Temporal history** +- Operations declared monotonic do not regress. +- Undo restores the prior state exactly; redo restores the post-state exactly. +- Audit / expiration policies fire on the declared schedule. + +**Optimization & preference** +- The chosen outcome is valid (satisfies constraints) before it is optimal. +- Tie-breaking is deterministic and matches the declared rule. + +### Multi-actor + +**Authority & capability** +- Permitted action by an authorized actor succeeds. +- Same action by an unauthorized actor is rejected (not silently dropped). +- Delegated capability flows to the delegatee; bounded scope is enforced. +- Revocation propagates; previously-permitted actions are now rejected. + +**Concurrency & collaboration** +- Concurrent compatible operations both succeed and converge to the same state. +- Concurrent conflicting operations resolve per the declared policy (LWW / FWW / conflict surface / merge), not by accident. +- Stale operations (based on outdated state) are detected and handled per policy. + +### System + +**Transactions & atomicity** +- A multi-object update either lands entirely or not at all. +- Partial failure leaves no observable intermediate state. +- Concurrent transactions do not interleave observably. + +**Resource accounting** +- Conservation holds: sum of accounts before equals sum after, for every operation that claims to conserve. +- Limits are enforced at the boundary that creates demand, not only at observation time. +- Capacity exhaustion is rejected with a diagnosable error, not silently dropped. + +**Derived data & views** +- After any source mutation, the derived view reflects it within the declared freshness window. +- Cache / index / projection cannot diverge from source without detection. + +**Error & recovery** +- Declared retry policy fires under the declared trigger. +- Compensation runs when rollback is impossible (external effect already issued). +- Degraded mode is reachable and exits when health returns. + +**External effects** +- Outbound call shape matches the contract the boundary declares. +- Inbound payload is validated before reaching domain code. +- Side effects are at-least-once / at-most-once / exactly-once per the declared guarantee. + +### Evolution + +**Change & migration** +- Old-format data round-trips through the migration without loss. +- Forward and backward compatibility hold for the declared compatibility window. + +**Observability & evidence** +- Every operation the spec declares auditable produces a log/event with sufficient provenance. +- Logs do not contain prohibited content (secrets, PII per declared policy). + +## How `ln-witness` uses this rubric + +1. **Audit mode** maps each test to a ladder rung (column 3 of the audit table) and to the kernel obligations it satisfies (step 3 of the procedure). +2. **Rivalry mode** uses each kernel's obligations as the source of contrastive scenarios: a missing obligation often *is* the rival the tests fail to rule out, expressed as a discriminating scenario rather than as a checklist line. + +A kernel obligation may be acceptably unwitnessed — but only with an explicit note in `memory/SPEC.md` §Verification Design saying *why* and *when to revisit*. The rubric does not produce a score; it produces a structured set of named gaps the user has to either close or knowingly defer. diff --git a/docs/praxis/ln-skills.md b/docs/praxis/ln-skills.md index 3f3d229b..29193af5 100644 --- a/docs/praxis/ln-skills.md +++ b/docs/praxis/ln-skills.md @@ -28,6 +28,7 @@ ln-consult → ln-spike (optional) → ln-build → ln-review + → ln-witness (optional) → ln-refactor (optional) → ln-sync → ln-handoff (when stopping or transferring) @@ -60,6 +61,7 @@ The flow is not a checklist. Skip steps whose uncertainty is already retired. | --- | --- | --- | | `ln-design` | API shape, module boundary, ownership, or information hiding is uncertain. Use especially before committing to a public seam. | Competing module shapes, chosen direction, rejected tradeoffs. | | `ln-oracles` | Verification strategy is uncertain or materially shapes implementation order, especially for LLM, visual, compositional, or multi-surface work. | Oracle strategy by loop tier, observability diagnosis, blind spots. | +| `ln-witness` | A slice has tests but evidentiary strength is unclear, or tests pass while the spec feels under-witnessed. Post-hoc complement to `ln-oracles`. | Per-test kernel attribution and ladder rung; unwitnessed proof obligations; contrastive rivals tests fail to rule out. | | `ln-prototype` | A throwaway playable/model/UI probe would answer design questions faster than production work. | Disposable prototype evidence; no production commitment. | | `ln-spike` | One hard technical question blocks a scoped slice or frontier item. | Spike verdict and recommendation; throwaway code unless explicitly promoted. | @@ -83,6 +85,7 @@ These are not always visible in the shortest default path, but they are importan | `ln-disambiguate` | Prevents vague requirements by asking contrastive example/counterexample questions where interpretations diverge. | | `ln-design` | Prevents shallow modules and accidental public APIs by exploring multiple shapes before implementation. | | `ln-oracles` | Prevents fake confidence by designing the right evidence before build work. | +| `ln-witness` | Prevents fake confidence after the fact: distinguishes tests that witness named claims from tests that merely pass, and surfaces rival interpretations the suite fails to rule out. | | `ln-prototype` | Retires UX/state/model uncertainty cheaply before the production seam hardens. | | `ln-diagnose` | Keeps debugging scientific and routes durable lessons back into SPEC/PLAN. | | `ln-review` | Catches domain-model erosion and agent-navigability problems after code lands. | @@ -101,6 +104,7 @@ There is currently no project-local `ln-map` skill in `.agents/skills/`. If you | “What is the smallest buildable slice?” | `ln-scope` | | “Which module/API shape should we choose?” | `ln-design` | | “How will we know this works?” | `ln-oracles` | +| “What do these tests actually prove?” | `ln-witness` | | “Can this technical approach work?” | `ln-spike` | | “Can we make the idea tangible before committing?” | `ln-prototype` | | “Why is this failing?” | `ln-diagnose` | @@ -123,6 +127,7 @@ When starting a new frontier item, follow `AGENTS.md` and `docs/praxis/graphite- | Per-slice application of oracle strategy | `ln-scope` | | TDD and inner-loop execution | `ln-build` | | Coverage audit after implementation | `ln-review` | +| Evidentiary audit of an existing test suite | `ln-witness` | Default commands: