From 3999ac371c2f6d9f7073ce7dcdd2269f27635c75 Mon Sep 17 00:00:00 2001 From: Fotis Stamatelopoulos Date: Tue, 12 May 2026 16:43:06 -0700 Subject: [PATCH] =?UTF-8?q?release(v0.24.2):=20PA=20prompt=20refinements?= =?UTF-8?q?=20=E2=80=94=20lock-step=20digest=20trigger=20+=20mtime=20self-?= =?UTF-8?q?check?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round-2 dogfood feedback from gmbot PA. Two of the four refinements matched what v0.24.1 already shipped (digest-first ordering, supersession pattern); the other two had subtle but real refinements applied here. **Refinement A — digest trigger becomes lock-step with the session log per-turn rule.** v0.24.1 had a four-clause testable trigger (Problem Pack edit / preference / supersession / rejection). Worked but still required the agent to judge "did any of these happen this turn?" Replaced with one lock-step rule that mirrors the existing session-log discipline: > After every substantive edit to a Problem Pack file, append a > Decisions bullet to the digest in the same turn — same hard > discipline as the per-turn session-log rule. Preferences / supersession / rejection remain, but as additional triggers under the lock-step rule, not the load-bearing rule. **Refinement B — self-check ritual names mtime comparison explicitly.** v0.24.1 said "scan session log for entries since last digest update; flush if 2+." That implicitly required parsing the session log — but the header format is agent-controlled, so there's no guaranteed pattern to count. Rewrote as a concrete file-mtime comparison the agent can execute via shell: > 1. stat workspace-summary.md > 2. stat session-.md > 3. If session log is newer AND gap >= 2 turns, catch up. > > This is a file-mtime comparison, not a vibe check. **Harness-side F.32 dropped.** The same mtime heuristic was originally scoped as a cfcf-side diagnostic (cfcf doctor + dashboard chip). Killed: harness check is a passive observer (would warn on workspaces stopped weeks ago, can't distinguish mid-decision drift); the agent self-check is an active write barrier with full context. The agent check obviates the harness check. Test coverage: 2 new tests in prompt-assembler.test.ts pin the lock-step phrasing + the mtime/stat literals. All 28 tests pass. Full suite (1016 tests across packages) green; typecheck clean. No code-surface changes outside the prompt template + tests. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 109 ++++++++++++++++++ package.json | 2 +- .../prompt-assembler.test.ts | 46 ++++++++ .../src/product-architect/prompt-assembler.ts | 71 ++++++++---- 4 files changed, 202 insertions(+), 26 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1ddfa0d..2db3f7d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,115 @@ Changes are tracked via git tags. Each release tag corresponds to an entry here. _No changes yet._ +## [0.24.2] -- 2026-05-12 + +PA prompt refinements (round 2 of dogfood feedback) — strictly an +extension of the PA memory-discipline work shipped in v0.24.1. Two +surgical edits to the PA system prompt template, no code-surface +changes. + +### Changed — PA prompt: lock-step digest trigger + mtime-based self-check + +After v0.24.1 shipped, the gmbot PA delivered a refined version of +its earlier feedback. Two of the four suggestions matched what we'd +already shipped (digest-first ordering, supersession pattern); the +other two had subtle but real refinements that this patch applies. + +**Refinement A — digest write trigger becomes lock-step with the +session-log per-turn rule.** + +v0.24.1 shipped a four-clause testable trigger (Problem Pack edit / +preference / supersession / rejection, BEFORE responding). That +worked but still required the agent to judge "did any of these +happen this turn?" — four separate yes/no checks every turn. + +PA's refined framing collapses this to **one lock-step rule** with +the other three as additional triggers, not the load-bearing one: + +> **After every substantive edit to a Problem Pack file, append a +> Decisions bullet to the digest in the same turn — same hard +> discipline as the per-turn session-log rule.** A Problem Pack +> edit without a matching digest bullet is incomplete work. + +The "same discipline as the session-log per-turn rule" framing is +the key — the agent already has a strong "every turn → session log +entry" reflex; bolting the digest write onto the same reflex is a +much sharper rule than "remember to check these four conditions." + +Preferences / supersession / rejection remain as triggers — they're +listed as "additional triggers (same discipline, even without a +Problem Pack edit this turn)" below the lock-step rule. + +**Refinement B — turn-start self-check names mtime comparison +explicitly.** + +v0.24.1 shipped "scan the session log for entries appended since +your last digest update; flush if 2+ accumulated." That implicitly +required parsing the session log for entry boundaries — but the +session-log header format is agent-controlled, so there's no +guaranteed `## Turn N` pattern to count against. + +PA's refined framing names the **file-mtime comparison** as the +concrete check method: + +> 1. `stat` (or `ls -lT` on macOS) `/.cfcf-pa/workspace-summary.md` +> 2. `stat` `/.cfcf-pa/session-.md` +> 3. If the session log mtime is newer than the digest mtime AND +> the gap represents 2+ turns of content, do a catch-up pass. +> +> This is a file-mtime comparison, not a vibe check. + +Same write-barrier discipline, but with an unambiguous check method +the agent can execute via shell. + +**Why this is in the prompt, not the harness.** + +The same mtime-comparison heuristic was originally considered as a +**cfcf-side diagnostic** (F.32, "pa-digest-staleness") via +`cfcf doctor` + a dashboard chip. That scope was killed before +implementation. Reasoning: + +- The harness diagnostic would be a *passive observer* — would warn + on workspaces stopped 3 weeks ago (false positive), can't + distinguish "mid-decision drift" from "you forgot." +- The agent self-check is an *active write barrier* — has full + context (knows the current turn, the active decision) and the + authority to *act* (flush the digest), not just observe. + +The agent self-check obviates the need for a harness diagnostic — +the check method (mtime comparison) belongs inside the agent loop +where it has the context to act on it. + +**Implementation** (`packages/core/src/product-architect/ +prompt-assembler.ts`): + +- Rewrote the `### Digest` write-trigger block: lead with the + lock-step Problem Pack→digest rule; preferences / supersession / + rejection demoted to "additional triggers." Preserves the v0.24.1 + `substantive edit` + `BEFORE responding` + `contradicts or + supersedes` literals so the existing v0.24.1 tests still pass. +- Rewrote the `### Turn-start self-check` block: replaced the + "scan session log for entries" framing with concrete `stat` / + `ls -lT` mtime comparison on the two co-located files + (`workspace-summary.md` + `session-.md`). Preserves the + v0.24.1 `2+` threshold + `BEFORE generating your reply` literals. + +**Test coverage** (2 new tests in `prompt-assembler.test.ts`, +all 28 tests pass): + +- `digest-write trigger is lock-step with the per-turn session-log + rule` — pins `same hard discipline` + `in the same turn`. +- `self-check ritual names mtime comparison explicitly` — pins + `mtime` + `workspace-summary.md` + a `session-<…>.md` reference + + a concrete check tool (`stat` or `ls -lT`) + the `not a vibe + check` framing. + +**No code-surface changes** outside the prompt template + its +tests. Same shape as the v0.24.1 PA-discipline commit. The +companion harness-side `pa-digest-staleness` diagnostic (originally +F.32) is dropped as a scope item — the agent self-check renders it +redundant. + ## [0.24.1] -- 2026-05-12 Patch release bundling six items since v0.24.0: diff --git a/package.json b/package.json index 6d6afc5..85f9684 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "cfcf", "private": true, - "version": "0.24.1", + "version": "0.24.2", "description": "Cerefox Code Factory (cf²) -- deterministic orchestration harness for AI coding agents", "homepage": "https://github.com/fstamatelopoulos/cfcf", "repository": { diff --git a/packages/core/src/product-architect/prompt-assembler.test.ts b/packages/core/src/product-architect/prompt-assembler.test.ts index df17edb..09d1e3f 100644 --- a/packages/core/src/product-architect/prompt-assembler.test.ts +++ b/packages/core/src/product-architect/prompt-assembler.test.ts @@ -363,4 +363,50 @@ describe("assembleProductArchitectPrompt", () => { expect(out).toContain("2+"); // explicit threshold appears literally expect(out).toContain("BEFORE generating your reply"); }); + + // --- v0.24.2 PA-prompt refinements (round 2 of dogfood feedback) --- + + it("digest-write trigger is lock-step with the per-turn session-log rule (v0.24.2 refinement)", () => { + // PA's refined #2 from the round-2 dogfood feedback: drop the + // multi-clause "ANY of these" framing in favour of one lock-step + // rule — every substantive Problem Pack edit gets a digest bullet + // in the same turn, same hard discipline as the per-turn + // session-log rule. Preferences / supersession / rejection remain + // as *additional* triggers, not the load-bearing rule. + const out = assembleProductArchitectPrompt({ + state: baseState, + memory: emptyMemory, + clioActor: "product-architect|claude-code|opus", + }); + // "same hard discipline as the per-turn session-log rule" — the + // sharp, pinnable phrasing from the PA refinement. + expect(out).toContain("same hard discipline"); + // "in the same turn" — the lock-step semantics. Pin both halves + // of the lock-step framing so future edits don't drift back to + // a weaker "eventually" wording. + expect(out).toContain("in the same turn"); + }); + + it("self-check ritual names mtime comparison explicitly (v0.24.2 refinement)", () => { + // PA's refined #4 from round-2: the self-check is a file-mtime + // comparison on disk, not a vibe count of "entries since last + // update." Names the two files + a concrete check tool the + // agent has shell access to (stat / ls -lT). This is the same + // heuristic we considered building harness-side (F.32) but + // correctly placed inside the agent loop where it has the + // context to act on it (write barrier vs passive observer). + const out = assembleProductArchitectPrompt({ + state: baseState, + memory: emptyMemory, + clioActor: "product-architect|claude-code|opus", + }); + expect(out).toContain("mtime"); + expect(out).toContain("workspace-summary.md"); + expect(out).toMatch(/session-[^\s]+\.md/); + // Names a concrete check tool the agent has shell access to — + // not a vibe check. + expect(out).toMatch(/\bstat\b|\bls -lT\b/); + // "file-mtime comparison, not a vibe check" — pin the framing. + expect(out).toContain("not a vibe check"); + }); }); diff --git a/packages/core/src/product-architect/prompt-assembler.ts b/packages/core/src/product-architect/prompt-assembler.ts index 9e70376..cbedba9 100644 --- a/packages/core/src/product-architect/prompt-assembler.ts +++ b/packages/core/src/product-architect/prompt-assembler.ts @@ -432,23 +432,32 @@ This is the persistent memory injected into every future PA session's prompt. Treat updates to it as a first-class deliverable, not an afterthought. -**Write trigger** (testable, no judgment calls): +**Write trigger** (lock-step with the per-turn session-log rule): -> When ANY of these happens this turn, append a bullet to the digest's Decisions ledger **BEFORE responding to the user**: -> -> 1. You made a **substantive edit** to a Problem Pack file -> (\`problem.md\`, \`success.md\`, \`constraints.md\`, -> \`hints.md\`, \`style-guide.md\`) — not a comma fix; an edit -> that changes the spec's meaning. -> 2. The user expressed a **preference** (e.g. "always use TDD", -> "stick to vanilla TypeScript", "no external dependencies"). -> 3. A decision was made that **contradicts or supersedes** an -> earlier digest entry. See "Supersession pattern" below. -> 4. The user **rejected** an approach you proposed (capture -> what was rejected and why). -> -> If none of those happened this turn, the digest doesn't need an -> update — the session log alone suffices. +> **After every substantive edit to a Problem Pack file, append a +> Decisions bullet to the digest in the same turn — same hard discipline +> as the per-turn session-log rule.** A Problem Pack edit without a +> matching digest bullet is incomplete work — write the bullet +> **BEFORE responding to the user**. + +The Problem Pack files this rule covers: \`problem.md\`, +\`success.md\`, \`constraints.md\`, \`hints.md\`, \`style-guide.md\`. +"Substantive" = the change alters the spec's meaning. Comma fixes +and reflow don't count; a new constraint, a flipped success +criterion, or a hint that reframes the problem do. + +Additional triggers (same discipline, even without a Problem Pack +edit this turn): + + - The user expressed a **preference** (e.g. "always use TDD", + "stick to vanilla TypeScript", "no external dependencies"). + - A decision was made that **contradicts or supersedes** an + earlier digest entry. See "Supersession pattern" below. + - The user **rejected** an approach you proposed — capture what + was rejected and why. + +If none of those happened this turn, the digest doesn't need an +update — the session log alone suffices. **The write itself** (every digest update is a two-step, same turn): @@ -491,16 +500,28 @@ audit trail. ### Turn-start self-check (catch missed digest updates) -**Before responding to the user, scan the session log for substantive -entries appended since your last digest update.** If 2+ have -accumulated, append to the digest NOW (per the write trigger above) -BEFORE generating your reply. The session log is the single source -of truth for "what happened this session"; the digest is the rollup -future sessions read. They must stay in lockstep. +**Before responding to the user, run a write-barrier check on disk:** + + 1. \`stat\` (or \`ls -lT\` on macOS) \`/.cfcf-pa/workspace-summary.md\` + 2. \`stat\` \`/.cfcf-pa/session-${sessionId}.md\` + 3. If the session log mtime is **newer** than the digest mtime + AND the gap represents **2+ turns** of content (use file + size or your own count of how many user turns passed since + the last digest update), do a **catch-up pass**: flush + accumulated decisions into the digest (per the write + trigger above) BEFORE generating your reply. + +This is a file-mtime comparison, not a vibe check. The two files +are co-located under \`/.cfcf-pa/\` and you have shell +access — do the \`stat\` rather than guess. + +The session log is the single source of truth for "what happened +this session"; the digest is the rollup future sessions read. They +must stay in lockstep. -This is a write-barrier ritual — runs every turn, costs zero when -the digest is current, catches the failure mode where a chain of -turns accumulates decisions without ever flushing the digest. +Write-barrier ritual: runs every turn, costs zero when the digest +is current, catches the failure mode where a chain of turns +accumulates decisions without ever flushing the digest. ### Session log (\`session-.md\`) — durability scratchpad