From 3999ac371c2f6d9f7073ce7dcdd2269f27635c75 Mon Sep 17 00:00:00 2001
From: Fotis Stamatelopoulos <f.stamatelopoulos@ebs.gr>
Date: Tue, 12 May 2026 16:43:06 -0700
Subject: [PATCH] =?UTF-8?q?release(v0.24.2):=20PA=20prompt=20refinements?=
 =?UTF-8?q?=20=E2=80=94=20lock-step=20digest=20trigger=20+=20mtime=20self-?=
 =?UTF-8?q?check?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round-2 dogfood feedback from gmbot PA. Two of the four refinements
matched what v0.24.1 already shipped (digest-first ordering,
supersession pattern); the other two had subtle but real
refinements applied here.

**Refinement A — digest trigger becomes lock-step with the session
log per-turn rule.**

v0.24.1 had a four-clause testable trigger (Problem Pack edit /
preference / supersession / rejection). Worked but still required
the agent to judge "did any of these happen this turn?" Replaced
with one lock-step rule that mirrors the existing session-log
discipline:

> After every substantive edit to a Problem Pack file, append a
> Decisions bullet to the digest in the same turn — same hard
> discipline as the per-turn session-log rule.

Preferences / supersession / rejection remain, but as additional
triggers under the lock-step rule, not the load-bearing rule.

**Refinement B — self-check ritual names mtime comparison
explicitly.**

v0.24.1 said "scan session log for entries since last digest
update; flush if 2+." That implicitly required parsing the session
log — but the header format is agent-controlled, so there's no
guaranteed pattern to count. Rewrote as a concrete file-mtime
comparison the agent can execute via shell:

>  1. stat workspace-summary.md
>  2. stat session-<id>.md
>  3. If session log is newer AND gap >= 2 turns, catch up.
>
> This is a file-mtime comparison, not a vibe check.

**Harness-side F.32 dropped.**

The same mtime heuristic was originally scoped as a cfcf-side
diagnostic (cfcf doctor + dashboard chip). Killed: harness check is
a passive observer (would warn on workspaces stopped weeks ago,
can't distinguish mid-decision drift); the agent self-check is an
active write barrier with full context. The agent check obviates
the harness check.

Test coverage: 2 new tests in prompt-assembler.test.ts pin the
lock-step phrasing + the mtime/stat literals. All 28 tests pass.
Full suite (1016 tests across packages) green; typecheck clean.

No code-surface changes outside the prompt template + tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  | 109 ++++++++++++++++++
 package.json                                  |   2 +-
 .../prompt-assembler.test.ts                  |  46 ++++++++
 .../src/product-architect/prompt-assembler.ts |  71 ++++++++----
 4 files changed, 202 insertions(+), 26 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1ddfa0d..2db3f7d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,6 +11,115 @@ Changes are tracked via git tags. Each release tag corresponds to an entry here.
 
 _No changes yet._
 
+## [0.24.2] -- 2026-05-12
+
+PA prompt refinements (round 2 of dogfood feedback) — strictly an
+extension of the PA memory-discipline work shipped in v0.24.1. Two
+surgical edits to the PA system prompt template, no code-surface
+changes.
+
+### Changed — PA prompt: lock-step digest trigger + mtime-based self-check
+
+After v0.24.1 shipped, the gmbot PA delivered a refined version of
+its earlier feedback. Two of the four suggestions matched what we'd
+already shipped (digest-first ordering, supersession pattern); the
+other two had subtle but real refinements that this patch applies.
+
+**Refinement A — digest write trigger becomes lock-step with the
+session-log per-turn rule.**
+
+v0.24.1 shipped a four-clause testable trigger (Problem Pack edit /
+preference / supersession / rejection, BEFORE responding). That
+worked but still required the agent to judge "did any of these
+happen this turn?" — four separate yes/no checks every turn.
+
+PA's refined framing collapses this to **one lock-step rule** with
+the other three as additional triggers, not the load-bearing one:
+
+> **After every substantive edit to a Problem Pack file, append a
+> Decisions bullet to the digest in the same turn — same hard
+> discipline as the per-turn session-log rule.** A Problem Pack
+> edit without a matching digest bullet is incomplete work.
+
+The "same discipline as the session-log per-turn rule" framing is
+the key — the agent already has a strong "every turn → session log
+entry" reflex; bolting the digest write onto the same reflex is a
+much sharper rule than "remember to check these four conditions."
+
+Preferences / supersession / rejection remain as triggers — they're
+listed as "additional triggers (same discipline, even without a
+Problem Pack edit this turn)" below the lock-step rule.
+
+**Refinement B — turn-start self-check names mtime comparison
+explicitly.**
+
+v0.24.1 shipped "scan the session log for entries appended since
+your last digest update; flush if 2+ accumulated." That implicitly
+required parsing the session log for entry boundaries — but the
+session-log header format is agent-controlled, so there's no
+guaranteed `## Turn N` pattern to count against.
+
+PA's refined framing names the **file-mtime comparison** as the
+concrete check method:
+
+>  1. `stat` (or `ls -lT` on macOS) `<repo>/.cfcf-pa/workspace-summary.md`
+>  2. `stat` `<repo>/.cfcf-pa/session-<sessionId>.md`
+>  3. If the session log mtime is newer than the digest mtime AND
+>     the gap represents 2+ turns of content, do a catch-up pass.
+>
+> This is a file-mtime comparison, not a vibe check.
+
+Same write-barrier discipline, but with an unambiguous check method
+the agent can execute via shell.
+
+**Why this is in the prompt, not the harness.**
+
+The same mtime-comparison heuristic was originally considered as a
+**cfcf-side diagnostic** (F.32, "pa-digest-staleness") via
+`cfcf doctor` + a dashboard chip. That scope was killed before
+implementation. Reasoning:
+
+- The harness diagnostic would be a *passive observer* — would warn
+  on workspaces stopped 3 weeks ago (false positive), can't
+  distinguish "mid-decision drift" from "you forgot."
+- The agent self-check is an *active write barrier* — has full
+  context (knows the current turn, the active decision) and the
+  authority to *act* (flush the digest), not just observe.
+
+The agent self-check obviates the need for a harness diagnostic —
+the check method (mtime comparison) belongs inside the agent loop
+where it has the context to act on it.
+
+**Implementation** (`packages/core/src/product-architect/
+prompt-assembler.ts`):
+
+- Rewrote the `### Digest` write-trigger block: lead with the
+  lock-step Problem Pack→digest rule; preferences / supersession /
+  rejection demoted to "additional triggers." Preserves the v0.24.1
+  `substantive edit` + `BEFORE responding` + `contradicts or
+  supersedes` literals so the existing v0.24.1 tests still pass.
+- Rewrote the `### Turn-start self-check` block: replaced the
+  "scan session log for entries" framing with concrete `stat` /
+  `ls -lT` mtime comparison on the two co-located files
+  (`workspace-summary.md` + `session-<id>.md`). Preserves the
+  v0.24.1 `2+` threshold + `BEFORE generating your reply` literals.
+
+**Test coverage** (2 new tests in `prompt-assembler.test.ts`,
+all 28 tests pass):
+
+- `digest-write trigger is lock-step with the per-turn session-log
+  rule` — pins `same hard discipline` + `in the same turn`.
+- `self-check ritual names mtime comparison explicitly` — pins
+  `mtime` + `workspace-summary.md` + a `session-<…>.md` reference
+  + a concrete check tool (`stat` or `ls -lT`) + the `not a vibe
+  check` framing.
+
+**No code-surface changes** outside the prompt template + its
+tests. Same shape as the v0.24.1 PA-discipline commit. The
+companion harness-side `pa-digest-staleness` diagnostic (originally
+F.32) is dropped as a scope item — the agent self-check renders it
+redundant.
+
 ## [0.24.1] -- 2026-05-12
 
 Patch release bundling six items since v0.24.0:
diff --git a/package.json b/package.json
index 6d6afc5..85f9684 100644
--- a/package.json
+++ b/package.json
@@ -1,7 +1,7 @@
 {
   "name": "cfcf",
   "private": true,
-  "version": "0.24.1",
+  "version": "0.24.2",
   "description": "Cerefox Code Factory (cf²) -- deterministic orchestration harness for AI coding agents",
   "homepage": "https://github.com/fstamatelopoulos/cfcf",
   "repository": {
diff --git a/packages/core/src/product-architect/prompt-assembler.test.ts b/packages/core/src/product-architect/prompt-assembler.test.ts
index df17edb..09d1e3f 100644
--- a/packages/core/src/product-architect/prompt-assembler.test.ts
+++ b/packages/core/src/product-architect/prompt-assembler.test.ts
@@ -363,4 +363,50 @@ describe("assembleProductArchitectPrompt", () => {
     expect(out).toContain("2+"); // explicit threshold appears literally
     expect(out).toContain("BEFORE generating your reply");
   });
+
+  // --- v0.24.2 PA-prompt refinements (round 2 of dogfood feedback) ---
+
+  it("digest-write trigger is lock-step with the per-turn session-log rule (v0.24.2 refinement)", () => {
+    // PA's refined #2 from the round-2 dogfood feedback: drop the
+    // multi-clause "ANY of these" framing in favour of one lock-step
+    // rule — every substantive Problem Pack edit gets a digest bullet
+    // in the same turn, same hard discipline as the per-turn
+    // session-log rule. Preferences / supersession / rejection remain
+    // as *additional* triggers, not the load-bearing rule.
+    const out = assembleProductArchitectPrompt({
+      state: baseState,
+      memory: emptyMemory,
+      clioActor: "product-architect|claude-code|opus",
+    });
+    // "same hard discipline as the per-turn session-log rule" — the
+    // sharp, pinnable phrasing from the PA refinement.
+    expect(out).toContain("same hard discipline");
+    // "in the same turn" — the lock-step semantics. Pin both halves
+    // of the lock-step framing so future edits don't drift back to
+    // a weaker "eventually" wording.
+    expect(out).toContain("in the same turn");
+  });
+
+  it("self-check ritual names mtime comparison explicitly (v0.24.2 refinement)", () => {
+    // PA's refined #4 from round-2: the self-check is a file-mtime
+    // comparison on disk, not a vibe count of "entries since last
+    // update." Names the two files + a concrete check tool the
+    // agent has shell access to (stat / ls -lT). This is the same
+    // heuristic we considered building harness-side (F.32) but
+    // correctly placed inside the agent loop where it has the
+    // context to act on it (write barrier vs passive observer).
+    const out = assembleProductArchitectPrompt({
+      state: baseState,
+      memory: emptyMemory,
+      clioActor: "product-architect|claude-code|opus",
+    });
+    expect(out).toContain("mtime");
+    expect(out).toContain("workspace-summary.md");
+    expect(out).toMatch(/session-[^\s]+\.md/);
+    // Names a concrete check tool the agent has shell access to —
+    // not a vibe check.
+    expect(out).toMatch(/\bstat\b|\bls -lT\b/);
+    // "file-mtime comparison, not a vibe check" — pin the framing.
+    expect(out).toContain("not a vibe check");
+  });
 });
diff --git a/packages/core/src/product-architect/prompt-assembler.ts b/packages/core/src/product-architect/prompt-assembler.ts
index 9e70376..cbedba9 100644
--- a/packages/core/src/product-architect/prompt-assembler.ts
+++ b/packages/core/src/product-architect/prompt-assembler.ts
@@ -432,23 +432,32 @@ This is the persistent memory injected into every future PA session's
 prompt. Treat updates to it as a first-class deliverable, not an
 afterthought.
 
-**Write trigger** (testable, no judgment calls):
+**Write trigger** (lock-step with the per-turn session-log rule):
 
-> When ANY of these happens this turn, append a bullet to the digest's Decisions ledger **BEFORE responding to the user**:
->
-> 1. You made a **substantive edit** to a Problem Pack file
->    (\`problem.md\`, \`success.md\`, \`constraints.md\`,
->    \`hints.md\`, \`style-guide.md\`) — not a comma fix; an edit
->    that changes the spec's meaning.
-> 2. The user expressed a **preference** (e.g. "always use TDD",
->    "stick to vanilla TypeScript", "no external dependencies").
-> 3. A decision was made that **contradicts or supersedes** an
->    earlier digest entry. See "Supersession pattern" below.
-> 4. The user **rejected** an approach you proposed (capture
->    what was rejected and why).
->
-> If none of those happened this turn, the digest doesn't need an
-> update — the session log alone suffices.
+> **After every substantive edit to a Problem Pack file, append a
+> Decisions bullet to the digest in the same turn — same hard discipline
+> as the per-turn session-log rule.** A Problem Pack edit without a
+> matching digest bullet is incomplete work — write the bullet
+> **BEFORE responding to the user**.
+
+The Problem Pack files this rule covers: \`problem.md\`,
+\`success.md\`, \`constraints.md\`, \`hints.md\`, \`style-guide.md\`.
+"Substantive" = the change alters the spec's meaning. Comma fixes
+and reflow don't count; a new constraint, a flipped success
+criterion, or a hint that reframes the problem do.
+
+Additional triggers (same discipline, even without a Problem Pack
+edit this turn):
+
+  - The user expressed a **preference** (e.g. "always use TDD",
+    "stick to vanilla TypeScript", "no external dependencies").
+  - A decision was made that **contradicts or supersedes** an
+    earlier digest entry. See "Supersession pattern" below.
+  - The user **rejected** an approach you proposed — capture what
+    was rejected and why.
+
+If none of those happened this turn, the digest doesn't need an
+update — the session log alone suffices.
 
 **The write itself** (every digest update is a two-step, same turn):
 
@@ -491,16 +500,28 @@ audit trail.
 
 ### Turn-start self-check (catch missed digest updates)
 
-**Before responding to the user, scan the session log for substantive
-entries appended since your last digest update.** If 2+ have
-accumulated, append to the digest NOW (per the write trigger above)
-BEFORE generating your reply. The session log is the single source
-of truth for "what happened this session"; the digest is the rollup
-future sessions read. They must stay in lockstep.
+**Before responding to the user, run a write-barrier check on disk:**
+
+  1. \`stat\` (or \`ls -lT\` on macOS) \`<repo>/.cfcf-pa/workspace-summary.md\`
+  2. \`stat\` \`<repo>/.cfcf-pa/session-${sessionId}.md\`
+  3. If the session log mtime is **newer** than the digest mtime
+     AND the gap represents **2+ turns** of content (use file
+     size or your own count of how many user turns passed since
+     the last digest update), do a **catch-up pass**: flush
+     accumulated decisions into the digest (per the write
+     trigger above) BEFORE generating your reply.
+
+This is a file-mtime comparison, not a vibe check. The two files
+are co-located under \`<repo>/.cfcf-pa/\` and you have shell
+access — do the \`stat\` rather than guess.
+
+The session log is the single source of truth for "what happened
+this session"; the digest is the rollup future sessions read. They
+must stay in lockstep.
 
-This is a write-barrier ritual — runs every turn, costs zero when
-the digest is current, catches the failure mode where a chain of
-turns accumulates decisions without ever flushing the digest.
+Write-barrier ritual: runs every turn, costs zero when the digest
+is current, catches the failure mode where a chain of turns
+accumulates decisions without ever flushing the digest.
 
 ### Session log (\`session-<sessionId>.md\`) — durability scratchpad