Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,115 @@ Changes are tracked via git tags. Each release tag corresponds to an entry here.

_No changes yet._

## [0.24.2] -- 2026-05-12

PA prompt refinements (round 2 of dogfood feedback) — strictly an
extension of the PA memory-discipline work shipped in v0.24.1. Two
surgical edits to the PA system prompt template, no code-surface
changes.

### Changed — PA prompt: lock-step digest trigger + mtime-based self-check

After v0.24.1 shipped, the gmbot PA delivered a refined version of
its earlier feedback. Two of the four suggestions matched what we'd
already shipped (digest-first ordering, supersession pattern); the
other two had subtle but real refinements that this patch applies.

**Refinement A — digest write trigger becomes lock-step with the
session-log per-turn rule.**

v0.24.1 shipped a four-clause testable trigger (Problem Pack edit /
preference / supersession / rejection, BEFORE responding). That
worked but still required the agent to judge "did any of these
happen this turn?" — four separate yes/no checks every turn.

PA's refined framing collapses this to **one lock-step rule** with
the other three as additional triggers, not the load-bearing one:

> **After every substantive edit to a Problem Pack file, append a
> Decisions bullet to the digest in the same turn — same hard
> discipline as the per-turn session-log rule.** A Problem Pack
> edit without a matching digest bullet is incomplete work.

The "same discipline as the session-log per-turn rule" framing is
the key — the agent already has a strong "every turn → session log
entry" reflex; bolting the digest write onto the same reflex is a
much sharper rule than "remember to check these four conditions."

Preferences / supersession / rejection remain as triggers — they're
listed as "additional triggers (same discipline, even without a
Problem Pack edit this turn)" below the lock-step rule.

**Refinement B — turn-start self-check names mtime comparison
explicitly.**

v0.24.1 shipped "scan the session log for entries appended since
your last digest update; flush if 2+ accumulated." That implicitly
required parsing the session log for entry boundaries — but the
session-log header format is agent-controlled, so there's no
guaranteed `## Turn N` pattern to count against.

PA's refined framing names the **file-mtime comparison** as the
concrete check method:

> 1. `stat` (or `ls -lT` on macOS) `<repo>/.cfcf-pa/workspace-summary.md`
> 2. `stat` `<repo>/.cfcf-pa/session-<sessionId>.md`
> 3. If the session log mtime is newer than the digest mtime AND
> the gap represents 2+ turns of content, do a catch-up pass.
>
> This is a file-mtime comparison, not a vibe check.

Same write-barrier discipline, but with an unambiguous check method
the agent can execute via shell.

**Why this is in the prompt, not the harness.**

The same mtime-comparison heuristic was originally considered as a
**cfcf-side diagnostic** (F.32, "pa-digest-staleness") via
`cfcf doctor` + a dashboard chip. That scope was killed before
implementation. Reasoning:

- The harness diagnostic would be a *passive observer* — would warn
on workspaces stopped 3 weeks ago (false positive), can't
distinguish "mid-decision drift" from "you forgot."
- The agent self-check is an *active write barrier* — has full
context (knows the current turn, the active decision) and the
authority to *act* (flush the digest), not just observe.

The agent self-check obviates the need for a harness diagnostic —
the check method (mtime comparison) belongs inside the agent loop
where it has the context to act on it.

**Implementation** (`packages/core/src/product-architect/
prompt-assembler.ts`):

- Rewrote the `### Digest` write-trigger block: lead with the
lock-step Problem Pack→digest rule; preferences / supersession /
rejection demoted to "additional triggers." Preserves the v0.24.1
`substantive edit` + `BEFORE responding` + `contradicts or
supersedes` literals so the existing v0.24.1 tests still pass.
- Rewrote the `### Turn-start self-check` block: replaced the
"scan session log for entries" framing with concrete `stat` /
`ls -lT` mtime comparison on the two co-located files
(`workspace-summary.md` + `session-<id>.md`). Preserves the
v0.24.1 `2+` threshold + `BEFORE generating your reply` literals.

**Test coverage** (2 new tests in `prompt-assembler.test.ts`,
all 28 tests pass):

- `digest-write trigger is lock-step with the per-turn session-log
rule` — pins `same hard discipline` + `in the same turn`.
- `self-check ritual names mtime comparison explicitly` — pins
`mtime` + `workspace-summary.md` + a `session-<…>.md` reference
+ a concrete check tool (`stat` or `ls -lT`) + the `not a vibe
check` framing.

**No code-surface changes** outside the prompt template + its
tests. Same shape as the v0.24.1 PA-discipline commit. The
companion harness-side `pa-digest-staleness` diagnostic (originally
F.32) is dropped as a scope item — the agent self-check renders it
redundant.

## [0.24.1] -- 2026-05-12

Patch release bundling six items since v0.24.0:
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "cfcf",
"private": true,
"version": "0.24.1",
"version": "0.24.2",
"description": "Cerefox Code Factory (cf²) -- deterministic orchestration harness for AI coding agents",
"homepage": "https://github.com/fstamatelopoulos/cfcf",
"repository": {
Expand Down
46 changes: 46 additions & 0 deletions packages/core/src/product-architect/prompt-assembler.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -363,4 +363,50 @@ describe("assembleProductArchitectPrompt", () => {
expect(out).toContain("2+"); // explicit threshold appears literally
expect(out).toContain("BEFORE generating your reply");
});

// --- v0.24.2 PA-prompt refinements (round 2 of dogfood feedback) ---

it("digest-write trigger is lock-step with the per-turn session-log rule (v0.24.2 refinement)", () => {
// PA's refined #2 from the round-2 dogfood feedback: drop the
// multi-clause "ANY of these" framing in favour of one lock-step
// rule — every substantive Problem Pack edit gets a digest bullet
// in the same turn, same hard discipline as the per-turn
// session-log rule. Preferences / supersession / rejection remain
// as *additional* triggers, not the load-bearing rule.
const out = assembleProductArchitectPrompt({
state: baseState,
memory: emptyMemory,
clioActor: "product-architect|claude-code|opus",
});
// "same hard discipline as the per-turn session-log rule" — the
// sharp, pinnable phrasing from the PA refinement.
expect(out).toContain("same hard discipline");
// "in the same turn" — the lock-step semantics. Pin both halves
// of the lock-step framing so future edits don't drift back to
// a weaker "eventually" wording.
expect(out).toContain("in the same turn");
});

it("self-check ritual names mtime comparison explicitly (v0.24.2 refinement)", () => {
// PA's refined #4 from round-2: the self-check is a file-mtime
// comparison on disk, not a vibe count of "entries since last
// update." Names the two files + a concrete check tool the
// agent has shell access to (stat / ls -lT). This is the same
// heuristic we considered building harness-side (F.32) but
// correctly placed inside the agent loop where it has the
// context to act on it (write barrier vs passive observer).
const out = assembleProductArchitectPrompt({
state: baseState,
memory: emptyMemory,
clioActor: "product-architect|claude-code|opus",
});
expect(out).toContain("mtime");
expect(out).toContain("workspace-summary.md");
expect(out).toMatch(/session-[^\s]+\.md/);
// Names a concrete check tool the agent has shell access to —
// not a vibe check.
expect(out).toMatch(/\bstat\b|\bls -lT\b/);
// "file-mtime comparison, not a vibe check" — pin the framing.
expect(out).toContain("not a vibe check");
});
});
71 changes: 46 additions & 25 deletions packages/core/src/product-architect/prompt-assembler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -432,23 +432,32 @@ This is the persistent memory injected into every future PA session's
prompt. Treat updates to it as a first-class deliverable, not an
afterthought.

**Write trigger** (testable, no judgment calls):
**Write trigger** (lock-step with the per-turn session-log rule):

> When ANY of these happens this turn, append a bullet to the digest's Decisions ledger **BEFORE responding to the user**:
>
> 1. You made a **substantive edit** to a Problem Pack file
> (\`problem.md\`, \`success.md\`, \`constraints.md\`,
> \`hints.md\`, \`style-guide.md\`) — not a comma fix; an edit
> that changes the spec's meaning.
> 2. The user expressed a **preference** (e.g. "always use TDD",
> "stick to vanilla TypeScript", "no external dependencies").
> 3. A decision was made that **contradicts or supersedes** an
> earlier digest entry. See "Supersession pattern" below.
> 4. The user **rejected** an approach you proposed (capture
> what was rejected and why).
>
> If none of those happened this turn, the digest doesn't need an
> update — the session log alone suffices.
> **After every substantive edit to a Problem Pack file, append a
> Decisions bullet to the digest in the same turn — same hard discipline
> as the per-turn session-log rule.** A Problem Pack edit without a
> matching digest bullet is incomplete work — write the bullet
> **BEFORE responding to the user**.

The Problem Pack files this rule covers: \`problem.md\`,
\`success.md\`, \`constraints.md\`, \`hints.md\`, \`style-guide.md\`.
"Substantive" = the change alters the spec's meaning. Comma fixes
and reflow don't count; a new constraint, a flipped success
criterion, or a hint that reframes the problem do.

Additional triggers (same discipline, even without a Problem Pack
edit this turn):

- The user expressed a **preference** (e.g. "always use TDD",
"stick to vanilla TypeScript", "no external dependencies").
- A decision was made that **contradicts or supersedes** an
earlier digest entry. See "Supersession pattern" below.
- The user **rejected** an approach you proposed — capture what
was rejected and why.

If none of those happened this turn, the digest doesn't need an
update — the session log alone suffices.

**The write itself** (every digest update is a two-step, same turn):

Expand Down Expand Up @@ -491,16 +500,28 @@ audit trail.

### Turn-start self-check (catch missed digest updates)

**Before responding to the user, scan the session log for substantive
entries appended since your last digest update.** If 2+ have
accumulated, append to the digest NOW (per the write trigger above)
BEFORE generating your reply. The session log is the single source
of truth for "what happened this session"; the digest is the rollup
future sessions read. They must stay in lockstep.
**Before responding to the user, run a write-barrier check on disk:**

1. \`stat\` (or \`ls -lT\` on macOS) \`<repo>/.cfcf-pa/workspace-summary.md\`
2. \`stat\` \`<repo>/.cfcf-pa/session-${sessionId}.md\`
3. If the session log mtime is **newer** than the digest mtime
AND the gap represents **2+ turns** of content (use file
size or your own count of how many user turns passed since
the last digest update), do a **catch-up pass**: flush
accumulated decisions into the digest (per the write
trigger above) BEFORE generating your reply.

This is a file-mtime comparison, not a vibe check. The two files
are co-located under \`<repo>/.cfcf-pa/\` and you have shell
access — do the \`stat\` rather than guess.

The session log is the single source of truth for "what happened
this session"; the digest is the rollup future sessions read. They
must stay in lockstep.

This is a write-barrier ritualruns every turn, costs zero when
the digest is current, catches the failure mode where a chain of
turns accumulates decisions without ever flushing the digest.
Write-barrier ritual: runs every turn, costs zero when the digest
is current, catches the failure mode where a chain of turns
accumulates decisions without ever flushing the digest.

### Session log (\`session-<sessionId>.md\`) — durability scratchpad

Expand Down
Loading