From 34dde923ef6d13aaf3248fb5eb68da9b5203ec33 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:00:12 +0300 Subject: [PATCH 01/21] feat: memory-consolidation Phase B.5 lint + frontmatter persistence MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds cross-file contradiction detection (Phase B.5) to the nightly memory-consolidation skill, gated by LINT_PHASE_B5_ENABLED for a 30-day trial. Phase C now persists confidence and revisit_if in frontmatter; Phase D writes a Pending Review section to MEMORY.md, appends a structured JSON line to memory/lint-stats.jsonl, and uses a parseable diary header prefix for recent-activity grepping. Auto-resolve uses evidence > confidence > recency hierarchy with anti-loop frontmatter fields (resolved_at / resolution_basis / do_not_reopen_before) to prevent re-triggering the same contradiction nightly. Time-scoped evolution is explicitly excluded from contradiction treatment. Mutation limit shared with Phase C; never silent-delete — losing claims get a (superseded: ...) annotation. Co-Authored-By: Claude Opus 4.7 --- .claude/skills/memory-consolidation/SKILL.md | 91 ++++++++++- docs/plans/2026-05-17-memory-lint-trial.md | 153 +++++++++++++++++++ 2 files changed, 239 insertions(+), 5 deletions(-) create mode 100644 docs/plans/2026-05-17-memory-lint-trial.md diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index aa3fd1a..f7cfea1 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -2,6 +2,12 @@ Nightly skill that crystallizes recent human conversation sessions into organized persistent memory. Reads session transcripts from the last 48 hours, extracts noteworthy facts, updates MEMORY.md and memory/auto/ files, and writes a narrative diary digest to memory/diary/. +## Feature Flags + +- `LINT_PHASE_B5_ENABLED=true` — Trial: 2026-05-17 → 2026-06-17. When false, skip Phase B.5 entirely. Rollback = flip to false (no data migration). See ADR-069. + +When the flag is false, the skill executes Phases 0/A/B/C/D as before — no cross-file lint, no Pending Review writes to MEMORY.md, no appends to `memory/lint-stats.jsonl`. Phase C still applies the new frontmatter fields (`confidence`, `revisit_if`) when creating or updating files, since those are forward-compatible regardless of the lint pass. + ## Context This skill runs as a nightly cron. It is the agent's equivalent of sleep — a time for absorption and crystallization of information, not mechanical fact transfer. The goal is to understand what new information means in the context of existing memory, update stale entries, resolve contradictions, and produce diary entries as narrative digests. @@ -105,6 +111,53 @@ bash "${CLAUDE_SKILL_DIR}/scripts/lock.sh" refresh "${CLAUDE_PROJECT_DIR}/.conso ``` If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pipeline immediately and output NO_REPLY. +### Phase B.5: Cross-file Lint (contradiction detection) + +**Gated by `LINT_PHASE_B5_ENABLED`.** If the feature flag is false (env var unset or `false`), skip this entire phase and proceed directly to Phase C. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. + +Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). + +Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. + +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, claim_phrases}`. Source the fields from frontmatter (`type`, `name`, optional `tags`) and the body. Tokenize the `name` slug and the first heading into `title_tokens`. Split the body on bullet boundaries and paragraph breaks to populate `claim_phrases`. Files whose frontmatter contains `do_not_reopen_before` later than today are excluded from the scan entirely. + +2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: + - same `type` field + - overlapping `title_tokens` (≥ 1 shared token, ignoring stop words) + - overlapping `tags` (≥ 1 shared tag, if present) + - matching normalized predicate phrase in both files: one of `prefers`, `uses`, `hates`, `requires`, `do not`, `never`, `avoid` + - negation/opposition markers: `not`, `never`, `avoid`, `instead`, or numerically changed value targeting the same entity + + A pair is a **candidate** only if at least two of the above signals match. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. + +3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" Allowed answers: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. Increment `contradictions_detected` for each `contradiction` verdict. + +4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence > recency`. + a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. + b. **Higher confidence wins** if `|confidence_A − confidence_B| >= 0.2`. + c. **Newer evidence-date wins** if both sides have an `evidence_date` (or frontmatter `updated_at` / `resolved_at`) and the delta is `>= 30 days`. + d. **Otherwise flag for review** — do NOT edit either file. Append `{files:[A,B], reason, detected_at:YYYY-MM-DD}` to `pending_review` and increment `pending_added`. + +5. **Apply auto-resolved edits.** For each auto-resolved pair: + - **Never silent-delete.** Edit the losing file to replace the contradicting claim with a `(superseded: )` annotation. The losing claim text remains visible as a strikethrough or parenthetical so audit history is preserved. + - Add anti-loop fields to BOTH files' frontmatter: + ```yaml + resolved_at: YYYY-MM-DD + resolution_basis: "" + do_not_reopen_before: YYYY-MM-DD # or semantic condition like "Ninja revisits topic X" + ``` + - Increment `auto_resolved`. + +6. **Mutation limit is shared with Phase C.** Phase B.5 edits count against the same per-run budget of 5 mutations. If `mutations_applied >= 5` mid-way through Phase B.5, stop applying further auto-resolves; remaining detections go to `pending_review` as `(deferred: mutation limit reached)`. + +7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. + +**Lock refresh:** Before continuing, refresh the lock: +```bash +bash "${CLAUDE_SKILL_DIR}/scripts/lock.sh" refresh "${CLAUDE_PROJECT_DIR}/.consolidation.lock" "" +``` +If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pipeline immediately and output NO_REPLY. + ### Phase C: Apply Changes **Mutation limit: 5 per run.** Each file creation or modification counts as one mutation. @@ -121,17 +174,25 @@ For each approved change (confidence >= 0.9), in priority order (updates before 2. **Apply the edit** — update existing `memory/auto/` file, create new one, or update `MEMORY.md` index. - For `memory/auto/` files, use this frontmatter format: - ```markdown + For `memory/auto/` files, use this frontmatter format. `confidence` and `revisit_if` are persisted on every create or update; the `resolved_at` / `resolution_basis` / `do_not_reopen_before` trio is optional and only added when Phase B.5 resolves a contradiction touching this file. + ```yaml --- name: topic-slug description: One-line description used for relevance matching in future sessions type: user|project|reference|feedback + confidence: 0.9 # 0.0-1.0, matches Phase B scoring + revisit_if: "Ninja decides to move" # semantic trigger, like ADR Revisit-if; "Never" valid + # Optional, added when resolved by Phase B.5: + # resolved_at: 2026-05-18 + # resolution_basis: "diary 2026-05-15 §3 explicit user statement" + # do_not_reopen_before: 2026-08-18 --- Body content here. For feedback/project types, include **Why:** and **How to apply:** sections. ``` + `revisit_if` is free-text. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). + 3. **After editing MEMORY.md:** ```bash bash "${CLAUDE_SKILL_DIR}/scripts/safe-edit.sh" verify "${CLAUDE_PROJECT_DIR}/MEMORY.md" @@ -167,15 +228,18 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi - What was learned or confirmed - What memory changes were made (and why) - Items noted for manual curation (confidence 0.5–0.9) + - Lint findings from Phase B.5: candidates considered, contradictions detected, auto-resolves applied, items deferred to Pending Review - Any errors or partial failures encountered If a diary file for today already exists, append a new section with a timestamp header. + **Parseable prefix.** New diary sections written from now on use this header line so a recent-activity log can be grepped: `## [YYYY-MM-DD HH:MM] consolidation | `. A consumer can run `grep "^## \[" memory/diary/*.md | tail -10` to see recent consolidations at a glance. **This change is forward-only** — do not rewrite existing diary headers; only newly written sections use the parseable prefix. + Format: ```markdown # Diary — YYYY-MM-DD - ## Consolidation at HH:MM + ## [YYYY-MM-DD HH:MM] consolidation | ### Sessions Reviewed - [topic]: brief description of what was discussed @@ -183,6 +247,9 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Memory Changes - Created/Updated memory/auto/filename.md — reason + ### Lint (Phase B.5) + - Candidates: N, contradictions: N, auto-resolved: N, pending added: N + ### Noted for Review - [confidence 0.7] Possible insight — context @@ -190,12 +257,26 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi - Any errors encountered during processing ``` -2. **Release consolidation lock:** +2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. + - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet: `- — file-A vs file-B — `. + - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. + - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. + - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow. + +3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line: + ```json + {"date":"YYYY-MM-DD","candidates_found":N,"contradictions_detected":N,"auto_resolved":N,"pending_added":N,"pending_total":N,"avg_age_days":N} + ``` + - `pending_total` is the total bullet count remaining in `MEMORY.md`'s "Pending Review" section after this run's writes. + - `avg_age_days` is the mean age in days of all current pending bullets (use `detected_at` for the age basis); if `pending_total == 0`, write `0`. + - Append-only — never rewrite earlier lines. + +4. **Release consolidation lock:** ```bash bash "${CLAUDE_SKILL_DIR}/scripts/lock.sh" release "${CLAUDE_PROJECT_DIR}/.consolidation.lock" "" ``` -3. **Output NO_REPLY** — this skill runs silently, never sends messages to chat. +5. **Output NO_REPLY** — this skill runs silently, never sends messages to chat. ## Error Handling diff --git a/docs/plans/2026-05-17-memory-lint-trial.md b/docs/plans/2026-05-17-memory-lint-trial.md new file mode 100644 index 0000000..48faac5 --- /dev/null +++ b/docs/plans/2026-05-17-memory-lint-trial.md @@ -0,0 +1,153 @@ +# Memory-Lint Phase B.5 Trial — Provenance Frontmatter + Contradiction Detection + Surfacing Rule + +## Goal + +30-day trial (2026-05-17 → 2026-06-17) of automated cross-file contradiction detection in memory, with proactive in-conversation surfacing. Adds to existing `memory-consolidation` skill: new Phase B.5 (lint pass), expanded Phase C (frontmatter persistence), expanded Phase D (Pending Review section in workspace MEMORY.md + stats file). Adds platform rule requiring the agent to surface pending items in conversation. + +Feature-flagged for instant rollback. Anti-loop fields prevent re-triggering same contradiction nightly. Auto-resolve uses `evidence > confidence > recency` hierarchy (codex-recommended). + +References upstream: ADR-069, beads workspace-txyu, [Karpathy LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) (abstract — no algorithm). Codex provided concrete algorithm. + +## Validation Commands + +```bash +grep -q 'LINT_PHASE_B5_ENABLED' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q '### Phase B.5' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q 'evidence > confidence > recency' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q 'resolved_at\|do_not_reopen_before' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q '## Surfacing pending lint items' .claude/rules/platform/memory-protocol.md && \ +echo "All checks passed" +``` + +## Reference: Current `memory-consolidation` skill + +File: `.claude/skills/memory-consolidation/SKILL.md`. + +Existing phases: +- Phase 0: Validate (locks, dirs) +- Phase A: Gather sessions (last 48h) +- Phase B: Diff & Score — confidence scoring (1.0/0.9/0.7/0.5), supersession check on new vs existing +- Phase C: Apply changes (mutation limit 5/run, safe-edit with rollback) +- Phase D: Diary digest + release lock + +Existing contradiction handling: Phase B step 1 — when ingesting new info, LLM checks if it contradicts existing memory, newer-info supersedes. NO cross-file scan between existing files. + +Existing memory file frontmatter: +```yaml +--- +name: +description: +type: user|project|reference|feedback +--- +``` + +## Reference: Codex algorithm for contradiction detection + +**Cheap candidate generation FIRST** (don't blindly LLM-judge all pairs): +- Same `type` filter +- Overlapping entities (filename tokens, title words, frontmatter tags) +- Normalized predicate phrases: `prefers`, `uses`, `hates`, `requires`, `do not` +- Negation/opposition markers: `not`, `never`, `avoid`, `instead`, changed values + +Only candidate bundles → LLM judgment. With 40 files, false positives are the cost concern, not compute. + +**Auto-resolve hierarchy:** +1. Direct diary/session evidence beats inferred +2. Else higher confidence wins if `Δ confidence >= 0.2` +3. Else newer evidence-date wins if `Δ >= 30 days` +4. Else flag — do not edit + +**Anti-loop fields** (added per memory file when resolved): +- `resolved_at: ` +- `resolution_basis: ""` +- `do_not_reopen_before: ` + +**Time-scoped changes are NOT contradictions** ("used X then, uses Y now" is evolution). + +## Tasks + +### Task 1: Add feature flag, Phase B.5 lint, frontmatter persistence, and stats file to `memory-consolidation/SKILL.md` + +The skill must gain a feature flag at top, a new Phase B.5 (after Phase B, before Phase C) that scans existing memory files for cross-file contradictions, an updated Phase C that persists `confidence` and `revisit_if` in frontmatter, and an updated Phase D that writes a "Pending Review" section to workspace `MEMORY.md` and appends a structured line to `memory/lint-stats.jsonl`. Diary entries gain a parseable prefix. + +What we want: + +- **Feature flag** at the top of SKILL.md (before "Context"): + ```markdown + ## Feature Flags + + - `LINT_PHASE_B5_ENABLED=true` — Trial: 2026-05-17 → 2026-06-17. When false, skip Phase B.5 entirely. Rollback = flip to false (no data migration). See ADR-069. + ``` + When false, the skill executes Phases 0/A/B/C/D as before, no lint, no Pending Review writes, no stats file appends. + +- **Phase B.5 inserted between Phase B and Phase C.** Steps: + 1. Iterate `memory/auto/*.md` and build a lightweight in-memory representation: `{file, type, name, tags, title_tokens, claim_phrases}` extracted from frontmatter and body. Claim extraction uses bullet/paragraph splits. + 2. Candidate generation: for each pair of files, only proceed if at least two of these match — same `type` field, overlapping `title_tokens`, overlapping `tags`, or matching normalized predicate ("prefers", "uses", "hates", "requires", "do not"). Files with `do_not_reopen_before` later than today are skipped entirely. + 3. For each candidate pair, ask the LLM (in-skill prompt) one question: "Do these two claims contradict, or is one time-scoped evolution of the other?" Return: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds. + 4. For each detected contradiction, attempt auto-resolve using hierarchy: (a) direct diary/session evidence in last 48h wins over inferred; (b) higher confidence wins if delta >= 0.2; (c) newer evidence-date wins if delta >= 30 days; (d) otherwise flag for review. + 5. Auto-resolved: edit the losing file to either remove the contradicting claim or mark it superseded. **Never silent-delete** — always replace with a `(superseded: ...)` annotation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before` to BOTH files' frontmatter (anti-loop). + 6. Flagged unresolved: add an entry to a `pending_review` accumulator (used in Phase D). + 7. Respect mutation limit from Phase C (5 per run total across B.5 and C combined). + +- **Phase C** must now persist `confidence` (existing 0.0-1.0 float from Phase B scoring) and `revisit_if` (semantic trigger string, free-text — see ADR-style examples) in the frontmatter when creating or updating files. Update the frontmatter format documentation block in SKILL.md accordingly: + ```yaml + --- + name: topic-slug + description: One-line description + type: user|project|reference|feedback + confidence: 0.9 # 0.0-1.0, matches Phase B scoring + revisit_if: "Ninja decides to move" # semantic trigger, like ADR Revisit-if; "Never" valid + # Optional, added when resolved by Phase B.5: + # resolved_at: 2026-05-18 + # resolution_basis: "diary 2026-05-15 §3 explicit user statement" + # do_not_reopen_before: 2026-08-18 + --- + ``` + +- **Phase D** must: + 1. Update workspace `MEMORY.md` with a `## Pending Review (Lint findings)` section listing unresolved items, one bullet per item with file references and reason. If accumulator is empty, the section must be ABSENT from MEMORY.md (do not leave an empty heading). When resolving an item, the corresponding bullet is removed; when the last bullet is removed, the section itself is removed. + 2. Append one structured JSON line to `memory/lint-stats.jsonl`: + ```json + {"date":"YYYY-MM-DD","candidates_found":N,"contradictions_detected":N,"auto_resolved":N,"pending_added":N,"pending_total":N,"avg_age_days":N} + ``` + 3. Diary entry format gains a parseable prefix: `## [YYYY-MM-DD HH:MM] consolidation | `. This allows `grep "^## \[" memory/diary/*.md | tail -10` to retrieve a recent-activity log. Apply forward-only — do not rewrite existing diary entries. + 4. Continue to follow "Silent operation: never send messages to any chat" (unchanged — no push notifications). + +- **`memory/lint-stats.jsonl` creation**: file is created on first run if absent; subsequent runs append. Format is strict JSON-per-line, parseable by Python's `json.loads` per line. + +- All existing safety mechanisms preserved: lock checks, mutation limit, safe-edit with rollback, never modify CLAUDE.md/USER.md/IDENTITY.md. + +- [x] `.claude/skills/memory-consolidation/SKILL.md` contains `LINT_PHASE_B5_ENABLED` feature flag at the top +- [x] SKILL.md contains a `### Phase B.5` section between Phase B and Phase C +- [x] Phase B.5 documents candidate generation, LLM judgment, auto-resolve hierarchy (`evidence > confidence > recency`), and anti-loop fields +- [x] Phase B.5 explicitly excludes time-scoped changes from being treated as contradictions +- [x] Phase B.5 documents "never silent-delete" — losing claim is replaced with `(superseded: ...)` annotation +- [x] Phase C frontmatter format documented in SKILL.md now includes `confidence` and `revisit_if` fields (with `resolved_at`, `resolution_basis`, `do_not_reopen_before` as optional) +- [x] Phase D documents the workspace `MEMORY.md` "Pending Review" section format and its add/remove rules +- [x] Phase D documents `memory/lint-stats.jsonl` format with one JSON line per run +- [x] Phase D diary format documented to use `## [YYYY-MM-DD HH:MM] consolidation | ...` parseable prefix +- [x] Phase D notes the parseable-prefix change is forward-only (existing diary entries unchanged) +- [x] "Silent operation" line in SKILL.md is unchanged (no push notifications added) +- [x] Existing safety lines preserved: lock-check, mutation-limit, safe-edit, "Never modify CLAUDE.md, USER.md, or IDENTITY.md" + +### Task 2: Add "Surfacing pending lint items" section to `.claude/rules/platform/memory-protocol.md` + +The platform memory-protocol rule must gain a new section requiring the agent to proactively surface pending lint items during conversation. Without this rule, the agent has no behavioral reason to mention them — and the user has stated explicitly they will not proactively ask. + +What we want: + +- New section `## Surfacing pending lint items` added to `.claude/rules/platform/memory-protocol.md`, placed coherently within existing structure (after "Auto-load mechanism" if present, otherwise after the initial storage-locations section). +- Section explains: when workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` section, the agent MUST proactively surface items in the current conversation. +- Section gives the surfacing strategy in concrete rules: + - **Preferred trigger**: when the conversation's topic relates to a pending item, bring it up inline as part of the relevant answer ("Кстати, есть unresolved contradiction про X — ..."). Topic-related = the agent's natural assessment, not a strict keyword match. + - **Aged escalation**: if a pending item is older than 14 days AND no topic-relevant opportunity has arisen, surface it at a natural pause or task end. + - **One per session max**: never dump multiple items in one message. Pick the most relevant or oldest. + - **Never interrupt urgency**: if the user is mid-urgent-task, do not derail — wait for natural break. + - **After resolution**: update the contradicting memory file(s) with the resolved value, then remove the bullet from the MEMORY.md "Pending Review" section in the same operation. Add `resolved_at` / `resolution_basis` / `do_not_reopen_before` per the consolidation skill's pattern. +- Section references ADR-069 and beads `workspace-txyu` for trial context. + +- [ ] `.claude/rules/platform/memory-protocol.md` contains a section titled `## Surfacing pending lint items` +- [ ] Section explicitly states "MUST" requirement to proactively surface +- [ ] Section lists at least: preferred trigger (topic-related), aged escalation (>14 days), one-per-session max, no-interrupt-urgency, after-resolution actions +- [ ] Section references ADR-069 for context +- [ ] No existing content in `memory-protocol.md` is removed or contradicted (additive change) From 32e5bd11360c24880d2dea7598095ba099a54047 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:01:26 +0300 Subject: [PATCH 02/21] feat: memory-protocol "Surfacing pending lint items" rule (Task 2) Adds platform rule requiring the agent to proactively surface pending Phase B.5 lint contradictions in conversation: topic-related inline trigger, 14-day aged escalation, one-per-session cap, no-interrupt-urgency, and after-resolution cleanup that updates both the affected memory file and the workspace MEMORY.md "Pending Review" section. References ADR-069. Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 12 ++++++++++++ docs/plans/2026-05-17-memory-lint-trial.md | 10 +++++----- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 2fe2dc8..af9a750 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -14,6 +14,18 @@ The workaround comes from the issue thread (see also [#36636](https://github.com **Do not remove the `@MEMORY.md` line from `CLAUDE.md`.** Without it, workspace `MEMORY.md` exists on disk but never enters the agent's initial context — your memory index becomes invisible to the agent. +## Surfacing pending lint items + +When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` section, the agent **MUST** proactively surface those items in the current conversation. Ninja has stated explicitly that he will not proactively ask — without this rule, the items rot indefinitely. Context: ADR-069 (memory-lint Phase B.5 trial 2026-05-17 → 2026-06-17, beads `workspace-txyu`). + +### Surfacing strategy + +- **Preferred trigger (topic-related):** when the current conversation's topic naturally relates to a pending item, bring it up inline as part of the relevant answer (e.g., "Кстати, есть unresolved contradiction про X — какой из вариантов актуален?"). Topic-relatedness is the agent's own judgment, not a strict keyword match — err on the side of mentioning when there is a plausible connection. +- **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. +- **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. +- **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, and `do_not_reopen_before` to the frontmatter of the affected file(s), matching the `memory-consolidation` skill's anti-loop pattern. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). + ## What goes WHERE: rules vs memory ### Rule (`.claude/rules/custom/`) diff --git a/docs/plans/2026-05-17-memory-lint-trial.md b/docs/plans/2026-05-17-memory-lint-trial.md index 48faac5..57383ff 100644 --- a/docs/plans/2026-05-17-memory-lint-trial.md +++ b/docs/plans/2026-05-17-memory-lint-trial.md @@ -146,8 +146,8 @@ What we want: - **After resolution**: update the contradicting memory file(s) with the resolved value, then remove the bullet from the MEMORY.md "Pending Review" section in the same operation. Add `resolved_at` / `resolution_basis` / `do_not_reopen_before` per the consolidation skill's pattern. - Section references ADR-069 and beads `workspace-txyu` for trial context. -- [ ] `.claude/rules/platform/memory-protocol.md` contains a section titled `## Surfacing pending lint items` -- [ ] Section explicitly states "MUST" requirement to proactively surface -- [ ] Section lists at least: preferred trigger (topic-related), aged escalation (>14 days), one-per-session max, no-interrupt-urgency, after-resolution actions -- [ ] Section references ADR-069 for context -- [ ] No existing content in `memory-protocol.md` is removed or contradicted (additive change) +- [x] `.claude/rules/platform/memory-protocol.md` contains a section titled `## Surfacing pending lint items` +- [x] Section explicitly states "MUST" requirement to proactively surface +- [x] Section lists at least: preferred trigger (topic-related), aged escalation (>14 days), one-per-session max, no-interrupt-urgency, after-resolution actions +- [x] Section references ADR-069 for context +- [x] No existing content in `memory-protocol.md` is removed or contradicted (additive change) From e4c07093a856bf5f14dd17372e139b1abd2cd95b Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:09:59 +0300 Subject: [PATCH 03/21] fix: address Phase B.5 spec defects from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Substantive corrections to the memory-consolidation SKILL.md spec and the companion memory-protocol rule, surfaced by the post-implementation review: - Mutation budget: Phase C re-zeroed mutations_applied, contradicting Phase B.5's "shared budget of 5". Initialize the counter in Phase B.5 and have Phase C continue it; explicitly skip Phase C if Phase B.5 already spent it. - Step 4c referenced evidence_date / updated_at — fields the frontmatter schema never persists. Replaced with resolved_at (the field that actually exists) and acknowledged the rule mostly defers to flag-for-review. - Step 4b: handle legacy files lacking the confidence field (default 0.7). - Per-pair anti-loop: do_not_reopen_before now scopes to a specific partner via the new do_not_reopen_partner field, rather than excluding the file wholesale from all future scans (which would silently hide unrelated third-party contradictions). - Step 5: edits now MUST use safe-edit.sh, same backup/verify/rollback flow as Phase C. Previously unprotected. - Picked one annotation style (trailing parenthetical) instead of offering "strikethrough or parenthetical" as alternatives. - Phase C frontmatter template no longer inlines commented optional fields — those got copied verbatim into new files when the LLM transcribed the template literally. Moved to prose with explicit "do not include". - Dropped the unused claim_phrases accumulator from step 1's representation. - Pending Review bullets now use a strict, machine-parseable format (detected_at=YYYY-MM-DD prefix) so pending_total and avg_age_days math has a defined parser. Added deduplication and section-removal scoping. - Sanitization rules added for free-text fields (resolution_basis, revisit_if, reason bullets): single-line, ≤200 chars, no leading #, escape ". - Auto-resolve mutation count: 1 per resolved pair (not 2 per file edit). - LLM judgment errors now treated as `unrelated` and logged. Plan validation greps still pass. Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 2 +- .claude/skills/memory-consolidation/SKILL.md | 52 +++++++++++--------- 2 files changed, 29 insertions(+), 25 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index af9a750..d697c6b 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,7 +24,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, and `do_not_reopen_before` to the frontmatter of the affected file(s), matching the `memory-consolidation` skill's anti-loop pattern. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partner` (filename of the other file in the resolved pair) to the frontmatter of the affected file(s), matching the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index f7cfea1..78758dd 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -117,9 +117,9 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). -Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. +Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, claim_phrases}`. Source the fields from frontmatter (`type`, `name`, optional `tags`) and the body. Tokenize the `name` slug and the first heading into `title_tokens`. Split the body on bullet boundaries and paragraph breaks to populate `claim_phrases`. Files whose frontmatter contains `do_not_reopen_before` later than today are excluded from the scan entirely. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, do_not_reopen_before, do_not_reopen_partner}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop pair). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field @@ -128,27 +128,32 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, - matching normalized predicate phrase in both files: one of `prefers`, `uses`, `hates`, `requires`, `do not`, `never`, `avoid` - negation/opposition markers: `not`, `never`, `avoid`, `instead`, or numerically changed value targeting the same entity - A pair is a **candidate** only if at least two of the above signals match. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. + A pair is a **candidate** only if at least two of the above signals match. Each signal contributes at most 1 to the match count regardless of how many tokens/tags overlap. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. -3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" Allowed answers: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. Increment `contradictions_detected` for each `contradiction` verdict. + **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file has frontmatter `do_not_reopen_partner` naming the other AND `do_not_reopen_before` later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. + +3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" Allowed answers: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. 4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence > recency`. a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. - b. **Higher confidence wins** if `|confidence_A − confidence_B| >= 0.2`. - c. **Newer evidence-date wins** if both sides have an `evidence_date` (or frontmatter `updated_at` / `resolved_at`) and the delta is `>= 30 days`. + b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. + c. **Newer `resolved_at` wins** if both sides carry `resolved_at` from a prior Phase B.5 cycle and the delta is `>= 30 days`. Files freshly judged this run usually lack `resolved_at` — in that case this rule does not fire and we fall through to (d). d. **Otherwise flag for review** — do NOT edit either file. Append `{files:[A,B], reason, detected_at:YYYY-MM-DD}` to `pending_review` and increment `pending_added`. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - - **Never silent-delete.** Edit the losing file to replace the contradicting claim with a `(superseded: )` annotation. The losing claim text remains visible as a strikethrough or parenthetical so audit history is preserved. + - **Edits MUST use `safe-edit.sh`.** Same `backup` / `verify` / `rollback` / `clean` flow as Phase C, applied to each `memory/auto/*.md` file edited in this step. If `verify` fails after either edit, `rollback` and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. + - **Never silent-delete.** Append ` (superseded YYYY-MM-DD: )` to the losing claim's line — do NOT delete the original text. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter: ```yaml resolved_at: YYYY-MM-DD - resolution_basis: "" + resolution_basis: "" do_not_reopen_before: YYYY-MM-DD # or semantic condition like "Ninja revisits topic X" + do_not_reopen_partner: # filename only, no path — names the file this resolution paired against ``` - - Increment `auto_resolved`. + Free-text values (`resolution_basis`) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"`. + - Increment `auto_resolved` and `mutations_applied` by 1 per resolved **pair** (the two file edits count as one logical resolution for budget purposes). -6. **Mutation limit is shared with Phase C.** Phase B.5 edits count against the same per-run budget of 5 mutations. If `mutations_applied >= 5` mid-way through Phase B.5, stop applying further auto-resolves; remaining detections go to `pending_review` as `(deferred: mutation limit reached)`. +6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. If `mutations_applied >= 5` mid-way through Phase B.5, stop applying further auto-resolves; remaining detections go to `pending_review` with the reason `(deferred: mutation limit reached)`. 7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. @@ -160,10 +165,10 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase C: Apply Changes -**Mutation limit: 5 per run.** Each file creation or modification counts as one mutation. +**Mutation limit: 5 per run, shared with Phase B.5.** Each file creation or modification counts as one mutation. Phase B.5 may have already consumed part of this budget — do NOT re-initialize `mutations_applied` here. If `mutations_applied >= 5` on entry to Phase C, skip Phase C mutations entirely and proceed to Phase D. If any mutation fails, stop further mutations immediately (stop-on-failure). -Track: `mutations_applied = 0`, `mutations_failed = 0`. +Track: `mutations_failed = 0` (continue using `mutations_applied` from Phase B.5; if Phase B.5 was skipped via the feature flag, initialize `mutations_applied = 0` here). For each approved change (confidence >= 0.9), in priority order (updates before creates): @@ -174,24 +179,22 @@ For each approved change (confidence >= 0.9), in priority order (updates before 2. **Apply the edit** — update existing `memory/auto/` file, create new one, or update `MEMORY.md` index. - For `memory/auto/` files, use this frontmatter format. `confidence` and `revisit_if` are persisted on every create or update; the `resolved_at` / `resolution_basis` / `do_not_reopen_before` trio is optional and only added when Phase B.5 resolves a contradiction touching this file. + For `memory/auto/` files, use this base frontmatter format on every create or update. Persist `confidence` and `revisit_if` on every write. Do NOT include the optional Phase B.5 fields unless they actually apply (do not copy commented-out lines from the template into new files). ```yaml --- name: topic-slug description: One-line description used for relevance matching in future sessions type: user|project|reference|feedback - confidence: 0.9 # 0.0-1.0, matches Phase B scoring - revisit_if: "Ninja decides to move" # semantic trigger, like ADR Revisit-if; "Never" valid - # Optional, added when resolved by Phase B.5: - # resolved_at: 2026-05-18 - # resolution_basis: "diary 2026-05-15 §3 explicit user statement" - # do_not_reopen_before: 2026-08-18 + confidence: 0.9 + revisit_if: "Ninja decides to move" --- Body content here. For feedback/project types, include **Why:** and **How to apply:** sections. ``` - `revisit_if` is free-text. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). + When Phase B.5 resolves a contradiction touching this file, additionally write the anti-loop trio (`resolved_at`, `resolution_basis`, `do_not_reopen_before`) plus `do_not_reopen_partner` — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. + + `revisit_if` is free-text and must be a single line. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). Apply the same sanitization as `resolution_basis` (max 200 chars, no embedded newlines, no leading `#`, escape `"` as `\"`). If Phase B's scoring did not yield a semantic trigger, default to `"Never"`. `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). The `revisit_if` field is written for human/agent inspection during interactive sessions; this skill does not read it back. 3. **After editing MEMORY.md:** ```bash @@ -258,17 +261,18 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ``` 2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. - - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet: `- — file-A vs file-B — `. + - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars). + - Before appending a new bullet, deduplicate: if a bullet for the same unordered `(file-A, file-B)` pair already exists in the section, do NOT append again. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. + - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line: ```json {"date":"YYYY-MM-DD","candidates_found":N,"contradictions_detected":N,"auto_resolved":N,"pending_added":N,"pending_total":N,"avg_age_days":N} ``` - - `pending_total` is the total bullet count remaining in `MEMORY.md`'s "Pending Review" section after this run's writes. - - `avg_age_days` is the mean age in days of all current pending bullets (use `detected_at` for the age basis); if `pending_total == 0`, write `0`. + - `pending_total` counts bullets in `MEMORY.md`'s "Pending Review (Lint findings)" section matching the regex `^- detected_at=\d{4}-\d{2}-\d{2} ` between the section heading and the next `## ` heading (or EOF), measured after this run's writes. + - `avg_age_days` is the mean age in days of all current pending bullets — parse the `detected_at=YYYY-MM-DD` field of each bullet via the same regex. If `pending_total == 0`, write `0`. If any bullet's `detected_at` fails to parse, count it with age `0` and log the malformed bullet in the diary Issues section. - Append-only — never rewrite earlier lines. 4. **Release consolidation lock:** From 3fdae7de05a91ea5b00a3a877107344dc45521da Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:17:15 +0300 Subject: [PATCH 04/21] fix: address Phase B.5 second review pass findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Convert do_not_reopen_partner (scalar) to do_not_reopen_partners (list) with APPEND-not-overwrite semantics. The scalar form silently broke the anti-loop guarantee when a file participated in more than one resolved contradiction over time — the later resolution overwrote the earlier partner record, allowing the prior pair to re-trigger. - Replace literal "N" placeholders in lint-stats.jsonl JSON template with explicit / angle-bracket form; add a jq validation step before append so malformed lines never corrupt the trial dataset. - Define avg_age_days formula explicitly (mean(today − detected_at); today is the same YYYY-MM-DD used in the same stats line's "date" field; round to 1 decimal). - Document SUSPICIOUS_SHRINK bypass for the Pending Review section removal case — without it, the cleanup edit silently rolls back once the section has grown to more than ~20% of MEMORY.md, leaving resolved bullets in place forever. - Update memory-protocol.md after-resolution rule to reference the new field name and document the append-not-overwrite requirement so human resolutions preserve the anti-loop chain. Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 2 +- .claude/skills/memory-consolidation/SKILL.md | 20 +++++++++++--------- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index d697c6b..e325aca 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,7 +24,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partner` (filename of the other file in the resolved pair) to the frontmatter of the affected file(s), matching the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. The other three fields are scalars and reflect the most recent resolution. Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 78758dd..995d96a 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -119,7 +119,7 @@ Cross-file scan of existing `memory/auto/*.md` files for contradictions that the Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, do_not_reopen_before, do_not_reopen_partner}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop pair). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field @@ -130,7 +130,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, A pair is a **candidate** only if at least two of the above signals match. Each signal contributes at most 1 to the match count regardless of how many tokens/tags overlap. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. - **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file has frontmatter `do_not_reopen_partner` naming the other AND `do_not_reopen_before` later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. + **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen_partners` list contains the other file's name AND that file's `do_not_reopen_before` is later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. 3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" Allowed answers: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. @@ -143,12 +143,13 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 5. **Apply auto-resolved edits.** For each auto-resolved pair: - **Edits MUST use `safe-edit.sh`.** Same `backup` / `verify` / `rollback` / `clean` flow as Phase C, applied to each `memory/auto/*.md` file edited in this step. If `verify` fails after either edit, `rollback` and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. - **Never silent-delete.** Append ` (superseded YYYY-MM-DD: )` to the losing claim's line — do NOT delete the original text. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - - Add anti-loop fields to BOTH files' frontmatter: + - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. The other three fields are scalars and reflect the most recent resolution. ```yaml resolved_at: YYYY-MM-DD resolution_basis: "" do_not_reopen_before: YYYY-MM-DD # or semantic condition like "Ninja revisits topic X" - do_not_reopen_partner: # filename only, no path — names the file this resolution paired against + do_not_reopen_partners: # accumulating list — append new partner; do not overwrite + - # filename only, no path ``` Free-text values (`resolution_basis`) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"`. - Increment `auto_resolved` and `mutations_applied` by 1 per resolved **pair** (the two file edits count as one logical resolution for budget purposes). @@ -192,7 +193,7 @@ For each approved change (confidence >= 0.9), in priority order (updates before Body content here. For feedback/project types, include **Why:** and **How to apply:** sections. ``` - When Phase B.5 resolves a contradiction touching this file, additionally write the anti-loop trio (`resolved_at`, `resolution_basis`, `do_not_reopen_before`) plus `do_not_reopen_partner` — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. + When Phase B.5 resolves a contradiction touching this file, additionally write the anti-loop trio (`resolved_at`, `resolution_basis`, `do_not_reopen_before`) plus `do_not_reopen_partners` (list) — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. `revisit_if` is free-text and must be a single line. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). Apply the same sanitization as `resolution_basis` (max 200 chars, no embedded newlines, no leading `#`, escape `"` as `\"`). If Phase B's scoring did not yield a semantic trigger, default to `"Never"`. `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). The `revisit_if` field is written for human/agent inspection during interactive sessions; this skill does not read it back. @@ -265,14 +266,15 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi - Before appending a new bullet, deduplicate: if a bullet for the same unordered `(file-A, file-B)` pair already exists in the section, do NOT append again. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow. + - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading and (b) the difference between backup and current size equals the size of the removed Pending Review section ± 5 bytes. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. -3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line: +3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): ```json - {"date":"YYYY-MM-DD","candidates_found":N,"contradictions_detected":N,"auto_resolved":N,"pending_added":N,"pending_total":N,"avg_age_days":N} + {"date":"","candidates_found":,"contradictions_detected":,"auto_resolved":,"pending_added":,"pending_total":,"avg_age_days":} ``` + - Before appending, validate the candidate line with `printf '%s' "$LINE" | jq -e . > /dev/null` — if validation fails, do NOT append; log the malformed line in the diary Issues section instead. - `pending_total` counts bullets in `MEMORY.md`'s "Pending Review (Lint findings)" section matching the regex `^- detected_at=\d{4}-\d{2}-\d{2} ` between the section heading and the next `## ` heading (or EOF), measured after this run's writes. - - `avg_age_days` is the mean age in days of all current pending bullets — parse the `detected_at=YYYY-MM-DD` field of each bullet via the same regex. If `pending_total == 0`, write `0`. If any bullet's `detected_at` fails to parse, count it with age `0` and log the malformed bullet in the diary Issues section. + - `avg_age_days` is the mean age in days of all current pending bullets, computed as `mean(today − detected_at)` where `today` is the same YYYY-MM-DD used in this run's `date` field, and `detected_at` is parsed from each bullet via the regex above. Round to one decimal place. If `pending_total == 0`, write `0`. If any bullet's `detected_at` fails to parse, count it with age `0` and log the malformed bullet in the diary Issues section. - Append-only — never rewrite earlier lines. 4. **Release consolidation lock:** From 0f4d30792844eba0e610caa9d170586e41337ce5 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:22:11 +0300 Subject: [PATCH 05/21] fix: address Phase B.5 third review pass findings - LLM judgment now returns {verdict, claim_a, claim_b} so step 5 has an exact-match anchor for the (superseded:) annotation; downgrade to unrelated if anchor doesn't appear in body - Sanitization rules extended to do_not_reopen_before when it carries a semantic condition (date form remains unquoted YAML) - Missing do_not_reopen_before with partner listed is now explicitly "exclusion inactive" rather than undefined - Mutation budget consistent: Phase B.5 counts each file edit (pair=2), matching Phase C's per-file rule; pre-check verifies remaining budget fits a full pair before starting - SUSPICIOUS_SHRINK bypass clarifies the section size MUST be measured from the backup before applying the edit, not live during the write - Phase D preamble pins pending_added to the post-dedup value so diary line and JSONL row agree Co-Authored-By: Claude Opus 4.7 --- .claude/skills/memory-consolidation/SKILL.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 995d96a..387480f 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -130,9 +130,9 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, A pair is a **candidate** only if at least two of the above signals match. Each signal contributes at most 1 to the match count regardless of how many tokens/tags overlap. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. - **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen_partners` list contains the other file's name AND that file's `do_not_reopen_before` is later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. + **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen_partners` list contains the other file's name AND that file's `do_not_reopen_before` is later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. If `do_not_reopen_partners` lists the partner but `do_not_reopen_before` is missing entirely (incomplete prior write), treat the exclusion as inactive and let the pair proceed to judgment — anti-loop requires both fields. -3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" Allowed answers: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. +3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. 4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence > recency`. a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. @@ -142,7 +142,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 5. **Apply auto-resolved edits.** For each auto-resolved pair: - **Edits MUST use `safe-edit.sh`.** Same `backup` / `verify` / `rollback` / `clean` flow as Phase C, applied to each `memory/auto/*.md` file edited in this step. If `verify` fails after either edit, `rollback` and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. - - **Never silent-delete.** Append ` (superseded YYYY-MM-DD: )` to the losing claim's line — do NOT delete the original text. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. + - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. The other three fields are scalars and reflect the most recent resolution. ```yaml resolved_at: YYYY-MM-DD @@ -151,10 +151,10 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, do_not_reopen_partners: # accumulating list — append new partner; do not overwrite - # filename only, no path ``` - Free-text values (`resolution_basis`) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"`. - - Increment `auto_resolved` and `mutations_applied` by 1 per resolved **pair** (the two file edits count as one logical resolution for budget purposes). + Free-text values (`resolution_basis`, and `do_not_reopen_before` when it carries a semantic condition rather than a `YYYY-MM-DD` date) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. Date-form `do_not_reopen_before` values (matching `^\d{4}-\d{2}-\d{2}$`) are written unquoted as YAML dates. + - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. -6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. If `mutations_applied >= 5` mid-way through Phase B.5, stop applying further auto-resolves; remaining detections go to `pending_review` with the reason `(deferred: mutation limit reached)`. +6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` with the reason `(deferred: mutation limit reached)`. 7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. @@ -224,6 +224,8 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase D: Report & Cleanup +Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup pass in step 2 finalizes the value of `pending_added` (it may be lower than the count carried out of Phase B.5 if some bullets were already present). The diary "Lint" line in step 1 and the JSONL `pending_added` field in step 3 MUST both reference the same final post-dedup value, not the pre-dedup count. Compute step 2's dedup (or at minimum the dedup count) before emitting the diary line so the two outputs agree. + 1. **Write diary entry** to `memory/diary/YYYY-MM-DD.md` (using today's date). The diary is a narrative digest — write it as if reflecting on the day's conversations. @@ -266,7 +268,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi - Before appending a new bullet, deduplicate: if a bullet for the same unordered `(file-A, file-B)` pair already exists in the section, do NOT append again. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading and (b) the difference between backup and current size equals the size of the removed Pending Review section ± 5 bytes. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. + - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured **from the backup file before applying the edit** (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `backup_size - current_size` ± 5 bytes. The runner MUST capture this measurement against the backup prior to writing the edit; do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): ```json From a8cc272e997358c7057cc38d9fb035fd6c517835 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:30:58 +0300 Subject: [PATCH 06/21] fix: address Phase B.5 fourth review pass findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Feature flag default flips to enabled during 2026-05-17 → 2026-06-17 trial window (unset env var was previously treated as skip, leaving the trial dead-on-arrival without runtime env wiring). - Phase B.5 step 1 now extracts `body_predicates` and `negation_markers` from file bodies; step 2 candidate signals reference those extracted sets instead of demanding body content the rep didn't carry. - `do_not_reopen_partners` entries normalized to bare basename at read time and deduped, closing an anti-loop bypass when prior cycles wrote heterogeneous path formats. - Step 5 paired edits now use explicit two-phase commit semantics (backup BOTH, edit BOTH, verify BOTH, then clean BOTH) so a partial failure can't leave file A annotated while file B is untouched. - Step 5 anchor re-verify added: before applying each pair's edit, re-confirm `claim_a`/`claim_b` still appears verbatim in the current file body — a prior pair in the same run may have already annotated the same line. - Mutation budget accounting timing pinned: increments fire only after both files verify and clean successfully; rolled-back pairs do not consume budget. Co-Authored-By: Claude Opus 4.7 --- .claude/skills/memory-consolidation/SKILL.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 387480f..9cc9faa 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -113,20 +113,20 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase B.5: Cross-file Lint (contradiction detection) -**Gated by `LINT_PHASE_B5_ENABLED`.** If the feature flag is false (env var unset or `false`), skip this entire phase and proceed directly to Phase C. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. +**Gated by `LINT_PHASE_B5_ENABLED`.** During the 2026-05-17 → 2026-06-17 trial window the flag defaults to **enabled**: an unset env var is treated as `true`, and only an explicit `LINT_PHASE_B5_ENABLED=false` skips this entire phase. After 2026-06-17 the default flips back to disabled (unset = skip). This is the rollback path — flip to false (no data migration) to abort the trial early. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Normalize each partner entry to its bare filename (apply `basename`, strip any directory prefix) and dedupe at read time so the anti-loop check in step 2 isn't bypassed by heterogeneous path formats across cycles. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires, do not, never, avoid}` that appear as case-insensitive whole-word matches; `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. Both sets are empty if no match is found. 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field - overlapping `title_tokens` (≥ 1 shared token, ignoring stop words) - overlapping `tags` (≥ 1 shared tag, if present) - - matching normalized predicate phrase in both files: one of `prefers`, `uses`, `hates`, `requires`, `do not`, `never`, `avoid` - - negation/opposition markers: `not`, `never`, `avoid`, `instead`, or numerically changed value targeting the same entity + - non-empty intersection of `body_predicates` between the two files (from the set extracted in step 1) + - non-empty intersection of `negation_markers` between the two files (from the set extracted in step 1) A pair is a **candidate** only if at least two of the above signals match. Each signal contributes at most 1 to the match count regardless of how many tokens/tags overlap. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. @@ -141,7 +141,8 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, d. **Otherwise flag for review** — do NOT edit either file. Append `{files:[A,B], reason, detected_at:YYYY-MM-DD}` to `pending_review` and increment `pending_added`. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - - **Edits MUST use `safe-edit.sh`.** Same `backup` / `verify` / `rollback` / `clean` flow as Phase C, applied to each `memory/auto/*.md` file edited in this step. If `verify` fails after either edit, `rollback` and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. + - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. + - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` with the reason `(deferred: anchor invalidated by prior edit)`. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. The other three fields are scalars and reflect the most recent resolution. ```yaml @@ -152,7 +153,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, - # filename only, no path ``` Free-text values (`resolution_basis`, and `do_not_reopen_before` when it carries a semantic condition rather than a `YYYY-MM-DD` date) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. Date-form `do_not_reopen_before` values (matching `^\d{4}-\d{2}-\d{2}$`) are written unquoted as YAML dates. - - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. + - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. **Timing:** both increments fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run. 6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` with the reason `(deferred: mutation limit reached)`. From 9521eb2ed6d0fc39050c2f0144951520bfd7acb3 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:37:28 +0300 Subject: [PATCH 07/21] fix: address Phase B.5 fifth review pass findings Six surgical spec fixes to memory-consolidation/SKILL.md and memory-protocol.md based on the 5th review pass: - title_tokens: explicit fallback when both heading and description absent - body_predicates: drop multi-token "do not" (undefined under "whole-word match"); already covered by negation_markers - pending_review canonical entry shape unified across steps 4d, 5, 6 so Phase D's parser regex matches every deferral uniformly - pending_added counter: removed in-phase increment, Phase D step 2 is the single source via post-dedup recompute - do_not_reopen_before: take MAX(existing, new) when persisting; never shorten an earlier pair's exclusion window (anti-loop preservation) - SUSPICIOUS_SHRINK bypass: reframed measurement from pre-edit MEMORY.md instead of requiring runner to know .consolidation-backup suffix - JSONL append: mandate trailing newline (printf '%s\\n') to keep file parseable as one JSON object per line memory-protocol.md updated to lockstep on the do_not_reopen_before MAX rule for the interactive resolution path. Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 2 +- .claude/skills/memory-consolidation/SKILL.md | 15 ++++++++------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index e325aca..5b47ec5 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,7 +24,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. The other three fields are scalars and reflect the most recent resolution. Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: if a date-form value already exists, take `MAX(existing, new_value)` — NEVER shorten an earlier resolution's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 9cc9faa..b23a6f8 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -119,7 +119,7 @@ Cross-file scan of existing `memory/auto/*.md` files for contradictions that the Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Normalize each partner entry to its bare filename (apply `basename`, strip any directory prefix) and dedupe at read time so the anti-loop check in step 2 isn't bypassed by heterogeneous path formats across cycles. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires, do not, never, avoid}` that appear as case-insensitive whole-word matches; `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. Both sets are empty if no match is found. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Normalize each partner entry to its bare filename (apply `basename`, strip any directory prefix) and dedupe at read time so the anti-loop check in step 2 isn't bypassed by heterogeneous path formats across cycles. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires, never, avoid}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. Both sets are empty if no match is found. 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field @@ -138,13 +138,13 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. c. **Newer `resolved_at` wins** if both sides carry `resolved_at` from a prior Phase B.5 cycle and the delta is `>= 30 days`. Files freshly judged this run usually lack `resolved_at` — in that case this rule does not fire and we fall through to (d). - d. **Otherwise flag for review** — do NOT edit either file. Append `{files:[A,B], reason, detected_at:YYYY-MM-DD}` to `pending_review` and increment `pending_added`. + d. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` with the reason `(deferred: edit verify failed)`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` with the reason `(deferred: anchor invalidated by prior edit)`. + - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. + - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: anchor invalidated by prior edit)"`. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. The other three fields are scalars and reflect the most recent resolution. + - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: if a date-form value already exists in the file's frontmatter, take `MAX(existing, new_value)` — NEVER shorten an earlier pair's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. If absent, write the new date directly. ```yaml resolved_at: YYYY-MM-DD resolution_basis: "" @@ -155,7 +155,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, Free-text values (`resolution_basis`, and `do_not_reopen_before` when it carries a semantic condition rather than a `YYYY-MM-DD` date) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. Date-form `do_not_reopen_before` values (matching `^\d{4}-\d{2}-\d{2}$`) are written unquoted as YAML dates. - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. **Timing:** both increments fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run. -6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` with the reason `(deferred: mutation limit reached)`. +6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: mutation limit reached)"`. 7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. @@ -269,13 +269,14 @@ Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup - Before appending a new bullet, deduplicate: if a bullet for the same unordered `(file-A, file-B)` pair already exists in the section, do NOT append again. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured **from the backup file before applying the edit** (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `backup_size - current_size` ± 5 bytes. The runner MUST capture this measurement against the backup prior to writing the edit; do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. + - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured against `MEMORY.md` **before** the edit is applied (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. The runner MUST capture both the section bytes and the pre-edit total byte size BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies to `${FILEPATH}.consolidation-backup`); do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): ```json {"date":"","candidates_found":,"contradictions_detected":,"auto_resolved":,"pending_added":,"pending_total":,"avg_age_days":} ``` - Before appending, validate the candidate line with `printf '%s' "$LINE" | jq -e . > /dev/null` — if validation fails, do NOT append; log the malformed line in the diary Issues section instead. + - Append the validated line with a trailing newline so the file remains parseable as one JSON object per line — e.g., `printf '%s\n' "$LINE" >> memory/lint-stats.jsonl`. Never use `printf '%s'` (no newline) when writing; that produces a single concatenated line and breaks `json.loads`-per-line. - `pending_total` counts bullets in `MEMORY.md`'s "Pending Review (Lint findings)" section matching the regex `^- detected_at=\d{4}-\d{2}-\d{2} ` between the section heading and the next `## ` heading (or EOF), measured after this run's writes. - `avg_age_days` is the mean age in days of all current pending bullets, computed as `mean(today − detected_at)` where `today` is the same YYYY-MM-DD used in this run's `date` field, and `detected_at` is parsed from each bullet via the regex above. Round to one decimal place. If `pending_total == 0`, write `0`. If any bullet's `detected_at` fails to parse, count it with age `0` and log the malformed bullet in the diary Issues section. - Append-only — never rewrite earlier lines. From e9554f15b2cdd6913d190ce662184113fe847ad9 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 18:43:11 +0300 Subject: [PATCH 08/21] fix: address Phase B.5 sixth review pass findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - memory-protocol.md: add "if absent" branch and today+90 default for do_not_reopen_before; agent-driven interactive resolution previously had no guidance on what value to write when the field was missing - Phase D step 2: dedup key now (file-A, file-B, reason) so two genuinely distinct contradictions between the same pair aren't silently dropped - Phase D step 2: reason sanitization strips em-dash so it can't collide with the ` — ` field separator in the bullet format - Phase D step 1 / template: gate `### Lint (Phase B.5)` block on LINT_PHASE_B5_ENABLED so post-trial diary entries don't emit zeroed stats Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 2 +- .claude/skills/memory-consolidation/SKILL.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 5b47ec5..d70b964 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,7 +24,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: if a date-form value already exists, take `MAX(existing, new_value)` — NEVER shorten an earlier resolution's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: pick the new value as `today + 90 days` by default (or a semantic condition like `"Ninja revisits topic X"` when the resolution is contingent on a future event). Then persist by these rules: if a date-form value already exists in the file, take `MAX(existing, new_value)` — NEVER shorten an earlier resolution's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. If the field is absent, write the new value directly — anti-loop requires both `do_not_reopen_partners` and `do_not_reopen_before` to be present together (the consolidation skill treats a partner-without-before pair as inactive exclusion and re-judges next cycle). Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index b23a6f8..890eb48 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -235,7 +235,7 @@ Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup - What was learned or confirmed - What memory changes were made (and why) - Items noted for manual curation (confidence 0.5–0.9) - - Lint findings from Phase B.5: candidates considered, contradictions detected, auto-resolves applied, items deferred to Pending Review + - Lint findings from Phase B.5: candidates considered, contradictions detected, auto-resolves applied, items deferred to Pending Review (omit this bullet when `LINT_PHASE_B5_ENABLED=false`, since the accumulators were never populated) - Any errors or partial failures encountered If a diary file for today already exists, append a new section with a timestamp header. @@ -254,7 +254,7 @@ Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup ### Memory Changes - Created/Updated memory/auto/filename.md — reason - ### Lint (Phase B.5) + ### Lint (Phase B.5) # omit this entire block when LINT_PHASE_B5_ENABLED=false - Candidates: N, contradictions: N, auto-resolved: N, pending added: N ### Noted for Review @@ -265,8 +265,8 @@ Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup ``` 2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. - - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars). - - Before appending a new bullet, deduplicate: if a bullet for the same unordered `(file-A, file-B)` pair already exists in the section, do NOT append again. Update `pending_added` to count only newly written bullets. + - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars) AND additionally replace any em-dash characters (`—`, U+2014) in the reason with a hyphen-space (`- `) so the bullet's three-field structure can be split unambiguously on the literal ` — ` separator. + - Before appending a new bullet, deduplicate on the triple `(file-A, file-B, reason)` (unordered file pair, exact reason after sanitization): if an existing bullet matches all three fields, do NOT append again. Two genuinely distinct contradictions between the same pair (different reasons) produce two separate bullets — do not collapse them. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured against `MEMORY.md` **before** the edit is applied (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. The runner MUST capture both the section bytes and the pre-edit total byte size BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies to `${FILEPATH}.consolidation-backup`); do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. From 7b6489c8313506ccc0e792f2730ac43c5e788fc2 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:07:11 +0300 Subject: [PATCH 09/21] session end: uncommitted changes --- .claude/skills/memory-consolidation/SKILL.md | 13 ++++++------- docs/plans/2026-05-17-memory-lint-trial.md | 13 +++++++------ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 890eb48..87c169d 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -134,15 +134,14 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. -4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence > recency`. +4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence`. A "recency" tie-breaker was considered but dropped: `resolved_at` reflects a file's unrelated prior resolution history, not the freshness of the currently contradicting claim, so it is not a valid freshness proxy. Direct freshness evidence is already handled by (a). a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. - c. **Newer `resolved_at` wins** if both sides carry `resolved_at` from a prior Phase B.5 cycle and the delta is `>= 30 days`. Files freshly judged this run usually lack `resolved_at` — in that case this rule does not fire and we fall through to (d). - d. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. + c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: anchor invalidated by prior edit)"`. + - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. + - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: anchor invalidated by prior edit)"`. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: if a date-form value already exists in the file's frontmatter, take `MAX(existing, new_value)` — NEVER shorten an earlier pair's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. If absent, write the new date directly. ```yaml @@ -155,7 +154,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, Free-text values (`resolution_basis`, and `do_not_reopen_before` when it carries a semantic condition rather than a `YYYY-MM-DD` date) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. Date-form `do_not_reopen_before` values (matching `^\d{4}-\d{2}-\d{2}$`) are written unquoted as YAML dates. - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. **Timing:** both increments fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run. -6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using the canonical shape from step 4d with `reason: "(deferred: mutation limit reached)"`. +6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: mutation limit reached)"`. 7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. @@ -196,7 +195,7 @@ For each approved change (confidence >= 0.9), in priority order (updates before When Phase B.5 resolves a contradiction touching this file, additionally write the anti-loop trio (`resolved_at`, `resolution_basis`, `do_not_reopen_before`) plus `do_not_reopen_partners` (list) — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. - `revisit_if` is free-text and must be a single line. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). Apply the same sanitization as `resolution_basis` (max 200 chars, no embedded newlines, no leading `#`, escape `"` as `\"`). If Phase B's scoring did not yield a semantic trigger, default to `"Never"`. `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). The `revisit_if` field is written for human/agent inspection during interactive sessions; this skill does not read it back. + `revisit_if` is free-text and must be a single line. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). Apply the same sanitization as `resolution_basis` (max 200 chars, no embedded newlines, no leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`). If Phase B's scoring did not yield a semantic trigger, default to `"Never"`. `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). The `revisit_if` field is written for human/agent inspection during interactive sessions; this skill does not read it back. 3. **After editing MEMORY.md:** ```bash diff --git a/docs/plans/2026-05-17-memory-lint-trial.md b/docs/plans/2026-05-17-memory-lint-trial.md index 57383ff..77b26f9 100644 --- a/docs/plans/2026-05-17-memory-lint-trial.md +++ b/docs/plans/2026-05-17-memory-lint-trial.md @@ -4,7 +4,7 @@ 30-day trial (2026-05-17 → 2026-06-17) of automated cross-file contradiction detection in memory, with proactive in-conversation surfacing. Adds to existing `memory-consolidation` skill: new Phase B.5 (lint pass), expanded Phase C (frontmatter persistence), expanded Phase D (Pending Review section in workspace MEMORY.md + stats file). Adds platform rule requiring the agent to surface pending items in conversation. -Feature-flagged for instant rollback. Anti-loop fields prevent re-triggering same contradiction nightly. Auto-resolve uses `evidence > confidence > recency` hierarchy (codex-recommended). +Feature-flagged for instant rollback. Anti-loop fields prevent re-triggering same contradiction nightly. Auto-resolve uses `evidence > confidence` hierarchy (codex-recommended; a "recency" leg was dropped during review because `resolved_at` reflects unrelated prior resolutions and is not a valid freshness proxy for the current claim). References upstream: ADR-069, beads workspace-txyu, [Karpathy LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) (abstract — no algorithm). Codex provided concrete algorithm. @@ -13,7 +13,7 @@ References upstream: ADR-069, beads workspace-txyu, [Karpathy LLM Wiki gist](htt ```bash grep -q 'LINT_PHASE_B5_ENABLED' .claude/skills/memory-consolidation/SKILL.md && \ grep -q '### Phase B.5' .claude/skills/memory-consolidation/SKILL.md && \ -grep -q 'evidence > confidence > recency' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q 'evidence > confidence' .claude/skills/memory-consolidation/SKILL.md && \ grep -q 'resolved_at\|do_not_reopen_before' .claude/skills/memory-consolidation/SKILL.md && \ grep -q '## Surfacing pending lint items' .claude/rules/platform/memory-protocol.md && \ echo "All checks passed" @@ -54,8 +54,9 @@ Only candidate bundles → LLM judgment. With 40 files, false positives are the **Auto-resolve hierarchy:** 1. Direct diary/session evidence beats inferred 2. Else higher confidence wins if `Δ confidence >= 0.2` -3. Else newer evidence-date wins if `Δ >= 30 days` -4. Else flag — do not edit +3. Else flag — do not edit + +(An earlier draft included a "newer evidence-date wins" leg; it was dropped during review — see `evidence > confidence` note above.) **Anti-loop fields** (added per memory file when resolved): - `resolved_at: ` @@ -84,7 +85,7 @@ What we want: 1. Iterate `memory/auto/*.md` and build a lightweight in-memory representation: `{file, type, name, tags, title_tokens, claim_phrases}` extracted from frontmatter and body. Claim extraction uses bullet/paragraph splits. 2. Candidate generation: for each pair of files, only proceed if at least two of these match — same `type` field, overlapping `title_tokens`, overlapping `tags`, or matching normalized predicate ("prefers", "uses", "hates", "requires", "do not"). Files with `do_not_reopen_before` later than today are skipped entirely. 3. For each candidate pair, ask the LLM (in-skill prompt) one question: "Do these two claims contradict, or is one time-scoped evolution of the other?" Return: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds. - 4. For each detected contradiction, attempt auto-resolve using hierarchy: (a) direct diary/session evidence in last 48h wins over inferred; (b) higher confidence wins if delta >= 0.2; (c) newer evidence-date wins if delta >= 30 days; (d) otherwise flag for review. + 4. For each detected contradiction, attempt auto-resolve using hierarchy: (a) direct diary/session evidence in last 48h wins over inferred; (b) higher confidence wins if delta >= 0.2; (c) otherwise flag for review. 5. Auto-resolved: edit the losing file to either remove the contradicting claim or mark it superseded. **Never silent-delete** — always replace with a `(superseded: ...)` annotation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before` to BOTH files' frontmatter (anti-loop). 6. Flagged unresolved: add an entry to a `pending_review` accumulator (used in Phase D). 7. Respect mutation limit from Phase C (5 per run total across B.5 and C combined). @@ -119,7 +120,7 @@ What we want: - [x] `.claude/skills/memory-consolidation/SKILL.md` contains `LINT_PHASE_B5_ENABLED` feature flag at the top - [x] SKILL.md contains a `### Phase B.5` section between Phase B and Phase C -- [x] Phase B.5 documents candidate generation, LLM judgment, auto-resolve hierarchy (`evidence > confidence > recency`), and anti-loop fields +- [x] Phase B.5 documents candidate generation, LLM judgment, auto-resolve hierarchy (`evidence > confidence`), and anti-loop fields - [x] Phase B.5 explicitly excludes time-scoped changes from being treated as contradictions - [x] Phase B.5 documents "never silent-delete" — losing claim is replaced with `(superseded: ...)` annotation - [x] Phase C frontmatter format documented in SKILL.md now includes `confidence` and `revisit_if` fields (with `resolved_at`, `resolution_basis`, `do_not_reopen_before` as optional) From 0dee927767763bcbeee048a1a9af3c8af5788801 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:17:30 +0300 Subject: [PATCH 10/21] session end: uncommitted changes --- .claude/rules/platform/memory-protocol.md | 6 +++++- .claude/skills/memory-consolidation/SKILL.md | 16 ++++++++-------- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index d70b964..405af39 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,7 +24,11 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before`, and `do_not_reopen_partners` to the frontmatter of the affected file(s). `do_not_reopen_partners` is a YAML list — APPEND the other file's name (filename only, no path) to any existing list; do NOT overwrite prior entries, or the earlier pair's anti-loop guarantee is lost. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: pick the new value as `today + 90 days` by default (or a semantic condition like `"Ninja revisits topic X"` when the resolution is contingent on a future event). Then persist by these rules: if a date-form value already exists in the file, take `MAX(existing, new_value)` — NEVER shorten an earlier resolution's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. If the field is absent, write the new value directly — anti-loop requires both `do_not_reopen_partners` and `do_not_reopen_before` to be present together (the consolidation skill treats a partner-without-before pair as inactive exclusion and re-judges next cycle). Match the `memory-consolidation` skill's anti-loop pattern (see `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5 for the canonical YAML and sanitization rules). Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). +- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. + - **Concurrency: defer if consolidation is running.** Before any edit, check whether `.consolidation.lock` exists at the workspace root. If it does, do NOT proceed — the nightly consolidator is writing to the same files and a concurrent edit can clobber its mutation or be rolled back by its `safe-edit.sh verify` step. Tell Ninja the resolution is deferred until the lock disappears and re-attempt at the next opportunity (the lock is short-lived: cron runs are minutes, not hours). + - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `.claude/skills/memory-consolidation/scripts/safe-edit.sh backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. + - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. + - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 87c169d..334ce33 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -119,7 +119,7 @@ Cross-file scan of existing `memory/auto/*.md` files for contradictions that the Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen_before, do_not_reopen_partners}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop fields). `do_not_reopen_partners` is read as a YAML list — if the field is absent treat as empty list; if it is a scalar (legacy single-value form) treat as a one-element list. Normalize each partner entry to its bare filename (apply `basename`, strip any directory prefix) and dedupe at read time so the anti-loop check in step 2 isn't bypassed by heterogeneous path formats across cycles. Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires, never, avoid}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. Both sets are empty if no match is found. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop list). `do_not_reopen` is read as a YAML list of records, each with `partner` (filename) and `before` (YYYY-MM-DD date) — if the field is absent treat as empty list. Normalize each `partner` to its bare filename (apply `basename`, strip any directory prefix) and dedupe by `partner` (if duplicate entries exist for the same partner, keep the one with the latest `before` — never shorten an existing pair's cooldown). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. The two sets are disjoint by construction so a single shared word cannot satisfy both signals in step 2. Both sets are empty if no match is found. 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field @@ -130,7 +130,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, A pair is a **candidate** only if at least two of the above signals match. Each signal contributes at most 1 to the match count regardless of how many tokens/tags overlap. Increment `candidates_found` for each candidate pair. With ~40 files, false positives are the primary concern; this filter keeps LLM calls bounded. - **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen_partners` list contains the other file's name AND that file's `do_not_reopen_before` is later than today. Exclusion is per-pair, not per-file: a file may still be paired against any other unrelated file. Non-date `do_not_reopen_before` values (semantic conditions) are treated as "always future" — skip the pair until Ninja manually clears the field. If `do_not_reopen_partners` lists the partner but `do_not_reopen_before` is missing entirely (incomplete prior write), treat the exclusion as inactive and let the pair proceed to judgment — anti-loop requires both fields. + **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen` list contains an entry whose `partner` matches the other file's bare name AND that entry's `before` date is later than today. Exclusion is genuinely per-pair: each partner has its own `before` date stored in its own record, so resolving A↔C cannot extend A↔B's cooldown. If the matching entry's `before` is absent or fails the `^\d{4}-\d{2}-\d{2}$` regex (malformed or legacy prior write), treat the exclusion as inactive for that specific pair and let the pair proceed to judgment — anti-loop requires a well-formed date. Only YYYY-MM-DD dates are recognized; the spec does not support semantic-condition values like `"Ninja revisits topic X"`, since the skill has no mechanism to auto-detect such events (use a far-future date if indefinite suppression is genuinely needed). 3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. @@ -143,15 +143,15 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: anchor invalidated by prior edit)"`. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen_partners` is a **list** — APPEND the new partner's filename to it (dedupe if already present); never overwrite existing entries. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution. For `do_not_reopen_before`: if a date-form value already exists in the file's frontmatter, take `MAX(existing, new_value)` — NEVER shorten an earlier pair's exclusion window. If the existing value is a semantic (non-date) condition, leave it untouched and skip the date write for that file. If absent, write the new date directly. + - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen` is a **list of records** keyed by `partner` — locate the existing entry for this pair's partner (if any) and apply `MAX(existing.before, new_value)` to its `before` field; NEVER shorten this pair's cooldown. If no entry exists for this partner, APPEND a new `{partner, before}` record. Other files' entries (for unrelated partners) are untouched — resolving A↔C must not extend A↔B's cooldown. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. ```yaml resolved_at: YYYY-MM-DD resolution_basis: "" - do_not_reopen_before: YYYY-MM-DD # or semantic condition like "Ninja revisits topic X" - do_not_reopen_partners: # accumulating list — append new partner; do not overwrite - - # filename only, no path + do_not_reopen: # accumulating list — one record per resolved pair + - partner: # filename only, no path + before: YYYY-MM-DD # always a date; other partners' dates are untouched ``` - Free-text values (`resolution_basis`, and `do_not_reopen_before` when it carries a semantic condition rather than a `YYYY-MM-DD` date) MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. Date-form `do_not_reopen_before` values (matching `^\d{4}-\d{2}-\d{2}$`) are written unquoted as YAML dates. + `resolution_basis` MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. `before` values are always date-form (matching `^\d{4}-\d{2}-\d{2}$`) and written unquoted as YAML dates — semantic-condition values are not supported (the skill has no auto-trigger for them; use a far-future date if indefinite suppression is required). - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. **Timing:** both increments fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run. 6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: mutation limit reached)"`. @@ -193,7 +193,7 @@ For each approved change (confidence >= 0.9), in priority order (updates before Body content here. For feedback/project types, include **Why:** and **How to apply:** sections. ``` - When Phase B.5 resolves a contradiction touching this file, additionally write the anti-loop trio (`resolved_at`, `resolution_basis`, `do_not_reopen_before`) plus `do_not_reopen_partners` (list) — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. + When Phase B.5 resolves a contradiction touching this file, additionally write `resolved_at`, `resolution_basis`, and the `do_not_reopen` list (one `{partner, before}` record per resolved pair) — see Phase B.5 step 5 for the exact YAML and sanitization rules. These fields are absent from files that have never participated in a resolved contradiction. `revisit_if` is free-text and must be a single line. Useful phrasings: a concrete user-action trigger ("Ninja switches editors"), a date ("after 2026-09-01"), or `"Never"` for facts that are stable by nature (e.g. timezone). Apply the same sanitization as `resolution_basis` (max 200 chars, no embedded newlines, no leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`). If Phase B's scoring did not yield a semantic trigger, default to `"Never"`. `confidence` mirrors the Phase B scoring rubric (1.0 / 0.9 / 0.7 / 0.5 / discarded below 0.5). The `revisit_if` field is written for human/agent inspection during interactive sessions; this skill does not read it back. From 9c2c0d27e1259964e2da8f43be956dd592c53d7c Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:26:20 +0300 Subject: [PATCH 11/21] session end: uncommitted changes --- .claude/rules/platform/memory-protocol.md | 3 ++- .claude/skills/memory-consolidation/SKILL.md | 2 +- docs/plans/2026-05-17-memory-lint-trial.md | 21 +++++++++++--------- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 405af39..9c9f195 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -25,8 +25,9 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. - **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. - - **Concurrency: defer if consolidation is running.** Before any edit, check whether `.consolidation.lock` exists at the workspace root. If it does, do NOT proceed — the nightly consolidator is writing to the same files and a concurrent edit can clobber its mutation or be rolled back by its `safe-edit.sh verify` step. Tell Ninja the resolution is deferred until the lock disappears and re-attempt at the next opportunity (the lock is short-lived: cron runs are minutes, not hours). + - **Concurrency: take the consolidation lock, do not just check it.** Before any edit, acquire the same lock the nightly consolidator uses — a file-existence check is TOCTOU-racy (cron can grab the lock between the check and the first write) and never reclaims a stale lock. Run `bash .claude/skills/memory-consolidation/scripts/lock.sh acquire "$PWD/.consolidation.lock" 60` and capture the `ACQUIRED ` value. If the script prints `LOCKED` (exit 1), tell Ninja the resolution is deferred and re-attempt at the next opportunity (cron runs are minutes, not hours). When acquired, pass the token to every later `refresh`/`release` call to prove ownership and release the lock at the end of the resolution — including on failure paths (after any rollback). The `60` arg is the stale-TTL in minutes; the script reclaims abandoned locks automatically. Also check `.maintenance.lock` first via `bash .claude/skills/memory-consolidation/scripts/lock.sh check-maintenance "$PWD"`; defer if it returns `MAINTENANCE`. - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `.claude/skills/memory-consolidation/scripts/safe-edit.sh backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. + - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured in the pre-edit file (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. Capture both numbers BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 334ce33..553cfee 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -143,7 +143,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: anchor invalidated by prior edit)"`. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen` is a **list of records** keyed by `partner` — locate the existing entry for this pair's partner (if any) and apply `MAX(existing.before, new_value)` to its `before` field; NEVER shorten this pair's cooldown. If no entry exists for this partner, APPEND a new `{partner, before}` record. Other files' entries (for unrelated partners) are untouched — resolving A↔C must not extend A↔B's cooldown. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. + - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen` is a **list of records** keyed by `partner` — locate the existing entry for this pair's partner (if any) and apply `MAX(existing.before, new_value)` to its `before` field; NEVER shorten this pair's cooldown. If no entry exists for this partner, APPEND a new `{partner, before}` record. Other files' entries (for unrelated partners) are untouched — resolving A↔C must not extend A↔B's cooldown. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. **Default `new_value` for `before`: `today + 90 days`** (same default as the interactive resolution path in `.claude/rules/platform/memory-protocol.md`, so nightly and manual cooldowns match). Use a far-future date (e.g., year 2099) if indefinite suppression is genuinely required; the field is always a YYYY-MM-DD date. ```yaml resolved_at: YYYY-MM-DD resolution_basis: "" diff --git a/docs/plans/2026-05-17-memory-lint-trial.md b/docs/plans/2026-05-17-memory-lint-trial.md index 77b26f9..a84f6d6 100644 --- a/docs/plans/2026-05-17-memory-lint-trial.md +++ b/docs/plans/2026-05-17-memory-lint-trial.md @@ -14,7 +14,8 @@ References upstream: ADR-069, beads workspace-txyu, [Karpathy LLM Wiki gist](htt grep -q 'LINT_PHASE_B5_ENABLED' .claude/skills/memory-consolidation/SKILL.md && \ grep -q '### Phase B.5' .claude/skills/memory-consolidation/SKILL.md && \ grep -q 'evidence > confidence' .claude/skills/memory-consolidation/SKILL.md && \ -grep -q 'resolved_at\|do_not_reopen_before' .claude/skills/memory-consolidation/SKILL.md && \ +grep -q 'resolved_at' .claude/skills/memory-consolidation/SKILL.md && \ +grep -qE 'do_not_reopen.{1,5}is a \*\*list of records\*\*' .claude/skills/memory-consolidation/SKILL.md && \ grep -q '## Surfacing pending lint items' .claude/rules/platform/memory-protocol.md && \ echo "All checks passed" ``` @@ -59,9 +60,9 @@ Only candidate bundles → LLM judgment. With 40 files, false positives are the (An earlier draft included a "newer evidence-date wins" leg; it was dropped during review — see `evidence > confidence` note above.) **Anti-loop fields** (added per memory file when resolved): -- `resolved_at: ` -- `resolution_basis: ""` -- `do_not_reopen_before: ` +- `resolved_at: ` — scalar, last resolution touching this file +- `resolution_basis: ""` — scalar +- `do_not_reopen:` — accumulating list of `{partner, before}` records, one per resolved pair (per-partner cooldown, so resolving A↔C never extends A↔B's window). `before` is a YYYY-MM-DD date; default `today + 90 days`. **Time-scoped changes are NOT contradictions** ("used X then, uses Y now" is evolution). @@ -83,10 +84,10 @@ What we want: - **Phase B.5 inserted between Phase B and Phase C.** Steps: 1. Iterate `memory/auto/*.md` and build a lightweight in-memory representation: `{file, type, name, tags, title_tokens, claim_phrases}` extracted from frontmatter and body. Claim extraction uses bullet/paragraph splits. - 2. Candidate generation: for each pair of files, only proceed if at least two of these match — same `type` field, overlapping `title_tokens`, overlapping `tags`, or matching normalized predicate ("prefers", "uses", "hates", "requires", "do not"). Files with `do_not_reopen_before` later than today are skipped entirely. + 2. Candidate generation: for each pair of files, only proceed if at least two of these match — same `type` field, overlapping `title_tokens`, overlapping `tags`, or matching normalized predicate ("prefers", "uses", "hates", "requires", "do not"). Per-pair exclusion: skip a pair if either file's `do_not_reopen` list has a record whose `partner` matches the other file's bare name AND whose `before` date is later than today (per-pair, not global). 3. For each candidate pair, ask the LLM (in-skill prompt) one question: "Do these two claims contradict, or is one time-scoped evolution of the other?" Return: `contradiction` | `evolution` | `unrelated`. Only `contradiction` proceeds. 4. For each detected contradiction, attempt auto-resolve using hierarchy: (a) direct diary/session evidence in last 48h wins over inferred; (b) higher confidence wins if delta >= 0.2; (c) otherwise flag for review. - 5. Auto-resolved: edit the losing file to either remove the contradicting claim or mark it superseded. **Never silent-delete** — always replace with a `(superseded: ...)` annotation. Add `resolved_at`, `resolution_basis`, `do_not_reopen_before` to BOTH files' frontmatter (anti-loop). + 5. Auto-resolved: edit the losing file to either remove the contradicting claim or mark it superseded. **Never silent-delete** — always replace with a `(superseded: ...)` annotation. Add `resolved_at`, `resolution_basis` (scalars) and append/merge a `{partner, before}` record in the `do_not_reopen` list of BOTH files' frontmatter (anti-loop, per-pair cooldown). 6. Flagged unresolved: add an entry to a `pending_review` accumulator (used in Phase D). 7. Respect mutation limit from Phase C (5 per run total across B.5 and C combined). @@ -101,7 +102,9 @@ What we want: # Optional, added when resolved by Phase B.5: # resolved_at: 2026-05-18 # resolution_basis: "diary 2026-05-15 §3 explicit user statement" - # do_not_reopen_before: 2026-08-18 + # do_not_reopen: # per-pair cooldown list + # - partner: + # before: 2026-08-18 # YYYY-MM-DD; default = today + 90 days --- ``` @@ -123,7 +126,7 @@ What we want: - [x] Phase B.5 documents candidate generation, LLM judgment, auto-resolve hierarchy (`evidence > confidence`), and anti-loop fields - [x] Phase B.5 explicitly excludes time-scoped changes from being treated as contradictions - [x] Phase B.5 documents "never silent-delete" — losing claim is replaced with `(superseded: ...)` annotation -- [x] Phase C frontmatter format documented in SKILL.md now includes `confidence` and `revisit_if` fields (with `resolved_at`, `resolution_basis`, `do_not_reopen_before` as optional) +- [x] Phase C frontmatter format documented in SKILL.md now includes `confidence` and `revisit_if` fields (with `resolved_at`, `resolution_basis`, and the `do_not_reopen` per-pair list as optional) - [x] Phase D documents the workspace `MEMORY.md` "Pending Review" section format and its add/remove rules - [x] Phase D documents `memory/lint-stats.jsonl` format with one JSON line per run - [x] Phase D diary format documented to use `## [YYYY-MM-DD HH:MM] consolidation | ...` parseable prefix @@ -144,7 +147,7 @@ What we want: - **Aged escalation**: if a pending item is older than 14 days AND no topic-relevant opportunity has arisen, surface it at a natural pause or task end. - **One per session max**: never dump multiple items in one message. Pick the most relevant or oldest. - **Never interrupt urgency**: if the user is mid-urgent-task, do not derail — wait for natural break. - - **After resolution**: update the contradicting memory file(s) with the resolved value, then remove the bullet from the MEMORY.md "Pending Review" section in the same operation. Add `resolved_at` / `resolution_basis` / `do_not_reopen_before` per the consolidation skill's pattern. + - **After resolution**: update the contradicting memory file(s) with the resolved value, then remove the bullet from the MEMORY.md "Pending Review" section in the same operation. Add `resolved_at` / `resolution_basis` and append/merge a `{partner, before}` record into the file's `do_not_reopen` list per the consolidation skill's pattern. - Section references ADR-069 and beads `workspace-txyu` for trial context. - [x] `.claude/rules/platform/memory-protocol.md` contains a section titled `## Surfacing pending lint items` From 3d0839c92384378d1302ce8404a0a927a0217cdf Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:34:14 +0300 Subject: [PATCH 12/21] session end: uncommitted changes --- .claude/rules/platform/memory-protocol.md | 1 + .claude/skills/memory-consolidation/SKILL.md | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 9c9f195..97a2301 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -30,6 +30,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured in the pre-edit file (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. Capture both numbers BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). + - **Bullet match key when removing.** Remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction you are resolving — the same triple used as the dedup key when the bullet was written (SKILL.md Phase D step 2). The match key is the bullet's literal `reason` field (the third ` — `-separated segment after sanitization), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions that still need resolution. ## What goes WHERE: rules vs memory diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 553cfee..073c153 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -224,7 +224,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase D: Report & Cleanup -Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup pass in step 2 finalizes the value of `pending_added` (it may be lower than the count carried out of Phase B.5 if some bullets were already present). The diary "Lint" line in step 1 and the JSONL `pending_added` field in step 3 MUST both reference the same final post-dedup value, not the pre-dedup count. Compute step 2's dedup (or at minimum the dedup count) before emitting the diary line so the two outputs agree. +Phase D substeps run in numerical order, with one exception: step 2's MEMORY.md edit attempt finalizes the value of `pending_added`, and step 1's diary "Lint" line plus step 3's JSONL `pending_added` field MUST both reference that final value. Two adjustments can lower `pending_added` relative to the count carried out of Phase B.5: (i) the dedup pass MAY remove items already present in MEMORY.md, and (ii) if step 2's edit fails `verify` and rolls back (excluding the documented `SUSPICIOUS_SHRINK` bypass), NO bullets were persisted and `pending_added` MUST be reset to 0. Practically: perform step 2's dedup computation AND its safe-edit attempt BEFORE emitting the diary line in step 1, so the diary uses the post-write (or post-rollback) value. If a rollback occurs, the diary's "Lint" line MUST report `pending added: 0` and the Issues section MUST note the rollback; step 3's stats line MUST also use the rolled-back value. Never write a diary line claiming new bullets when MEMORY.md was not actually modified. 1. **Write diary entry** to `memory/diary/YYYY-MM-DD.md` (using today's date). @@ -267,7 +267,7 @@ Phase D substeps run in numerical order, with one exception: the MEMORY.md dedup - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars) AND additionally replace any em-dash characters (`—`, U+2014) in the reason with a hyphen-space (`- `) so the bullet's three-field structure can be split unambiguously on the literal ` — ` separator. - Before appending a new bullet, deduplicate on the triple `(file-A, file-B, reason)` (unordered file pair, exact reason after sanitization): if an existing bullet matches all three fields, do NOT append again. Two genuinely distinct contradictions between the same pair (different reasons) produce two separate bullets — do not collapse them. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - - When the agent or a future run resolves a pending item, the corresponding bullet is removed; when the last bullet is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. + - When the agent or a future run resolves a pending item, remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction being resolved — the same triple used as the dedup key when the bullet was written. The match key is the bullet's literal `reason` field (the third dash-separated segment, sanitized as on write), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair `(file-A, file-B)` must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions. When the last bullet in the section is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured against `MEMORY.md` **before** the edit is applied (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. The runner MUST capture both the section bytes and the pre-edit total byte size BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies to `${FILEPATH}.consolidation-backup`); do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): From f98c617b30f8ebf464a94c396db245c9b5992ae7 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:41:50 +0300 Subject: [PATCH 13/21] session end: uncommitted changes --- .claude/skills/memory-consolidation/SKILL.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 073c153..e441bc4 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -139,6 +139,8 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. + **Deterministic `reason` derivation for normal contradictions.** For an entry routed through this step (4c), `reason` MUST be derived from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 chars (cut at byte boundary, no trailing whitespace). Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars). The resulting string is the dedup key: as long as the underlying body lines and the file pair are unchanged, the same contradiction yields the same `reason` byte-for-byte across runs. The deferral routes in steps 5 and 6 supply their own explicit `reason` strings (e.g. `"(deferred: edit verify failed)"`) instead of this derivation. + 5. **Apply auto-resolved edits.** For each auto-resolved pair: - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: anchor invalidated by prior edit)"`. @@ -224,7 +226,16 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase D: Report & Cleanup -Phase D substeps run in numerical order, with one exception: step 2's MEMORY.md edit attempt finalizes the value of `pending_added`, and step 1's diary "Lint" line plus step 3's JSONL `pending_added` field MUST both reference that final value. Two adjustments can lower `pending_added` relative to the count carried out of Phase B.5: (i) the dedup pass MAY remove items already present in MEMORY.md, and (ii) if step 2's edit fails `verify` and rolls back (excluding the documented `SUSPICIOUS_SHRINK` bypass), NO bullets were persisted and `pending_added` MUST be reset to 0. Practically: perform step 2's dedup computation AND its safe-edit attempt BEFORE emitting the diary line in step 1, so the diary uses the post-write (or post-rollback) value. If a rollback occurs, the diary's "Lint" line MUST report `pending added: 0` and the Issues section MUST note the rollback; step 3's stats line MUST also use the rolled-back value. Never write a diary line claiming new bullets when MEMORY.md was not actually modified. +Phase D substeps do NOT execute in literal numerical order. The diary (step 1) is written LAST among steps 1–3 so it can summarize the actual outcomes of steps 2 and 3 — including their issues — in a single write (the spec does NOT support amending an already-written diary section). Step 3 is also split: its compute-and-validate work runs early (to surface any issues), and the actual JSONL append runs after the diary. Execution order: + + 1. **Step 2** (MEMORY.md "Pending Review" edit) — finalizes `pending_added`; may surface a `SUSPICIOUS_SHRINK` rollback issue. + 2. **Step 3 compute-and-validate** — compute `pending_total` and `avg_age_days` from the post-step-2 MEMORY.md, compose the JSONL line, validate it via `jq -e`. This sub-step MAY surface issues to the Issues collector: malformed bullets whose `detected_at` fails to parse (counted with age 0; logged), or a candidate JSON line that fails `jq` validation (NOT appended; logged). Do NOT perform the actual `>>` append yet — defer it to after the diary. + 3. **Step 1** (diary) — write diary using the finalized `pending_added` AND the full Issues set collected in (1) and (2). + 4. **Step 3 append** — perform the actual `printf '%s\n' "$LINE" >> memory/lint-stats.jsonl`, only if (2)'s `jq` validation passed. + 5. **Step 4** (release lock). + 6. **Step 5** (NO_REPLY). + +Two adjustments can lower `pending_added` relative to the count carried out of Phase B.5: (i) the dedup pass MAY remove items already present in MEMORY.md, and (ii) if step 2's edit fails `verify` and rolls back (excluding the documented `SUSPICIOUS_SHRINK` bypass), NO bullets were persisted and `pending_added` MUST be reset to 0. If a rollback occurs, the diary's "Lint" line MUST report `pending added: 0` and the Issues section MUST note the rollback; step 3's stats line MUST also use the rolled-back value. Never write a diary line claiming new bullets when MEMORY.md was not actually modified. 1. **Write diary entry** to `memory/diary/YYYY-MM-DD.md` (using today's date). From 1e604f3e6f1a87f82cf1cccf07238cd075fd8be6 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 19:51:30 +0300 Subject: [PATCH 14/21] session end: uncommitted changes --- .claude/skills/memory-consolidation/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index e441bc4..604fadb 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -132,14 +132,14 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, **Per-pair exclusion (anti-loop).** Skip the pair entirely if EITHER file's `do_not_reopen` list contains an entry whose `partner` matches the other file's bare name AND that entry's `before` date is later than today. Exclusion is genuinely per-pair: each partner has its own `before` date stored in its own record, so resolving A↔C cannot extend A↔B's cooldown. If the matching entry's `before` is absent or fails the `^\d{4}-\d{2}-\d{2}$` regex (malformed or legacy prior write), treat the exclusion as inactive for that specific pair and let the pair proceed to judgment — anti-loop requires a well-formed date. Only YYYY-MM-DD dates are recognized; the spec does not support semantic-condition values like `"Ninja revisits topic X"`, since the skill has no mechanism to auto-detect such events (use a far-future date if indefinite suppression is genuinely needed). -3. **LLM judgment per candidate.** For each candidate pair, ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. +3. **LLM judgment per candidate.** For each candidate pair, first establish canonical ordering: sort the two files by basename so that A's basename is lexicographically less than B's basename, and present them to the LLM in that order. This ensures `claim_a` / `claim_b` map deterministically to the lex-smaller / lex-larger file regardless of pair iteration order — without this, dedup keys produced by step 4c can differ between runs when filesystem enumeration flips the order. Then ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim — with `claim_a` belonging to canonical-A (lex-smaller basename) and `claim_b` belonging to canonical-B. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. 4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence`. A "recency" tie-breaker was considered but dropped: `resolved_at` reflects a file's unrelated prior resolution history, not the freshness of the currently contradicting claim, so it is not a valid freshness proxy. Direct freshness evidence is already handled by (a). a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. - **Deterministic `reason` derivation for normal contradictions.** For an entry routed through this step (4c), `reason` MUST be derived from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 chars (cut at byte boundary, no trailing whitespace). Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars). The resulting string is the dedup key: as long as the underlying body lines and the file pair are unchanged, the same contradiction yields the same `reason` byte-for-byte across runs. The deferral routes in steps 5 and 6 supply their own explicit `reason` strings (e.g. `"(deferred: edit verify failed)"`) instead of this derivation. + **Deterministic `reason` derivation for normal contradictions.** For an entry routed through this step (4c), `reason` MUST be derived from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Step 3's canonical-ordering rule (lex-smaller basename = canonical-A) guarantees `claim_a` / `claim_b` map to the same files across runs, so the composed string is order-invariant. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 characters at a UTF-8 codepoint boundary (count codepoints, never split a multi-byte sequence mid-character — write invalid UTF-8 to MEMORY.md and downstream `json.loads`-per-line stats parsing will break), with no trailing whitespace. Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars — again at a UTF-8 codepoint boundary). The resulting string is the dedup key: as long as the underlying body lines and the file pair are unchanged, the same contradiction yields the same `reason` byte-for-byte across runs. The deferral routes in steps 5 and 6 supply their own explicit `reason` strings (e.g. `"(deferred: edit verify failed)"`) instead of this derivation. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. From 44fbcb85049733110008c5fba526716e3c9bb254 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:02:33 +0300 Subject: [PATCH 15/21] session end: uncommitted changes --- .claude/rules/platform/memory-protocol.md | 4 +-- .claude/skills/memory-consolidation/SKILL.md | 32 +++++++++++--------- 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 97a2301..37beb59 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -25,8 +25,8 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. - **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. - - **Concurrency: take the consolidation lock, do not just check it.** Before any edit, acquire the same lock the nightly consolidator uses — a file-existence check is TOCTOU-racy (cron can grab the lock between the check and the first write) and never reclaims a stale lock. Run `bash .claude/skills/memory-consolidation/scripts/lock.sh acquire "$PWD/.consolidation.lock" 60` and capture the `ACQUIRED ` value. If the script prints `LOCKED` (exit 1), tell Ninja the resolution is deferred and re-attempt at the next opportunity (cron runs are minutes, not hours). When acquired, pass the token to every later `refresh`/`release` call to prove ownership and release the lock at the end of the resolution — including on failure paths (after any rollback). The `60` arg is the stale-TTL in minutes; the script reclaims abandoned locks automatically. Also check `.maintenance.lock` first via `bash .claude/skills/memory-consolidation/scripts/lock.sh check-maintenance "$PWD"`; defer if it returns `MAINTENANCE`. - - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `.claude/skills/memory-consolidation/scripts/safe-edit.sh backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. + - **Concurrency: take the consolidation lock, do not just check it.** Before any edit, acquire the same lock the nightly consolidator uses — a file-existence check is TOCTOU-racy (cron can grab the lock between the check and the first write) and never reclaims a stale lock. **All paths in this protocol are anchored at the workspace root via `${CLAUDE_PROJECT_DIR}`, never `$PWD` or relative paths.** The agent's shell may be in any subdirectory when invoked, so a relative `.claude/...` path or a `$PWD`-relative lock can target the wrong file (or fail to find the script entirely) and silently bypass coordination with the nightly consolidator. Run `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" acquire "${CLAUDE_PROJECT_DIR}/.consolidation.lock" 60` and capture the `ACQUIRED ` value. If the script prints `LOCKED` (exit 1), tell Ninja the resolution is deferred and re-attempt at the next opportunity (cron runs are minutes, not hours). When acquired, pass the token to every later `refresh`/`release` call to prove ownership and release the lock at the end of the resolution — including on failure paths (after any rollback). The `60` arg is the stale-TTL in minutes; the script reclaims abandoned locks automatically. Also check `.maintenance.lock` first via `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" check-maintenance "${CLAUDE_PROJECT_DIR}"`; defer if it returns `MAINTENANCE`. + - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `"${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/safe-edit.sh" backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. Always reference the script and target files via `${CLAUDE_PROJECT_DIR}` absolute paths so the call is correct regardless of the agent's current working directory. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured in the pre-edit file (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. Capture both numbers BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 604fadb..145aa9b 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -117,7 +117,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). -Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). +Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`, `pending_resolved = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). `pending_resolved` accumulates `{files:[A,B], reason: canonical_reason}` records for pairs that this run auto-resolved (used by Phase D step 2 to clear any pre-existing MEMORY.md "Pending Review" bullet for the same contradiction so stale flags do not accumulate across runs). 1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop list). `do_not_reopen` is read as a YAML list of records, each with `partner` (filename) and `before` (YYYY-MM-DD date) — if the field is absent treat as empty list. Normalize each `partner` to its bare filename (apply `basename`, strip any directory prefix) and dedupe by `partner` (if duplicate entries exist for the same partner, keep the one with the latest `before` — never shorten an existing pair's cooldown). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. The two sets are disjoint by construction so a single shared word cannot satisfy both signals in step 2. Both sets are empty if no match is found. @@ -135,15 +135,16 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 3. **LLM judgment per candidate.** For each candidate pair, first establish canonical ordering: sort the two files by basename so that A's basename is lexicographically less than B's basename, and present them to the LLM in that order. This ensures `claim_a` / `claim_b` map deterministically to the lex-smaller / lex-larger file regardless of pair iteration order — without this, dedup keys produced by step 4c can differ between runs when filesystem enumeration flips the order. Then ask one in-skill question: "Do these two claims contradict each other, or is one a time-scoped evolution of the other?" The LLM must return a structured response: `{verdict, claim_a, claim_b}` where `verdict` is `contradiction` | `evolution` | `unrelated`, and `claim_a` / `claim_b` are the single full body lines (verbatim, including leading bullet/heading markers if any) from each file that carry the contradicting claim — with `claim_a` belonging to canonical-A (lex-smaller basename) and `claim_b` belonging to canonical-B. The `claim_a` / `claim_b` strings are used as exact match anchors in step 5; if either is empty or does not appear verbatim in the corresponding file body, downgrade the verdict to `unrelated` and log the mismatch in the Phase D diary Issues section. Only `contradiction` proceeds to step 4. **Time-scoped changes are NOT contradictions** — a fact like "used X then, uses Y now" is evolution, not contradiction. On malformed LLM output or transient error, treat as `unrelated` and log the failure in the Phase D diary Issues section. Increment `contradictions_detected` for each `contradiction` verdict. 4. **Auto-resolve hierarchy** (apply in order, stop at first match): the rule is `evidence > confidence`. A "recency" tie-breaker was considered but dropped: `resolved_at` reflects a file's unrelated prior resolution history, not the freshness of the currently contradicting claim, so it is not a valid freshness proxy. Direct freshness evidence is already handled by (a). + + **Compute `canonical_reason` FIRST (applies to every contradiction, regardless of resolution path).** Before evaluating a–c below, derive the deterministic `reason` string from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Step 3's canonical-ordering rule (lex-smaller basename = canonical-A) guarantees `claim_a` / `claim_b` map to the same files across runs, so the composed string is order-invariant. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 characters at a UTF-8 codepoint boundary (count codepoints, never split a multi-byte sequence mid-character — invalid UTF-8 written to MEMORY.md will break downstream `json.loads`-per-line stats parsing), with no trailing whitespace. Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars — again at a UTF-8 codepoint boundary). The resulting string is `canonical_reason`. It is the dedup key shared by ALL routes that flag this contradiction (4c plus the deferral routes in steps 5 and 6) AND by the resolution-removal path (step 5 successful auto-resolves append `{files, reason: canonical_reason}` to `pending_resolved` so Phase D can clear any pre-existing MEMORY.md bullet for this same contradiction). Because identity is the file pair plus the underlying claim lines, all routes use the SAME `canonical_reason` — deferral causes (verify failure, anchor invalidation, mutation-limit) are forensic metadata logged in the Phase D diary Issues section, NOT embedded in the bullet's `reason` field. Embedding cause text would split one contradiction's bullets across multiple variants and defeat dedup; it would also prevent a later run from removing the same contradiction's stale bullet when it succeeds in auto-resolving. + a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. - c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason, detected_at:}`. This is the ONLY in-phase append; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly. - - **Deterministic `reason` derivation for normal contradictions.** For an entry routed through this step (4c), `reason` MUST be derived from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Step 3's canonical-ordering rule (lex-smaller basename = canonical-A) guarantees `claim_a` / `claim_b` map to the same files across runs, so the composed string is order-invariant. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 characters at a UTF-8 codepoint boundary (count codepoints, never split a multi-byte sequence mid-character — write invalid UTF-8 to MEMORY.md and downstream `json.loads`-per-line stats parsing will break), with no trailing whitespace. Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars — again at a UTF-8 codepoint boundary). The resulting string is the dedup key: as long as the underlying body lines and the file pair are unchanged, the same contradiction yields the same `reason` byte-for-byte across runs. The deferral routes in steps 5 and 6 supply their own explicit `reason` strings (e.g. `"(deferred: edit verify failed)"`) instead of this derivation. + c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason: canonical_reason, detected_at:}`. This is the ONLY in-phase append for the unresolved path; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason: canonical_reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly and dedup correctly merges a future 4c finding with an earlier deferral for the same contradiction. 5. **Apply auto-resolved edits.** For each auto-resolved pair: - - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: edit verify failed)"`. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: anchor invalidated by prior edit)"`. + - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}` — the SAME `canonical_reason` computed in step 4, so a future run's 4c finding for this contradiction dedup-merges into the existing bullet. Log the deferral cause `"edit verify failed (file=, error=)"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. + - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}`. Log the deferral cause `"anchor invalidated by prior edit (file=)"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen` is a **list of records** keyed by `partner` — locate the existing entry for this pair's partner (if any) and apply `MAX(existing.before, new_value)` to its `before` field; NEVER shorten this pair's cooldown. If no entry exists for this partner, APPEND a new `{partner, before}` record. Other files' entries (for unrelated partners) are untouched — resolving A↔C must not extend A↔B's cooldown. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. **Default `new_value` for `before`: `today + 90 days`** (same default as the interactive resolution path in `.claude/rules/platform/memory-protocol.md`, so nightly and manual cooldowns match). Use a far-future date (e.g., year 2099) if indefinite suppression is genuinely required; the field is always a YYYY-MM-DD date. ```yaml @@ -154,11 +155,11 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, before: YYYY-MM-DD # always a date; other partners' dates are untouched ``` `resolution_basis` MUST be sanitized: single line, max 200 chars, replace embedded newlines with `; `, strip leading `#`, double-quote and escape `"` as `\"` and `\` as `\\`. `before` values are always date-form (matching `^\d{4}-\d{2}-\d{2}$`) and written unquoted as YAML dates — semantic-condition values are not supported (the skill has no auto-trigger for them; use a far-future date if indefinite suppression is required). - - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. **Timing:** both increments fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run. + - Increment `auto_resolved` by 1 per resolved pair. Increment `mutations_applied` by 2 per resolved pair — one per file edit, matching Phase C's "each file modification counts as one mutation" rule so the shared 5-per-run budget is counted consistently across phases. Append `{files:[A,B], reason: canonical_reason}` to `pending_resolved` so Phase D step 2 can remove any pre-existing MEMORY.md "Pending Review" bullet for this same contradiction (idempotent — a no-op if no matching bullet exists). **Timing:** all three (increments + `pending_resolved` append) fire ONLY after both files' `verify` calls succeed and both backups have been cleaned. A pair that fails verify and is rolled back does NOT consume the budget and is NOT appended to `pending_resolved` (the increments are not applied) — the remaining budget is preserved for subsequent pairs in the same run, and the failed pair instead lands in `pending_review` per the verify-failure bullet above. -6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using the canonical shape from step 4c with `reason: "(deferred: mutation limit reached)"`. +6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}` — the SAME `canonical_reason` so dedup correctly merges with any earlier or later finding for this contradiction. Log the deferral cause `"mutation limit reached"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. -7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, and `pending_review` to Phase D for stats and Pending Review writes. +7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, `pending_review`, and `pending_resolved` to Phase D for stats, Pending Review adds, and stale-bullet removals. **Lock refresh:** Before continuing, refresh the lock: ```bash @@ -228,7 +229,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi Phase D substeps do NOT execute in literal numerical order. The diary (step 1) is written LAST among steps 1–3 so it can summarize the actual outcomes of steps 2 and 3 — including their issues — in a single write (the spec does NOT support amending an already-written diary section). Step 3 is also split: its compute-and-validate work runs early (to surface any issues), and the actual JSONL append runs after the diary. Execution order: - 1. **Step 2** (MEMORY.md "Pending Review" edit) — finalizes `pending_added`; may surface a `SUSPICIOUS_SHRINK` rollback issue. + 1. **Step 2** (MEMORY.md "Pending Review" edit) — applies `pending_resolved` removals first, then dedup-appends `pending_review` adds in a single MEMORY.md mutation, finalizes `pending_added`; may surface a `SUSPICIOUS_SHRINK` rollback issue. 2. **Step 3 compute-and-validate** — compute `pending_total` and `avg_age_days` from the post-step-2 MEMORY.md, compose the JSONL line, validate it via `jq -e`. This sub-step MAY surface issues to the Issues collector: malformed bullets whose `detected_at` fails to parse (counted with age 0; logged), or a candidate JSON line that fails `jq` validation (NOT appended; logged). Do NOT perform the actual `>>` append yet — defer it to after the diary. 3. **Step 1** (diary) — write diary using the finalized `pending_added` AND the full Issues set collected in (1) and (2). 4. **Step 3 append** — perform the actual `printf '%s\n' "$LINE" >> memory/lint-stats.jsonl`, only if (2)'s `jq` validation passed. @@ -265,7 +266,7 @@ Two adjustments can lower `pending_added` relative to the count carried out of P - Created/Updated memory/auto/filename.md — reason ### Lint (Phase B.5) # omit this entire block when LINT_PHASE_B5_ENABLED=false - - Candidates: N, contradictions: N, auto-resolved: N, pending added: N + - Candidates: N, contradictions: N, auto-resolved: N, pending added: N, pending removed: N # `pending removed` = count of stale MEMORY.md bullets cleared this run by `pending_resolved` (Phase D step 2). Forensic deferral causes (verify failure, anchor invalidation, mutation-limit) are logged in the Issues section below. ### Noted for Review - [confidence 0.7] Possible insight — context @@ -274,11 +275,12 @@ Two adjustments can lower `pending_added` relative to the count carried out of P - Any errors encountered during processing ``` -2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. - - If the `pending_review` accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars) AND additionally replace any em-dash characters (`—`, U+2014) in the reason with a hyphen-space (`- `) so the bullet's three-field structure can be split unambiguously on the literal ` — ` separator. +2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. This step applies BOTH the nightly-auto-resolve removals (`pending_resolved` from Phase B.5 step 5) AND the new-flag adds (`pending_review`) in a single MEMORY.md mutation — removals first, then adds, so dedup decisions in the add pass see the post-removal state. + - **Process `pending_resolved` removals FIRST (nightly auto-resolve cleanup).** For each `{files, reason}` record in `pending_resolved`, find and remove the bullet whose triple `(file-A, file-B, reason)` matches per the removal rule below. This prevents stale flags from accumulating: a contradiction previously surfaced in `MEMORY.md` and auto-resolved in a later nightly run no longer inflates `pending_total` / `avg_age_days`. Removal is idempotent — a no-op if no matching bullet exists (e.g., the auto-resolve happened on the same run the contradiction was first detected). + - **Then process `pending_review` adds.** If the accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars) AND additionally replace any em-dash characters (`—`, U+2014) in the reason with a hyphen-space (`- `) so the bullet's three-field structure can be split unambiguously on the literal ` — ` separator. New bullets MUST use `canonical_reason` from Phase B.5 step 4 as the `reason` field — deferral causes (verify failure, anchor invalidation, mutation-limit) are forensic and belong in the diary Issues section, NOT in the bullet. - Before appending a new bullet, deduplicate on the triple `(file-A, file-B, reason)` (unordered file pair, exact reason after sanitization): if an existing bullet matches all three fields, do NOT append again. Two genuinely distinct contradictions between the same pair (different reasons) produce two separate bullets — do not collapse them. Update `pending_added` to count only newly written bullets. - - If `pending_review` is empty AND no prior unresolved bullets remain in the section, the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - - When the agent or a future run resolves a pending item, remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction being resolved — the same triple used as the dedup key when the bullet was written. The match key is the bullet's literal `reason` field (the third dash-separated segment, sanitized as on write), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair `(file-A, file-B)` must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions. When the last bullet in the section is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. + - If `pending_review` is empty AND no prior unresolved bullets remain in the section (after removals), the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. + - **Removal rule (used by both the nightly `pending_resolved` path above AND interactive resolutions).** Remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction being resolved — the same triple used as the dedup key when the bullet was written. The match key is the bullet's literal `reason` field (the third dash-separated segment, sanitized as on write), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair `(file-A, file-B)` must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions. When the last bullet in the section is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured against `MEMORY.md` **before** the edit is applied (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. The runner MUST capture both the section bytes and the pre-edit total byte size BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies to `${FILEPATH}.consolidation-backup`); do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): From 5fe82fcde453ca6aab97913ed9d653314bd8c9d8 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:10:11 +0300 Subject: [PATCH 16/21] session end: uncommitted changes --- .claude/rules/platform/memory-protocol.md | 2 +- .claude/skills/memory-consolidation/SKILL.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 37beb59..856e614 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -27,7 +27,7 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. - **Concurrency: take the consolidation lock, do not just check it.** Before any edit, acquire the same lock the nightly consolidator uses — a file-existence check is TOCTOU-racy (cron can grab the lock between the check and the first write) and never reclaims a stale lock. **All paths in this protocol are anchored at the workspace root via `${CLAUDE_PROJECT_DIR}`, never `$PWD` or relative paths.** The agent's shell may be in any subdirectory when invoked, so a relative `.claude/...` path or a `$PWD`-relative lock can target the wrong file (or fail to find the script entirely) and silently bypass coordination with the nightly consolidator. Run `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" acquire "${CLAUDE_PROJECT_DIR}/.consolidation.lock" 60` and capture the `ACQUIRED ` value. If the script prints `LOCKED` (exit 1), tell Ninja the resolution is deferred and re-attempt at the next opportunity (cron runs are minutes, not hours). When acquired, pass the token to every later `refresh`/`release` call to prove ownership and release the lock at the end of the resolution — including on failure paths (after any rollback). The `60` arg is the stale-TTL in minutes; the script reclaims abandoned locks automatically. Also check `.maintenance.lock` first via `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" check-maintenance "${CLAUDE_PROJECT_DIR}"`; defer if it returns `MAINTENANCE`. - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `"${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/safe-edit.sh" backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. Always reference the script and target files via `${CLAUDE_PROJECT_DIR}` absolute paths so the call is correct regardless of the agent's current working directory. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. - - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured in the pre-edit file (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. Capture both numbers BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. + - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is to the Pending Review section (bullets and/or the section heading itself — interactive resolution typically just removes one bullet, but the bypass is defined to cover any in-section change), `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) everything OUTSIDE the Pending Review section is byte-identical between the backup (`${FILEPATH}.consolidation-backup`) and the post-edit file. Concretely, `outside_bytes(file)` is the file content with the section excised — from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF, inclusive of any trailing blank line separating the section from what follows; if the section is absent in a given state, `outside_bytes` equals that full file. The bypass holds iff `outside_bytes(backup) == outside_bytes(post-edit)` (exact byte-string equality — content outside the section is not supposed to move). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). - **Bullet match key when removing.** Remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction you are resolving — the same triple used as the dedup key when the bullet was written (SKILL.md Phase D step 2). The match key is the bullet's literal `reason` field (the third ` — `-separated segment after sanitization), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions that still need resolution. diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 145aa9b..602ef6d 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -144,8 +144,8 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 5. **Apply auto-resolved edits.** For each auto-resolved pair: - **Edits MUST use `safe-edit.sh` with paired two-phase commit semantics.** Each resolved pair touches two `memory/auto/*.md` files (A and B); they must succeed or fail together. The flow is: (i) `backup` BOTH files first; (ii) apply the annotation + frontmatter edits to BOTH files; (iii) run `verify` on BOTH files; (iv) ONLY if both verifies succeed, `clean` both backups. If `verify` fails on either file, `rollback` BOTH files from their backups and route the pair to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}` — the SAME `canonical_reason` computed in step 4, so a future run's 4c finding for this contradiction dedup-merges into the existing bullet. Log the deferral cause `"edit verify failed (file=, error=)"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. Never `clean` one backup before the other has verified — otherwise a partial mutation can leave file A annotated while file B is untouched, breaking the symmetric anti-loop guarantee. - - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim. If the exact match no longer holds, abort this pair, rollback both files from their backups, and route the pair to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}`. Log the deferral cause `"anchor invalidated by prior edit (file=)"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. - - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. If the line appears more than once verbatim in the body, annotate only the first occurrence and log the duplicate in the Phase D diary Issues section. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. + - **Re-verify anchor before each file's edit.** A prior pair within the same run may have already annotated the same line. Immediately before applying the annotation (after `backup` but before mutating the file), re-read the target file's current body and confirm the relevant `claim_a` / `claim_b` string still appears verbatim AND appears exactly ONCE. Count verbatim occurrences with whole-line equality (each body line, post-strip of trailing whitespace, compared against the claim string post-strip). If the exact match no longer holds (zero occurrences), abort this pair, rollback both files from their backups, and route the pair to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}`. Log the deferral cause `"anchor invalidated by prior edit (file=)"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. If the claim string appears MORE than once (ambiguous anchor: the LLM only returned a verbatim line, not which occurrence it judged — auto-annotating any single occurrence risks marking the wrong line while still writing the 90-day `do_not_reopen` cooldown), likewise abort this pair, rollback both files from their backups, and route to `pending_review` using the same shape. Log the deferral cause `"ambiguous anchor: claim line appears N times in file="` (substitute N) in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. Never auto-annotate when the anchor count is not exactly 1 — Ninja can disambiguate during interactive resolution. + - **Never silent-delete.** Locate the losing claim line by exact-match against the `claim_a` / `claim_b` string returned in step 3 (whichever side lost the auto-resolve in step 4). The anchor re-verify above guarantees exactly one verbatim occurrence at this point. Append ` (superseded YYYY-MM-DD: )` to that line — do NOT delete the original text. The annotation lives as a trailing parenthetical on the same line so audit history is preserved. - Add anti-loop fields to BOTH files' frontmatter. `do_not_reopen` is a **list of records** keyed by `partner` — locate the existing entry for this pair's partner (if any) and apply `MAX(existing.before, new_value)` to its `before` field; NEVER shorten this pair's cooldown. If no entry exists for this partner, APPEND a new `{partner, before}` record. Other files' entries (for unrelated partners) are untouched — resolving A↔C must not extend A↔B's cooldown. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. **Default `new_value` for `before`: `today + 90 days`** (same default as the interactive resolution path in `.claude/rules/platform/memory-protocol.md`, so nightly and manual cooldowns match). Use a far-future date (e.g., year 2099) if indefinite suppression is genuinely required; the field is always a YYYY-MM-DD date. ```yaml resolved_at: YYYY-MM-DD @@ -281,7 +281,7 @@ Two adjustments can lower `pending_added` relative to the count carried out of P - Before appending a new bullet, deduplicate on the triple `(file-A, file-B, reason)` (unordered file pair, exact reason after sanitization): if an existing bullet matches all three fields, do NOT append again. Two genuinely distinct contradictions between the same pair (different reasons) produce two separate bullets — do not collapse them. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section (after removals), the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. - **Removal rule (used by both the nightly `pending_resolved` path above AND interactive resolutions).** Remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction being resolved — the same triple used as the dedup key when the bullet was written. The match key is the bullet's literal `reason` field (the third dash-separated segment, sanitized as on write), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair `(file-A, file-B)` must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions. When the last bullet in the section is removed, the section heading itself is removed in the same edit. Match the section by its exact title `## Pending Review (Lint findings)` and remove only between that heading and the next `## ` heading or EOF — do not touch unrelated occurrences of the string. - - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is removing Pending Review bullets and/or the section heading, `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). In that specific case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) the byte-size of the Pending Review section measured against `MEMORY.md` **before** the edit is applied (from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF) equals `pre_edit_size - post_edit_size` ± 5 bytes. The runner MUST capture both the section bytes and the pre-edit total byte size BEFORE issuing the edit (the pre-edit state is identical to what `safe-edit.sh backup` copies to `${FILEPATH}.consolidation-backup`); do not measure live during/after the write. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. + - This edit uses the standard `safe-edit.sh backup / verify / rollback / clean` flow, with one allowance: when the edit's only effect is to add, remove, or rewrite Pending Review bullets and/or the section heading (i.e. nothing OUTSIDE the section changed), `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`, and a mixed run that removes a backlog of stale bullets while adding only one or two new ones can still net-shrink below the 20% threshold). The bypass covers BOTH removal-only and mixed (remove + add) edits, since Phase D step 2 applies removals and adds in a single mutation. In that case, accept the result if (a) the post-edit file still contains the `# Memory Index` heading AND (b) everything OUTSIDE the Pending Review section is byte-identical between the pre-edit (backup) and post-edit states. Concretely, define `outside_bytes(file)` as the byte-content of `file` with the Pending Review section excised — from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF, inclusive of any trailing blank line that visually separates the section from what follows. If the section is absent in either state, `outside_bytes` equals the full file content for that state. The bypass holds iff `outside_bytes(backup) == outside_bytes(post-edit)` (exact byte-string equality, not ± tolerance — anything outside the section is not supposed to move). The runner MUST capture the backup state from `${FILEPATH}.consolidation-backup` (which is identical to the pre-edit file) and the post-edit state by re-reading `${FILEPATH}` after the write; both reads happen AFTER the write completes. Otherwise rollback as usual. Document the bypass in the diary Issues section so the audit trail is preserved. 3. **Append a line to `memory/lint-stats.jsonl`.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. The file is created on first run if absent. Format is one strict JSON object per line, parseable by Python `json.loads` per line. Each angle-bracketed placeholder below is substituted with the actual value (`` becomes a literal integer like `7`, `` becomes a float like `4.5`): ```json From bf55cb32ba7475d14f75ebac37af34d389554cfe Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:18:46 +0300 Subject: [PATCH 17/21] session end: uncommitted changes --- .claude/skills/memory-consolidation/SKILL.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 602ef6d..82344e1 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -117,9 +117,11 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). -Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_review = []`, `pending_resolved = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). `pending_resolved` accumulates `{files:[A,B], reason: canonical_reason}` records for pairs that this run auto-resolved (used by Phase D step 2 to clear any pre-existing MEMORY.md "Pending Review" bullet for the same contradiction so stale flags do not accumulate across runs). +Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, `auto_resolved = 0`, `pending_added = 0`, `pending_removed = 0`, `pending_review = []`, `pending_resolved = []`. Also initialize `mutations_applied = 0` here — this counter is **shared with Phase C** (do not re-zero on entry to Phase C). `pending_resolved` accumulates `{files:[A,B], reason: canonical_reason}` records for pairs that this run auto-resolved (used by Phase D step 2 to clear any pre-existing MEMORY.md "Pending Review" bullet for the same contradiction so stale flags do not accumulate across runs). `pending_removed` is the count of actual MEMORY.md bullets cleared by Phase D step 2 — incremented per bullet match, NOT per `pending_resolved` entry (since most entries are no-ops on the first run where the contradiction is both detected and resolved before any bullet was ever written). -1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop list). `do_not_reopen` is read as a YAML list of records, each with `partner` (filename) and `before` (YYYY-MM-DD date) — if the field is absent treat as empty list. Normalize each `partner` to its bare filename (apply `basename`, strip any directory prefix) and dedupe by `partner` (if duplicate entries exist for the same partner, keep the one with the latest `before` — never shorten an existing pair's cooldown). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. The two sets are disjoint by construction so a single shared word cannot satisfy both signals in step 2. Both sets are empty if no match is found. +1. **Build lightweight representation.** Iterate `memory/auto/*.md` and for each file extract: `{file, type, name, tags, title_tokens, body_predicates, negation_markers, do_not_reopen, recent_diary_mention}`. Source the fields from frontmatter (`type`, `name`, optional `tags`, the anti-loop list). `do_not_reopen` is read as a YAML list of records, each with `partner` (filename) and `before` (YYYY-MM-DD date) — if the field is absent treat as empty list. Normalize each `partner` to its bare filename (apply `basename`, strip any directory prefix) and dedupe by `partner` (if duplicate entries exist for the same partner, keep the one with the latest `before` — never shorten an existing pair's cooldown). Tokenize the `name` slug and the first body heading (or the `description` frontmatter field if no heading exists, else just the `name` slug tokens) into `title_tokens`. Skip stop words (`a, an, the, of, for, with, and, or, to, in, on`) and tokens of length < 3. Also scan the file body (everything after the closing `---` of the frontmatter) for two sets used by step 2's signals: `body_predicates` = the subset of `{prefers, uses, hates, requires}` that appear as case-insensitive whole-word matches (single-token only — negation-style phrases like "do not"/"don't" are covered by `negation_markers`); `negation_markers` = the subset of `{not, never, avoid, instead}` that appear as case-insensitive whole-word matches. The two sets are disjoint by construction so a single shared word cannot satisfy both signals in step 2. Both sets are empty if no match is found. + + **Provenance extraction for step 4a.** Populate `recent_diary_mention: bool` by scanning diary files in `memory/diary/` whose filename date (`YYYY-MM-DD.md`) is within the last 48 hours of today's date. In each diary file, look only at lines that fall under a `### Memory Changes` subsection (between that heading and the next `### ` or `## ` heading or EOF — Phase D step 1 writes `### Memory Changes` as the canonical heading for the per-file create/update log). Set `recent_diary_mention = true` if any such line contains either (a) the file's bare basename (e.g., `topic-slug.md`) as a substring, OR (b) the path form `memory/auto/` as a substring. This is the operational definition of "direct diary/session evidence" used by step 4a — a file mentioned in a recent diary's Memory Changes block was grounded by a real conversation event, whereas a file absent from recent diaries is older/inferred. No frontmatter field is required; provenance is derived entirely from the diary log Phase D already writes. If no diary files exist or none fall within 48 hours, `recent_diary_mention = false` for all files (step 4a then yields no winner and the resolver falls through to step 4b). 2. **Cheap candidate generation FIRST** — do not blindly LLM-judge all `O(n^2)` pairs. For each unordered pair `(A, B)`, count matches across these signals: - same `type` field @@ -138,7 +140,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, **Compute `canonical_reason` FIRST (applies to every contradiction, regardless of resolution path).** Before evaluating a–c below, derive the deterministic `reason` string from the LLM's `claim_a` and `claim_b` strings returned in step 3 — NOT from any free-form LLM summary, which would be reworded between runs and break dedup. Step 3's canonical-ordering rule (lex-smaller basename = canonical-A) guarantees `claim_a` / `claim_b` map to the same files across runs, so the composed string is order-invariant. Composition: for each claim, strip any leading bullet/heading markers (`-`, `*`, `#`, plus a single following space) and surrounding whitespace, collapse internal runs of whitespace to a single space, then truncate to 80 characters at a UTF-8 codepoint boundary (count codepoints, never split a multi-byte sequence mid-character — invalid UTF-8 written to MEMORY.md will break downstream `json.loads`-per-line stats parsing), with no trailing whitespace. Compose as the literal string ` | ` (single ASCII pipe with spaces). Then apply the standard `reason` sanitization documented in Phase D step 2 (single line, strip leading `#`, collapse newlines to `; `, replace em-dashes `—` with hyphen-space `- `, truncate to 200 chars — again at a UTF-8 codepoint boundary). The resulting string is `canonical_reason`. It is the dedup key shared by ALL routes that flag this contradiction (4c plus the deferral routes in steps 5 and 6) AND by the resolution-removal path (step 5 successful auto-resolves append `{files, reason: canonical_reason}` to `pending_resolved` so Phase D can clear any pre-existing MEMORY.md bullet for this same contradiction). Because identity is the file pair plus the underlying claim lines, all routes use the SAME `canonical_reason` — deferral causes (verify failure, anchor invalidation, mutation-limit) are forensic metadata logged in the Phase D diary Issues section, NOT embedded in the bullet's `reason` field. Embedding cause text would split one contradiction's bullets across multiple variants and defeat dedup; it would also prevent a later run from removing the same contradiction's stale bullet when it succeeds in auto-resolving. - a. **Direct evidence wins over inferred.** If exactly one side of the pair has a direct diary or session reference (file:line or session timestamp citation in the last 48 hours' diary entries), that side wins. + a. **Direct evidence wins over inferred.** Use the `recent_diary_mention` boolean populated in step 1 (true iff the file's basename appears under a `### Memory Changes` heading in any diary file from the last 48 hours — see step 1's "Provenance extraction" paragraph for the exact definition). If exactly one side of the pair has `recent_diary_mention == true`, that side wins; the other is treated as the losing claim. If both sides have it (both recently grounded) or neither does (both stale/inferred), this leg yields no winner — fall through to step 4b. b. **Higher confidence wins** if both sides have a `confidence` field and `|confidence_A − confidence_B| >= 0.2`. If either side lacks `confidence` (legacy files predating the schema), treat it as `0.7` for this comparison only. c. **Otherwise flag for review** — do NOT edit either file. Append an entry to `pending_review` using the canonical shape `{files:[A,B], reason: canonical_reason, detected_at:}`. This is the ONLY in-phase append for the unresolved path; do NOT increment a separate `pending_added` counter here — Phase D step 2 computes the final post-dedup value from the actual bullets written. The same `{files, reason: canonical_reason, detected_at:}` shape is reused by the deferral routes in steps 5 and 6 below, so Phase D's parser regex matches every entry uniformly and dedup correctly merges a future 4c finding with an earlier deferral for the same contradiction. @@ -159,7 +161,7 @@ Initialize accumulators: `candidates_found = 0`, `contradictions_detected = 0`, 6. **Mutation limit is shared with Phase C.** The shared per-run budget is 5 mutations counted on `mutations_applied`. Before starting an auto-resolve pair, verify `mutations_applied + 2 <= 5` (a pair consumes 2). If the remaining budget cannot fit a full pair, stop applying further auto-resolves; remaining detections go to `pending_review` using `{files:[A,B], reason: canonical_reason, detected_at:}` — the SAME `canonical_reason` so dedup correctly merges with any earlier or later finding for this contradiction. Log the deferral cause `"mutation limit reached"` in the Phase D diary Issues section; do NOT embed the cause in the bullet's `reason` field. -7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, `pending_review`, and `pending_resolved` to Phase D for stats, Pending Review adds, and stale-bullet removals. +7. **Carry accumulators into Phase D.** Pass `candidates_found`, `contradictions_detected`, `auto_resolved`, `pending_added`, `pending_removed`, `pending_review`, and `pending_resolved` to Phase D for stats, Pending Review adds, and stale-bullet removals. `pending_removed` enters Phase D at 0 and is incremented by Phase D step 2 (per actual bullet matched and removed, not per `pending_resolved` entry). **Lock refresh:** Before continuing, refresh the lock: ```bash @@ -236,7 +238,7 @@ Phase D substeps do NOT execute in literal numerical order. The diary (step 1) i 5. **Step 4** (release lock). 6. **Step 5** (NO_REPLY). -Two adjustments can lower `pending_added` relative to the count carried out of Phase B.5: (i) the dedup pass MAY remove items already present in MEMORY.md, and (ii) if step 2's edit fails `verify` and rolls back (excluding the documented `SUSPICIOUS_SHRINK` bypass), NO bullets were persisted and `pending_added` MUST be reset to 0. If a rollback occurs, the diary's "Lint" line MUST report `pending added: 0` and the Issues section MUST note the rollback; step 3's stats line MUST also use the rolled-back value. Never write a diary line claiming new bullets when MEMORY.md was not actually modified. +Two adjustments can lower `pending_added` relative to the count carried out of Phase B.5: (i) the dedup pass MAY remove items already present in MEMORY.md, and (ii) if step 2's edit fails `verify` and rolls back (excluding the documented `SUSPICIOUS_SHRINK` bypass), NO bullets were persisted and `pending_added` MUST be reset to 0. The same rollback condition also resets `pending_removed` to 0 (rollback restores the pre-edit MEMORY.md, so the removals never landed either). If a rollback occurs, the diary's "Lint" line MUST report `pending added: 0` and `pending removed: 0`, and the Issues section MUST note the rollback; step 3's stats line MUST also use the rolled-back value. Never write a diary line claiming new bullets or removed bullets when MEMORY.md was not actually modified. 1. **Write diary entry** to `memory/diary/YYYY-MM-DD.md` (using today's date). @@ -276,7 +278,7 @@ Two adjustments can lower `pending_added` relative to the count carried out of P ``` 2. **Update workspace `MEMORY.md` "Pending Review" section.** Gated by `LINT_PHASE_B5_ENABLED`; skip if false. This step applies BOTH the nightly-auto-resolve removals (`pending_resolved` from Phase B.5 step 5) AND the new-flag adds (`pending_review`) in a single MEMORY.md mutation — removals first, then adds, so dedup decisions in the add pass see the post-removal state. - - **Process `pending_resolved` removals FIRST (nightly auto-resolve cleanup).** For each `{files, reason}` record in `pending_resolved`, find and remove the bullet whose triple `(file-A, file-B, reason)` matches per the removal rule below. This prevents stale flags from accumulating: a contradiction previously surfaced in `MEMORY.md` and auto-resolved in a later nightly run no longer inflates `pending_total` / `avg_age_days`. Removal is idempotent — a no-op if no matching bullet exists (e.g., the auto-resolve happened on the same run the contradiction was first detected). + - **Process `pending_resolved` removals FIRST (nightly auto-resolve cleanup).** For each `{files, reason}` record in `pending_resolved`, find and remove the bullet whose triple `(file-A, file-B, reason)` matches per the removal rule below. Increment `pending_removed` by 1 ONLY when a matching bullet was actually found and removed; do NOT increment for no-op records where no matching bullet existed (e.g., the auto-resolve happened on the same run the contradiction was first detected, so no prior bullet exists to clear). Using `len(pending_resolved)` here would overcount — most first-run resolutions are no-ops. This prevents stale flags from accumulating: a contradiction previously surfaced in `MEMORY.md` and auto-resolved in a later nightly run no longer inflates `pending_total` / `avg_age_days`. Removal is idempotent. - **Then process `pending_review` adds.** If the accumulator is non-empty, ensure `MEMORY.md` contains a section titled exactly `## Pending Review (Lint findings)`. Each unresolved item is one bullet in this strict, machine-parseable format (parser regex `^- detected_at=\d{4}-\d{2}-\d{2} `): `- detected_at=YYYY-MM-DD — file-A vs file-B — `. Sanitize `` the same way as `resolution_basis` (strip leading `#`, collapse newlines to `; `, truncate to 200 chars) AND additionally replace any em-dash characters (`—`, U+2014) in the reason with a hyphen-space (`- `) so the bullet's three-field structure can be split unambiguously on the literal ` — ` separator. New bullets MUST use `canonical_reason` from Phase B.5 step 4 as the `reason` field — deferral causes (verify failure, anchor invalidation, mutation-limit) are forensic and belong in the diary Issues section, NOT in the bullet. - Before appending a new bullet, deduplicate on the triple `(file-A, file-B, reason)` (unordered file pair, exact reason after sanitization): if an existing bullet matches all three fields, do NOT append again. Two genuinely distinct contradictions between the same pair (different reasons) produce two separate bullets — do not collapse them. Update `pending_added` to count only newly written bullets. - If `pending_review` is empty AND no prior unresolved bullets remain in the section (after removals), the section MUST be absent from `MEMORY.md` — do NOT leave an empty heading. From 662d245c3eba3b2b5a970440cbe42042a68c0e99 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:27:06 +0300 Subject: [PATCH 18/21] fix: address code review findings Co-Authored-By: Claude Opus 4.7 --- .claude/rules/platform/memory-protocol.md | 4 ++-- .claude/skills/memory-consolidation/SKILL.md | 13 ++++++++----- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/.claude/rules/platform/memory-protocol.md b/.claude/rules/platform/memory-protocol.md index 856e614..bda6d09 100644 --- a/.claude/rules/platform/memory-protocol.md +++ b/.claude/rules/platform/memory-protocol.md @@ -24,11 +24,11 @@ When workspace `MEMORY.md` contains a `## Pending Review (Lint findings)` sectio - **Aged escalation (>14 days):** if a pending item is older than 14 days AND no topic-relevant opportunity has arisen during the session, surface it at a natural pause or at task end. Do not let aged items sit silent indefinitely. - **One per session max:** never dump multiple pending items in a single message or session. Pick the most topic-relevant item, or if none is relevant, the oldest one. - **Never interrupt urgency:** if Ninja is mid-urgent-task (incident, time-pressured debugging, mid-deploy), do not derail the flow with a pending item — wait for a natural break or for the urgent work to finish. -- **After resolution:** once Ninja resolves a contradiction, update the affected memory file(s) under `memory/auto/` with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. +- **After resolution:** once Ninja resolves a contradiction, update BOTH memory files of the resolved pair (under `memory/auto/`) with the resolved value AND remove the corresponding bullet from the `## Pending Review (Lint findings)` section in workspace `MEMORY.md` — in the same operation. Write anti-loop fields to BOTH files symmetrically, not just the "losing" one — the nightly per-pair exclusion lookup checks EITHER file's `do_not_reopen` list, so a one-sided write still suppresses the pair, but symmetric writes match the canonical SKILL.md Phase B.5 step 5 invariant and keep nightly and interactive flows aligned. - **Concurrency: take the consolidation lock, do not just check it.** Before any edit, acquire the same lock the nightly consolidator uses — a file-existence check is TOCTOU-racy (cron can grab the lock between the check and the first write) and never reclaims a stale lock. **All paths in this protocol are anchored at the workspace root via `${CLAUDE_PROJECT_DIR}`, never `$PWD` or relative paths.** The agent's shell may be in any subdirectory when invoked, so a relative `.claude/...` path or a `$PWD`-relative lock can target the wrong file (or fail to find the script entirely) and silently bypass coordination with the nightly consolidator. Run `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" acquire "${CLAUDE_PROJECT_DIR}/.consolidation.lock" 60` and capture the `ACQUIRED ` value. If the script prints `LOCKED` (exit 1), tell Ninja the resolution is deferred and re-attempt at the next opportunity (cron runs are minutes, not hours). When acquired, pass the token to every later `refresh`/`release` call to prove ownership and release the lock at the end of the resolution — including on failure paths (after any rollback). The `60` arg is the stale-TTL in minutes; the script reclaims abandoned locks automatically. Also check `.maintenance.lock` first via `bash "${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/lock.sh" check-maintenance "${CLAUDE_PROJECT_DIR}"`; defer if it returns `MAINTENANCE`. - **Use the same safe-edit flow as the nightly path.** Wrap every affected file edit with `"${CLAUDE_PROJECT_DIR}/.claude/skills/memory-consolidation/scripts/safe-edit.sh" backup → write → verify → clean` (and `rollback` on verify failure), exactly as Phase C does. Always reference the script and target files via `${CLAUDE_PROJECT_DIR}` absolute paths so the call is correct regardless of the agent's current working directory. When the resolution touches two memory files plus `MEMORY.md`, apply the paired two-phase commit pattern from Phase B.5 step 5: `backup` ALL affected files first, apply all edits, run `verify` on ALL of them, and only `clean` the backups once every `verify` passes. If any `verify` fails, `rollback` every file from its backup and report the failure to Ninja — never leave a partial resolution where one file is annotated but another is untouched. - **MEMORY.md `SUSPICIOUS_SHRINK` allowance.** When the only `MEMORY.md` change is to the Pending Review section (bullets and/or the section heading itself — interactive resolution typically just removes one bullet, but the bypass is defined to cover any in-section change), `safe-edit.sh verify` may legitimately return `SUSPICIOUS_SHRINK` (the section can be a large fraction of a small `MEMORY.md`). Apply the same bypass as the nightly path (SKILL.md Phase D step 2): accept the verify failure if (a) the post-edit file still contains the `# Memory Index` heading AND (b) everything OUTSIDE the Pending Review section is byte-identical between the backup (`${FILEPATH}.consolidation-backup`) and the post-edit file. Concretely, `outside_bytes(file)` is the file content with the section excised — from the `## Pending Review (Lint findings)` heading line through the byte immediately preceding the next `## ` heading or EOF, inclusive of any trailing blank line separating the section from what follows; if the section is absent in a given state, `outside_bytes` equals that full file. The bypass holds iff `outside_bytes(backup) == outside_bytes(post-edit)` (exact byte-string equality — content outside the section is not supposed to move). Otherwise rollback as usual. Note the bypass briefly when reporting the resolution to Ninja so the audit trail is preserved. - - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on the affected file(s). `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. + - **Anti-loop frontmatter.** Add `resolved_at`, `resolution_basis`, and update the `do_not_reopen` list on BOTH files of the resolved pair. `do_not_reopen` is a YAML list of records, each with `partner` (the other file's bare name) and `before` (a YYYY-MM-DD date). For this pair's partner: find any existing entry and apply `MAX(existing.before, new_value)` to its `before` field — NEVER shorten this pair's window. If no entry exists for this partner, APPEND a new `{partner, before}` record. Do NOT touch entries for unrelated partners — that's what makes the cooldown genuinely per-pair. `resolved_at` and `resolution_basis` are scalars and reflect the most recent resolution across all of this file's pairs. Default `before` value: `today + 90 days`. The `before` field is always a date — semantic-condition values like `"Ninja revisits topic X"` are not supported (the consolidation skill has no mechanism to auto-detect such events; use a far-future date if indefinite suppression is genuinely required). Match the canonical YAML in `.claude/skills/memory-consolidation/SKILL.md` Phase B.5 step 5. - **Pending Review bullet format.** Preserve the bullet format `- detected_at=YYYY-MM-DD — file-A vs file-B — ` when editing — do not restyle existing bullets or rename the heading, since the consolidation skill parses both. If removing the last bullet empties the section, remove the section heading itself (do not leave an empty `## Pending Review (Lint findings)` heading behind). - **Bullet match key when removing.** Remove ONLY the bullet whose triple `(file-A, file-B, reason)` matches the specific contradiction you are resolving — the same triple used as the dedup key when the bullet was written (SKILL.md Phase D step 2). The match key is the bullet's literal `reason` field (the third ` — `-separated segment after sanitization), NOT the frontmatter `resolution_basis` field which is a separate human-readable summary. The unordered file pair must match (order-insensitive) and the sanitized `reason` text must match exactly. If the same pair has multiple unresolved bullets with different reasons, leave the non-matching bullets in place — they represent distinct unresolved contradictions that still need resolution. diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 82344e1..7fca32c 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -113,7 +113,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase B.5: Cross-file Lint (contradiction detection) -**Gated by `LINT_PHASE_B5_ENABLED`.** During the 2026-05-17 → 2026-06-17 trial window the flag defaults to **enabled**: an unset env var is treated as `true`, and only an explicit `LINT_PHASE_B5_ENABLED=false` skips this entire phase. After 2026-06-17 the default flips back to disabled (unset = skip). This is the rollback path — flip to false (no data migration) to abort the trial early. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. +**Gated by `LINT_PHASE_B5_ENABLED`.** During the 2026-05-17 → 2026-06-17 trial window the flag defaults to **enabled**: an unset env var is treated as `true`, and only an explicit `LINT_PHASE_B5_ENABLED=false` skips this entire phase. This is the rollback path — flip to false (no data migration) to abort the trial early. The default does NOT auto-flip after the trial window ends; the trial post-mortem decides whether to land the feature, in which case the default is changed by editing this file. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing). @@ -174,7 +174,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi **Mutation limit: 5 per run, shared with Phase B.5.** Each file creation or modification counts as one mutation. Phase B.5 may have already consumed part of this budget — do NOT re-initialize `mutations_applied` here. If `mutations_applied >= 5` on entry to Phase C, skip Phase C mutations entirely and proceed to Phase D. If any mutation fails, stop further mutations immediately (stop-on-failure). -Track: `mutations_failed = 0` (continue using `mutations_applied` from Phase B.5; if Phase B.5 was skipped via the feature flag, initialize `mutations_applied = 0` here). +Carry `mutations_applied` from Phase B.5 (if Phase B.5 was skipped via the feature flag, initialize `mutations_applied = 0` here). For each approved change (confidence >= 0.9), in priority order (updates before creates): @@ -193,6 +193,9 @@ For each approved change (confidence >= 0.9), in priority order (updates before type: user|project|reference|feedback confidence: 0.9 revisit_if: "Ninja decides to move" + # Optional, user-set only — this skill does not write `tags`, but Phase B.5 + # reads them as a candidate-generation signal when present: + # tags: [editor, tooling] --- Body content here. For feedback/project types, include **Why:** and **How to apply:** sections. @@ -216,8 +219,8 @@ For each approved change (confidence >= 0.9), in priority order (updates before bash "${CLAUDE_SKILL_DIR}/scripts/safe-edit.sh" clean "${CLAUDE_PROJECT_DIR}/MEMORY.md" ``` -5. Increment `mutations_applied`. If `mutations_applied >= 5`, stop applying changes. - If any mutation fails, increment `mutations_failed` and stop further mutations. +5. Increment `mutations_applied` ONLY after the edit's `safe-edit.sh verify` succeeded and `clean` completed — matching Phase B.5 step 5's "fire after verify succeeds" rule so a rolled-back edit does NOT consume the shared budget. If `mutations_applied >= 5` after the increment, stop applying changes. + If `verify` fails for an edit, `rollback` and stop further mutations (stop-on-failure); the rolled-back edit's slot remains available for future runs. **Critical: Never modify CLAUDE.md, USER.md, or IDENTITY.md.** @@ -264,7 +267,7 @@ Two adjustments can lower `pending_added` relative to the count carried out of P ### Sessions Reviewed - [topic]: brief description of what was discussed - ### Memory Changes + ### Memory Changes # REQUIRED heading — do not rename; Phase B.5 step 1 parses this heading text to populate `recent_diary_mention` (renaming silently breaks the evidence leg of the auto-resolve hierarchy). - Created/Updated memory/auto/filename.md — reason ### Lint (Phase B.5) # omit this entire block when LINT_PHASE_B5_ENABLED=false From 70021c584061d8c250bf8d8d626b0c6bda55694d Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:33:20 +0300 Subject: [PATCH 19/21] move completed plan: 2026-05-17-memory-lint-trial.md --- docs/plans/{ => completed}/2026-05-17-memory-lint-trial.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/plans/{ => completed}/2026-05-17-memory-lint-trial.md (100%) diff --git a/docs/plans/2026-05-17-memory-lint-trial.md b/docs/plans/completed/2026-05-17-memory-lint-trial.md similarity index 100% rename from docs/plans/2026-05-17-memory-lint-trial.md rename to docs/plans/completed/2026-05-17-memory-lint-trial.md From 4f5ff9b01a7e2597797603520ce093e1c51241c8 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 20:35:50 +0300 Subject: [PATCH 20/21] revert: restore pii-scan permissions block (out-of-scope regression from ralphex run) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ralphex run for memory-lint-trial inadvertently reverted the `permissions: pull-requests: read` block in .github/workflows/pii-scan.yml. That block was added in commit b909b20 to fix private-repo gitleaks permissions (PR #123). Restore to origin/main state — out of scope for this PR. --- .github/workflows/pii-scan.yml | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/.github/workflows/pii-scan.yml b/.github/workflows/pii-scan.yml index 2a28f45..b7b3c33 100644 --- a/.github/workflows/pii-scan.yml +++ b/.github/workflows/pii-scan.yml @@ -6,6 +6,15 @@ on: push: branches: [main] +# gitleaks-action@v2 lists PR commits to determine scan scope. In private +# repos, the default GITHUB_TOKEN does not grant pull-requests:read, so the +# action returns 403 "Resource not accessible by integration" and fails +# before scanning. Public repos worked under the more permissive default — +# private consumers of this reusable workflow need the explicit grant. +permissions: + contents: read + pull-requests: read + jobs: gitleaks: uses: ./.github/workflows/gitleaks-reusable.yml From 1acfb6289cff0510d45cf47557baa0205f3b95f1 Mon Sep 17 00:00:00 2001 From: fitz123 Date: Sun, 17 May 2026 21:59:52 +0300 Subject: [PATCH 21/21] fix(memory-consolidation): clarify feature flag source-of-truth MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves Copilot review on PR #124: the previous wording was ambiguous — top section presented LINT_PHASE_B5_ENABLED as if it were an editable in-file value, but the Phase B.5 paragraph described it as an env var where only an explicit 'false' disables it. A rollback per the documented 'edit this file' path would have left the env-var check unaffected. Clarification: SKILL.md is the SOLE source of truth for the flag. Not an env var, not external config. To roll back: edit the line in the Feature Flags section to LINT_PHASE_B5_ENABLED=false, commit, next nightly cron run picks up the new value. --- .claude/skills/memory-consolidation/SKILL.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.claude/skills/memory-consolidation/SKILL.md b/.claude/skills/memory-consolidation/SKILL.md index 7fca32c..633289d 100644 --- a/.claude/skills/memory-consolidation/SKILL.md +++ b/.claude/skills/memory-consolidation/SKILL.md @@ -4,9 +4,9 @@ ## Feature Flags -- `LINT_PHASE_B5_ENABLED=true` — Trial: 2026-05-17 → 2026-06-17. When false, skip Phase B.5 entirely. Rollback = flip to false (no data migration). See ADR-069. +- `LINT_PHASE_B5_ENABLED=true` — Trial: 2026-05-17 → 2026-06-17. **The line above is the sole source of truth** — not an environment variable, not external config, not a settings file lookup. To roll back: edit that line to `LINT_PHASE_B5_ENABLED=false`, commit; the next nightly run reads this skill file and respects the new value. No data migration. See ADR-069. -When the flag is false, the skill executes Phases 0/A/B/C/D as before — no cross-file lint, no Pending Review writes to MEMORY.md, no appends to `memory/lint-stats.jsonl`. Phase C still applies the new frontmatter fields (`confidence`, `revisit_if`) when creating or updating files, since those are forward-compatible regardless of the lint pass. +When the flag value (as read from the line above) is `false`, the skill executes Phases 0/A/B/C/D as before — no cross-file lint, no Pending Review writes to MEMORY.md, no appends to `memory/lint-stats.jsonl`. Phase C still applies the new frontmatter fields (`confidence`, `revisit_if`) when creating or updating files, since those are forward-compatible regardless of the lint pass. ## Context @@ -113,7 +113,7 @@ If refresh returns `STOLEN`, another run has reclaimed the lock — abort the pi ### Phase B.5: Cross-file Lint (contradiction detection) -**Gated by `LINT_PHASE_B5_ENABLED`.** During the 2026-05-17 → 2026-06-17 trial window the flag defaults to **enabled**: an unset env var is treated as `true`, and only an explicit `LINT_PHASE_B5_ENABLED=false` skips this entire phase. This is the rollback path — flip to false (no data migration) to abort the trial early. The default does NOT auto-flip after the trial window ends; the trial post-mortem decides whether to land the feature, in which case the default is changed by editing this file. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. +**Gated by `LINT_PHASE_B5_ENABLED`** declared in the Feature Flags section at the top of this skill file (line ~7). The flag is read by the skill agent from this file — it is NOT an environment variable, NOT an external config lookup. During the 2026-05-17 → 2026-06-17 trial window the documented value is `true`. This entire phase is skipped only when the documented value reads exactly `LINT_PHASE_B5_ENABLED=false` — the rollback path (no data migration) for aborting the trial early. The documented value does NOT auto-flip after the trial window ends; the trial post-mortem decides whether to land the feature, in which case the value is changed by editing this file. Skipped runs MUST NOT write to `memory/lint-stats.jsonl` or touch the workspace `MEMORY.md` "Pending Review" section. Cross-file scan of existing `memory/auto/*.md` files for contradictions that the per-fact Phase B check cannot catch (Phase B only compares new vs existing, not existing vs existing).