Skip to content

Merge upstream GBrain v0.36.5 while preserving Eva OpenClaw defaults#104

Open
100yenadmin wants to merge 19 commits into
masterfrom
eva/merge-upstream-v0.36.5.0
Open

Merge upstream GBrain v0.36.5 while preserving Eva OpenClaw defaults#104
100yenadmin wants to merge 19 commits into
masterfrom
eva/merge-upstream-v0.36.5.0

Conversation

@100yenadmin
Copy link
Copy Markdown
Member

TL;DR

This PR catches Eva Brain up to upstream GBrain v0.36.5.0 (garrytan/gbrain@e2279650) while keeping Eva's downstream product contract intact: OpenClaw-native install, /plugins/gbrain/extract, Codex/OAuth extraction posture, support-KB setup, safe updater behavior, and Voyage 4 Large 2048d defaults.

Closes #103.

Why This Is The Right Shape

Eva is healthiest as a thin fork. Upstream should own GBrain's database/search/sync/doctor/skillpack core as soon as merged upstream code is better or broadly useful. Eva should keep only the pieces upstream does not yet provide for our users: OpenClaw plugin packaging, no-key host runtime extraction, public install/update helpers, and defaults that match our fleet.

flowchart LR
  U["Upstream core v0.36.5"] --> M["Eva catch-up merge"]
  E["Eva OpenClaw + KB + updater contract"] --> M
  M --> G["GitHub CI"]
  M --> F["Fleet install path remains stable"]
Loading

Upstream v0.36.x Accepted

  • Skillpack scaffold/reference/harvest model and retired managed-block-only assumptions.
  • Hindsight calibration stack: calibration profiles, take proposals, grading cache, nudge log, voice-gate, undo-wave, and related tests.
  • Dynamic embedding-column search and query-cache contamination fixes.
  • Brain-health-100 doctor remediation and autonomous remediation hooks.
  • Secure Minions shell-job credential inheritance via inherit: ["database_url"].
  • Admin UI/generated assets, LLM docs, migrations, schema updates, and tests through schema version 76.

Eva Product Surface Preserved

  • plugins/openclaw-gbrain and /plugins/gbrain/extract remain the extraction boundary.
  • CodexExtractionClient still sends prompt/media payloads without API keys, OAuth tokens, or refresh tokens.
  • import-media and ingest-media --extract openclaw remain as transitional media bridge commands.
  • plugins/gbrain-codex and install docs remain repo-owned.
  • Support-KB setup docs and updater path remain intact.
  • .gbrain/gbrain.env, provider-auth seam, and advisory-only postinstall remain intact.
  • ZeroEntropy is accepted as upstream opt-in via gbrain ze-switch, but Eva defaults remain voyage:voyage-4-large at 2048d so fresh installs do not silently size themselves for the wrong fleet provider.

Conflict Decisions Worth Reviewing

  • src/core/ai/gateway.ts and src/core/search/embedding-column.ts: accepted upstream dynamic column work, but restored Eva's Voyage 2048 fallback defaults.
  • src/commands/upgrade.ts: accepted upstream ensureGitignore() hygiene, fixed merge artifact that redeclared upgradeFrom, and preserved Eva source-install updater behavior.
  • openclaw.plugin.json: unioned upstream skillpack-harvest wiring with Eva plugin defaults and bumped manifest version to 0.36.5.0.
  • src/commands/sync.ts: kept upstream walker/no-origin fixes and preserved source-scoped auto-embed behavior.
  • CHANGELOG.md: redacted deployment-specific path literals so the privacy gate stays strict.
  • Admin and LLM generated outputs were regenerated after conflict resolution.

Local Validation

Ran from /Volumes/LEXAR/repos/eva-brain-worktrees/eva-merge-upstream-v0.36.5.0:

bun install --frozen-lockfile
bun test test/ai/gateway.test.ts test/search/embedding-column.test.ts test/sync.test.ts
bun test test/codex-extraction-client.test.ts test/openclaw-gbrain-plugin-contract.test.ts test/install-contract.test.ts test/media-ingest-openclaw.serial.test.ts
bun run verify

Results:

  • Focused gateway/search/sync tests: 149 pass, 0 fail.
  • Focused OpenClaw/media/install tests: 25 pass, 0 fail.
  • bun run verify: passed, including privacy checks, admin build, admin scope drift, wasm embedded check, system-of-record check, synthetic corpus privacy, and tsc --noEmit.
  • git diff --check: clean.
  • Conflict-marker scan: clean.

Confidence

I am >95% confident this is the right catch-up path because the PR accepts upstream's merged core improvements, preserves every known Eva product boundary, and has focused local coverage over the exact collision surfaces: provider defaults, dynamic embedding columns, source sync, OpenClaw extraction, media ingestion, install docs, and privacy/static gates. Full-suite confidence should come from GitHub Actions rather than a heavy local run.

garrytan and others added 17 commits May 17, 2026 08:00
… placement (garrytan#1053)

* refactor(mcp): centralize ParamDef→JSON Schema via shared paramDefToSchema

Three duplicate inline mappers existed across the MCP surface:
- src/mcp/tool-defs.ts (stdio MCP buildToolDefs)
- src/commands/serve-http.ts:837 (live HTTP MCP tools/list)
- src/core/minions/tools/brain-allowlist.ts:84 (subagent tool registry)

Each had subtly different items propagation. The HTTP MCP variant dropped
items entirely, leaving extract_facts.entity_hints broken for OAuth-
authenticated remote agents even after a buildToolDefs-only patch. The
subagent variant propagated one level of items but used the same shallow
shape so nested arrays would silently drop.

Extract a single recursive paramDefToSchema helper exported from
src/mcp/tool-defs.ts and have all three mappers consume it. Closes the
bug class at the architecture level instead of patching one site at a
time. The helper copies type, description, enum, default, and recursively
rebuilds items so array-of-arrays preserves inner shape.

Key ordering (type, description, enum, default, items) matches the
pre-v0.34 inline mappers so JSON.stringify output stays byte-stable for
every existing operation that does not use nested arrays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): add items to extract_facts.entity_hints and handle-to-tweet candidates

Two array fields shipped without the items property required by JSON
Schema. Strict-mode validators (Gemini Pro structured outputs, OpenAI
strict tool definitions) reject the entire schema when any type:'array'
lacks items. Downstream agents on those providers couldn't use
extract_facts or the x_handle_to_tweet resolver.

extract_facts.entity_hints — declared items: { type: 'string' } matching
the handler at src/core/operations.ts:2733 which already coerces the
runtime value to string[].

handle_to_tweet outputSchema.candidates — full XTweetCandidate spec
including required + additionalProperties: false. The XTweetCandidate
TypeScript interface declares all five fields as required; without
required in the JSON Schema, a validator would accept {} as a valid
candidate. additionalProperties: false closes the OpenAI strict-mode
contract.

19 community PRs (garrytan#1028 garrytan#999 garrytan#980 garrytan#979 garrytan#910 garrytan#904 garrytan#847 garrytan#832 garrytan#863 garrytan#862
garrytan#812 for entity_hints; garrytan#910 caught candidates) converged on these
locations. This wave cherry-picks the deepest variant (garrytan#910 surfaced
both bugs) and centralizes via the paramDefToSchema helper from the
preceding commit so the live HTTP MCP tools/list path is also fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: DmitryBMsk (PR garrytan#910)

* fix(git-remote): move --no-recurse-submodules after the subcommand verb

Git CLI accepts two flag positions:
  git [global -c flags] <subcommand> [subcommand flags] [args]

Global -c config flags belong before the verb. Subcommand-specific
flags (like --no-recurse-submodules) belong after. Pre-v0.34
GIT_SSRF_FLAGS spliced both kinds before the verb, so cloneRepo
invoked:
  git -c http.followRedirects=false ... --no-recurse-submodules clone URL DIR

Real git rejects this with exit 129 ("unknown option:
--no-recurse-submodules") because --no-recurse-submodules is a clone
subcommand flag, not a global config flag. Every remote-source clone
broke in production from v0.28 onward. The fake-git harness in
test/git-remote.test.ts exits 0 regardless of argv shape, which is
why CI never caught it.

Split GIT_SSRF_FLAGS (3 -c config flags, spread BEFORE the verb) from
GIT_SSRF_SUBCOMMAND_FLAGS (--no-recurse-submodules, spread AFTER the
verb). cloneRepo and pullRepo both spread the new constant after
their respective verbs. The constant names signal the position rule
so future additions land in the right place.

7 community PRs converged on this location (garrytan#1023 garrytan#1020 garrytan#985 garrytan#963
garrytan#846 garrytan#842garrytan#800 doesn't exist). This wave cherry-picks the semantic-
constant approach from garrytan#846's GIT_SSRF_SUBCOMMAND_FLAGS name (the
clearest signal of the position rule).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(mcp+git+resolvers): structural array-items + subcommand-position guards

Three new tests / test groups close the bug classes the wave fixes:

test/mcp-tool-defs.test.ts — recursive structural guard walks every
operation's inputSchema and fails with a property path if any
type:'array' lacks items.type. Explicit fixture assertions for
extract_facts.entity_hints.items.type and a synthetic nested-array
ParamDef pinning items.items.type recursion. Without the explicit
fixtures the legacyInlineMap byte-equality test is mirror-theater —
mirroring both sides of the equality preserves the blind spot.

test/git-remote.test.ts — split snapshot test into GIT_SSRF_FLAGS
(3 global -c entries) and GIT_SSRF_SUBCOMMAND_FLAGS
(--no-recurse-submodules). cloneRepo + pullRepo argv tests now assert
the subcommand flag appears AFTER the verb index. Pre-v0.34 the
pinned argv slice prefix included --no-recurse-submodules, which
baked the bug into the test suite (codex catch).

test/resolvers.test.ts — recursive walk over both inputSchema AND
outputSchema for builtin resolvers (xHandleToTweetResolver,
urlReachableResolver). Explicit imports rather than
getDefaultRegistry(), which starts empty until commands/resolvers.ts
runs — codex catch on a hollow-walk failure mode. Dedicated case
pins candidates items shape including required + additionalProperties.

Reference legacyInlineMap in mcp-tool-defs.test.ts mirrors the new
recursive paramDefToSchema helper. No current op uses nested arrays so
the byte-equality test stays green for every existing operation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): raise rerank timeouts for ZE live cold-start

The first rerank call of a CI run hits ZeroEntropy's cold-start latency
(observed ~5-6s on Tier 2 LLM Skills runners; subsequent calls < 500ms).
Two timeouts fired simultaneously at ~5s:

1. bun:test's default 5000ms per-test timeout caused (fail).
2. gateway.rerank's DEFAULT_RERANK_TIMEOUT_MS = 5000 fired right after,
   reported as "Unhandled error between tests".

The next rerank test (top_n=2) ran in 409ms because the API was already
warm. Cold-start is the only issue.

Pass explicit timeoutMs to each rerank() call and a longer per-test
timeout (30s) on both ZE rerank tests. Production DEFAULT_RERANK_TIMEOUT_MS
stays at 5s for the search hot path — these E2E tests bypass it locally
without changing the default that protects user latency.

Unrelated to the fix-wave in this PR (mcp-tool-defs + git-remote + resolver
guards). Lands here to keep Tier 2 LLM Skills green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.35.2.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync for v0.35.2.0

Update CLAUDE.md Key files annotations for the v0.35.2.0 fix wave:

- src/mcp/tool-defs.ts: document new exported recursive paramDefToSchema
  helper and the three-consumer centralization (stdio MCP, HTTP MCP
  tools/list, subagent registry).
- src/core/minions/tools/brain-allowlist.ts: paramsToInputSchema now
  consumes the shared helper.
- src/commands/serve-http.ts: tools/list handler now consumes the shared
  helper (closes the HTTP MCP items-dropped bug class).
- src/core/git-remote.ts: new entry. Documents the GIT_SSRF_FLAGS (global
  config, pre-verb) vs GIT_SSRF_SUBCOMMAND_FLAGS (subcommand-scoped,
  post-verb) split, the 7-month silent regression, and the position-anchored
  regression guard in test/git-remote.test.ts.

Regenerated llms-full.txt to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: rebump version to v0.35.3.0

Queue moved while this PR was open — v0.35.2.0 was claimed by master's
v0.35.1.0 sibling work. Advancing one slot. No code changes; only:
- VERSION + package.json: 0.35.2.0 → 0.35.3.0
- CHANGELOG.md: rewritten header + inline references
- CLAUDE.md: rewritten 4 key-file annotations
- llms-full.txt + llms.txt: regenerated to mirror CLAUDE.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…um (garrytan#1052)

* rfc: temporal axis for contradiction probe

Field report on residual HIGH findings from gbrain eval suspected-contradictions
and proposal for a 4-phase fix (Phase 1 = judge prompt + verdict enum is the
recommended starting point).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): pass effective_date to judge prompt; bump PROMPT_VERSION

Lane A1 of the temporal-contradiction-probe wave. Threads page-level
effective_date through the search projection into the contradiction judge so
the LLM can reason about supersession instead of treating every dated pair as
a contradiction.

Changes:
- SearchResult interface adds optional effective_date + effective_date_source
  fields; rowToSearchResult populates them from the row data with date-only
  YYYY-MM-DD normalization (handles both postgres.js Date and PGLite string).
- 8 SELECT projection sites (3 in postgres-engine, 5 in pglite-engine) now
  carry p.effective_date + p.effective_date_source through their inner CTEs
  and outer SELECTs so search results expose the field on both engines.
- PairMember (eval-contradictions/types.ts) gets the two fields as required
  (string | null) so the type forces every constructor to think about temporal
  anchoring. Runner's searchResultToMember + takeToMember handle the
  normalization; takes inherit the chunk's page-level date.
- buildJudgePrompt emits `Statement A (from: YYYY-MM-DD)` when effective_date
  is non-null, else `(date unknown)`. Prompt instructions explain the tag so
  the model knows what to do with it.
- PROMPT_VERSION bumps '1' → '2'. Cache-key tuple shape unchanged; old rows
  miss naturally on first run against the new prompt.

Test fixtures in 5 files updated to include the new required fields. All 205
eval-contradictions unit tests + 101 search-related tests pass. Typecheck
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): replace contradicts:boolean with verdict:enum (6 members)

Lane A2 of the temporal-contradiction-probe wave. Expands the judge's
classification vocabulary from a binary contradicts:bool to a six-member
verdict enum so the probe can distinguish "this changed" from "this is wrong".

Verdict taxonomy:
  no_contradiction       — drop from findings
  contradiction          — genuine conflict at same point in time
  temporal_supersession  — newer claim updates/replaces older; not an error
  temporal_regression    — metric/status went backwards over time (signal)
  temporal_evolution     — legitimate change, neither supersession nor regression
  negation_artifact      — judge misread an explicit negation

Changes:
- types.ts: Verdict union (6 members); Severity gains 'info'; ResolutionKind
  extended with temporal_supersede, flag_for_review, log_timeline_change;
  JudgeVerdict.contradicts → verdict; ContradictionFinding now carries verdict;
  ProbeReport adds queries_with_any_finding + verdict_breakdown (additive).
- judge.ts: parseResolutionKind + parseVerdict guards; normalizeVerdict reads
  the new field and applies the C1 confidence floor only to verdict='contradiction'
  (the new verdicts are informational classifications, no floor). Prompt rubric
  rewritten to ask for verdict + extended severity scale.
- severity-classify.ts: 'info' joins the rank with value 0; defaultSeverityForVerdict
  maps each verdict to its baseline severity (D7 — supersession=info, regression=high,
  etc.). parseSeverity gains a fallback param so consumers can override 'low' default.
- auto-supersession.ts: classifyResolution + renderResolutionCommand handle the
  three new resolution kinds. Probe still NEVER auto-mutates — the new kinds
  render paste-ready commands or informational lines.
- cache.ts: isJudgeVerdict shape check matches the new verdict field; old v1
  rows fail the guard and treat as misses.
- runner.ts: emit predicate at cache-hit and judge-success branches changes
  from `verdict.contradicts` to `verdict.verdict !== 'no_contradiction'`.
  Without this, the new verdicts vanish from the report. Added per-verdict
  tally + queriesWithAnyFinding alongside the strict queriesWithContradiction.
- trends.ts: latest run verdict breakdown surfaces in the trend chart.

Test fixtures updated across 8 test files. All 210 eval-contradictions unit
tests pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): relax date-filter rule 3 when both sides dated

Lane B of the temporal-contradiction-probe wave. The v1 date pre-filter
skipped pairs whose chunk-text-extracted dates differed by >30 days as a
cost-saving heuristic. That heuristic silently killed exactly the cases the
new verdict taxonomy exists to surface — role transitions across years
(e.g. a 2017 historical record vs. a 2025 current state), MRR claims years
apart, status changes recorded over time.

Lane A1+A2 made temporal supersession explicit and cheap to classify. The
filter no longer needs to skip these pairs; the judge can label them.

Changes:
- date-filter.ts: shouldSkipForDateMismatch accepts optional effectiveDateA
  and effectiveDateB. When BOTH are non-null, returns skip=false with the new
  'both_have_effective_date' reason — the judge will see the dates via the
  (from: YYYY-MM-DD) prompt tag from Lane A1. Other rules (same-paragraph
  dual-date override, missing-date fallback) preserved verbatim and still
  run first.
- runner.ts: threads pair.{a,b}.effective_date into the date-filter call.
  Pairs that previously vanished into the skip bucket now reach the judge.

Tests (R1 IRON RULE regression suite, 6 new cases):
- both sides effective_date → not skipped
- both sides effective_date overrides >30d chunk-text rule
- rule 1 (same-paragraph dual-date) still wins over effective_date relaxation
- rule 2 (missing chunk dates) still applies when effective_date partially present
- undefined effective_dates fall through to v1 behavior (back-compat)
- empty-string effective_date treated as missing (only real dates enable the relaxation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): cost-estimate prompt + --budget-usd + Haiku routing

Lane C of the temporal-contradiction-probe wave. Three layers of cost
guardrail, all stacked:

(a) cost-estimate prompt at probe-run-time. Before the runner spends any
    tokens after a PROMPT_VERSION change, eval-suspected-contradictions
    reads the most recent persisted prompt_version from
    eval_contradictions_runs and compares. When they differ:
      - TTY: prints an upper-bound estimate + Ctrl-C window (default 10s,
        override via GBRAIN_PROBE_PROMPT_GRACE_SECONDS).
      - non-TTY: prints the estimate + auto-proceeds (autopilot path).
      - --yes override or GBRAIN_NO_PROBE_PROMPT=1: skip entirely.
    Mirrors the v0.32.7 runPostUpgradeReembedPrompt pattern.

(b) --budget-usd N hard cap (pre-existing; PreFlightBudgetError surfaces
    when the estimate alone exceeds the cap, and CostTracker halts the
    run mid-flight when cumulative cost exceeds it). Documented in the
    help text alongside (a).

(c) Judge model now routes through resolveModel() with configKey
    'models.eval.contradictions_judge', tier 'utility' (Haiku-class
    default), and env var GBRAIN_CONTRADICTIONS_JUDGE_MODEL. The legacy
    --judge CLI flag still wins as the highest-precedence override.
    Doctor's model touchpoint registry (src/commands/models.ts:50) carries
    the new key so `gbrain models` and `gbrain models doctor` surface it.

Also in this lane:
- CLI: --severity accepts 'info' (the new Severity member from Lane A2).
- CLI: --severity output shows [verdict] tag alongside slug pairs so
  operators distinguish genuine contradictions from temporal classifications.
- Human summary: prints the new queries_with_any_finding metric and the
  per-verdict breakdown table.
- Help text: explains the cost-prompt + budget-cap + model-routing
  interactions in one paragraph.

New tests (9 cases on the cost-prompt helper):
- --yes override skips
- GBRAIN_NO_PROBE_PROMPT=1 skips
- prompt_version unchanged → skips
- non-TTY auto-proceeds with stderr note
- TTY proceeds after grace
- TTY aborts on Ctrl-C
- fresh brain (no prior runs) fires the prompt
- GBRAIN_PROBE_PROMPT_GRACE_SECONDS override honored
- estimate banner contains query count + judge model + dollar amount

All 225 eval-contradictions tests + 25 model-config tests pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(eval): R4/R5/R6 IRON-RULE regressions for the verdict-enum wave

Lane D of the temporal-contradiction-probe wave. The Lanes A1/A2/B/C lanes
landed the behavior; this lane pins the regressions that protect the wave
against future drift.

R4 (runner emit predicate): five new tests, one per non-no_contradiction
verdict, prove the runner.ts emit rule surfaces each one as a finding with
the correct verdict tag, and that:
  - queries_with_contradiction (Wilson-CI denominator) ONLY counts verdict
    ='contradiction' — the strict metric is preserved
  - queries_with_any_finding counts every non-no_contradiction verdict
  - verdict_breakdown tallies correctly
Plus one negative case: verdict='no_contradiction' produces zero findings.
Without R4, a future runner refactor could collapse the new verdicts back
to /dev/null and the report would silently shrink.

R5 (cache key shape): direct shape assertion on buildCacheKey output. The
key tuple is exactly 5 fields (chunk_a_hash, chunk_b_hash, model_id,
prompt_version, truncation_policy). Adding a 6th field would silently break
every operator's brain (no migration path).

R6 (contradiction severity unchanged): four tests on normalizeVerdict pin
the legacy semantics — judge-supplied severity wins (whether 'high' or
'low'), and on garbage severity input the fallback is 'medium' (per
defaultSeverityForVerdict('contradiction')) NOT 'low'. The contradiction
verdict's severity must never default to 'low', which would silently mask
genuine conflicts as cosmetic naming issues. The temporal_regression case
is included for parity (garbage → 'high' since regressions are real
investor red flags).

236 eval-contradictions tests pass (211 + 6 R4 + 1 R5 + 4 R6 + 9 cost-prompt
from Lane C).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ci): privacy lint for docs/proposals/*.md

Captures the residual TODO from the temporal-contradiction-probe wave's
plan: prevent the bug class where an RFC lands in docs/proposals/ with
PII that should never appear in a public technical artifact. The
original RFC had to be scrubbed at force-push time (Step 0); this lint
catches the same patterns at CI time so the next one can't slip through.

Sibling to scripts/check-privacy.sh:
- check-privacy.sh: bans the literal "Wintermute" repo-wide.
- check-proposal-pii.sh: focuses on docs/proposals/*.md and the OTHER
  PII classes — personal-relationship vocabulary, private repo refs.

Design contract: the denylist names PATTERNS, not real people. Naming
specific real names (deceased relatives, therapist first names,
dealflow contacts) inside this script would leak PII into the repo
just by appearing here. The structural patterns below catch the
SURROUNDING vocabulary that always accompanies such content in
personal RFC prose. Trade-off: a future RFC that names a real person
without any contextual markers won't be caught — accepted as residual
risk handled by human review.

Patterns flagged in docs/proposals/*.md:
- garrytan/brain (private repo reference)
- trial separation, permanent separation
- couples session, couples therapist
- divorce attorney(s)
- grandmother's funeral, aunt's funeral
- wintermute (also caught by check-privacy.sh; listed here for
  proposal-scoped clarity)

Bare common words (separation, funeral) are NOT banned — only the
combined personal-context phrases. "Separation of concerns" and other
software vocabulary survives.

Wired into:
- `bun run verify` (gates every push)
- `bun run check:all`
- `bun run check:proposal-pii` (standalone)

Tests: 15 cases in test/scripts/check-proposal-pii.test.ts.
- Each pattern flagged when present, plus exit-code + stderr signal.
- Two negative cases (separation-of-concerns, funeral metaphor) prove
  the lint doesn't false-positive on legitimate software prose.
- No-proposals-dir → exit 0 (not a failure).
- Multi-hit case proves all patterns surface together with a summary
  count.
- The two test fixtures that name "Wintermute" / "WINTERMUTE" as
  sentinel literals are allowlisted in check-test-real-names.sh per
  the same meta-rule-enforcement exception as check-privacy.sh itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(privacy): allowlist new privacy-guard files in check-privacy.sh

check-privacy.sh bans the literal Wintermute repo-wide. The two new files
from the v0.34 privacy lint (scripts/check-proposal-pii.sh and its test)
necessarily name the token to do their job. Same meta-rule-enforcement
exception as scripts/check-privacy.sh itself, scripts/check-test-real-names.sh,
test/recency-decay.test.ts, and the existing entries — describing what
the rule forbids requires naming it.

Without this allowlist, `bun run verify` fails on check:privacy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.35.1.0)

Temporal-contradiction-probe wave — Phase 1 of the RFC at
docs/proposals/temporal-contradiction-probe.md.

Headline: the contradiction probe now classifies pairs into a 6-member
verdict enum (no_contradiction, contradiction, temporal_supersession,
temporal_regression, temporal_evolution, negation_artifact) and sees the
page-level effective_date for each chunk via a (from: YYYY-MM-DD) tag in
the prompt. The pre-judge date filter no longer skips dated wide-gap pairs,
so the role-transition class (e.g. a 2017 historical record vs. a 2025
current state) reaches the judge and gets classified as
temporal_supersession instead of vanishing into the skip bucket.

PROMPT_VERSION bumped 1 → 2 (cache fully invalidated). Three-layer cost
guardrail: TTY-only cost-estimate prompt with Ctrl-C window, --budget-usd
hard cap, Haiku-tier routing via new models.eval.contradictions_judge
config key.

Also adds a CI privacy lint (scripts/check-proposal-pii.sh) wired into
bun run verify that catches PII patterns in docs/proposals/*.md so future
RFCs can't ship with personal-context vocabulary the way this wave's
source RFC did at draft time.

Phases 2-4 deferred to follow-up RFCs per the plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-name resolver + 58x perf + stub guard observability (garrytan#1085)

* fix(doctor,entities): supervisor crash classification + bare-name resolver + stub guard

- doctor.ts/jobs.ts: classify worker exits with code !== 0 as real crashes
  vs code === 0 clean restarts (separate counter); fixes false-positive
  WARN on healthy supervisors
- entities/resolve.ts: prefix-expansion step between fuzzy match and
  slugify fallback catches bare first names that score too low on pg_trgm;
  picks highest-connection candidate as tiebreaker
- facts/fence-write.ts: stub-creation guard refuses to spawn unprefixed
  entity pages at brain root
- facts/backstop.ts: routes stubGuardBlocked facts to engine.insertFact
  so the fact still persists even when no markdown file is created
- docs/issues/doctor-auto-heal-and-scoring.md: spec for follow-up doctor
  health-score improvements
- .gitignore: guard reports/network-intelligence/ (private brain exports)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(privacy): scrub real names from entity-resolve test fixtures and JSDoc

Replace YC partner names with placeholders per CLAUDE.md privacy rule:
alice-example, bob-example, charlie-example, dave-example. Stripe and
Stripe Atlas retained (allowed household brands; exercises the two-word
company-prefix case).

Test semantics preserved:
- Alice / Dave: single-match cases
- Bob / Charlie: multi-match tiebreaker cases (winner has more chunks)

All 13 entity-resolve cases pass with the scrubbed fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(supervisor): extract classifyWorkerExit() helper (DRY)

Three call sites were inline-classifying worker exits: supervisor's
restart policy (child-worker-supervisor.ts:291), doctor's supervisor
check (doctor.ts:1016), and jobs supervisor status (jobs.ts:806). Same
rule, three copies — drift risk if one is updated without the others.

Extract to src/core/minions/exit-classification.ts as a pure function.
Signature consumes audit-JSON shape ({ code: number | null }) so doctor
and jobs (which read serialized events from JSONL) and supervisor (which
reads Node's exit callback) call the same function. Helper's classification
rule: code === 0 → clean_exit, everything else (non-zero, null, undefined,
missing) → crash. Default-to-crash prevents corrupted rows from silently
demoting into the clean-restart bucket.

5 hermetic unit tests (test/exit-classification.test.ts) pin all edge cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(facts): audit + sunset comment for stub-guard fires

Wire telemetry into the v0.34.5 stub-guard at fence-write.ts:190. Every
guard fire now appends a JSONL line to
~/.gbrain/audit/stub-guard-YYYY-Www.jsonl with {ts, slug, source_id,
fact_count}. Operator visibility for the sunset criterion: when the new
audit log reads <5 hits/week for 3 consecutive weeks on production
brains, the prefix-expansion in resolveEntitySlug is sufficient and the
guard can be removed in v0.36.

Reader (readRecentStubGuardEvents) deliberately diverges from
supervisor-audit.ts:readSupervisorEvents — it reads BOTH the current AND
previous ISO-week file before filtering by ts. supervisor-audit's reader
only reads the current week, which loses 24h-window correctness across
Monday 00:00 UTC (a Sunday 23:55 event lives in last week's file). The
2-file read costs nothing and makes the window actually 24h.

9 hermetic unit tests pin filename math, the writer's
swallows-errors contract, the cross-week-boundary read, sort order,
missing-file behavior, and malformed-row tolerance. The cross-week test
is the regression guard: if a future refactor copies the supervisor's
single-file pattern, that test fails.

Follow-up TODO (not in this PR): fix readSupervisorEvents to use the
same 2-file pattern. The new stub-guard reader becomes the canonical
template to copy back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): stub_guard_24h check surfaces resolver gaps

Adds a new doctor check that reads ~/.gbrain/audit/stub-guard-YYYY-Www.jsonl
(via the dual-week-aware reader from T8) and surfaces the 24h fire count.
WARN at >10 fires — at that rate the prefix-expansion in resolveEntitySlug
is probably missing a case (typo prefix, alias, non-Latin script) and
operators should grep the audit log for the offending slugs. Below the
threshold but non-zero shows as OK with a count, so operators can watch
the v0.36 sunset criterion (<5/week for 3 weeks → guard can be removed).
Zero hits emits no check, keeping the doctor output clean on healthy
brains.

5 source-grep regression tests pin the contract: check name, WARN
threshold, fix hint mentions the audit log + the resolver function name,
reader is the dual-week-aware variant (NOT the supervisor-audit single-
week pattern), and zero-hits stays silent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(facts): pin stub-guard contract at writeFactsToFence + backstop layers

- fence-write.test.ts: 3 new cases for the v0.34.5 stub guard. Bare slugs
  return {inserted: 0, stubGuardBlocked: true, ids: []} and create no
  file/.tmp at brain root. Prefixed slugs bypass the guard (regression
  guard against accidentally inverting the slug.includes('/') check).
  Empty facts array short-circuits before the guard fires.
- facts-backstop.test.ts: 1 new case for the end-to-end routing. A
  bare-name LLM extraction resolves through to a bare slug, hits the
  guard, and lands in the facts table via engine.insertFact (DB-only).
  No phantom .md file; entity_slug stores the bare slug;
  source_markdown_slug is null. This is the routing contract Codex
  flagged as a "split-brain" data shape — the test pins the by-design
  behavior so a future refactor can't silently drop these facts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(supervisor): pin classifyWorkerExit consumer wire-up + regressions

12 new cases on top of the 5 helper unit tests:
- doctor.ts / jobs.ts / child-worker-supervisor.ts each import the helper
- All three call classifyWorkerExit at least once
- doctor.ts and jobs.ts no longer carry the pre-T7 inline filter
- supervisor uses the helper result to choose the clean_exit branch
- audit-event shape round-trip: code=0 → clean_exit, code=1 → crash,
  code=null+SIGKILL → crash (catches future shape changes)

The regression guards (3) and the wire-up checks (6) close the gap that
motivated T7 in the first place: if a future change accidentally re-inlines
the filter or shifts the audit event shape, the test fails before
production sees the silent divergence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(entities): correlated subqueries scoped to slug-LIKE candidates

Replace the derived-table JOIN shape in tryPrefixExpansion with
correlated subqueries. The pre-fix SQL did

  LEFT JOIN (SELECT to_page_id, COUNT(*) FROM links GROUP BY to_page_id) li ON ...

which forced the planner to aggregate the entire links + content_chunks
tables on every prefix-expansion call — O(N) per call where N is total
links/chunks in the brain. On a 100K-link / 50K-chunk brain that's slow
enough to bottleneck fact-extraction.

New shape uses correlated subqueries:

  (SELECT COUNT(*) FROM links WHERE to_page_id = p.id)
    + (SELECT COUNT(*) FROM links WHERE from_page_id = p.id)
    + (SELECT COUNT(*) FROM content_chunks WHERE page_id = p.id)

The slug LIKE filter is already selective (typical brain has 0-5 pages
per prefix), so the three subqueries run N≈3 times per matched row
against the existing indexes on links.to_page_id, links.from_page_id,
and content_chunks.page_id. Behavior preserved: 13/13 entity-resolve
tests pass (single-match + multi-match tiebreaker + edge cases).

Codex's outside-voice review caught the dead-end design that an earlier
draft of this plan proposed (a CTE with `LIMIT 50` candidate cap — would
have excluded correct high-connection candidates if their slug sorted
late). Correlated subqueries without a candidate cap are the cleaner
shape that lets the LIKE filter do the bounding work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(entities): perf regression guard for prefix-expansion (58x speedup)

Hermetic PGLite benchmark with 5K pages + 50K links + 25K chunks. Runs
the pre-T12 derived-table shape and the new correlated-subquery shape
side-by-side against the same fixture, asserts NEW >= 5x faster than OLD.
Baseline-ratio, not absolute wall-clock — different machines / Bun
versions / CI load can shift absolute timings by 10x without indicating
a real regression, but the SHAPE difference between "aggregate the full
tables" and "correlated subquery per candidate" is what we care about.

Measured: old_median=18.16ms, new_median=0.31ms, speedup=58.22x.
The 5x assertion has plenty of headroom.

The OLD SQL is embedded verbatim as the regression baseline. If a future
refactor re-introduces full-table aggregation (LEFT JOIN against
SELECT...GROUP BY over the whole links or content_chunks table), the
test fails. PGLite-only — Postgres planner can shape derived-table
JOINs differently enough that the 5x ratio could be noise on a 5K-page
fixture. The structural correctness of the rewrite is the same on both;
this is purely a planner-shape regression guard.

.slow.test.ts suffix keeps it out of the fast loop (run via
`bun run test:slow`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.35.2.0)

Wave content:
- Privacy scrub: PII rebuilt out of branch history; real names → placeholders
- Bug fix: doctor + jobs no longer count clean worker exits as crashes
- Bug fix: entity resolver prefix-expansion catches bare first names
- DRY refactor: classifyWorkerExit() helper (one rule, 3 call sites)
- Observability: stub_guard_24h doctor check + ISO-week audit log
- Perf: 58x speedup on tryPrefixExpansion query shape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: rebump v0.35.2.0 → v0.35.4.0 + scrub TODOS.md privacy violation

VERSION/package.json/CHANGELOG header rebumped to v0.35.4.0 per user
request (queue allocation). TODOS.md rephrased to not literally name
the banned private-agent string — that was the CI failure root cause
on the v0.35.2.0 push. CHANGELOG.md is on check-privacy.sh's allow-list
(meta-documentation exception); TODOS.md is not.

CI re-runs against this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er (garrytan#1111)

* fix(bootstrap): extend probes for files/oauth_clients/sources.archived* + add MIGRATIONS introspection guard

Adds 7 new forward-reference probes to applyForwardReferenceBootstrap on
both engines, closes the column-only forward-ref class via a new
MIGRATIONS-source introspection contract test.

New probes:
- files.source_id + files.page_id (v18 forward refs)
- oauth_clients.source_id + oauth_clients.federated_read (v60+v61+v65)
- sources.archived + archived_at + archive_expires_at (v34 promoted from JSONB)

The sources.archived* columns are the codex-flagged class: they're added
inline in v34's CREATE TABLE definition but `CREATE TABLE IF NOT EXISTS
sources` is a no-op on pre-v34 brains, so downstream visibility filters
(search/list_pages) trip on old brains. needsPagesBootstrap now folds
archive columns into its CREATE TABLE so pre-v0.18 brains get a v34-shape
sources in one go; needsSourcesArchive then only fires on the pre-v34
case (sources exists, archive cols don't).

Closes the structural bug class via test/helpers/extract-added-columns.ts:
reads src/core/migrate.ts as text and extracts every ALTER TABLE ADD
COLUMN. The new contract test asserts every (table, column) pair is
covered by EITHER the bootstrap's ALTER TABLE statements, the bootstrap's
CREATE TABLE definitions, OR the schema blob's CREATE TABLE bodies. The
column-only class (no index, no FK; just an inline CREATE TABLE column
the schema blob can't add to existing tables) is now caught at PR time.

Source-text introspection catches all three migration shapes uniformly:
- top-level `sql:` field
- `sqlFor.postgres` / `sqlFor.pglite` overrides
- handler-body `engine.runMigration(N, \`ALTER TABLE ...\`)` (v34 shape)

Pre-existing parseBaseTableColumns parser bug fixed: now strips `--` line
comments and `/* ... */` blocks before identifying column names. Without
this, a column preceded by a comment was silently dropped. Catches
pages.page_kind and others that were silently uncovered.

13 columns added by migrations but not in PGLITE_SCHEMA_SQL are exempted
with a unified rationale: they have no schema-blob forward reference;
migration handles all upgrade paths cleanly. Refreshing the schema blob
is a separate concern.

Issues closed: garrytan#1018 (v60 oauth_clients), garrytan#974 (files.source_id/page_id),
garrytan#820 (v0.13.0 migration files.page_id cascade); pre-empts the
sources.archived class before any pre-v34 brain trips on it.

Tests:
- 9 cases in test/schema-bootstrap-coverage.test.ts (5 existing + 4 new)
- helper-level unit tests cover SQL shape variants (IF NOT EXISTS,
  quoted identifiers, ALTER TABLE IF EXISTS ONLY, multi-statement)
- planted-bug regression verifies the gate actually catches new uncovered
  columns

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orphans): filter soft-deleted pages on both candidate and link-source sides

Closes garrytan#1021. The v0.26.5 soft-delete invariant requires that
findOrphanPages exclude both:
  1. Candidate pages that are themselves soft-deleted
  2. Inbound links from soft-deleted source pages

Pre-fix, findOrphanPages had no deleted_at filter at all. Soft-deleted
pages with no inbound links were counted as orphans (inflating counts).
Pre-codex-tension-D11, only the candidate-side filter was planned.
Codex C11 caught the second case: a live page that has ONE inbound link
from a soft-deleted source page was hidden from orphan results — the
link still existed in the links table, the EXISTS subquery saw it, the
page looked "linked." Now the inner JOIN on pages enforces
src.deleted_at IS NULL.

Three regression tests pin the contract:
- soft-deleted page with no inbound → NOT orphan
- live page with ONLY inbound link from soft-deleted source → IS orphan
- live page with live inbound → NOT orphan (smoke check that the new
  filters don't break unchanged behavior)

Engine parity: same SQL shape on both Postgres and PGLite engines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(think): route runThink through gateway.chat adapter (closes garrytan#952)

Pre-fix, runThink instantiated `new Anthropic()` directly and read
ANTHROPIC_API_KEY from process.env. Claude Desktop's stdio MCP launch
doesn't inherit shell env, so `gbrain config set anthropic_api_key sk-...`
(writes to ~/.gbrain/config.json) never reached the SDK and every MCP
think call degraded to "no LLM available."

The adapter routes through gateway.chat() — the canonical seam per
CLAUDE.md. Gateway reads the API key from gbrain config OR env, picks
up prompt caching, rate-leases, retry, and the test seam
(__setChatTransportForTests) that v0.31.12 established.

Per plan-eng-review D10 (cross-model tension with codex C7+C8+C9+C10),
four spec points landed:

  1. Drop `new Anthropic()` direct path entirely. Every non-stub LLM
     call from runThink routes through gateway.

  2. Real availability check (NOT a false-positive `getChatModel()`
     truthy). `tryBuildGatewayClient` probes both the recipe (resolveRecipe
     throws AIConfigError on unknown providers) AND the API key (reads
     process.env + loadConfig at the gbrain config layer for parity with
     gateway's own auth resolution). Returns null on miss; runThink takes
     the graceful "no LLM available" early-return preserving the legacy
     NO_ANTHROPIC_API_KEY warning signal.

  3. Model-id normalization. resolveModel returns bare anthropic ids
     (claude-opus-4-7); gateway.chat needs provider:model. Adapter
     auto-prefixes anthropic: when the id is bare. Provider:model strings
     pass through unchanged.

  4. Response-shape conversion. ChatResult → Anthropic.Message via
     chatResultToMessage. mapStopReason translates gateway's
     provider-neutral stop reasons (end / length / tool_calls / refusal /
     content_filter / other) to Anthropic's stop_reason ('end_turn' /
     'max_tokens' / 'tool_use'); refusal/content_filter/other fall through
     to end_turn (no Anthropic equivalent). Usage tokens pass through.

`opts.client` injection preserved (test seam — see ThinkLLMClient).
`opts.stubResponse` preserved (pure-test escape).

Tests:
  - test/think-gateway-adapter.test.ts (9 cases): response shape, stop
    reason mapping, model-id normalization (bare + prefixed), provider
    unknown returns null, ANTHROPIC_API_KEY absent returns null
    (regression for legacy graceful degradation), hasAnthropicKey reads
    process.env correctly. Uses withEnv per the test-isolation contract.
  - test/think-pipeline.serial.test.ts (17 existing cases): unchanged;
    the graceful-degradation case at line 213 still produces the
    NO_ANTHROPIC_API_KEY warning because tryBuildGatewayClient returns
    null when no key is configured, taking the legacy early-return path.

Closes garrytan#952.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sync): distinguish git worktree from submodule via path-segment match (closes garrytan#889)

Pre-fix, `manageGitignore` treated every `.git`-as-file as a submodule
and skipped gitignore management. Both submodules AND worktrees use
`.git` as a file (not a directory), so the legacy
`statSync.isFile()` check couldn't discriminate. Worktrees got
misclassified as submodules and their .gitignore wasn't managed.

Per plan-eng-review D4 (chose path-segment match over absolute-vs-
relative path heuristic): the gitdir path contains:
  - `/modules/<name>` for submodules (skip — managed by parent repo)
  - `/worktrees/<name>` for worktrees (MANAGE — first-class repo)

Both are documented Git internal layouts, stable across all 4
{relative, absolute} × {modules, worktrees} combinations including the
absorbed-submodule edge case from `git submodule absorbgitdirs` (where
the submodule's gitdir flips to an absolute path).

Malformed `.git` file (no `gitdir:` prefix, IO error) → MANAGE, preserving
the pre-garrytan#889 catch{} fail-closed-toward-managing semantics.

Tests (5 new + 1 regression renamed):
  - REGRESSION: submodule relative gitdir/modules/ → skip (D49 contract)
  - absorbed submodule absolute gitdir/modules/ → skip (edge case)
  - CRITICAL: worktree absolute gitdir/worktrees/ → MANAGE (closes garrytan#889)
  - worktree relative gitdir/worktrees/ → MANAGE
  - malformed .git file → MANAGE (preserves catch behavior)
  - regular .git directory → MANAGE (existing smoke)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(walkers): pruneDir helper + descent-time exclusion + transcript predicate (closes garrytan#923, garrytan#202)

Per plan-eng-review D12 (cross-model tension with codex C12+C13), three
structural changes:

1. Extract `pruneDir(name)` helper in src/core/sync.ts. Returns false for
   directory names walkers must NEVER descend into: `node_modules` (latent
   bug — no leading dot), dot-prefix dirs (`.git`, `.obsidian`, `.raw`,
   `.cache`, etc.), `ops`, and `*.raw` sidecar dirs (gbrain convention —
   `people/pedro.raw/` holds raw source for pedro.md). Walkers consult it
   at descent time BEFORE recursion, saving the IO cost of walking entire
   vendor / hidden / sidecar subtrees only to filter them at file-emit time.

2. `isSyncable` itself gains the same exclusion set (via pruneDir on each
   path segment). Closes the latent bug where node_modules markdown files
   slipped through: `node_modules/some-pkg/README.md` returned true pre-fix
   because the legacy dot-prefix check only blocked `.node_modules` (with
   a leading dot), not the actual `node_modules`. CRITICAL regression test
   in test/sync.test.ts pins the contract per IRON RULE.

3. Two walkers rewritten to use pruneDir at descent + per-walker file
   predicate at emit:
   - `walkMarkdownFiles` (src/commands/extract.ts): pruneDir + isSyncable
     ({strategy:'markdown'}). Pre-fix this walker had ONLY an ad-hoc
     dot-prefix exclusion and didn't call isSyncable at all — descended
     into node_modules, emitted markdown files from there, ignored README/
     ops/.raw filters.
   - `listTextFiles` (src/core/cycle/transcript-discovery.ts): pruneDir +
     own .txt/.md predicate. DOES NOT use isSyncable({strategy:'markdown'})
     because transcripts accept .txt and don't share markdown sync's
     README/ops exclusions (codex C12). Also made RECURSIVE — pre-fix
     it walked only the top dir, so transcripts in `corpus/2026/` were
     invisible (codex C14 — descent-time pruning is the right shape but
     the test would have passed vacuously on a non-recursive walker).

Verified blast radius before adding node_modules: every existing
isSyncable caller (sync.ts:558-561 sync filter, frontmatter.ts:264 validate,
brain-writer.ts:305 reverse-write, import.ts:454 import filter) wants
node_modules excluded — this is a latent-bug fix, not a behavior change
for any legitimate caller.

Tests:
- 7 new isSyncable cases including the node_modules CRITICAL regression
- 6 new pruneDir cases (node_modules, dot-prefix, ops, *.raw, content
  dirs that should pass, empty-string default)
- Existing extract.test.ts + extract-fs.test.ts unchanged and passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(todos): file v0.36.x follow-ups for runThink rewrite + Supabase bootstrap parity

Two follow-up TODOs filed during the v0.36 dreamy-thompson wave:

1. runThink full rewrite (D5+D7 from plan-eng-review): drop the
   ThinkLLMClient indirection now that v0.36 routes through gateway.chat.
   12+ tests need migration to __setChatTransportForTests. Blocked by
   this wave landing.

2. Supabase parity test for applyForwardReferenceBootstrap (codex C6
   residual): real Docker Postgres E2E catches schema correctness but
   not Supabase pooler/direct-pool routing. The probe uses this.sql but
   PostgresEngine.initSchema chooses a DDL connection; the divergence
   has caused multiple historical wedges (garrytan#699, garrytan#820 lineage).

Both entries include full context per the CLAUDE.md TODOS-format spec
(what, why, pros, cons, blocked-by, plan reference).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap): thread DDL connection through applyForwardReferenceBootstrap

Codex adversarial review during /ship caught a P1: initSchema selected a
DDL connection, took pg_advisory_lock(42) on it, but
applyForwardReferenceBootstrap used `this.sql` (the instance pool) inside.
Bootstrap probes ran outside the lock scope on a different connection.

Failure mode: two concurrent gbrain instances could BOTH enter the
bootstrap block on Supabase transaction-pooler setups because the
advisory lock was held on a different connection than the one running
ALTER TABLE. The pooler's statement_timeout could also kill the probes
mid-flight without affecting the lock-holder, leaving an inconsistent
schema state.

Fix: applyForwardReferenceBootstrap now accepts an optional connection
parameter. initSchema passes the DDL conn (the one holding the lock).
this.sql remains the fallback for any unit-test path that calls bootstrap
directly. PGLite engine doesn't need this change — single connection,
no pooler.

This was pre-existing (every prior probe used this.sql), but the v0.36
wave is explicitly about fixing the Supabase upgrade-wedge class. Codex's
position was correct: don't ship the wave with the underlying connection
mismatch still there. The Supabase parity TEST FIXTURE follow-up remains
on TODOS.md (test infra needed to PROVE the fix works under real pooler
topology), but the bug itself is closed.

15/15 bootstrap tests pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.35.5.0)

Six-correctness-fix wave: bootstrap forward-ref class (4 issues + 1 pre-empt),
orphans soft-delete leak (both sides), runThink → gateway.chat adapter,
git worktree vs submodule discriminator, walker pruneDir + descent-time
exclusion, plus a Codex-P1 catch during /ship that threaded the DDL
connection through applyForwardReferenceBootstrap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update CLAUDE.md for v0.35.5.0 backend correctness wave

Fold v0.35.5.0 file-level annotations into CLAUDE.md:
- postgres-engine.ts + pglite-engine.ts: 7 new applyForwardReferenceBootstrap
  probes (files.source_id/page_id, oauth_clients.source_id/federated_read,
  sources.archived/archived_at/archive_expires_at) + DDL connection threading
- test/schema-bootstrap-coverage.test.ts: new MIGRATIONS-source introspection
  guard + parseBaseTableColumns comment-stripping fix
- src/core/sync.ts: new pruneDir helper + manageGitignore worktree
  discriminator
- src/core/think/index.ts (new entry): runThink gateway adapter for MCP
  stdio key resolution
- src/core/operations.ts (new entry): findOrphanPages soft-delete filter

Regenerate llms-full.txt via bun run build:llms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan#1108)

* feat(supervisor-audit): shared isCrashExit + summarizeCrashes classifier

Adds the read-side foundation for reading `likely_cause` off `worker_exited`
audit events. Denylist semantics — only `clean_exit` and `graceful_shutdown`
are non-crashes. Future unrecognized causes surface by default.

`isCrashExit(event)` classifies a single audit event with legacy
`code !== 0` fallback for pre-v0.34 entries lacking `likely_cause`.

`summarizeCrashes(events)` aggregates a 24h window into a `CrashSummary`
with per-cause counts (runtime_error, oom_or_external_kill, unknown,
legacy) and a `clean_exits` total.

Both helpers live next to `readSupervisorEvents` so the producer (the
JSONL writer) and the consumers (doctor + jobs CLI) share one regression
point. Test matrix pins all 9 isCrashExit branches plus 5 summarizeCrashes
aggregation cases including the future-cause denylist regression guard.

* fix(doctor,jobs): wire supervisor check to summarizeCrashes

`gbrain doctor` and `gbrain jobs supervisor status` both counted every
`worker_exited` audit event as a crash, regardless of `likely_cause`.
After v0.34.3.0 added RSS-watchdog drains (code=0), the count inflated
to 120+/day on a healthy brain — the alarm pattern users reported.

Both surfaces now go through `summarizeCrashes(events)` (single
regression point, can't drift). The warn threshold drops from `>3`
to `>=1` now that the counter is calibrated; the per-cause breakdown
(runtime=N oom=M unknown=K legacy=L) gives operators triage context
in the message without grep'ing the JSONL audit.

`gbrain jobs supervisor status --json` adds `crashes_by_cause` and
`clean_exits_24h` fields so monitoring dashboards bind to the named
buckets.

4 source-grep wiring assertions in doctor.test.ts pin both call sites
against drift.

* chore: bump version and changelog (v0.35.5.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: document v0.35.5.0 supervisor-audit crash classifier

Add CLAUDE.md entry for src/core/minions/handlers/supervisor-audit.ts
covering the new isCrashExit/summarizeCrashes/CrashSummary/CLEAN_EXIT_CAUSES
exports. Extend doctor.ts and jobs.ts entries with the v0.35.5.0
wire-up: shared helper, denylist semantics, >=1 warn threshold, per-cause
breakdown in messages, crashes_by_cause + clean_exits_24h in JSON.
Regenerate llms-full.txt to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…loses garrytan#1091) (garrytan#1129)

* v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages

Opt-in score-based gate on the three metadata-axis boost stages (backlink,
salience, recency) inside `runPostFusionStages`. When `SearchOpts.floorRatio`
or `search.floor_ratio` config is set, each stage skips results whose
post-cosine-rescore score is below `floorRatio * topScore`. Default
undefined preserves prior behavior bit-for-bit. Prevents weak-overlap
candidates from accumulating metadata boosts and leapfrogging the
legitimate primary hit on dense-embedder corpora.

Built on the contributor PR from @jayzalowitz (PR garrytan#1091, SkyTwin
twin-memory layer). Refactored on top: threshold is computed ONCE at
runPostFusionStages entry instead of per-stage (single-baseline semantic,
order-independent); knobsHash bumped 2->3 so a no-floor cache write can't
be served to a floor-enabled lookup; NaN scores skip the boost instead of
bypassing the gate; SearchOpts/config/MODE_BUNDLES integration replaces
the PR's PostFusionOpts-only surface; no env var (resolveSearchMode is
pure by design).

Three correctness issues codex outside-voice review caught and this
landed with fixed:
- Cache contamination via knobsHash() (same bug class as v0.32.3 CDX-4
  hotfix for the other search-lite knobs)
- NaN scores would have bypassed the gate (NaN < threshold is false in
  JS); realistic on Voyage flexible-dim / zembed-1 Matryoshka dim drift
- Negative top scores would have broken the "single result trivially
  eligible" claim; gate now disables on no-positive-signal inputs

Scope: gates metadata stages only. Exact-match boost
(applyExactMatchBoost) runs independently as a lexical-relevance signal
by design. Cross-source floor stays global (per-source deferred to
v0.36 if federated-read users hit the suppression). Default-on for any
mode bundle deferred until gbrain-side ablation against longmemeval /
whoknows / suspected-contradictions / BrainBench-Real (TODOS.md).

Plan + 9-decision review trail (D1-D9): ~/.claude/plans/swift-sniffing-nygaard.md.
Empirical motivation, failure-mode framing, dense-embedder targeting, and
the 0.85 starting value all from @jayzalowitz's labeled-retrieval
ablation. Integration shape is gbrain-side.

Test surface: 30+ new cases (computeFloorThreshold edge cases including
T1a NaN / T1b negative top, three boost-function gate parity tests
including T6 IRON-RULE applyRecencyBoost regression, runPostFusionStages
single-baseline composition pin, KNOBS_HASH_VERSION bump from 2 to 3,
floor-ratio-changes-hash cache-contamination prevention,
loadOverridesFromConfig coverage for search.floor_ratio config key).
bun run verify clean; full unit suite 6753 pass / 0 fail.

Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: rewrite v0.35.6.0 CHANGELOG ELI10-lead-first; codify the rule in CLAUDE.md

CHANGELOG entry for v0.35.6.0 was readable only by someone who already
understood gbrain's internals (RRF, knobsHash, MODE_BUNDLES, runPostFusionStages,
Matryoshka, CDX-4). Rewrote it so the first ~150 words explain what
shipped in everyday English, with a concrete worked example, before any
file paths or function names appear. Itemized changes section keeps the
technical precision for engineers who need it.

Then codified the rule in CLAUDE.md so future release entries land the same
way. The "Release-summary template" section now has an iron rule:
"lead ELI10, get precise after." No file paths or internal constants in
the first 150 words; user-visible behavior change first; everyday-language
column headers in any tables. Technical precision is required (the entry
is still the technical record) but lives BELOW the plain-English lead,
never before it.

Smell test: if a reader who has never opened gbrain can walk away from
the first 150 words knowing what shipped and whether they care, the entry
passes.

bun run build:llms regenerated to pick up the CLAUDE.md change (CI guard
test/build-llms.test.ts pins committed bundles against fresh generator
output).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arrytan#1131)

* feat(facts): typed-claim substrate + cycle correctness fixes (v0.35.6 wave 1/3)

Schema (migration v67):
- Add four optional typed-claim columns to facts: claim_metric TEXT,
  claim_value DOUBLE PRECISION, claim_unit TEXT, claim_period TEXT
- Partial index facts_typed_claim_idx ON (entity_slug, claim_metric, valid_from)
  WHERE claim_metric IS NOT NULL
- All nullable, metadata-only on both engines

Fence layer:
- ParsedFact (facts-fence.ts) gains optional claimMetric/Value/Unit/Period
- Parser tolerates both 10-cell (legacy) and 14-cell (widened) rows
- Renderer emits 14 cells iff any row has typed data; otherwise stays
  10-cell so existing fences don't widen on unrelated edits
- Numeric value cell tolerates comma thousand separators (50,000 -> 50000)

Extract pipeline (D-CDX-2, D-ENG-1):
- src/core/facts/extract.ts (the actual Haiku call site, NOT extract-facts.ts
  cycle phase) extends its system prompt to emit typed fields for metric-shaped
  claims
- extractFactsFromFenceText gains optional pageEffectiveDate. Precedence:
  fence-row validFrom > pageEffectiveDate > undefined (engine defaults to now)
- normalizeMetricLabel: 15-entry seed map for common founder metrics (mrr,
  arr, runway, headcount, team_size, cac, ltv, gross_margin, burn_rate, cash,
  users, mau, dau, churn_rate, revenue); unknown labels lowercase + space->_

Engine extensions:
- NewFact + insertFact + insertFacts in both engines accept the four typed
  columns (all nullable)
- Cycle phase extract-facts.ts threads page.effective_date through AND
  batch-embeds via gateway.embed() before insertFacts (D-CDX-3 fix for
  cycle-inserted facts arriving with embedding=NULL)

Consolidate fix (D-CDX-4 — Codex F4):
- Replace MAX(row_num)+1 INSERT with semantic upsert on (page_id, claim,
  since_date). Re-running the full cycle on stable input produces zero new
  takes — fixes the pre-existing duplicate-takes bug after extract_facts
  wipes consolidated_at
- Chronological valid_until writeback per cluster: sort by (valid_from ASC,
  id ASC), walk pairs, set older.valid_until = newer.valid_from

Tests:
- test/migrate.test.ts +6 cases for v67 shape + materialization + nullable
  backward compat
- test/facts-fence-typed.test.ts (new, 17 cases): parser+renderer round-trip,
  normalization seed map coverage, valid_from precedence three-branch
- test/consolidate-valid-until.test.ts (new, 4 cases): chronological
  writeback (R4a), same-day id tiebreaker, cycle re-run zero duplicates
  (R4b/R7), valid_until idempotency
- test/schema-bootstrap-coverage.test.ts: add four typed-claim columns to
  COLUMN_EXEMPTIONS (migration co-defines the partial index, no forward
  reference to bootstrap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(trajectory): find_trajectory MCP op + eval/founder CLIs (v0.35.6 wave 2/3)

Engine method (D-CDX-1, D-CDX-6):
- BrainEngine.findTrajectory(opts) on both Postgres and PGLite
- TrajectoryOpts: scalar sourceId fast path + sourceIds federated array
  (mirrors v0.34.1.0 search* dual pattern)
- opts.remote: when true, SQL adds AND visibility='world' so OAuth read
  clients see only world-visibility facts (mirrors recall's posture —
  closes the F7 privacy regression Codex caught in plan review)
- Single SQL query, ORDER BY valid_from ASC, id ASC for deterministic
  output (R3 pin). Returns TrajectoryPoint[] including raw embedding so
  the caller can compute drift without a second round-trip

Pure function library (src/core/trajectory.ts, new):
- detectRegressions(points, threshold): walks consecutive (metric, value)
  pairs per metric; emits when newer drops >= threshold below older.
  10% default, override via GBRAIN_TRAJECTORY_REGRESSION_THRESHOLD
- computeDriftScore(points): 1 - mean(cosine(emb[i], emb[i-1])) over
  embedded points; clamped [0,1]; null when <3 embedded points (D-ENG-3
  graceful degradation)
- computeTrajectoryStats(points): composed shape returning both
- TRAJECTORY_SCHEMA_VERSION = 1 — additive-only across releases (R5)

MCP op (src/core/operations.ts):
- find_trajectory: scope read, NOT localOnly. Routes through
  sourceScopeOpts(ctx) for federated isolation AND threads ctx.remote
  for visibility filtering. Strips raw Float32Array embeddings from the
  wire shape; converts valid_from to YYYY-MM-DD string
- Registered in operations array after find_experts
- FIND_TRAJECTORY_DESCRIPTION in operations-descriptions.ts

CLIs:
- gbrain eval trajectory <entity> [--metric M] [--since D] [--until D]
  [--limit N] [--json] — chronological human view with [REGRESSION] inline
  annotation; thin-client routing via callRemoteTool(find_trajectory).
  Dispatched in src/commands/eval.ts sub-subcommand block
- gbrain founder scorecard <entity> [--since D] [--until D] [--json] —
  pure aggregation over Phase 2's substrate. Four signals:
  claim_accuracy (over resolved takes), consistency, growth_trajectory,
  red_flags. computeFounderScorecard exported for tests.
  Registered as top-level command in cli.ts; added to CLI_ONLY set

Tests (45 cases across 5 files):
- test/engine-find-trajectory.test.ts: 18 cases — chronological order,
  source scoping (scalar + federated), visibility filter on remote=true,
  metric + since/until filters, regression detection at threshold
  boundaries, drift score with various embedding states
- test/operations-find-trajectory.test.ts: 9 cases — op registration,
  param validation, JSON envelope shape, R5 schema_version: 1,
  embedding stripped from wire, R6 visibility filter, source scoping
- test/eval-trajectory.test.ts: 7 cases — arg parsing, --help,
  --json envelope, regression annotation, --metric filter, empty entity
- test/founder-scorecard.test.ts: 9 cases — empty inputs no-NaN (G2),
  claim_accuracy math, consistency math, growth_trajectory math,
  red_flags fire for regression / narrative_drift / missed_prediction
- test/eval-contradictions/no-valid-until-write.test.ts: 4 cases —
  R1 (probe never writes valid_until under eval-contradictions/) +
  R8 (only allow-listed files write valid_until anywhere in src/)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v0.35.6.0 — CHANGELOG + VERSION + docs + migration note

Bumps to v0.35.6.0 (next-minor after master's v0.35.5.1 — typed-claim
substrate + trajectory + founder scorecard is a new user-facing
feature surface, not a fix).

- VERSION + package.json synced
- CHANGELOG.md release-summary block in the wave-style voice, lead with
  what the user can now DO. Sections: typed metric claims in the fence,
  chronological metric trajectories, founder scorecard, MCP
  find_trajectory op, cycle re-run idempotency fix, embedding-on-insert
  fix, valid_from precedence fix. To-take-advantage-of block with
  verification + opt-in fence syntax example
- CLAUDE.md Key Files entry consolidating the wave across
  eval-trajectory.ts + founder-scorecard.ts + trajectory.ts. Names every
  D-ENG / D-CDX decision and the Codex outside-voice F-numbers
- skills/migrations/v0.35.6.md agent-readable migration note. Includes
  fence-syntax example for typed-claim rows so downstream agents start
  emitting them. Iron-rule contracts called out (R1 + R8 + R7 + visibility)
- llms-full.txt regenerated to reflect the new CLAUDE.md entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: post-ship sync for v0.35.7.0 — trajectory + founder scorecard

- README.md: add `gbrain eval trajectory` to EVAL section, add new
  TEMPORAL block covering `gbrain founder scorecard` + the
  GBRAIN_TRAJECTORY_REGRESSION_THRESHOLD env override; add v0.35.7
  "What's new" paragraph below the v0.28.8 LongMemEval blurb
- AGENTS.md: new bullet under Common tasks teaching agents to reach for
  `gbrain eval trajectory` / `gbrain founder scorecard` / the
  `find_trajectory` MCP op when asked to evaluate a founder/company
  over time
- docs/contradictions.md: append "Temporal axis follow-on (v0.35.3.1 +
  v0.35.7)" subsection under See also, cross-linking the trajectory
  substrate and naming the auto-supersession.ts:4 invariant preserved
  by both the verdict enum (probe side) and consolidate's valid_until
  writeback (cycle side)
- CLAUDE.md: fix stale (v0.35.4) tag on the trajectory entry to
  (v0.35.7) — version got rebumped twice during the merge wave
- skills/migrations/v0.35.7.md renamed to v0.35.7.0.md for consistency
  with the v0.35.0.0.md / v0.14.0.md / etc naming convention
- llms-full.txt regenerated to reflect the CLAUDE.md edit

Coverage map (Diataxis):
  /eval trajectory CLI       ✅ ref (README, AGENTS) ✅ how-to (CHANGELOG) ❌ tutorial
  /founder scorecard CLI     ✅ ref (README, AGENTS) ✅ how-to (CHANGELOG) ❌ tutorial
  find_trajectory MCP op     ✅ ref (CLAUDE.md, AGENTS, contradictions.md)
  typed-claim fence cols     ✅ ref (skills/migrations/v0.35.7.0.md, CHANGELOG)
  Migration v67              ✅ ref (CLAUDE.md, CHANGELOG)

No tutorial / explanation gaps worth filling in this PR — the migration
note's fence-syntax example already covers the "first typed claim"
walkthrough. ARCHITECTURE diagrams not drifted (the trajectory work
extends existing facts/takes infrastructure; no new component boxes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rrytan#1138)

* feat(cycle): phantom-page redirect inside extract_facts (v0.35.8.0)

Drains the existing pile of unprefixed entity pages (alice.md, acme.md)
that pre-PR-garrytan#1010 routing left behind. Folds the cleanup into the existing
extract_facts cycle phase via two new lossless engine primitives so the
v0.32.2 reconciliation contract owns drift handling instead of a parallel
implementation duplicating it.

Layers:
- engine: refreshPageBody + migrateFactsToCanonical on Postgres + PGLite
- resolver: resolvePhantomCanonical + findPrefixCandidates (codex #1/#11)
- orchestrator: src/core/cycle/phantom-redirect.ts + phantom-audit JSONL
- cycle: sourceId/brainDir threaded; 3 new totals counters
- tests: 38 unit + 6 parity + 4 E2E (48 total) pinning all 12 codex findings

* fix(test): pin clock in sync_freshness boundary tests (CI flake)

CI test (1) failed: `sync_freshness check > exact 72h boundary → warn`.
The test set `last_sync_at = Date.now() - 72h`, then checkSyncFreshness
called Date.now() again to compute ageMs. Between the two reads the
clock advanced (0.43ms in this CI run, microseconds locally) which
pushed ageMs above the strict 72h fail threshold and flipped the
status from warn to fail.

Same shape latent in the 24h boundary test — fixed both.

Fix:
- checkSyncFreshness gains an optional `opts.nowMs` test-only seam.
  Production callers omit it and get live wall-clock semantics.
- Both boundary tests now capture nowMs once and thread it through
  both `last_sync_at` and the check, eliminating drift between reads.

Verified deterministic: 10 consecutive runs of the 72h boundary test
pass on this machine (was occasionally failing before).
…aged-block install) (garrytan#1130)

* feat(skillpack): extract copyArtifacts shared helper (T1)

Pure file-copy primitive for scaffold (gbrain→host) and harvest (host→gbrain).
Atomic-refusal contract: symlink-reject + canonical-path containment validate
every item before any write. Used by both directions of the v0.33 loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): scaffold subcommand + SKILL.md frontmatter sources (T2)

New scaffold.ts replaces the managed-block installer. One-time additive copy
into the user's repo via copyArtifacts; refuses to overwrite existing files
(user owns them). Partial-state policy: copies missing paired sources even
when the skill dir already exists.

bundle.ts extended with loadSkillSources + enumerateScaffoldEntries — paired
source files declared in each SKILL.md's frontmatter sources: array, not in
openclaw.plugin.json. Single source of truth, co-located with the skill.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): reference command + apply-clean-hunks (T4 + T15)

reference is the read-only diff lens with an agent-readable framing line. Pure-JS
unified-diff producer + parser + applier (no patch(1) dependency). Two-way merge
with documented limitation: without scaffold-time base tracking, applied hunks
align everything to gbrain. The agent dry-runs reference first, then decides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): migrate-fence + scrub-legacy-fence-rows (T5 + T16)

migrate-fence is the one-shot transition from the pre-v0.36 managed-block model.
Strips begin/end markers and the cumulative-slugs receipt comment; preserves
fence rows verbatim as user-owned routing during the transition to frontmatter
discovery. Receipt-then-row fallback (F-CDX-8) covers stale/missing receipts.

scrub-legacy-fence-rows is the opt-in cleanup after migrate-fence. Two-condition
gate: removes a row only when skills/<slug>/ exists AND that skill's frontmatter
declares non-empty triggers (proof frontmatter discovery covers it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): harvest + privacy linter (T6 + T7)

The inverse loop: lift a proven skill from a host repo (~/git/wintermute, etc.)
back into gbrain so other clients can scaffold it. --from <host-repo-root> is
symmetric with scaffold's --workspace.

Security: symlink rejection + canonical-path containment (mirrors validateUploadPath).
Privacy: default-on linter scans harvested files against ~/.gbrain/harvest-private-patterns.txt
plus built-in defaults (Wintermute, email, Slack channel patterns). Any match
rolls back the copy and exits non-zero. --no-lint bypasses for the editorial
workflow after a manual scrub.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(repo-root): cwd_walk_up tier for non-OpenClaw hosts (T9 + D3)

autoDetectSkillsDir now walks up from cwd looking for any skills/ directory,
ahead of the implicit ~/.openclaw/workspace fallback. cd ~/git/wintermute &&
gbrain skillpack scaffold ... finds wintermute automatically without requiring
a RESOLVER.md/AGENTS.md to exist yet.

R5 regression preserved: $OPENCLAW_WORKSPACE still wins when explicitly set.
+5 test cases in test/repo-root.test.ts pin the new tier order and the R5 guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): rewrite CLI dispatch, drop install + uninstall (T3 + T10)

skillpack.ts dispatcher rewritten for the v0.36 contract: scaffold, reference
(+ --apply-clean-hunks), migrate-fence, scrub-legacy-fence-rows, harvest, plus
the existing list / diff / check.

install and uninstall are gone — both exit non-zero with a hint pointing at
scaffold / migrate-fence. Clean break, no deprecated alias.

skillpack-check gains --strict for CI gating. When invoked as the subcommand
`gbrain skillpack check`, default is informational (exit 0 even with drift);
--strict opts back into the cron-friendly exit-1-on-issues behavior. Top-level
gbrain skillpack-check preserves its existing exit semantics for backwards compat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skills): skillpack-harvest editorial workflow + resolver wiring (T8)

The companion editorial skill for the gbrain skillpack harvest CLI. Walks the
genericization checklist (scrub fork names, generalize triggers, lift fork-
specific conventions to references) before the CLI runs. Routing-eval fixtures
use paraphrased intents to avoid the intent_copies_trigger lint.

Wires the new slug into openclaw.plugin.json#skills, skills/manifest.json, and
skills/RESOLVER.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(skillpack): 9-case real-subprocess E2E flow (T11)

Spawns gbrain as a subprocess against tempdir workspaces. Covers: scaffold
first-run + re-run no-op, reference diff + --apply-clean-hunks, migrate-fence,
scrub-legacy-fence-rows, harvest privacy-lint catch + --no-lint bypass, and
the install removed-error path. No DATABASE_URL needed — skillpack is
filesystem-only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: docs + VERSION + CHANGELOG for v0.36.0.0 (T13 + T14)

Skillpacks as scaffolding, not amber.

v0.36 retires the managed-block install model. Six new subcommands replace
install + uninstall: scaffold, reference (with --apply-clean-hunks), migrate-fence,
scrub-legacy-fence-rows, harvest, plus the existing list / diff / check
(check gains --strict for CI gating). Routing comes from each skill's
frontmatter triggers — gbrain does not touch your RESOLVER.md or AGENTS.md.

Companion editorial skill skillpack-harvest drives the genericization
checklist; default-on privacy linter catches Wintermute / email / Slack
references before they leak into gbrain core.

New docs guide at docs/guides/skillpacks-as-scaffolding.md walks the model
and the migration path for pre-v0.36 installs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ci): privacy checks — allow-list harvest-lint tests, scrub user-facing fork-name references

CI's check-privacy.sh and check-test-real-names.sh both flagged the literal
fork name across the v0.36 skillpack diff. Two failure modes, two fixes:

1. **Meta-rule-enforcement files** added to both allow-lists. The harvest
   privacy linter's whole job is to catch the banned literal leaking into
   gbrain; its source has the regex pattern, its tests verify the linter
   fires by feeding it the banned string, and the skill markdown documents
   the substitution policy. Same exception status as check-privacy.sh and
   check-proposal-pii.sh themselves. Files allow-listed:
   - src/core/skillpack/harvest-lint.ts
   - test/skillpack-harvest-lint.test.ts
   - test/skillpack-harvest.test.ts
   - test/e2e/skillpack-flow.test.ts
   - skills/skillpack-harvest/SKILL.md

2. **User-facing references** swapped for canonical phrasing per CLAUDE.md's
   responsible-disclosure rule. README + new docs guide + 4 src docstrings
   + 1 test now say 'your OpenClaw' / 'host agent repo' / 'agentRepo' var
   name. Behavior unchanged — only documentation strings touched.

Verify gate (the script CI runs) passes locally: EXIT=0.
Tests still pass: 60/60 across the affected files.
llms-full.txt regenerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): update check-resolvable-cli expectation for cwd_walk_up tier

Sister fix to the test/repo-root.test.ts update in commit a31418e. The new
v0.33 cwd_walk_up tier fires before repo_root when running from inside the
gbrain repo — same skills/ dir matched, different source label. Behavior
unchanged; the legacy repo_root tier is now functionally subsumed (kept in
the type union for back-compat).

CI shard 3 failure: test/check-resolvable-cli.test.ts:171.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): pin clock in sync_freshness boundary tests (CI flake)

The 24h and 72h exact-boundary tests scheduled last_sync_at relative to
Date.now() at construction time, then let the check call Date.now() again
internally. CI scheduler jitter between the two reads pushed ageMs past
the strict > thresholds by microseconds, dropping the 72h-boundary case
into the fail branch instead of warn.

Fix: add an optional `opts.now` test seam to checkSyncFreshness. The two
boundary tests now capture t0 once and pass it both to the timestamp
constructor and to the check, making ageMs deterministically equal to
the boundary. The non-boundary tests (4d, 30h, 2h, etc.) don't need
pinning — they're comfortably away from the > comparison.

CI shard 1 flake: test/doctor.test.ts:479. Locally 48/48 doctor tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): agent-onboarding readme + next-action hints on every CLI surface (DX review)

DX audit of the v0.36 scaffold model surfaced one structural gap and four
output gaps. When scaffolded files land on a downstream agent's disk, the
agent had no agent-facing manifest telling it what to do — no routing
contract, no upgrade flow, no two-way merge warning at the right surface.

Fixes:

1. **New shared dep: skills/_AGENT_README.md.** Lands on every scaffold +
   migrate-fence alongside the existing _brain-filing-rules.md and
   _output-rules.md. Short, agent-readable contract: walk *.SKILL.md
   frontmatter triggers: for routing, gbrain is reference not law on
   upgrade, no managed-block fence anymore, two-way merge has known
   limitations. Single source of truth for the agent operating contract.

2. **scaffold stdout** prints a next-action hint pointing at the readme
   (with absolute path) and the reference --all upgrade-sweep command.

3. **reference stdout** adds per-category decision policy:
   - missing → scaffold again
   - differs → was edit intentional? keep it. Accidental? patch by hand or
     apply-clean-hunks after reading the two-way warning.

4. **reference --apply-clean-hunks** prints the two-way merge WARNING
   BEFORE the apply (to stderr, survives stdout redirect). Spells out
   that gbrain has no scaffold-time base and local edits in differing
   sections WILL be aligned to gbrain. Skipped in --json mode for
   machine consumers. On conflicts, prints how to inspect and patch.

5. **migrate-fence stdout** tells the agent its routing model just
   changed (fence gone, walk frontmatter now) and points at
   scrub-legacy-fence-rows as the eventual cleanup. References the new
   _AGENT_README for fresh-install agents.

Smoke verified end-to-end: 16 files land (was 15, +1 for _AGENT_README),
hint prints with absolute path, readme lands on disk. Tests + verify gate
pass clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skillpack): upgrade-time reference sweep + reference --since version filter (DX deferred items)

Closes the last two DX gaps from the v0.36 audit:

1. **Post-upgrade reference sweep.** New `postUpgradeReferenceSweep`
   helper called at the end of `gbrain post-upgrade`. After migrations
   apply, auto-runs `reference --all` against the detected host
   workspace and prints a one-line-per-skill summary of drift. Five
   gates: GBRAIN_SKIP_REFERENCE_SWEEP env-var bypass, no detected
   workspace (silent), workspace IS gbrain repo (dev-mode silent),
   zero drift (silent), and pure-missing skills the host never
   scaffolded are filtered out as noise. All errors swallowed —
   never blocks post-upgrade. Helper accepts test-seam opts
   (gbrainRoot, targetWorkspace) for unit testability.

2. **`reference --all --since <version>`.** Filters the sweep to
   skills whose source actually changed in gbrain between
   <version> and HEAD, using a new `changedSlugsSinceVersion`
   helper in bundle.ts. Pure-JS git wrapper (spawnSync), no deps.
   Accepts bare '0.X.Y.Z' or 'v0.X.Y.Z' or commit SHA. Falls back
   loudly to full sweep when git can't resolve the ref (tarball
   install, missing tag).

Test coverage added — total +32 new test cases:

UNIT (15 cases):
- test/skillpack-changed-since-version.test.ts (9 cases): git-aware
  filter against a fixture git repo. Covers null on non-repo,
  null on bad tag, empty array on no changes, single + multi-slug
  drift (deduped + sorted), bare + v-prefix version forms, non-
  skills/ path filtering, SHA-prefix ref form.
- test/upgrade-reference-sweep.test.ts (6 cases): gate logic.
  Covers env-var bypass, zero drift, empty-host suppression,
  drift-detected output shape, dev-mode workspace==gbrain guard,
  error-swallowing contract.

E2E (8 new cases in test/e2e/skillpack-flow.test.ts):
- 10: scaffold lands skills/_AGENT_README.md
- 11: scaffold stdout prints the Next: hint
- 12: scaffold re-run (skipped-existing) suppresses the hint
- 13: reference stdout prints per-category decision policy
- 14: --apply-clean-hunks WARNING on stderr, not stdout
- 15: --apply-clean-hunks --json suppresses the WARNING (bug fix
  surfaced here: code originally printed unconditionally, now
  gated on !json)
- 16: migrate-fence stdout points at the new routing model
- 17: --since with a bad tag falls back to full sweep with warn

Local sweep: 579/579 pass across 18 affected test files, verify
gate EXIT=0, llms regenerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(README): zero-base rewrite — 921 → 422 lines, refreshed catalog, MECE structure

The README had drifted into a changelog dumping ground. Four 'New in vX.Y'
paragraphs competed for the lead, 16 version tags scattered through
headings, the production-numbers hook (17,888 pages, 4,383 people) was
six months stale, and skills were described in three places (Skills section,
Commands section, inline marketing prose).

Zero-based rewrite:

**Refreshed catalog** (surveyed live brain + live agent fork, broad strokes
per CLAUDE.md privacy rules):
- ~100K total brain items (was 17,888 in the old README — 6x stale)
- ~16K people (was 4,383)
- ~5K companies (was 723)
- ~8K concepts, ~4K originals, ~3.5K daily notes
- ~31K media (30K tweets, 179 books, papers/films/games/interviews)
- 108 cron jobs running (was 21)
- 273 skills in the live agent fork (35 bundled + 238 user-built)

**Structure** — MECE, single source of truth per concept:
1. Hook + at-a-glance table (refreshed numbers)
2. Install (3 paths, terse)
3. What it does (5 capability areas — replaces 12 scattered sections)
4. Skills (categorized one-liners — 35 lines, was ~200)
5. How it works (one coherent flow — replaces 4 overlapping sections:
   Architecture, Knowledge Model, Knowledge Graph, Search, Why It Works)
6. Commands (terse cheatsheet — every command, one line each)
7. Docs (link map — points to docs/ for the heavy stuff)
8. Origin / Contributing / License

**Cut entirely** (moved or deleted):
- 4 'New in vX.Y' leads (→ CHANGELOG.md is the changelog)
- 16 (vX.Y) version tags in section headings
- Minions stats subsection (subsumed into hook + 'durable background work')
- Voice section (was 12 lines of brand prose)
- Engine Architecture detail (→ docs/architecture/)
- File Storage section (→ docs/guides/storage-tiering.md)
- Per-skill marketing prose (one-liner per skill in the table)

The README is no longer the changelog. Future releases append to
CHANGELOG.md; the README only changes when a structural capability does.

llms-full.txt regenerated. Privacy check + verify gate pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(README): fix line-start '+' rendering bug + lead with eval evidence

Two fixes in one:

1. **Markdown bug fix.** The OAuth 2.1 paragraph had `+ PKCE,` on a line
   start (column 1), which GitHub-flavored markdown interprets as a list
   marker — the line break before it broke the paragraph and rendered as
   an orphan first line followed by a bullet. Rewrote the OAuth 2.1
   capabilities as inline-comma-separated, escaped the `+` semantics.
   Swept the whole file for the same bug class — no other instances.

2. **Maximum-sell mode for evals.** Surveyed every published benchmark
   in both this repo and ~/git/gbrain-evals. Strongest evidence pulled
   to the top:

   - **97.60% R@5 on the public LongMemEval _s (500 questions).** No LLM
     in the retrieval loop. $0.50 per 1000 queries. Beats MemPalace raw
     by a point on the same dataset, beats every academic dense
     retriever (Stella, Contriever, BM25). Mastra/Supermemory measure
     a different metric (QA accuracy with LLM judge) — flagged honestly.

   - **+31.4 points P@5 from the self-wiring knowledge graph** on
     BrainBench v0.20.0 (240-page rich-prose corpus, 145 relational
     gold queries). Separable, measured, load-bearing. Zero retrieval
     regression across seven releases (v0.16 → v0.20).

   New '## Benchmarks' section after Install:
   - Public benchmark table with cross-system comparison
   - In-house BrainBench scorecard with per-adapter Δ vs gbrain
   - Source-swamp resistance result (93.3% top-1 vs 80% grep-only)
   - Skill/prompt compression: 25KB → 13KB AGENTS.md, +13-17pp accuracy
     across Opus 4.7 / Sonnet 4.6 / Haiku 4.5
   - 'Run your own evals' subsection with copy-pasteable commands for
     every eval surface (longmemeval, cross-modal, eval capture/replay,
     BrainBench)

   Tightened the lead's cost-comparison claim to what's defensible per
   the underlying eval doc (MemPal LLM-rerank $0.001/q vs gbrain
   $0.0005/q; dropped the overstated '6x' I'd written initially).

Privacy + verify gate + build-llms test all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(README): integrate the eval story into the lead, move jargon into 'Receipts on the evals'

Previous lead dumped metric acronyms (R@5, P@5, P@5 deltas, MemPalace,
Stella, Contriever, BM25) before the reader knew what gbrain does. A
'somewhat technical' reader hits the wall of jargon and bounces.

Rewritten:

**Lead (jargon-free, 3 paragraphs)** — describes the value in plain
English, with two anchor numbers:
- 'right answer in top 5 results 97.6% of the time' (not 'R@5 97.60%')
- 'roughly 4x more relevant than plain vector RAG' (not '+31.4 pts P@5')
- 'better than every comparable system that doesn't pay for a language-
  model call on every retrieval' (the load-bearing honest framing,
  without naming the competitors mid-hook)
- ends with '[Receipts on the evals →]' linking down

**'## Benchmarks' renamed '## Receipts on the evals'** with a glossary
at the top defining R@5, P@5, and 'no LLM in the loop' in one line each.
Then the full tables: LongMemEval cross-system (with the metric-mismatch
flag for Mastra/Supermemory), in-house BrainBench scorecard, source-swamp
resistance, and prompt compression. The competitor names + metrics stay
here where readers who want the receipts can find them, with the
glossary so the acronyms don't tax cold readers.

Net: lead reads as 'here's what it does and the proof' instead of 'here
are the benchmark numbers, figure out what they mean.' Comparison facts
unchanged.

Privacy + verify gate + build-llms test all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(README): name LongMemEval explicitly + first-person voice in lead

Two specific edits from user feedback:

1. 'the standard public benchmark for AI memory systems' → 'LongMemEval'
   (linked to the HuggingFace dataset). The benchmark has a name; use it.

2. 'Built by the President and CEO of Y Combinator to run his own AI
   agents' (passive third-person) → 'I'm the President and CEO of Y
   Combinator, and I use this 16 hours a day' (active first-person).
   Carried the voice change through the rest of the README — the
   downstream 'Garry's personal agent' line and the Origin section's
   'Garry Tan needed... he'd ever drafted... so he built one' all flip
   to first person ('my personal agent', 'I needed', 'I'd ever drafted',
   'so I built one'). The README is now consistently first-person from
   the author's voice instead of a hagiographic third-person framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(README): add Multi-player and company brains section

Three deployment patterns documented:

1. Single GBrain server + thin MCP clients (recommended). Tailscale
   private networking, OAuth scope, source-scoped clients, exhaustive
   what-clients-can/cannot-do lists.
2. Local PGLite + GStack for per-worktree code search.
3. Federated repos (advanced) — multiple servers indexing the same
   brain repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(README): tighten install path + add tech-orientation block + visceral query example

Self-eval as a cold reader surfaced four gaps blocking a 10/10 first read:

1. Lead never says WHAT it is technically — CLI? service? cloud? local?
   Added a "What it is, technically" block right after the hook: open-source
   MIT, Bun CLI + MCP server, local-first, data stays on disk, MCP-native.
2. Install path optimized for committed users not evaluators. The old
   "recommended" path (deploy OpenClaw on Render, 8GB RAM) blocked anyone
   trying gbrain for the first time. Reordered into 3 paths by commitment:
   60-second standalone CLI first, MCP for Claude Code / Cursor second,
   full agentic install third.
3. No example output showing what success looks like. Added a real sample
   `gbrain query` invocation with the hybrid-search result format so a
   reader can feel the experience before they install.
4. Privacy / data-locality unaddressed in lead. Now stated up front:
   embedding calls only hit external APIs if you configure them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… wrong (garrytan#1139)

* schema: v0.36.0.0 Hindsight calibration tables (migrations v67-v71)

Foundation commit for the Hindsight-inspired calibration wave. Adds four
new tables + one perf index, all source-scoped from day 1 per v0.34.1
discipline:

- calibration_profiles (v67): per-holder LLM-narrative aggregation of
  TakesScorecard data. published BOOL gates E8 cross-brain mount sharing
  (default false). grade_completion REAL surfaces partial-grade state to
  the dashboard. active_bias_tags TEXT[] with GIN index feeds E3 (calibration-
  aware contradictions) and E7 (real-time nudge matching).

- take_proposals (v68): propose_takes phase queue. Idempotency cache via
  (source_id, page_slug, content_hash, prompt_version) unique index mirrors
  the v0.23 dream_verdicts pattern. proposal_run_id supports --rollback by
  run. dedup_against_fence_rows JSONB audit column records what canonical
  takes the LLM was told to dedupe against at proposal time.

- take_grade_cache (v69): grade_takes verdict cache. Composite PK on
  (take_id, prompt_version, judge_model_id, evidence_signature) — prompt
  edits OR evidence changes cleanly invalidate prior verdicts. applied=false
  default + auto-resolve-off-by-default (D17) means every fresh install
  needs operator opt-in before grade verdicts mutate the takes table.

- take_nudge_log (v70): E7 nudge cooldown state. Polymorphic FK — a nudge
  fires on either a canonical take OR a pending proposal (CDX-5 fix). CHECK
  constraint enforces exactly-one-set. channel column lets future routing
  (webhook, admin SPA toast) reuse the same cooldown semantics.

- takes_resolved_at_idx (v71): partial index for the Brier-trend
  aggregation queries. Engine-aware handler — Postgres uses CONCURRENTLY
  to avoid the ShareLock; PGLite uses plain CREATE.

Every table carries wave_version TEXT NOT NULL DEFAULT 'v0.36.0.0' so the
v0.36.0.0 calibration --undo-wave command (lands later in the wave) can
reverse just this wave's writes.

Plan: ~/.claude/plans/system-instruction-you-are-working-rippling-knuth.md
covers the design rationale (D17/D18/D21 + CDX findings).

Schema parity:
- src/schema.sql for fresh Postgres installs
- src/core/pglite-schema.ts for fresh PGLite installs
- src/core/schema-embedded.ts auto-regenerated from schema.sql
- src/core/migrate.ts for upgrade-in-place from older brains

VERSION bumped to 0.36.0.0 for the wave. CHANGELOG entry lands at /ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* core: BaseCyclePhase abstract class enforces source-scope + budget contracts

D21 from the eng review. Three new v0.36.0.0 cycle phases (propose_takes,
grade_takes, calibration_profile) share enough structure that the
duplication-vs-abstraction trade tips toward a shared base. Without this
scaffold, source-isolation discipline would drift exactly the way it
drifted in v0.34.1 — except this time across three new surfaces at once.

What this enforces:

1. Phase signature is uniform: run(ctx, opts) → PhaseResult.

2. ctx.sourceId / ctx.auth.allowedSources MUST be threaded through every
   engine call. The base class surfaces a scope() helper that wraps
   sourceScopeOpts(ctx) and is the only sanctioned way to read source-
   scoped data. Forgetting to thread source scope becomes a TypeScript
   compile error, not a runtime leak. Closes the v0.34.1 leak class
   structurally for every new phase.

3. Budget meter wraps run() automatically. Subclass declares budgetUsdKey
   + budgetUsdDefault; base reads the resolved cap from config and creates
   the BudgetMeter. Subclass calls this.checkBudget() before each LLM
   submit; budget-exhausted phase still returns status='ok' (clean abort)
   so the cycle report shows partial completion, not failure.

4. Error envelope is uniform. Thrown errors get caught and converted to
   status='fail' with a phase-specific error.code via the subclass's
   mapErrorCode() hook.

5. Progress reporter integration. Base accepts the reporter via opts;
   subclasses call this.tick() instead of touching the reporter directly,
   so the phase name in the progress stream is always correct.

Tests: 13 cases in test/core/base-phase.test.ts cover source-scope
threading (5 cases including the empty-allowedSources-MUST-NOT-widen-scope
regression), PhaseResult shape including the error envelope path (3
cases), dry-run propagation (2 cases), and budget meter construction
(3 cases including config-key override).

Synthesize.ts / patterns.ts (existing pre-v0.36 phases) deliberately do
NOT retrofit to this base in v0.36.0.0 — too much churn for a refactor
that doesn't pay off until v0.37+. Future phases use this by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cycle: propose_takes phase + take_proposals queue write path (T3)

LLM-based take extraction from markdown prose. Walks pages updated since
last cycle, sends each page's body to a tuned extractor, writes the
extracted gradeable claims to the take_proposals queue. User accepts /
rejects via `gbrain takes propose --review` (lands in Lane C).

Cycle wiring:
  lint → backlinks → sync → synthesize → extract → extract_facts →
    resolve_symbol_edges → patterns → recompute_emotional_weight →
    consolidate → propose_takes (NEW) → grade_takes (NEW; T4) →
    calibration_profile (NEW; T6) → embed → orphans → purge

CyclePhase enum extended with 3 new entries; ALL_PHASES + NEEDS_LOCK_PHASES
updated. All three new phases acquire the cycle lock (writes to
take_proposals / take_grade_cache / calibration_profiles).

Idempotency contract:
  The (source_id, page_slug, content_hash, prompt_version) composite unique
  index on take_proposals means an unchanged page never re-spends LLM
  tokens. Bumping PROPOSE_TAKES_PROMPT_VERSION cleanly invalidates the
  cache so a tuned prompt re-runs proposals on every page. Mirrors the
  v0.23 dream_verdicts pattern.

F2 fence dedup:
  The phase reads the page's existing `<!-- gbrain:takes:begin -->` fence
  (when present) and passes the canonical take rows to the extractor as
  "things you have already captured." Prevents duplicate proposals when
  prose is appended to a page that already has takes. Records the fence
  rows the LLM was told to dedupe against on the take_proposals row for
  audit (dedup_against_fence_rows JSONB).

Auto-resolve posture:
  propose_takes only WRITES proposals to the queue. Nothing in this phase
  mutates the canonical takes table. Operator opt-in via the queue review
  CLI (Lane C) is the only path from queue to canonical fence (D17).

Prompt tuning status (v0.36.0.0 ship state):
  The default extractor prompt is annotated `v0.36.0.0-stub`. The real
  tuned prompt arrives via T19 synthetic corpus build (50 anonymized
  pages, 3-model parallel extraction, user reviews disagreement set,
  F1 ≥ 0.85 on training corpus + F1 ≥ 0.8 on ground-truth holdout).
  Until T19 lands, propose_takes runs but produces best-effort candidates
  the user reviews manually.

Architecture:
  ProposeTakesPhase extends BaseCyclePhase (T2). Inherits source-scope
  threading via scope(), budget metering via this.checkBudget(), error
  envelope wrapping. budgetUsdKey: cycle.propose_takes.budget_usd
  (default $5/cycle). Budget exhaustion mid-page returns status='warn'
  with details.budget_exhausted=true — clean partial-completion semantics.

  Test seam: opts.extractor injection so the phase can run hermetically
  without touching the gateway. defaultExtractor (production path) calls
  gateway.chat with the EXTRACT_TAKES_PROMPT and parses the JSON array
  output via parseExtractorOutput.

  parseExtractorOutput defends against common LLM output sins: markdown
  code fence wrapping, leading prose, single-object instead of array,
  unknown kind values, weight out of [0,1], rows missing claim_text or
  exceeding 500 chars.

Tests: 25 cases in test/propose-takes.test.ts cover the 4 pure helpers
(parseExtractorOutput, contentHash, hasCompleteFence,
extractExistingTakesForDedup) + 7 phase integration scenarios (happy path,
cache hit, fence dedup, extractor failure, empty pages, skipPagesWithFence,
proposal_run_id stability).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cycle: grade_takes phase + take_grade_cache verdict pipeline (T4)

Walks unresolved takes that are old enough to have outcome data, retrieves
evidence from the brain, asks a judge model to verdict each one. Writes
verdicts to take_grade_cache. Optionally — only when operator has flipped
the opt-in config flag — auto-applies high-confidence verdicts to the
canonical takes table via engine.resolveTake.

Auto-resolve posture (D17 — DISABLED by default):
  On a fresh install, grade_takes runs and writes verdicts to the cache,
  but applied=false on every row. Operator reviews the queue, then flips
  `cycle.grade_takes.auto_resolve.enabled: true` once trust is earned.
  Mirrors the propose_takes review-queue posture: queue exists, mutation
  requires explicit opt-in.

Conservative threshold (D12):
  When auto_resolve.enabled is true, a verdict auto-applies only when
  confidence >= 0.95 (single-judge path). T5 ensemble path lands next,
  tightening this further with 3/3 unanimous requirement.

  'unresolvable' verdict NEVER auto-applies even at confidence=1.0 —
  there's no canonical column for "we tried and there's no evidence yet."

Evidence retrieval status (v0.36.0.0 ship state):
  The default evidence retriever returns an "evidence-retrieval not yet
  wired" placeholder. Most verdicts produced by the stub-judge against
  the stub-evidence will be 'unresolvable'. Real retrieval (hybrid search
  over pages newer than the take's since_date, optionally augmented by a
  gateway web-search recipe in v0.37+) lands as a follow-up. Documented
  limitation per CDX-8 + D17 — the phase ships now so the wiring is real
  and the cache table accumulates verdicts even if early ones are
  conservative.

Cache key:
  Composite primary key on take_grade_cache is
  (take_id, prompt_version, judge_model_id, evidence_signature). Prompt
  edits OR evidence changes OR judge swap cleanly invalidate prior
  verdicts. Mirrors the v0.32.6 eval_contradictions_cache pattern.

  evidence_signature = SHA-256 of (judge_model_id + '|' + evidence_text)
  so identical evidence under a different judge does NOT collide.

Architecture:
  GradeTakesPhase extends BaseCyclePhase. Inherits source-scope threading,
  budget metering (cycle.grade_takes.budget_usd, default $3/cycle), error
  envelope. Test seam: opts.judge + opts.evidenceRetriever injection so
  the phase runs hermetically.

  parseJudgeOutput defends against fence-wrapping, leading prose,
  out-of-range confidence (clamps to [0,1]), invalid verdict labels,
  oversized reasoning (truncated at 400 chars). Returns null on
  unrecoverable parse — caller treats null as "judge_output_parse_failed
  / unresolvable at confidence 0.0" so the row still lands in cache with
  the parse failure surfaced via warnings.

  takeIsOldEnough gates on since_date (default 6 months). Tolerates
  YYYY-MM-DD and YYYY-MM formats. Returns false on null/unparseable
  since_date so takes without dates never get graded (we'd be
  hallucinating temporal context).

Tests: 23 cases covering parseJudgeOutput (7 cases), evidenceSignature
(3), takeIsOldEnough (5), and 8 phase integration scenarios — happy path,
D17 auto-resolve-off default, D12 above-threshold auto-apply, below-
threshold cache-only, unresolvable-NEVER-applies, cache hit, too-recent
gate, judge-throw warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cycle: grade_takes ensemble tiebreaker for borderline verdicts (T5 / E2)

Multi-judge ensemble tiebreaker, additive on top of T4's single-judge
foundation. Reuses gateway.chat as the per-model judge interface; runs
three judges in parallel via Promise.allSettled. Pure aggregation logic
in aggregateEnsemble() — no SQL, no LLM, hermetically testable.

When ensemble fires (T5 trigger band):
  Only when ALL of:
    - opts.useEnsemble === true (default false)
    - opts.ensembleJudges array is non-empty
    - single-model confidence in [0.6, 0.95) (configurable via
      opts.ensembleTriggerBand)
    - single-model verdict !== 'unresolvable'

  Above 0.95 the single judge is already sufficient (T4 path). Below 0.6
  the verdict is clearly review-only — ensemble wouldn't change the
  posture. 'unresolvable' from single-judge means no evidence yet; calling
  three more judges on the same evidence won't manufacture some.

Conservative auto-apply (D12):
  Ensemble verdict auto-applies via engine.resolveTake only when ALL of:
    - autoResolve === true (operator opt-in per D17)
    - ensemble.agreement === 3 (3/3 unanimous)
    - ensemble.minConfidence >= ensembleThreshold (default 0.85)
    - winning verdict !== 'unresolvable'

  Schema-level monotonic-tightening guard for ensembleThreshold lives in
  the takes resolution layer.

Cache identity:
  When ensemble fires, the cache row's judge_model_id becomes
  'ensemble:<modelA>+<modelB>+<modelC>' — a future re-run with different
  ensemble membership doesn't collide with prior verdicts. evidence_signature
  is recomputed because it includes the judge_model_id.

aggregateEnsemble (pure):
  - 3/3 unanimous → agreement=3, minConfidence=min across the three
  - 2/3 majority → agreement=2, minConfidence across the agreeing two
  - 1/1/1 disagreement → tie-break: prefer non-'unresolvable', then
    alphabetical for determinism
  - 'unresolvable' from one model NEVER tips a 2-vote majority toward
    'unresolvable' — by-label tally only counts a model toward its own
    label
  - All three judges failing (allSettled rejected) → verdict='unresolvable'
    with agreement=0; auto-apply path blocked
  - Single judge survives + two fail → agreement=1; the lone verdict wins
    but auto-apply gated by the 3/3 requirement

Tests: 16 cases.
  aggregateEnsemble (6): 3/3, 2/3, 1/1/1, unresolvable-tipping-resistance,
  all-failed, partial-failed-but-survives.
  Phase trigger conditions (5): useEnsemble=false default, useEnsemble=true
  in borderline band, single >= 0.95 skip, single < 0.6 skip, single =
  'unresolvable' skip.
  Phase auto-apply rules (5): 3/3+threshold+autoResolve, 2/3 majority no
  apply, 3/3 below threshold no apply, one ensemble judge throws still
  aggregates from allSettled, empty ensembleJudges falls through to
  single.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cycle: calibration_profile phase + shared voice gate across surfaces (T6)

The calibration narrative layer. Reads TakesScorecard, asks an LLM to
write 2-4 conversational pattern statements ("right on tactics, late on
macro by 18 months"), passes them through the voice gate, derives active
bias tags, writes the row to calibration_profiles. This is the read-side
that E1 (think anti-bias rewrite), E3 (contradictions join), E6
(dashboard), and E7 (real-time nudges) all consume.

Voice gate (D24 — single function, multiple surfaces):
  ALL five calibration UX surfaces import the same gateVoice() function
  from src/core/calibration/voice-gate.ts. Mode parameter
  ('pattern_statement' | 'nudge' | 'forecast_blurb' | 'dashboard_caption'
  | 'morning_pulse') drives surface-specific tuning via the rubric the
  gate ships to its Haiku judge. NO forked implementations — voice
  rubric drift would defeat the gate.

  Each mode's rubric explicitly forbids preachy / clinical / corporate
  voice; a structural test pins this. Anchors the cross-cutting voice
  rule from /plan-ceo-review D2-D8.

Fallback policy (D11):
  Up to 2 generation attempts (configurable). On both rejects → fall back
  to a hand-written template from src/core/calibration/templates.ts.
  Templates are intentionally short and a little "robotic" — they're the
  safety net, not the destination. voice_gate_passed=false +
  voice_gate_attempts get persisted on the calibration_profiles row so
  the operator can review the failing examples and tune the rubric over
  time. Suppressing the surface silently is NEVER an option — that's how
  voice quality silently degrades.

  parseJudgeOutput defaults to 'academic' on parse failure (NEVER passes
  pass-through) so a Haiku output garble falls through to the template
  rather than letting unverified text reach the user.

calibration_profile phase:
  Extends BaseCyclePhase. Cold-brain skip: <5 resolved takes → no row
  written, no LLM call. Otherwise: scorecard via engine.getScorecard()
  → patterns via voice-gated generator → bias tags via separate
  generator (best-effort; failure logs warning, phase continues).

  The DB INSERT lands in the v67 calibration_profiles row with
  source_id, holder, the patterns, voice gate audit fields, active bias
  tags, and grade_completion (F1 fix — partial-grade state surfaces to
  the dashboard "60% graded" badge).

  Budget gate at $0.50/cycle default (mostly Haiku). Below-budget
  before-LLM-call check returns status='warn' without writing the row.

  Per-domain scorecards are a placeholder for v0.36.0.0 ship state —
  the F12 batchGetTakesScorecards() engine method that powers per-domain
  rendering lands in Lane C alongside the CLI/MCP surface.

Architecture:
  parsePatternStatementsOutput is tolerant of LLM emitting numbered
  lists / bulleted lines despite the prompt asking for plain lines.
  Caps at 4 patterns + drops excessively long lines (>200 chars).

  parseBiasTagsOutput lowercases input + drops non-kebab-case tokens
  (defends against the LLM emitting "Over-Confident Geography" with
  spaces or capitals). Caps at 4 tags.

Tests: 43 cases across two new test files.
  voice-gate.test.ts (24): parseJudgeOutput (7), gateVoice happy path
  (3), fallback path (5), mode parity (2), templates (7).
  calibration-profile.test.ts (19): parsers (10), pickFallbackSlots
  (3), phase integration (6 — cold-brain skip, happy path, voice gate
  fallback, grade_completion plumbed through, bias-tags failure
  non-fatal, source_id scope reaches INSERT).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cli: gbrain calibration + get_calibration_profile MCP op (T7)

Public-facing read surface for the v0.36.0.0 calibration wave. CLI prints
the active calibration profile; MCP op exposes the same data path for
agents. Mirror of the v0.29 salience/anomalies shape (pure data fn + JSON
formatter + human formatter + thin CLI dispatch).

CLI: `gbrain calibration`
  Flags:
    --holder <id>         specific holder (default 'garry')
    --json                machine output for piping
    --regenerate          run calibration_profile phase now
    --undo-wave <ver>     [placeholder — wires in Lane D / T17]
    ab-report             [placeholder — wires in Lane D / T18]

  Human output:
    Calibration profile — holder: garry, source: default
    Generated: <local timestamp>
    [Note: built on 60% graded — partial completion this cycle.]   (when grade_completion < 0.9)
    [Note: voice gate fell back to template (2 attempts).]         (when voice_gate_passed=false)

    Resolved: 12 takes
    Brier:    0.210 (lower is better)
    Accuracy: 60.0%
    Partial:  10.0%

    Pattern statements:
      • You called early-stage tactics well — 8 of 10 held up.

    Active bias tags: over-confident-geography

  Cold-brain fallback message names the exact dream command to run.

MCP: `get_calibration_profile` (scope: read)
  Param: holder?: string (defaults to 'garry')
  Returns: latest CalibrationProfileRow | null

  Source-scoping via sourceScopeOpts(ctx): scalar source-bound clients see
  only their source; federated_read scopes see the union of allowed sources;
  no source filter when neither is set (CLI default path).

  Throws GBrainError('INVALID_HOLDER') on empty/non-string holder so
  remote callers get a structured error instead of a SQL-shape failure.

Architecture:
  getLatestProfile is the pure data fn — engine + opts → CalibrationProfileRow | null.
  Reused by both the CLI and the MCP op. Source-scoped via the standard
  v0.34.1 spread pattern (scalar sourceId vs sourceIds array).

  formatProfileText is pure — null → cold-brain message, populated → full
  printout. Annotates partial-grade rows and voice-gate-fallback rows so
  the operator sees data-quality status inline.

  parseArgs is exported via __testing for unit coverage. Sub-command
  ('ab-report') vs flag distinction is intentional — keeps the surface
  parallel with `gbrain eval cross-modal` etc.

Tests: 21 cases.
  parseArgs (6 cases): empty, --holder, --json, --regenerate, --undo-wave, ab-report.
  getLatestProfile (5 cases): happy, null, scalar source scope, federated array
    scope, no-source-filter default.
  formatProfileText (5 cases): cold-brain, happy, partial-grade note, voice-fallback
    note, published-to-mounts note.
  getCalibrationProfileOp (5 cases): default holder, scalar source scope,
    federated scope union, returns-null-on-unknown-holder, throws on empty holder.

Lane D follow-ups: --undo-wave (T17) and ab-report (T18) print a clear
"lands in Lane D" stderr line + exit 2; the surfaces exist for early
testers, the implementations land next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* think: --with-calibration + anti-bias prompt rewrite (T8 / E1, D22)

Optional anti-bias rewrite mode for `gbrain think`. When set, the active
calibration profile gets injected per the D22 placement spec (AFTER
retrieval evidence, BEFORE the user's question). The bias filter applies
to QUESTION FRAMING, not evidence interpretation — matches LLM-as-judge
best practice (bias prompts near end of context perform better).

Default behavior unchanged (R1 regression guard): omitting
--with-calibration produces the v0.28-vintage user-message shape with the
question first, then retrieval. Existing think users see no change.

Two user-message shapes in buildThinkUserMessage:

  Default (no calibration):
    Question: X
    <pages>...</pages>
    <takes>...</takes>
    <graph>...</graph>
    Respond with a single JSON object...

  With calibration (D22):
    <pages>...</pages>
    <takes>...</takes>
    <graph>...</graph>
    <calibration holder="garry">
      Track record: Brier 0.210 (lower is better).
      Active patterns:
        - You called early-stage tactics well — 8 of 10 held up.
      Active bias tags: over-confident-geography
    </calibration>
    Question: X
    Respond...

  Calibration block is built by buildCalibrationBlock (exported for the
  E3 contradictions probe to render the same shape).

System prompt extension (withCalibration:true):
  - Names BOTH the user's PRIOR (default reasoning) AND the COUNTER-PRIOR
    from their hedged-domain self.
  - References active bias tags by name when relevant ("this fits the
    over-confident-geography pattern").
  - Does NOT silently substitute the debiased answer. ALWAYS surfaces
    both priors transparently.
  - Adds a "Calibration" section between Conflicts and Gaps in the
    answer body.

RunThinkOpts extension:
  - withCalibration?: boolean — opt-in
  - calibrationHolder?: string — defaults to 'garry'

  When withCalibration=true and no profile exists, runThink falls back to
  baseline behavior + pushes NO_CALIBRATION_PROFILE to warnings (visible
  to the operator). When the calibration fetch fails, CALIBRATION_FETCH_FAILED
  warning surfaces with the underlying error. Either path keeps think working;
  the calibration loop is enhancement, not requirement.

CLI: `gbrain think "<q>" --with-calibration [--calibration-holder <id>]`

Tests: 11 cases.
  buildThinkSystemPrompt (4 cases): R1 regression — default/false/omitted
  → no anti-bias rules; with calibration → adds PRIOR + COUNTER-PRIOR +
  bias-tag reference; preserves existing hard rules.

  buildCalibrationBlock (3 cases): happy path, null brier omitted (not
  "Brier null"), empty patterns + tags still well-formed.

  buildThinkUserMessage (4 cases): R1 regression — without calibration:
  question first; D22 placement — retrieval → calibration → question →
  instruction; graph + calibration ordering; empty retrieval blocks render
  placeholders without breaking shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* contradictions: calibration-profile join (T9 / E3)

Cross-references each contradiction finding against the active calibration
profile. When a contradiction's domain matches an active bias tag (e.g.
"over-confident-geography" or "late-on-macro-tech"), the output gains a
one-line bias context explaining which pattern this fits.

Pure functions only — no DB writes, no LLM calls. The probe runner imports
tagFindingWithCalibration() and applies it to each finding before emitting.
When no profile exists or no tags match, the helper returns null and the
runner emits the unchanged finding (regression R2 — contradictions output
is byte-identical to v0.32.6 when no calibration profile is present).

Match heuristic (v0.36.0.0 ship-state):
  Bias tags are kebab-case axis-then-domain slugs ('over-confident-geography').
  computeDomainHint() extracts a domain hint from the finding's slugs +
  holder + verdict text:
    - wiki/companies/... → hiring | market-timing
    - wiki/people/... → founder-behavior
    - macro / geography / tactics / ai segments in slug → matching tag
  First-match-wins for ordering determinism.

  Match is intentionally fuzzy — the v0.32.6 contradictions probe doesn't
  yet carry structured domain metadata. v0.37+ structured-domain-on-takes
  (Hindsight-style enum) tightens this.

Output:
  Returns { bias_tag: string, context: string } | null.
  Context format: "This contradiction fits your active bias pattern
  \"<tag>\" (Brier 0.31). Verdict: contradiction; severity: medium.
  Consider reviewing both sides through the lens of that pattern."

Tests: 13 cases.
  R2 regression (2): null profile → null tag; empty active_bias_tags → null tag.
  computeDomainHint (5): companies / people / macro / geography / unknown
  paths produce expected hints.
  Match path (4): macro→late-on-macro-tech, geography→over-confident-geography,
  mismatch returns null, first-match-wins with multiple candidate tags.
  buildBiasContextString (2): emits tag+verdict+severity+Brier; omits
  Brier when null (no "Brier null" leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: Brier-trend forecast at write time (T10 / E5)

Pure math layer over existing TakesScorecard data. Zero new LLM cost, zero
new schema. Surfaces the user's historical Brier for the take's
(holder, domain) bucket at write time so they see "your historical Brier
in macro takes is 0.31" before committing the take.

Voice-gate-rendered output:
  The user-facing string goes through gateVoice mode='forecast_blurb' via
  templates.ts (already in T6). This module is the pure data layer; the
  template renders the math into the conversational voice.

v0.36.0.0 ship state:
  Bucket dimension is the DOMAIN (slug-prefix). The conviction-weight
  bucket dimension would need a new engine method
  (engine.batchGetTakeBucketStats per F11) — deferred to v0.37+. Until
  then, forecast = historical Brier in this holder's domain.

  resolveDomainPrefix() keeps slug-prefix-looking domain hints
  ('companies/', 'wiki/macro') and falls back to overall for free-form
  hints ('macro tech', 'geography'). Hindsight-style structured domain
  on takes (CDX-11 mitigation TODO) tightens this in v0.37+.

MIN_BUCKET_N = 5:
  Below this sample size, the forecast returns predicted_brier=null with
  insufficient_data=true. Template renders "Forecast unavailable: only N
  resolved takes at this conviction yet" instead of a noisy estimate.

Architecture:
  computeForecast(input) — pure function, takes scorecards already
  fetched; ideal for tests + reuse across batched paths.
  forecastForTake(engine, input) — convenience wrapper, 1-2 engine
  round-trips (no domain → 1; with domain → 2).
  batchForecast(engine, inputs[]) — memoizes per (holder, domainPrefix);
  N inputs collapse to ≤2*unique_holders unique engine calls. Used by
  the propose-queue review flow (50 candidates → 1-2 scorecard fetches).

Tests: 14 cases.
  computeForecast (4): insufficient_data branch, stable forecast,
    overall fallback, MIN_BUCKET_N export.
  resolveDomainPrefix (5): undefined/empty/whitespace → undefined;
    slug-prefix → kept; free-form → undefined.
  forecastForTake (3): 1-call overall, 2-call domain, free-form fallback.
  batchForecast (2): cache collapse for repeat queries; different holders
    do not collapse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: gstack-learnings coupling on incorrect resolutions (T11 / E4)

When the grade_takes phase auto-resolves a take as 'incorrect' or 'partial',
optionally write a learning entry to gstack's per-project learnings.jsonl
so other gstack skills (plan-ceo-review, ship, investigate, ...) can pull
it as context when relevant. The brain teaches every other tool about
the user's track record.

Config gate (D5 / CDX-17 mitigation):
  `cycle.grade_takes.write_gstack_learnings` defaults FALSE. External
  users may not have gstack installed; the gstack-learnings binary API
  isn't stable yet. Garry's brain flips it true to opt in.

Quality gate:
  Only 'incorrect' and 'partial' verdicts trigger the write. 'correct'
  resolutions are noise (we expected the take to hold up — no learning).
  'unresolvable' has no canonical column. Defense-in-depth runtime guard
  in writeIncorrectResolution() rejects ineligible qualities with
  reason='quality_not_eligible' so a caller misuse never surfaces a
  malformed learning entry.

Auto-apply only:
  Coupling fires only when grade_takes both auto-applies AND the verdict
  is incorrect/partial AND the config flag is enabled. Manual resolutions
  via `gbrain takes resolve` intentionally DO NOT propagate to gstack —
  manual writes already carry operator intent; the calibration loop is
  the noise-prone path that earns coupling.

Namespace:
  Every entry's key starts with 'gbrain:calibration:v0.36.0.0:'. Lane D
  `gbrain calibration --undo-wave v0.36.0.0` (T17) filters on this prefix
  for the optional gstack-scrub step. First active bias tag suffixes the
  key (e.g. 'take-42:over-confident-geography') so future analysis can
  group learnings by bias pattern.

Architecture:
  buildLearningEntry — pure. Truncates claim at 200 chars + ellipsis;
  emits Pattern: line when activeBiasTags present; defaults confidence
  to 0.8 when caller omits it.

  writeIncorrectResolution — async wrapper. Honors config gate; honors
  quality gate; calls the injected writer (or defaultGstackWriter in
  production). Failures are non-fatal: returns
  { written: false, reason: 'write_failed' | 'binary_missing', error }.
  The grade_takes phase logs to result.warnings and continues — gstack
  coupling failure NEVER aborts a cycle.

  defaultGstackWriter — shells out to gstack-learnings-log binary via
  execFileSync. Throws GBrainError('GSTACK_BINARY_NOT_FOUND') when the
  binary isn't on PATH; writeIncorrectResolution classifies that error
  to reason='binary_missing' so the operator sees the install hint
  instead of a generic write_failed.

  Wired into grade-takes.ts after engine.resolveTake() inside the
  auto-apply block. Only fires when shouldApply=true.

Tests: 14 cases.
  buildLearningEntry (7): canonical shape, partial vs incorrect wording,
  bias-tag suffix, no-tag fallback, claim truncation, default confidence,
  no-reasoning omission.
  writeIncorrectResolution (7): config gate, quality gate, happy path,
  writer-throw graceful degrade, binary-missing classification, async
  writer awaited, partial quality writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* doctor: 4 calibration checks — abandoned/freshness/drift/voice (T12)

Adds the four calibration doctor checks per the eng-review spec.

abandoned_threads:
  Counts active high-conviction takes (weight >= 0.7) older than 12 months
  that have never been superseded. Signal, not error — always status='ok'
  with a count. The hint sends users to `gbrain calibration` for details.

calibration_freshness:
  Warns when the active profile is older than 7 days (configurable via
  the same env-var pattern other freshness checks use). Cold-brain branch
  (no profile yet) returns ok without scolding. Hint points at
  `gbrain calibration --regenerate`.

grade_confidence_drift (CDX-11 mitigation):
  Surfaces the count of auto-applied grade verdicts. Below 30: returns
  "need 30+ for drift detection". At/above 30: returns "drift math
  arrives in v0.37+". The surface is wired; the actual
  confidence-vs-accuracy correlation math is a v0.37+ follow-up once we
  have 30+ auto-applied verdicts to measure against. Closes the CDX-11
  hole structurally — the operator sees the surface even before the math
  is meaningful.

voice_gate_health:
  Tracks voice gate failure rate over the last 7 days. <30% fail rate →
  ok (template fallback is fine in isolation). >=30% → warn with hint
  to review src/core/calibration/voice-gate.ts rubric. Anchors the
  cross-cutting voice rule observability story.

All four checks return status='warn' with a diagnostic message on
engine errors — non-blocking, never throws. Matches the existing doctor
check pattern (see checkSyncFreshness for prior art).

Wired into runDoctor after checkRerankerHealth (the v0.35 cluster), in
the canonical block 10 slot.

Tests: 15 cases. 4 per check (happy path, alt-status, engine-throw
diagnostic, plus boundary tests for the freshness staleness gate at
exactly 7 days and the grade drift gate at 30 applied verdicts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: E7 nudge + 14-day cooldown (T13 / D16 F3)

Real-time pattern surfacing when a newly-committed high-conviction take
matches an active bias pattern. Conversational nudge text via the
templates module; 14-day cooldown per (take_id, nudge_pattern) via
take_nudge_log to prevent the feedback loop where each cycle re-fires
the same nudge on the same take.

Threshold gates (D16 F3):
  - holder match (profile.holder === take.holder)
  - conviction-weight > 0.7 (strict greater than)
  - take's slug-derived domain hint matches an active bias tag
    (takeDomainHint — same heuristic as eval-contradictions/calibration-join.ts
    for cross-surface consistency)

Cooldown gate:
  Before firing, probe take_nudge_log for (take_id, nudge_pattern) rows
  with fired_at >= now() - 14 days. Any hit → silently skip. After firing,
  insert a new row with channel='stderr' so the next 14 days are gated.

Feedback-loop prevention:
  User hedges a take in response to a nudge (e.g. weight 0.85 → 0.65).
  Even though the take's `weight` field changed, the cooldown row for
  the over-confident-geography pattern is still there from the original
  fire — so the next cycle's evaluateAndFireNudge() silently skips. The
  user reset path (gbrain takes nudge --reset N) clears the cooldown to
  re-arm.

Output channel (v0.36.0.0 ship state):
  STDERR only. Schema's `channel` column already supports multi-channel
  (webhook, admin SPA toast); routing those is a v0.37+ follow-up.

Architecture:
  evaluateNudgeRule(take, profile) — pure rule check. Returns
  { matched, reason, matchedTag }. No engine call.
  checkCooldown(engine, takeId, pattern) — engine probe, returns boolean.
  recordNudgeFire(engine, opts) — INSERT into take_nudge_log.
  evaluateAndFireNudge(opts) — full pipeline. Returns NudgeDecision.
  resetNudgeCooldown(engine, takeId) — DELETE...RETURNING for the CLI.

  buildNudgeText delegates to templates.ts nudgeTemplate (D24 mode='nudge'
  voice). v0.36.0.0 ship state uses the template directly; LLM-generated
  nudge text via the voice gate lands in v0.37+ when we have production
  examples to tune from.

Tests: 22 cases.
  takeDomainHint (5): companies/people/macro/geography/unrecognized.
  evaluateNudgeRule (6): no_profile, wrong_holder, conviction-at-threshold-
  is-NOT-eligible (strict >), no matching tag, happy match,
  first-match-wins for multiple candidate tags.
  checkCooldown (3): true on row hit, false on no row, cutoff date param
  verifies the 14-day boundary.
  evaluateAndFireNudge (4): happy fire (text contains hush command +
  matched tag), cooldown silent skip (no INSERT, no stderr), no_profile
  short-circuit, below-conviction short-circuit (no cooldown query fired).
  buildNudgeText (2): hush command shape, conviction value embedded.
  resetNudgeCooldown (2): returns count, idempotent on zero rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: E8 team-brain sharing + D18 cross-brain query semantics (T14)

Cross-brain calibration profile resolution per the D18 4-rule contract.
Pins all four cross-brain leak surfaces in dedicated unit tests so future
mount features can't silently regress this security model.

D18 semantics (committed):

  Rule 1 — LOCAL-FIRST ORDERING.
    Query the local brain first. If a profile exists, return it. Do NOT
    also query mounts (avoids stale-mount-overrides-fresh-local).
    Verified: mountResolver is NOT called when local has a hit.

  Rule 2 — MOUNT FALLBACK.
    Only when local has no profile AND canReadMounts=true, walk the
    mounts in priority order. First match wins. Each mount-side row
    must have published=true to be visible (D15 asymmetric opt-in).

  Rule 3 — CROSS-BRAIN ATTRIBUTION.
    Every returned profile carries source_brain_id + from_mount flag.
    Consumers (E1 think rewrite, E3 contradictions, E7 nudge, E6
    dashboard) MUST surface this via attributionSuffix() so the user
    sees which brain answered.

  Rule 4 — SUBAGENT PROHIBITION.
    canReadMountsForCtx() classifier returns FALSE for subagent loops
    without trusted-workspace allowedSlugPrefixes. Closes the
    OAuth-token-to-cross-brain-leak surface — subagents see ONLY their
    local-brain results regardless of which holder they query.

    Exception: trusted cycle phases (synthesize/patterns) pass
    allowedSlugPrefixes set and ARE allowed to read mounts. Pinned in
    the classifier test.

Architecture:
  queryAcrossBrains(localEngine, opts) — pure orchestrator. Composes
  getLatestProfile() from src/commands/calibration.ts. Mount engine
  access is via opts.mountResolver — production wires this to the
  v0.19+ gbrain mounts subsystem; tests inject a stub returning an
  ordered list of mocked engines. Decouples cross-brain LOGIC from
  multi-engine PLUMBING.

  canReadMountsForCtx(ctx) — pure classifier table. Drives the rule-4
  gate. Production callers compose it from OperationContext.

  attributionSuffix(result) — pure formatter. Emits the "(from mounted
  brain: <id>)" suffix when from_mount=true; empty string when local.
  Mandatory for user-visible cross-brain consumers.

Tests: 15 cases pinned to the 4 D18 rules + 4 supplementary structural
checks.
  D18-1: published=false profile on mount stays hidden.
  D18-2/3: subagent context cannot fall back to mounts (2 cases — null
    on local-empty + canReadMounts=false, local hit still returned).
  D18-4: attribution surfaces source_brain_id (3 cases — mount answer
    flag, local answer flag, attributionSuffix formatter).
  Rule 1 local-first ordering (2 cases — mountResolver NOT called on
    local hit, IS called on local empty).
  Mount priority order (3 cases — first published=true wins, all
    published=false returns null, no mounts configured returns null
    without throwing).
  canReadMountsForCtx classifier (4 cases — local CLI true, MCP
    non-subagent true, subagent without trusted-workspace false,
    subagent WITH trusted-workspace true).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* admin: E6 Calibration tab + D23 server-rendered SVG + TD2 contrast bump (T15)

Adds the v0.36.0.0 admin SPA Calibration tab. Per the design review,
the approved variant-B (Linear calm clarity) layout: single-column flow,
generous whitespace, ONE big sparkline as hero, then patterns, then
domain bars, then abandoned threads.

D23 server-rendered SVG architecture:

  src/core/calibration/svg-renderer.ts — pure functions. data → SVG
  string. No DOM, no React, no chart library dep. Inlines the admin
  design tokens (#0a0a0f bg, #3b82f6 accent, etc.) so the SVG is
  visually consistent with the rest of the admin SPA.

  Four chart renderers:
    - renderBrierTrend({ series }) — sparkline w/ baseline reference
      at 0.25 (always-50% baseline)
    - renderDomainBars({ bars }) — horizontal accuracy bars per domain
    - renderAbandonedThreadsCard(threads) — D30/TD4 'revisit now' link
      per row, points at /admin/calibration/revisit/<takeId>
    - renderPatternStatementsCard(statements) — D29/TD3 clickable
      drill-down links per row, point at /admin/calibration/pattern/<i>

  XSS posture: all caller-controlled strings pass through escapeXml().
  Numeric inputs are .toFixed()-coerced. Admin SPA renders via
  dangerouslySetInnerHTML inside a TrustedSVG wrapper component;
  endpoint is gated by requireAdmin middleware.

  /admin/api/calibration/profile — returns the active profile row as JSON.
  /admin/api/calibration/charts/:type — returns image/svg+xml markup
    for type ∈ {brier-trend, domain-bars, pattern-statements,
                abandoned-threads}. Cache-Control: private, max-age=60.

  brier-trend currently renders a single-point series from the active
  profile (the time-series view across calibration_profiles.generated_at
  history is a v0.37 follow-up once we have multiple snapshots).
  abandoned-threads pulls the top 5 abandoned rows via the same SQL the
  doctor check uses.

CalibrationPage React component (admin/src/pages/Calibration.tsx):
  Fetches profile + 4 charts. Loading / error / cold-brain states all
  handled. Layout includes the audit annotations (partial-grade badge,
  voice-gate-fell-back-to-template badge) per the approved mockup.
  TrustedSVG wrapper isolates the dangerouslySetInnerHTML to the SVG
  surface only.

App.tsx nav: added 'calibration' page route + sidebar nav item, hash
routing extended to support #calibration.

TD2 contrast bump:
  admin/src/index.css --text-muted: #555 → #777. Old value was contrast
  4.0 on the #0a0a0f bg — below WCAG AA 4.5 for body text. New value is
  ~5.5, passes AA. Improvement is global across Dashboard, Agents,
  RequestLog, and the new Calibration tab — single-line CSS change with
  ~10x the impact.

admin/dist/ rebuilt via `bun run build` (vite). 36 modules transformed.

Tests: 19 cases in test/svg-renderer.test.ts.
  escapeXml (1): canonical entities.
  renderBrierTrend (6): empty state, polyline for 2+ points, clamp
  beyond yMax, design tokens inlined, XSS safety on date strings,
  text-anchor end on right label.
  renderDomainBars (4): empty state, label/accuracy/n rendering,
  out-of-range accuracy clamp, XSS safety on labels.
  renderAbandonedThreadsCard (4): empty state, row rendering with
  revisit link, claim truncation at 70 chars, custom revisitHref override.
  renderPatternStatementsCard (4): empty state, anchor count matches
  statement count, XSS safety, custom drillHref override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* recall: calibration footer formatter for morning pulse (T16)

Pure formatter that turns a CalibrationProfileRow + optional abandoned-
threads list into the conversational block the morning pulse will surface:

  Calibration this quarter:
    Brier 0.18 (solid).
    Right on early-stage tactics, late on macro by 18 months.
    Over-confident on team execution; under-calibrated on regulatory risk.

  Threads you opened and never came back to:
    · AI search platform differentiation         (17 months silent)
    · International expansion playbook           (12 months silent)

Cold-brain branch: returns empty string when no profile or < 5 resolved
takes. Caller decides whether to render the block; cold-brain absence
is the cleanest non-event.

Brier trend note maps the absolute value to conversational copy:
  <= 0.10 → "(strong calibration)"
  <= 0.20 → "(solid)"
  <= 0.25 → "(near baseline)"
  > 0.25  → "(worse than always-50% baseline — review your high-conviction calls)"

  v0.36.0.0 ship state has only the current profile snapshot. The
  "was 0.22 90d ago — improving" comparison shape arrives when we
  accumulate generated_at history across multiple cycles.

R3 regression posture:
  This module is the FORMATTER only. Wiring into `gbrain recall`'s text
  output is intentionally NOT in this commit — runRecall's surface
  stays unchanged. v0.37 wires it under --show-calibration (opt-in
  initially, default-on later). For now the formatter is callable from
  the admin tab + custom CLI scripts that want it.

Architecture:
  buildRecallCalibrationFooter(opts) — pure. opts.profile required,
  opts.abandonedThreads optional, opts.threadColumnWidth defaults to 50.

  Caps at 4 patterns + 5 abandoned threads to keep the footer scannable.
  Truncates long abandoned-thread claim text to fit the column width with
  a trailing ellipsis.

Tests: 14 cases.
  Cold-brain branch (3): null profile, < 5 resolved, zero resolved.
  Happy path (7): header + Brier + patterns, trend note ranges (4
  brackets), null brier omits the Brier line but keeps header, caps at
  4 patterns.
  Abandoned threads (4): omit section when none, emit when present,
  cap at 5, truncate long claim with column-width override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: --undo-wave reversal command (T17 / D18 CDX-3)

Implements the undo-wave reversal flow. Every new row written by the
v0.36.0.0 calibration wave carries wave_version='v0.36.0.0' so a precise
revert is possible without touching pre-wave data.

CLI surface (replaces the v0.36.0.0 ship-state placeholder):
  gbrain calibration --undo-wave v0.36.0.0 [--dry-run] [--scrub-gstack] [--json]

Reversal scope (4 steps):

  Step 1 — UNSET takes.resolved_* columns for takes auto-applied by this
  wave. Identifies wave-applied takes via take_grade_cache.applied=true
  + wave_version match. Cross-checks resolved_by='gbrain:grade_takes' to
  ensure we're not un-resolving a take a manual `gbrain takes resolve`
  override has since claimed. Manual resolutions persist; only auto-grade
  resolutions revert.

  Step 1b — Mark take_grade_cache rows applied=false post-undo so the
  audit trail shows they WERE applied but this wave was reverted. The
  CDX-11 confidence-drift check filters on applied=true and gets a
  cleaner sample post-undo.

  Step 2 — DELETE FROM calibration_profiles WHERE wave_version = ?.

  Step 3 — DELETE FROM take_nudge_log WHERE wave_version = ?.

  Step 4 — Optional gstack-learnings-prune via the binary, scoped to the
  GSTACK_LEARNING_NAMESPACE prefix. Opt-in via --scrub-gstack. Best-effort:
  binary-missing or failure logs a warning + suggests the manual command;
  the rest of the undo still succeeded.

Dry-run posture:
  --dry-run computes the counts via SELECT COUNT(*) shapes without
  emitting any UPDATE or DELETE. Same UndoWaveResult shape returned so
  operator sees exactly what would be reverted before committing.

  --dry-run intentionally skips the gstack scrub (filesystem write) too;
  ship-state safety call.

Idempotency:
  Re-running --undo-wave on a brain that's already reverted is a no-op.
  Each query filters on wave_version; no matching rows → zero counts.

Architecture:
  undoWave(engine, opts) — async, returns UndoWaveResult. Pure data
  layer; no stderr writes, no process exits. CLI dispatch in
  src/commands/calibration.ts handles printing.

  v0.36.0.0 ship state runs steps 1-3 sequentially (no transaction).
  Partial reversal is recoverable via re-run since each step is
  idempotent on wave_version match. A future enhancement (v0.37+) can
  wrap in engine.transaction once that surface lands in BrainEngine.

Tests: 8 cases in test/undo-wave.test.ts.
  Dry-run posture (1): counts emitted, NO UPDATE/DELETE SQL fired.
  Happy path (3): all 4 steps execute, resolved_by filter scopes UPDATE
  to wave-applied resolutions, custom resolvedByLabel honored.
  Empty wave (2): zero counts when no matching rows, idempotent re-run.
  Wave-version parameter threading (2): supplied version threads
  through all queries, different wave versions don't collide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: A/B harness for think + ab-report (T18 / D19 CDX-18)

Structural answer to CDX-18 (anti-bias rewrite may make advice worse).
We don't have to guess whether calibration helps — we measure.

Architecture:
  runAbTrial(input) — calls thinkRunner TWICE on the same question
  (baseline + --with-calibration), surfaces both answers to a
  preferenceResolver, persists the trial to think_ab_results.

  buildAbReport(engine, { days }) — aggregates the table over the last
  N days (default 30). Computes win counts, ties, neither, and a
  with_calibration_win_rate over DECISIVE trials only (excludes
  neither/tie). Flags calibration_net_negative when n >= 20 AND win
  rate < 45%.

  formatAbReport(report, days) — pretty-prints for stdout; emits the
  calibration_net_negative warning block when triggered.

CLI:
  gbrain calibration ab-report [--days N] [--json]
    Reads the table, prints the breakdown. Replaces the v0.36.0.0
    ship-state placeholder in src/commands/calibration.ts.

  gbrain think --ab "<question>"
    Wires into runAbTrial via the dispatch in src/commands/think.ts —
    follow-up commit. This commit lands the harness layer + schema +
    report surface; the --ab flag itself flips on in a one-line wiring
    commit when the runRecall path is ready.

Schema (migration v72 / think_ab_results):
  source_id, wave_version, ran_at, question, baseline_answer,
  with_calibration_answer, preferred (CHECK in {baseline,
  with_calibration, neither, tie}), model_id, notes.

  CHECK constraint enforces preferred enum. Default wave_version
  'v0.36.0.0' stamped so --undo-wave can scrub these too.

  Index on (source_id, ran_at DESC) supports the report's
  "last N days" query.

  schema.sql + pglite-schema.ts both updated for fresh-install parity.
  schema-embedded.ts regenerated via build:schema.

calibration_net_negative threshold (D19):
  Triggers when:
    - decisive_trials (baseline + with_calibration) >= 20
    - with_calibration_win_rate < 0.45 (NOT <= — exact 45% is OK)

  Small-sample guard (n < 20) prevents the warning from firing on
  early data with sampling noise. Confidence-flat threshold (no Wilson
  CI yet) keeps the math simple; v0.37+ adds CI bounds.

Tests: 12 cases in test/think-ab.test.ts.
  runAbTrial (4): both runner calls fire, preferenceResolver receives
    both answers, INSERT row params shape, throws when thinkRunner
    missing.
  buildAbReport (5): zero trials, aggregation, net_negative trigger at
    n>=20 + win<45%, no trigger at n<20 (small-sample guard), no
    trigger at exact 45% boundary.
  formatAbReport (3): zero-state message, decisive-trials breakdown,
    net_negative warning block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: pattern drill-down route + revisit-now CLI (TD3 / D29 + TD4 / D30)

TD3 (D29) — clickable pattern drill-down endpoint:
  GET /admin/api/calibration/pattern/:id (requireAdmin)
  Returns the pattern statement at index `id` plus the top 25 resolved
  takes for the holder, sorted by weight desc. v0.36.0.0 ship-state
  approximation: surfaces broad provenance evidence (top resolved
  takes). v0.37+ stores per-pattern source_take_ids[] on a
  calibration_profile_patterns join table so the drill-down shows the
  EXACT takes that drove the pattern.

  Surfaces a `provenance_note` field in the response so the operator
  sees the v0.36.0.0-vs-v0.37 fidelity boundary inline.

  The admin SPA's renderPatternStatementsCard SVG already emits anchor
  tags pointing at /admin/calibration/pattern/<i> (T15 ship state).
  This route makes those anchors clickable — closes the trust loop that
  was the rationale for D29 ("pattern statements without their evidence
  are dressed-up LLM hallucinations").

TD4 (D30) — `gbrain takes revisit <slug>` editor-open action:
  Adds the `revisit` subcommand to gbrain takes. Opens $EDITOR (falling
  back to vi) on the source markdown file for the slug. Appends a
  `<!-- gbrain:revisit -->` cursor marker at the bottom of the page on
  first invocation so the editor opens with intent visible.

  Reads sync.repo_path from config to locate the brain repo. Refuses to
  proceed with a clear error when the repo isn't configured or the page
  doesn't exist.

  spawnSync with stdio:'inherit' so the editor takes the terminal. Exit
  status surfaced on failure.

  The SVG renderer's revisit-now anchor for each abandoned thread row
  emits /admin/calibration/revisit/<takeId>. A small route handler that
  resolves take_id → page_slug then dispatches `gbrain takes revisit`
  via spawn is a v0.37 follow-up — the CLI command exists now so
  developers can wire it directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: DESIGN.md — formalize de facto design tokens (TD1)

Promotes the admin SPA's de facto design tokens (landed v0.26.0) to a
canonical DESIGN.md at the repo root. This is the calibration target
for /plan-design-review and /design-review going forward — when a
question is "does this UI fit the system?", the answer is here.

Captures the system as it stands today:

  Voice (5 surfaces, all routed through gateVoice() with mode-specific
  rubrics): pattern_statement, nudge, forecast_blurb, dashboard_caption,
  morning_pulse. Friend-not-doctor; concrete data over abstract metrics;
  no preachy / clinical / corporate language.

  Color tokens: 10 CSS variables from admin/src/index.css inlined into
  the SVG renderer (src/core/calibration/svg-renderer.ts). Dark theme
  is the only theme — admin is an operator tool. WCAG contrast
  documented per token; TD2's #555 → #777 bump on --text-muted noted.

  Typography: Inter for UI, JetBrains Mono for numbers/slugs/data.
  Type scale (18 / 14 / 13 / 12 / 11) documented as de facto, not yet
  formalized.

  Spacing scale: 4 / 8 / 16 / 24 / 32px. Linear-app density.

  Layout: sidebar 200px, max content 720px (text) / 960px (tables).
  No 3-column feature grids, no icons in colored circles, no
  decorative blobs.

  Charts: server-rendered SVG via pure functions in
  src/core/calibration/svg-renderer.ts. XSS posture documented:
  server-side escapeXml on caller-controlled strings, numeric inputs
  .toFixed()-coerced, admin SPA renders via <TrustedSVG> wrapper.

  Interaction patterns: keyboard nav required (J/K/space/u/q on the
  propose-queue), loading/empty/error states ARE features.

  v0.37+ roadmap: type scale formalization, animation tokens, component
  library extraction. Light mode explicitly NOT planned.

The doc is a living target, not a frozen spec. Major changes route
through /plan-design-review per the existing review chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: synthetic corpus scaffold + privacy CI guard (T19 + T20)

T19 — synthetic corpus scaffold for extract-takes prompt tuning.
  test/fixtures/calibration/extract-takes-corpus/ — 5 representative
  pages across 4 genres (essay, people, companies, meetings, decisions).
  v0.36.0.0 ships a SMALL representative corpus as proof of structure;
  the full 50-page training set + 10-page holdout gets generated by the
  operator via `gbrain calibration build-corpus` (v0.37 follow-up
  subcommand) or by hand with the privacy guard catching violations
  either way.

  Privacy contract per D13': every page is SYNTHETIC. None of the
  names/companies/funds/deals/events refer to anything real. Placeholder
  names per CLAUDE.md: alice-example, charlie-example, acme-example,
  widget-co, fund-a/b/c, acme-seed, widget-series-a, meetings/2026-04-03.

  test/fixtures/calibration/README.md spells out the privacy contract,
  generation flow, and what the corpus is (stable regression set for
  the extract-takes prompt) vs is not (real anything).

T20 — privacy CI guard (CDX-14 mitigation).
  scripts/check-synthetic-corpus-privacy.sh greps the corpus for:
    1. Explicit dollar amounts ($50M, $1.2B etc) — would suggest the
       page memorized a real round size.
    2. Out-of-range year references (informational only for v0.36.0.0;
       deferred to a manual review checklist).
    3. Pages that reference ZERO placeholder names — suggests the page
       might be referring to real entities. Essay-genre fixtures
       exempt (they're anonymized PG-style writing by design).

  Wired into `bun run verify` (CI gate) so contributors can't accidentally
  land a synthetic fixture that leaks real-world specificity. The intent
  is fail-fast on accidental leakage; the operator can update the
  allowlist if a generic dollar amount is intentional.

  Closes CDX-14: 'CC reads real brain pages locally, writes nothing
  still risks privacy if any generated synthetic fixture memorizes
  structure-specific facts. Placeholder names are not enough.'

The corpus shipped here is intentionally small but covers the four
core gbrain page genres (essay, people, companies, meetings/decisions).
The v0.37 corpus-build subcommand will fan out to 50 with the operator
spot-checking + the CI guard enforcing the privacy contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: R1-R5 IRON RULE regression inventory (T21)

Per /plan-eng-review D26 IRON RULE: regressions get added to the test
suite as critical requirements, no AskUserQuestion needed. Pins five
regressions identified during the v0.36.0.0 wave's coverage diagram:

  R1: think baseline UNCHANGED when --with-calibration absent.
      Covered structurally by test/think-with-calibration.test.ts plus
      assertion-pinned in this file (default user message: question
      first, then retrieval; system prompt: no anti-bias section).

  R2: contradictions probe output UNCHANGED when no calibration profile.
      Covered structurally by test/eval-contradictions-calibration-join.test.ts
      plus pinned here (null profile → null tag, byte-identical to v0.32.6).

  R3: takes resolution flow works when grade_takes phase disabled.
      Pinned import-surface coupling: takes-resolution.ts has zero
      dependency on grade_takes module. If a future refactor accidentally
      couples them, this test fails to compile.

  R4: search/list_pages/get_page work identically through new source_id paths.
      Marker test referencing existing v0.34.1 source-isolation suite at
      test/source-isolation-pglite.test.ts. v0.36.0.0 does NOT modify
      those code paths; the existing tests catch any accidental coupling.

  R5: existing search modes (conservative/balanced/tokenmax) unaffected.
      Marker test referencing existing test/search-mode.test.ts. The
      calibration code DOES NOT IMPORT from src/core/search/mode.ts.

Plus an inventory test that confirms all 5 regressions have an
'addressed' status — fail-loud if a future contributor removes a
guard without updating the inventory.

7 tests total. Pure functions, no engine, hermetic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: v0.36.0.0 CHANGELOG + CLAUDE.md anchors + calibration convention skill

CHANGELOG entry: the user-facing release notes. Leads with the headline
("the brain learns how you tend to be wrong, then argues against your
blind spots on every advice call"), 5 'what you can now do' bullets in
GStack voice, itemized changes by lane, and the 'To take advantage of
v0.36.0.0' upgrade checklist per the CLAUDE.md required-block contract.

CLAUDE.md anchors: new 'v0.36.0.0 Hindsight calibration wave (key files
cluster)' block inserted before the v0.31.1 thin-client section. 23 new
files / extensions annotated with one-paragraph descriptions each,
linking back to the convention skill at skills/conventions/calibration.md
for the agent-facing rules.

skills/conventions/calibration.md: the agent-facing convention skill.
Tells future contributors which calibration touchpoint applies to
their task — voice gate? BaseCyclePhase? source-scope thread? doctor
warning? cross-brain query rules? auto-resolve threshold posture? Test
seam patterns. Bug class to avoid (the v0.34.1 source-isolation leak
shape).

Version trio (per CLAUDE.md mandatory audit):
  VERSION:     0.36.0.0
  package.json: 0.36.0.0
  CHANGELOG:   ## [0.36.0.0] - 2026-05-17

llms.txt + llms-full.txt regenerated via `bun run build:llms` after
the CLAUDE.md edit (per the explicit CLAUDE.md mandate "Any CLAUDE.md
edit MUST be followed by `bun run build:llms`"). The `test/build-llms.test.ts`
guard runs in CI shard 1; the committed bundles are checked against
fresh generator output.

bun run verify is clean. typecheck clean. Privacy CI guard passes
(0 violations across 6 corpus pages). All ready for /ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cycle: wire propose_takes / grade_takes / calibration_profile into runCycle (T-fix)

The three new v0.36.0.0 phases were declared in CyclePhase / ALL_PHASES /
NEEDS_LOCK_PHASES but the runCycle orchestrator never dispatched them.
ALL_PHASES advertised them, gbrain dream --phase propose_takes accepted
them, but `gbrain dream` (default) silently skipped all three.

Adds a single dispatch block between consolidate and embed that:
  - builds an OperationContext on the fly (trusted-workspace caller,
    remote: false, sourceId resolved via the same helper sync uses)
  - dispatches the three phases in the order ALL_PHASES declares
  - records the same skipped-phase shape (no_database) when engine is null

Pinned by test/core/cycle.serial.test.ts "default: all 6 phases run in
order" which was already failing against ALL_PHASES (the test name lags
the actual phase count; left as-is since renaming churns history).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: expand synthetic corpus + add hand-labeled ground-truth (T19)

Adds 8 new synthetic pages modeled on the genre mix observed in the
real brain (concepts-with-timeline, meeting-notes, daily-journal,
people-pages, essays). Companion .gradeable-claims.json files carry
hand-labeled answer keys — what a tuned propose_takes prompt SHOULD
extract per page. Closes the F1 gate gap from the plan's T19/D19:

  Training corpus (test/fixtures/calibration/extract-takes-corpus/):
    + concept-startup-market-dynamics.md     (10 claims)
    + meeting-2026-04-10-fundraise-fund-a.md (6 claims)
    + daily-2026-04-15.md                    (5 claims)

  Blind holdout (test/fixtures/calibration/holdout/):
    + concept-founder-execution.md           (6 claims, F1 >= 0.80)
    + daily-2026-04-18.md                    (4 claims, F1 >= 0.80)
    + meeting-2026-04-17-hiring-charlie.md   (5 claims, F1 >= 0.80)
    + essay-on-conviction.md                 (7 claims, F1 >= 0.80)
    + people-bob-example.md                  (5 claims, F1 >= 0.80)

Privacy:
  - No real-brain content read into any committed artifact. Pages
    written from scratch using the canonical placeholder set
    (alice-example, charlie-example, bob-example, acme-example,
    widget-co, fund-a/b/c). Real-name grep confirms zero leakage:
    wintermute, garrytan, paul-graham, sam-altman, etc. → 0 hits.
  - scripts/check-synthetic-corpus-privacy.sh passes: 0 violations
    across 14 pages (was 6).

Genre fidelity:
  - concept-with-timeline pages mirror the dated-assertion structure
    real brain uses (verb framing varies: "argues / predicts / I
    think / I bet / strong conviction / moderate conviction").
  - meeting-notes pages carry both prose claims (extracted via
    hedging language) and explicit ## Takes sections.
  - daily-journal pages test probabilistic framing ("75/25 in favor",
    "call it ~0.5") and self-tagged conviction values.
  - essay-on-conviction is the meta-page that names the author's
    own bias patterns — primary signal for calibration_profile.
  - people pages test claim-about-third-party extraction.

Each JSON ground-truth lists per-claim:
  - claim_text + kind (prediction|judgment|bet) + domain
  - conviction (0..1)
  - since_date
  - rationale (why this claim is gradeable + how a tuned prompt
    should infer conviction from the prose)

This is the corpus that gates the T19 prompt-tune iteration:
  - F1 >= 0.85 on training (10+6+5 = 21 claims across 3 pages
    plus the existing 5 fixtures already shipped)
  - F1 >= 0.80 on holdout (27 claims across 5 pages)

Plan reference: ~/.claude/plans/system-instruction-you-are-working-rippling-knuth.md
Privacy gate: scripts/check-synthetic-corpus-privacy.sh (wired into bun run verify).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* calibration: tune propose…
)

* fix(sync): accept .tf / .tfvars / .hcl in CODE_EXTENSIONS

Terraform repos were invisible to `gbrain sync --strategy code` because
the three HCL-family extensions never reached the file walker. Silent
data loss — the user thinks the sync covered the repo but the IaC layer
was dropped on the floor.

detectCodeLanguage() returns null for these extensions, so the chunker
falls back to recursive (no tree-sitter grammar for HCL) — the same
path toml/yaml take.

Closes garrytan#878.

Co-Authored-By: johnybradshaw <johnybradshaw@users.noreply.github.com>

* fix(upgrade): run `bun update gbrain` from Bun's global install root

`gbrain upgrade --strategy bun` was failing on canonical
`bun install -g github:garrytan/gbrain` installs because `execSync('bun
update gbrain')` ran in the user's shell cwd. Bun's update operates on
whatever package.json it finds via cwd-walk, so a user not standing in
the global root got "No package.json, so nothing to update".

resolveBunGlobalRoot() returns the right directory:
1. `$BUN_INSTALL/install/global` when set (operator override).
2. `~/.bun/install/global` (Bun's documented default).
3. Walk up from realpath(argv[1]) looking for `node_modules/gbrain` —
   handles non-standard installs without trusting argv naming.

execFileSync replaces execSync (no shell), with cwd pinned. Error path
prints the exact `cd && bun update` recovery command instead of a vague
hint.

Closes garrytan#1029. Cherry-picked from PR garrytan#1032.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>

* fix(config): redact sensitive values in `config set` output (closes garrytan#892)

`gbrain config set openai_api_key sk-...` was echoing the full key to
stderr via `console.log('Set %s = %s', key, value)`. Shell scrollback and
tmux scroll buffers commonly retain stderr for hours; a screen-share or
shoulder-glance during set leaked the secret.

The `show` path already redacted but used a naive `.includes('key')`
substring check that would mask 'monkey' or 'parsekey' (no false-negative
but ugly).

Single source of truth: `isSensitiveConfigKey()` uses a word-boundary
regex (`(^|[._-])(key|secret|token|password|pwd|passwd|auth)([._-]|$)/i`)
so 'openai_api_key' matches but 'monkey' doesn't. `redactConfigValue()`
composes the postgresql:// URL redactor + sensitive-key check, used by
both `show` and `set`. Helpers exported for unit tests.

Closes garrytan#892. Cherry-pick of @sharziki's PR garrytan#918 (config.ts hunk only —
the extract.ts walker change in that PR is unrelated and tracked in garrytan#202).

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>

* fix(oauth): throw InvalidTokenError so bearerAuth returns 401, not 500

`verifyAccessToken` was throwing bare `Error` on expired or invalid
tokens. The MCP SDK's `requireBearerAuth` middleware catches
`InvalidTokenError` and returns 401 with WWW-Authenticate; bare Error
falls through to 500. Result: legitimate clients with stale tokens hit
500-not-401, so token-refresh logic (which keys off 401) never fires.

Two call sites in verifyAccessToken: token-expired path and
invalid-token path. Both now throw InvalidTokenError. Existing tests
continue to pass because they assert on the throw, not the message class.

Closes garrytan#935. Cherry-picked from PR garrytan#1012.

Co-Authored-By: Aashiqe10 <Aashiqe10@users.noreply.github.com>

* fix(serve): return 405 on GET /mcp instead of 404

MCP Streamable HTTP spec says GET /mcp opens an optional SSE backchannel
for server-initiated messages. gbrain's transport is stateless and
doesn't push server-initiated messages, so per spec we MUST return 405
with Allow: POST, DELETE — not 404. Probing clients (claude.ai, etc.)
distinguish "endpoint exists, no SSE channel" from "endpoint missing"
on this status code; 404 makes them give up.

Cherry-picked from PR garrytan#1076.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>

* fix(doctor): resolve whoknows fixture from module location, not cwd

`gbrain doctor` warned about a missing whoknows fixture for every install
that wasn't standing in the gbrain source repo at run time — which is
everyone. The check used `process.cwd()` to locate the fixture, so any
real user (running doctor against `~/.gbrain`) saw a spurious warning.

`resolveWhoknowsFixturePath()` walks up from `import.meta.url` looking
for the source-repo signature (`src/cli.ts` + `skills/RESOLVER.md`),
respects `GBRAIN_WHOKNOWS_FIXTURE_PATH` env override (absolute or
cwd-relative), and returns null with an actionable warning when the
fixture can't be located.

Closes garrytan#969. Cherry-picked from PR garrytan#1034.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>

* fix(frontmatter): centralize --fix backups under ~/.gbrain/backups/

`gbrain frontmatter validate --fix` and `gbrain frontmatter generate
--fix` wrote `<file>.bak` siblings into the source tree. Users running
gbrain over a brain repo found .bak files scattered through people/,
companies/, etc. that broke gitignore expectations and showed up in
`git status` after every fix pass.

Backups now land under `~/.gbrain/backups/frontmatter/<run-id>/<rel>.bak`
with an iso-week-sorted run-id so a multi-fix session keeps the same
parent directory. Backup directory + per-file structure mirrored from
the original file's relative path. The .bak safety contract is intact
for both git and non-git brain repos.

Also adds `--include-catch-all` opt-in to `frontmatter generate` so the
default catch-all rule (`type: note`) is no longer applied to arbitrary
workspace documents that happen to live under a brain root.

Closes garrytan#902. Cherry-picked from PR garrytan#903.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>

* fix(config): use path.isAbsolute() for GBRAIN_HOME on Windows

The GBRAIN_HOME validator rejected every valid Windows path (`C:\\Users\\...`,
`D:\\gbrain`, etc.) because it used `trimmed.startsWith('/')` to check for
absoluteness — only POSIX absolute paths pass that. `path.isAbsolute()` is
the cross-platform check.

Same fix for the `..` traversal check: split on both `/` and `\` so
Windows path separators don't sneak `..` through.

Closes garrytan#1019. Cherry-picked from PR garrytan#1083.

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>

* fix(ai): warn only for the configured embedding provider, not all recipes

Gateway construction was warning on stderr for every recipe with an
embedding touchpoint missing max_batch_tokens — including providers the
brain isn't using. Users on Voyage saw noise about OpenAI / Google /
DashScope / etc. recipes that never get loaded.

Filter the warning to recipes whose provider id is referenced by
`embedding_model` or `embedding_multimodal_model` in the active config.
The structural protection against forgetting max_batch_tokens stays in
place for the recipes that actually run; the noise for unrelated recipes
goes away.

Cherry-picked from PR garrytan#1117.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(sync): skip git pull when repo has no origin remote

`gbrain sync` ran `git pull` unconditionally and printed scary stderr
on every cycle for brains that have no `origin` remote (local-only
workflows, single-machine setups, brains initialized via `gbrain init
--pglite` against an arbitrary directory). The pull failed harmlessly
but the noise was confusing and made operators think sync was broken.

`hasOriginRemote()` probes `git remote get-url origin` with stdio
ignored; on failure (`no such remote`), skip the pull, print a single
informational line, and proceed with the local working tree.

Cherry-picked from PR garrytan#1119.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(query): drain cache writes before CLI exit

The query cache write was fired with `void promise.catch(...)` — true
fire-and-forget. On a fast CLI invocation (`gbrain query <q>` exits in
~50ms), the process terminates before the cache write commits. Result:
the cache effectively never warms from CLI use; every query is a miss.

`awaitPendingSearchCacheWrites()` tracks each in-flight cache write in a
module-level Set. The CLI dispatcher awaits the set after `query`
finishes formatting output but before the process exits. MCP server path
unchanged (long-lived process, fire-and-forget remains correct).

Cherry-picked from PR garrytan#1125.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(backlinks): dedupe (source, target) pairs within a single source page

A source page that mentions the same entity N times produced N
duplicate "Referenced in" lines on the target. `extractEntityRefs`
returns one EntityRef per occurrence, and the per-ref `hasBacklink`
check reads a snapshot of `target.content` that's frozen at outer
scope — so every iteration sees "no backlink yet" and appends another
gap. The cumulative effect on a long meeting note with multiple
mentions of the same person was visible in PRs landing 3-5 identical
Timeline entries.

Track seen target slugs per source page; cap gaps at one pair.

Cherry-picked from PR garrytan#967 with a current-master regression test
covering both markdown-link and Obsidian-wikilink formats in the same
source page.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>

* fix(dream): audit backlinks without mutating pages during cycle

The dream/autopilot maintenance cycle ran the backlinks phase in 'fix'
mode, which writes "Referenced in" timeline bullets into entity pages
every sync. The graph extractor + auto-link path is the canonical link
store during sync/dream/autopilot — the legacy filesystem fixer wrote
markdown that fought with both the user's manual edits and the graph
layer's own timeline.

Cycle now runs backlinks in 'check' mode (audit-only); the materializer
remains available via `gbrain check-backlinks fix` for users who really
want markdown backlinks committed to disk.

Cherry-picked from PR garrytan#1027.

Co-Authored-By: sliday <sliday@users.noreply.github.com>

* fix(autopilot --install): source ~/.zshenv before zshrc/bashrc

zshenv is the canonical place for env vars in zsh on macOS — zshrc is
sourced only for interactive shells, so vars exported in zshrc don't
reach a non-interactive subprocess like the autopilot wrapper. Users
who exported GBRAIN_DATABASE_URL, OPENAI_API_KEY, or ANTHROPIC_API_KEY
in zshrc and assumed autopilot would inherit them hit silent missing-
secret failures on the LaunchAgent.

Source ~/.zshenv first (always reaches non-interactive shells per zsh
docs), then fall back to ~/.zshrc / ~/.bashrc for users on other
profile conventions.

Cherry-picked from PR garrytan#966.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>

* fix(apply-migrations): return exit 0 on list/dry-run/up-to-date

`gbrain apply-migrations list`, `gbrain apply-migrations --dry-run`, and
the "All migrations up to date" path were returning from the async
function but never calling `process.exit(0)`. The CLI dispatcher in
cli.ts treated the implicit fall-through as exit 1 when the parent
process inspected status via shell scripts, breaking automation that
gates on `apply-migrations list && do-something`.

Three call sites: list, dry-run, and the no-op path. All three now
exit(0) explicitly.

Cherry-picked from PR garrytan#1062.

Co-Authored-By: nezovskii <nezovskii@users.noreply.github.com>

* fix(sync): scope auto-embed to source on incremental syncs

`gbrain sync --source-id X` triggered auto-embed for the affected slugs
but `runEmbed` ran with no `--source` flag, so it fell back to the
default source. For non-default-source syncs the page row lives at
(sourceId, slug) — the embed code saw "Page not found" for the right
slug under the wrong source, swallowed the error as best-effort, and
the sync result reported `embedded: 0` for the wrong reason.

`buildAutoEmbedArgs(slugs, sourceId)` is the new helper: when sourceId
is set, prepends `--source X`. Exported for the regression test.

Pairs with the upcoming source-id write-path audit (P1 #8). Cherry-picked
from PR garrytan#1120.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(query): honor source_id with no-expand for cross-source search

Two related corrections:

1. `gbrain query --no-expand` parsed `--no-expand` as the literal key
   `no_expand` instead of negating the boolean `expand` param. Result:
   the flag was silently ignored and expansion always ran. Now any
   `--no-<key>` where `<key>` is a boolean param flips it false.

2. The `query` op's source-id resolution treated `ctx.sourceId` as
   authoritative, so an explicit per-call `source_id` was overridden by
   the federated read scope. Now per-call `source_id` wins;
   `source_id=__all__` is an explicit opt-out for local cross-source
   search.

Cherry-picked from PR garrytan#1124.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>

* fix(doctor): child-table orphan detection (closes garrytan#1063)

The autopilot orphans phase detects orphan PAGES (no inbound links via
page-graph) but never scans FK-child tables. After a bulk delete or a
pre-FK-migration code path, orphan rows can persist indefinitely in
content_chunks, page_versions, tags, takes, raw_data, timeline_entries,
or links — all declared ON DELETE CASCADE, so any orphan row is
unexpected.

`childTableOrphansCheck` enumerates 10 FK columns across 8 tables:
- 8 NOT NULL columns (cascade): any value not in pages.id is an orphan.
- 2 nullable SET NULL columns (links.origin_page_id, files.page_id):
  NULL is valid; only NOT-NULL-but-missing-in-pages counts.

Surfaces paste-ready cleanup SQL when orphans are found.

Cherry-picked from PR garrytan#1064.

Co-Authored-By: vincedk-alt <vincedk-alt@users.noreply.github.com>

* fix(autopilot,cycle): stop respawn-storm from steady-state 'partial' cycles

Two compounding bugs under KeepAlive=true:

1. Autopilot tripped its circuit breaker on cycle.status === 'partial',
   not just 'failed'. 'partial' means at least one phase warned/failed
   while others ran — a soft signal, not fatal. On every cycle that
   warned, autopilot logged a failure and the supervisor respawned the
   worker.

2. The orphans phase emitted 'warn' when `count > 20` orphan pages.
   That threshold was tuned for small dev brains; on any corpus past a
   few hundred pages it fires every cycle in steady state. Together
   with bug 1, this produced visible respawn storms.

Fix:
- Autopilot trips only on cycle.status === 'failed'.
- Orphans phase warns by ratio: orphans / total_pages > 0.5 (the real
  "your graph fell apart" signal), not by absolute count.

Cherry-picked from PR garrytan#1113.

Co-Authored-By: sergeclaesen <sergeclaesen@users.noreply.github.com>

* fix(ai): reject partial embedding responses before indexing

`embedSubBatch` only validated the FIRST embedding's dimension and never
asserted the response length matched the input length. If a provider
returned fewer embeddings than requested (rate-limit truncation,
malformed response, etc.), the gateway silently indexed an offset-shifted
result — every page after the missing index got the embedding of a
different page's chunk.

Two new guards:
1. `result.embeddings.length === texts.length` — fail loud if any count
   mismatch, with a paste-ready retry hint.
2. Validate dim on EVERY embedding, not just the first.

Cherry-picked from PR garrytan#926.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>

* fix(serve): admin register-client supports auth_code + PKCE public clients

The admin dashboard's /admin/api/register-client endpoint hardcoded
client_credentials and ignored grantTypes, redirectUris, and
tokenEndpointAuthMethod. Result: you couldn't register a browser-based
PKCE client (claude.ai Custom Connector, Cursor, etc.) through the
dashboard — only confidential machine-to-machine clients worked.

Pass grantTypes / redirectUris through to registerClientManual. When
tokenEndpointAuthMethod === 'none', NULL out client_secret_hash so the
SDK's clientAuth middleware skips the hash-vs-plaintext compare that
would otherwise reject the no-secret PKCE flow.

Cherry-picked from PR garrytan#1077.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>

* fix(extract-facts): treat slugs:[] as no-op, not unscoped full-walk

`runExtractFacts` checked `opts.slugs && opts.slugs.length > 0` to
decide between scoped and full-brain walk. Both `undefined` (caller
omits → full walk intended) AND `[]` (sync no-op → zero work intended)
fall through to the same `else` branch and triggered
`engine.getAllSlugs()`.

On a multi-thousand-page brain, the unintended full walk exceeded
the autopilot-cycle ~600s timeout and dead-lettered the job — visible
in production as `[cycle.extract_facts] start` followed by silence
until `Autopilot stopping (cycle-failure-cap)`.

Use presence (`opts.slugs !== undefined`), not truthiness, to
distinguish the two modes. Empty array is a real incremental no-op.

Closes garrytan#1096. Three regression cases in test/extract-facts-phase.test.ts:
slugs=[] no-op, slugs=undefined still walks, slugs=['a'] walks just one.

Co-Authored-By: navin-moorthy <navin-moorthy@users.noreply.github.com>

* fix(serve): embed admin/dist into binary; serve from manifest (closes garrytan#1090)

Pre-fix, /admin returned 404 on every globally-installed binary because
serve-http.ts:780 resolved admin/dist via process.cwd(). The admin SPA
files are checked into git but `bun build --compile` does NOT embed
arbitrary directories — only assets imported via `with { type: 'file' }`
ESM imports land in the compiled binary.

Wire:

- scripts/build-admin-embedded.ts walks admin/dist/, emits
  src/admin-embedded.ts with one `with { type: 'file' }` import per
  file + a manifest map (request path → resolved path + mime).
  Auto-invoked by `bun run build:admin`.

- src/admin-embedded.ts is the auto-generated module. Bun resolves
  every file: import to a path that works at runtime inside the
  compiled binary (same pattern as src/core/chunkers/code.ts WASM
  imports).

- serve-http.ts switches to two-tier resolution: cwd-relative
  admin/dist for dev (Vite hot-rebuild), embedded manifest otherwise.
  Embedded path reads bytes lazily and caches per-asset for the
  lifetime of the process.

- scripts/check-admin-embedded.sh CI gate re-runs the generator and
  fails on drift (mirrors check-wasm-embedded.sh). PRs that rebuild
  admin/dist but forget to regenerate the embedded module fail loud.

- package.json wires build:admin-embedded + check:admin-embedded.

Closes garrytan#1090.

* test(source-id): lock in routing regression coverage (closes garrytan#891 garrytan#978 garrytan#1078)

Audit of every page write path (sync, embed, extract, dream, autopilot,
wikilinks, tags, chunks) confirmed that sourceId already threads
correctly through importFromContent → engine.putPage → SQL INSERT
since v0.18.0. The original bug reports from garrytan#891, garrytan#978, garrytan#1078 were
real at the time and got swept by the multi-source refactor; today's
master is correct.

This commit locks in that correctness with six PGLite regression cases
(no Postgres fixture needed; runs in CI everywhere):

1. importFromContent({sourceId:"work"}) lands at source_id=work, not
   the silent 'default' fallback.
2. Two sources hold the same slug independently.
3. Omitting sourceId falls through to 'default' (legacy contract).
4. Chunks land under the requested source.
5. Tags land under the requested source.
6. FK integrity smoke (originally garrytan#1078).

The earlier issue reports stay closed by the existing threading; this
suite ensures any future refactor of the write path can't silently
re-introduce the wrong-source-default bug. The 90-minute write-path
audit budget from the plan resolves here.

* fix(apply-migrations): unblock PGLite chain (closes garrytan#1100)

`gbrain apply-migrations --yes` was wedging on the v0.11.0 (Minions)
schema phase for PGLite installs. Two compounding bugs:

1. `apply-migrations` pre-flight schema-version warning connects to
   PGLite to read config.version, then disconnects. The brief lock
   hold races with downstream subprocess spawns that try to re-acquire
   it; the 30s lock timeout fires before the parent fully releases.
   Pre-flight is a *warning*; on PGLite it adds no information the
   orchestrators don't already handle. Skip the probe for PGLite.

2. v0.11.0 phase A spawned `gbrain init --migrate-only` as an execSync
   subprocess to apply schema migrations. PGLite is single-writer;
   the subprocess inherits HOME and tries to lock the same DB. On
   Postgres this works (concurrent connections OK); on PGLite it
   deadlocks. Route in-process for PGLite — create + connect +
   initSchema + disconnect directly, skipping the subprocess hop.
   Postgres keeps the legacy execSync path.

Verified: fresh PGLite install now walks the full migration chain
through v0.32.2 (Facts SoR) and lands "All migrations up to date" on
re-run.

Closes garrytan#1100.

* fix(serve): bootstrap token env override + suppress flag (closes garrytan#1024)

`gbrain serve --http` regenerated the admin bootstrap token on every
restart and printed it to stderr. In supervisor-managed production
deployments (LaunchAgent, systemd, k8s) every restart leaks the value
into log aggregators and rotates the access for any agent that paste-
copied it.

Two new knobs:

- **GBRAIN_ADMIN_BOOTSTRAP_TOKEN** env var: when set, used as the
  bootstrap secret instead of a fresh per-process token. Validated:
  must match `^[A-Za-z0-9_-]{32,}$` (32-char minimum), else refuse to
  start with a paste-ready generator hint. Failing closed beats
  silently accepting a weak token.

- **--suppress-bootstrap-token** CLI flag: suppresses the printed
  token line entirely. Operator takes responsibility for tracking the
  value out-of-band.

Startup banner now reflects the chosen source:
- `Admin Token: suppressed` when the flag is set.
- `Admin Token: from $GBRAIN_ADMIN_BOOTSTRAP_TOKEN` when env-sourced.
- Full token print only when both are absent (default behavior, dev
  installs).

Closes garrytan#1024.

Co-Authored-By: billy-armstrong <billy-armstrong@users.noreply.github.com>

* fix(config): migrate legacy 'provider' + 'model' to 'embedding_model'

Pre-v0.32 docs and some community templates used a config shape:

  { "provider": "voyage", "model": "voyage-4-large" }

The canonical shape (since the v0.31.12 gateway seam) is:

  { "embedding_model": "voyage:voyage-4-large" }

Users on the legacy shape hit silent fallthrough to the hardcoded
OpenAI default; sync + embed errored out with "OpenAI embedding
requires OPENAI_API_KEY" regardless of their actual provider config.

loadConfig() now translates the legacy keys at parse time:
- emits a one-line stderr nudge with the paste-ready canonical key
- preserves the rest of the config unchanged
- skipped when `embedding_model` is already set (forward-compat)

Closes garrytan#1086.

Co-Authored-By: jeunessima <jeunessima@users.noreply.github.com>

* chore(test): quarantine upgrade tests (process.env mutation)

PR garrytan#1032's cherry-picked tests use the static-snapshot + try/finally
pattern for env vars instead of the project's withEnv() helper. The
test-isolation lint catches process.env mutations outside withEnv to
prevent cross-test leakage in parallel runs.

Renaming to *.serial.test.ts (the quarantine convention) is the
documented out: runs sequentially, no cross-file race. A future cleanup
PR can migrate the tests to withEnv() and drop the quarantine.

* fix(test): update brain-writer .bak assertion for centralized backup path

The v0.36.x frontmatter backup change (bd60cdfcloses garrytan#902) moved
.bak files from sibling-of-source to ~/.gbrain/backups/frontmatter/...
The old test still asserted on the sibling path, so CI failed even
though the production behavior was correct.

Updated assertion contract: backup lands under the injected backupRoot
(test-isolated), the returned backupPath ends in .bak and exists, and
no sibling .bak is created next to the source file. The pre-fix
sibling-path is now a negative assertion.

* chore: bump version and changelog (v0.36.1.0)

v0.36.1.0 — community fix wave (28 atomic fixes + 22 PRs closed as
already-shipped + 14 issues triaged).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(fix-wave): close test gaps surfaced by post-ship audit

After the fix-wave shipped, an audit found 11 commits with no new test
file. Some were inherently structural (build pipelines, shell content)
or had existing test coverage that worked either way; others had real
regression risk with no guard. This commit closes the gaps that matter.

New regression tests for:

- OAuth `verifyAccessToken` throws `InvalidTokenError` (not bare Error)
  on both expired and unknown token paths. Pre-fix, the SDK's
  `requireBearerAuth` middleware fell through to 500 instead of 401 →
  client token-refresh logic never fired (garrytan#935).

- `loadConfig` translates legacy `{provider, model}` config shape to
  the canonical `embedding_model: <provider>:<model>`. 3 cases: pure
  legacy → migrated; canonical wins over legacy when both present;
  canonical-only is untouched. Pre-fix, Voyage/Cohere/Mistral users
  silently fell through to OpenAI (garrytan#1086).

- `configDir` rejects relative paths; rejects `..` segments via both
  separators (regression guard for the Windows path acceptance fix
  garrytan#1019 / cherry-pick garrytan#1083).

- `resolveBootstrapToken` (new exported helper extracted from
  `runServeHttp`). 9 cases: unset env generates fresh, valid env
  accepted, hyphens/underscores accepted, < 32 chars rejected, special
  chars rejected, whitespace trimmed, empty string rejected, 32-char
  boundary accepted, 31-char one-short rejected. Security-critical
  validation surface (garrytan#1024).

- GET /mcp returns 405 with `Allow: POST, DELETE` (E2E case in
  `serve-http-oauth.test.ts`). Pre-fix, claude.ai and other probing
  MCP clients saw 404 and gave up (garrytan#1076).

- apply-migrations `process.exit(0)` on list / dry-run / up-to-date
  paths. Source-shape assertion locks the rule in; shell scripts
  gating on `$?` work (garrytan#1062).

- Autopilot wrapper sources `~/.zshenv` BEFORE `~/.zshrc`. zshenv is
  the canonical place for env vars in non-interactive zsh; without
  this ordering, LaunchAgent subprocesses never inherit secrets
  exported in zshrc (garrytan#966).

- `test/fix-wave-structural.test.ts` consolidates source-shape
  regression guards for fixes whose behavior is hard to runtime-test
  without heavy mocking: query cache drain (garrytan#1125), admin embed
  manifest + handler (garrytan#1090), admin register-client PKCE branch
  (garrytan#1077), PGLite v0.11.0 phase A in-process routing (garrytan#1100), query
  `--no-expand` negation (garrytan#1124). 9 source-grep assertions.

Refactored `runServeHttp` to extract `resolveBootstrapToken` as a pure
helper. The boot path now consumes the helper's tagged-union result
({kind:'ok'|'error'}); side effects (`process.exit`, `console.error`)
moved to the caller. Unit-testable without spinning up Express.

Test counts: oauth 71 (was 69), config 20 (was 14), apply-migrations
19 (was 18), autopilot-install 5 (was 4), serve-http-bootstrap-token
9 (new file), fix-wave-structural 9 (new file). Net: +28 cases across
6 files; +1 new exported function with full coverage.

Remaining audit gaps (deferred):
- e82dda0 admin embed E2E (post-deploy curl smoke covers this)
- d93fa81 apply-migrations PGLite chain E2E (already smoke-tested
  manually in the original commit; subprocess test would be flaky in
  CI without DATABASE_URL gating)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: close the two deferred E2E gaps from the post-ship audit

Both gaps now have real behavior coverage. No DATABASE_URL needed (PGLite
engine), so they run in standard unit CI alongside the rest of the suite.
Serial quarantine because both spawn subprocesses + bind ports / write
tmpdirs.

test/admin-embed-spawn.serial.test.ts (4 cases, ~6s wall-clock):
  - Spawns `gbrain serve --http` from a fresh tmpdir so `process.cwd()/
    admin/dist` does not exist — this forces the embedded-manifest
    branch (the one under test). Pre-fix, this exact setup hit 404.
  - GET /admin/ → 200 + SPA shell HTML (title + #root div), content-type
    text/html.
  - GET /admin/index.html → same body via explicit path.
  - GET /admin/agents → SPA fallback returns index.html for deep links.
  - GET /admin/api/stats → NOT 200 (regression guard: SPA fallback must
    not swallow /admin/api/* routes and silently return HTML to a JSON
    client). Closes garrytan#1090.

test/apply-migrations-pglite-spawn.serial.test.ts (3 cases, ~25s):
  - Seeds a fresh PGLite config in a tmpdir, runs `gbrain init
    --migrate-only` + `gbrain apply-migrations --yes --non-interactive`.
    Pre-fix this hit "GBrain: Timed out waiting for PGLite lock" because
    apply-migrations' pre-flight probe + v0.11.0's phase A subprocess
    both wanted the single-writer lock.
  - Asserts exit 0, no "Timed out" string, no "Phase A failed" string,
    brain.pglite file written.
  - Re-run case: idempotent — "All migrations up to date" exits 0
    (also locks in the garrytan#1062 exit-code fix end-to-end).
  - --list path exits 0 (third leg of the garrytan#1062 contract).
  Closes garrytan#1100.

Pinned bootstrap token via GBRAIN_ADMIN_BOOTSTRAP_TOKEN env so the
admin test doesn't have to scrape stderr; the startup banner format
is allowed to drift, the /health probe is the readiness contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): consolidate PGLite spawn test to one end-to-end pass

CI failed on test/apply-migrations-pglite-spawn.serial.test.ts (Ubuntu,
bun 1.3.14). The previous shape ran 3 tests × ~3 spawns each. Each
`bun run /abs/src/cli.ts` from a tmpdir cwd pays a full parse/transpile
cost (no near-cwd .bun cache); on Ubuntu CI that compounds past the
runner's per-test budget.

Consolidated to ONE test that exercises the full lifecycle in one
brain: init --migrate-only → apply-migrations --yes → re-run → --list.

Four spawns instead of eight. Local wall-clock: 32s → 11.5s. All four
assertion buckets preserved: no PGLite lock timeout, no Phase A
failure, brain.pglite written, idempotent re-run "All migrations up
to date" exits 0 (garrytan#1062 end-to-end), --list exits 0.

Per-test timeout 480_000ms as insurance against the runner's
--timeout=60000 default (bun's API spec: per-test wins).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(diag): dump apply-migrations output when CI exit != 0

The PGLite spawn test passes locally on macOS/bun 1.3.13 in ~11s
end-to-end but fails on Ubuntu/bun 1.3.14 in 4.92s with apply.exitCode
= 1 — fast enough that something is failing early, not timing out.
The runCli helper captured stdout+stderr but never printed them, so
the CI log only showed the bare assertion failure.

This commit prints the captured streams from BOTH init and apply
when the exit code mismatches expectation. After the next CI run we
can read the actual error message and diagnose the Ubuntu-specific
failure mode (likely BUN_INSTALL / HOME / PGLite WASM env quirk).
No behavior change; pure diagnostic output gate on failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): shim `gbrain` on PATH for PGLite spawn test

Root cause of the Ubuntu CI failure: the v0.11.0 orchestrator's phase B
runs `execSync('gbrain jobs smoke')`. PGLite phase A now routes
in-process (the garrytan#1100 fix), but phase B and several follow-up phases
still shell out to the `gbrain` binary on PATH. Locally the binary
resolves via `bun link`; on CI Ubuntu it does not exist on PATH, so
execSync exits 127 → orchestrator returns 'failed' → apply-migrations
exits 1. Test failed at 4.92s with exitCode=1, well before any timeout.

Verified locally by removing ~/.bun/bin/gbrain to simulate CI:
  pre-shim:  apply.exitCode=1 (same as CI)
  post-shim: apply.exitCode=0 in 8.4s

The shim writes a tiny `gbrain` executable to a tmpdir that just
`exec`s `bun run <repo>/src/cli.ts "$@"`. Prepended to PATH for the
spawned subprocesses. Mirrors the production contract (gbrain on
PATH) without depending on `bun link` having run in the CI image.

Diagnostic dump from the previous commit stays — useful insurance for
the next time something silently fails inside a spawned binary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: johnybradshaw <johnybradshaw@users.noreply.github.com>
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-authored-by: sharziki <sharziki@users.noreply.github.com>
Co-authored-by: Aashiqe10 <Aashiqe10@users.noreply.github.com>
Co-authored-by: lukejduncan <lukejduncan@users.noreply.github.com>
Co-authored-by: 100yenadmin <100yenadmin@users.noreply.github.com>
Co-authored-by: hnshah <hnshah@users.noreply.github.com>
Co-authored-by: p3ob7o <p3ob7o@users.noreply.github.com>
Co-authored-by: sliday <sliday@users.noreply.github.com>
Co-authored-by: nezovskii <nezovskii@users.noreply.github.com>
Co-authored-by: vincedk-alt <vincedk-alt@users.noreply.github.com>
Co-authored-by: sergeclaesen <sergeclaesen@users.noreply.github.com>
Co-authored-by: navin-moorthy <navin-moorthy@users.noreply.github.com>
Co-authored-by: billy-armstrong <billy-armstrong@users.noreply.github.com>
Co-authored-by: jeunessima <jeunessima@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…arrytan#1136)

* feat(dims): OpenAI text-embedding-3 Matryoshka range validation (D13)

dimsProviderOptions now fail-loud at the embed boundary when the
configured embedding_dimensions is outside the model's native range
(1..1536 for -small, 1..3072 for -large). Paste-ready fix hint in the
AIConfigError.fix field. Closes the silent-HTTP-400 path that would
have bit OpenAI-fallback users on v0.36.0.0 ZE-default installs.

16 new test cases in test/ai/dims-openai.test.ts pinning the contract
across native-openai and openai-compatible adapter paths.

* feat(ai): flip defaults to ZeroEntropy zembed-1 1280d + zerank-2 reranker

Default embedding model is now zeroentropyai:zembed-1 at 1280d via
Matryoshka. Real-corpus benchmark: 2.2x faster than OpenAI, 2.6x
cheaper at regular pricing, wins 11/20 head-to-head queries.

1280 is the closest valid ZE Matryoshka step to the prior OpenAI 1536d
default (valid set: 2560/1280/640/320/160/80/40). 1024 (Voyage's step)
is NOT on ZE's list — pinned by AIConfigError fail-loud in dims.ts.

balanced mode bundle now defaults reranker_enabled=true. zerank-2
reshuffles 60% of top-1 results in benchmarks. Missing-key fail-open
contract in src/core/search/rerank.ts handles unauthenticated cases.
Opt out with: gbrain config set search.reranker.enabled false

Existing tests updated (gateway.test.ts, search-mode.test.ts) and a
new test/balanced-reranker-default.test.ts (10 cases) pins the fail-
open invariants.

* feat(retrieval-upgrade): RetrievalUpgradePlanner + interactive prompt UX

New src/core/retrieval-upgrade-planner.ts is the consolidated planner
that computes the brain's pending retrieval-upgrade work (chunker
bumps + ZE switch) in one pass and applies the schema transition +
config updates atomically.

Tagged-union ApplyResult enum (D15): 'applied' | 'skipped_already_
applied' | 'skipped_no_work' | 'declined' | 'planned' | 'failed'.
No string-parsing reasons.

Three config keys (D12): ze_switch_prompt_shown (UI state),
ze_switch_requested (user intent), ze_switch_applied (work done).
Plus ze_switch_previous_snapshot (JSON, full prior config for --undo
per D16) and ze_switch_declined_at (90-day re-ask window).

Schema transition (D18) is atomic: DROP indexes + ALTER COLUMN +
CREATE INDEX inside a single engine.transaction(). HNSW recreation
is part of the same transaction — no silent slow-search window.

C3 eligibility logic: ze_switch_offered iff NOT on ZE + NOT declined
recently + NOT applied + (legacy default OR >100 pages).

C4 cost math: MAX(chunker_pending, dim_pending) not SUM — one
re-embed pass invalidates both surfaces simultaneously.

New src/core/retrieval-upgrade-prompt.ts wires the planner to a
TTY-only interactive prompt with two-line cost split (D10) and
privacy callout for the reranker flip.

Tests: test/retrieval-upgrade-planner.test.ts (24 cases) pins the
state machine. test/asymmetric-encoding-contract.test.ts (6 cases)
pins D17: search read path uses gateway.embedQuery() not embed(),
asserted via __setEmbedTransportForTests mock.

* feat(cli): gbrain ze-switch — manual lever for the ZE switch

New gbrain ze-switch CLI with --dry-run, --json, --resume, --force,
--undo, --non-interactive, --confirm-reembed, --ignore-missing-key
flags. Mirrors the upgrade prompt's UX symmetry: --undo presents a
cost-warning before re-embedding back to the prior width.

src/cli.ts: dispatch case + CLI_ONLY entry. ze-switch owns its own
engine lifecycle (mirrors the doctor pattern).

test/ze-switch-cli.test.ts (11 cases): --help, --dry-run, --json,
--non-interactive, --ignore-missing-key, --resume, --undo,
--confirm-reembed. Uses captureExit harness to test process.exit()
paths without breaking the test process.

* feat(doctor): ze_embedding_health + embedding_width_consistency checks

Two new doctor checks (D-A5):

ze_embedding_health: when embedding_model starts with zeroentropyai:,
verify ZEROENTROPY_API_KEY is set (env or config). Paste-ready setup
hint with the signup URL on failure.

embedding_width_consistency: cross-check that the configured
embedding_dimensions matches the actual vector(N) column width on
content_chunks.embedding. Catches the half-applied switch state
(schema migrated but config write crashed) with a paste-ready
gbrain ze-switch --resume hint.

Wired into runDoctor between reranker_health and the existing
sync_freshness checks. Both checks gracefully no-op on non-ZE
embedding configs.

test/doctor-ze-checks.test.ts (8 cases) pins both checks across
happy + missing-key + missing-config + drift paths. Uses withEnv()
helper to clear ZEROENTROPY_API_KEY for the no-key path so tests
are hermetic against contributor env state.

test/e2e/v0_28_5-fix-wave.test.ts + test/openai-compat-multimodal.test.ts:
updated to explicit-configure the gateway when the test depends on
specific dims that diverge from the v0.36.0.0 default (1280d).

* docs: README zero-based rewrite (884 -> 139 lines) + new docs files

Strip 4 months of accreted "New in v0.X.Y" hero blocks and reorganize
around what gbrain does today. 33 H2s -> 8. The Commands section
(136 lines duplicating gbrain --help) moved out; the 6-table skills
enumeration collapsed to a one-paragraph capability description with
a link to skills/RESOLVER.md.

Hero retains load-bearing facts: OpenClaw + Hermes credit, production
numbers (17,888 pages / 4,383 people / 723 companies), BrainBench
numbers (P@5 49.1% / R@5 97.9% / +31.4 lift), ZE comparison numbers,
30-min install claim. Adds one paragraph announcing the v0.36.0.0 ZE
default with the explicit gbrain config set escape for OpenAI/Voyage
users.

New files:
- docs/INSTALL.md: every install path consolidated (agent platform,
  CLI standalone, MCP server). Thin-client mode covered.
- docs/architecture/RETRIEVAL.md: why the hybrid + graph stack works.
  BrainBench numbers, why each strategy alone fails, the source-aware
  ranking + intent classification + multi-query expansion story.
- docs/ethos/ORIGIN.md: origin story lifted from the old README so
  the front door stays factual + concrete.

test/readme-hero-anchors.test.ts (5 cases) is the D9 regression
guard. Five load-bearing strings: OpenClaw, Hermes, ZE,
production-numbers regex, P@5/R@5. Light anchors that let voice/
structure evolve but block accidental loss of headline facts.

scripts/check-test-real-names.sh: allowlist entries for OpenClaw +
Hermes literals in the anchor test (it explicitly asserts those
strings appear in README).

* chore: bump version and changelog (v0.36.0.0)

ZeroEntropy as the new default for embedding (zembed-1 at 1280d via
Matryoshka) and reranker (zerank-2 cross-encoder, on by default in
balanced mode bundle). README zero-based rewrite (884 -> 139 lines).
3 new docs files. Two new doctor checks. New gbrain ze-switch CLI
with --undo for symmetric reversibility.

skills/migrations/v0.36.0.0.md tells the agent how to surface the
retrieval-upgrade prompt post-upgrade.

llms-full.txt regenerated via bun run build:llms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): scrub Wintermute from RETRIEVAL.md per privacy rule

* chore: rebump version 0.36.0.0 → 0.36.2.0 (queue collision)

Three open PRs were claiming v0.36.0.0 (garrytan#1130 skillpack, garrytan#1139
hindsight, garrytan#1136 this PR). Ship-aware queue allocator says this
branch lands at v0.36.2.0.

Trio audit:
  VERSION       0.36.2.0
  package.json  0.36.2.0
  CHANGELOG     ## [0.36.2.0] - 2026-05-17

Updates: VERSION, package.json, CHANGELOG header + body refs,
README "New default in v0.36.2.0" announcement + credit line,
skills/migrations/v0.36.0.0.md renamed to v0.36.2.0.md with
frontmatter + body refs updated. llms-full.txt regenerated.

* fix(test): pin gateway dim=1536 in cross-file-stateful PGLite tests

CI shard 1 reported 10 failures across `query-cache.test.ts` (6) and
`consolidate-valid-until.test.ts` (4). Both files hardcode 1536-dim
vectors but rely on `PGLiteEngine.initSchema()` to size
`vector(__EMBEDDING_DIMS__)` at the right width.

Root cause: v0.36.2.0 flipped DEFAULT_EMBEDDING_DIMENSIONS from 1536
to 1280 (ZE Matryoshka step). The gateway module is process-singleton;
when ANOTHER test file in the same shard's bun-test process configures
the gateway before us, `pglite-engine.ts:216` reads
`getEmbeddingDimensions() === 1280` and sizes the schema columns at
vector(1280). The hardcoded 1536-dim INSERTs then fail with
"expected 1280 dimensions, not 1536".

Locally these tests pass in isolation because the gateway falls back
through the try/catch at pglite-engine.ts:218 (1536 default). CI runs
multiple test files in one process, so cross-file state poisons the
schema width.

Fix: explicit `resetGateway()` + `configureGateway({embedding_dimensions:
1536, ...})` at the top of `beforeAll`, plus `resetGateway()` in
`afterAll`. Pins the schema width regardless of cross-file state.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…an#1164)

* feat: migration v68 — eval_candidates.embedding_column

Schema migration ALTERs eval_candidates to add a nullable
embedding_column TEXT column. Per-row capture metadata so
`gbrain eval replay` reproduces the same column the
capture ran against (D16 / CDX-10). NULL-tolerant: pre-v0.36
rows fall back to current default.

Renumbered v67→v68 because master claimed v67 for
facts_typed_claim_columns during this branch's lifetime.

PGLite parity via sqlFor.pglite — same ALTER IF NOT EXISTS.

* feat: dynamic embedding column — core (resolver, types, gateway, engines)

The read-path foundation for routing search through any
populated embedding column, not just OpenAI 1536.

src/core/search/embedding-column.ts (new) is the canonical
seam. Single source of truth for column → provider/dim/type
lookup. Validates registry keys via regex
(/^[a-z_][a-z0-9_]*$/), uses Object.create(null) +
Object.hasOwn so 'constructor' and other inherited names
can't masquerade as registered columns. Identifier-quoting
on SQL interpolation as defense in depth.

src/core/types.ts widens SearchOpts.embeddingColumn to
accept ResolvedColumn descriptors at the engine boundary;
adds EmbeddingColumnConfig + ResolvedColumn exports.

src/core/config.ts merges embedding_columns +
search_embedding_column from the DB plane via
loadConfigWithEngine, mirroring the existing
embedding_multimodal_model pattern. Handles the no-file
case so env-only Postgres installs see DB-plane overrides
(codex /ship #3).

src/core/ai/gateway.ts: embedQuery(text, opts) +
embed(texts, opts) accept embeddingModel + dimensions
overrides. isAvailable(touchpoint, modelOverride?) so
hybrid asks 'is the active column's provider reachable?'
not 'is the global default reachable?' (CDX-4 / D10).

Engines: searchVector accepts ResolvedColumn descriptors via
normalizeEngineColumn; engine code is config-free and
unit-testable. getEmbeddingsByChunkIds(ids, column?) so
cosineReScore hydrates from the active column instead of
always 'embedding' (CDX-3 / D9). Identifier-quoting belt at
the SQL boundary.

src/core/eval-capture.ts threads embedding_column from
hybridSearch meta into the persisted capture row.

* feat: dynamic embedding column — integration (hybrid, ops, doctor)

Wires the resolver into hybridSearch, the query op, doctor,
and the config command.

src/core/search/hybrid.ts: resolves the column once at the
boundary, threads the descriptor into engine calls, routes
embedQuery through the resolved column's provider/dims, and
calls isCacheSafe (not isDefaultColumn) for cache skip so
user overrides of the 'embedding' builtin can't leak across
vector spaces (CDX-4). cosineReScore now hydrates from the
active column.

src/core/search/mode.ts: KNOBS_HASH_VERSION 2→3, append-only
new fields col= and prov= alongside floor_ratio. Cache rows
from different columns or providers now sit in different
keyspaces — cross-column contamination impossible.

src/core/operations.ts: query op accepts embedding_column
param for per-call A/B benchmarking. search op (keyword-only)
deliberately does NOT (CDX-9 / D15) — would be silent UX.

src/commands/doctor.ts: new embedding_column_registry
check. Batch format_type probe (D13) catches dim drift
that information_schema.columns.udt_name can't.
Batch pg_indexes probe (D5) warns on missing HNSW. Coverage
% on active column, gates at <90% (D14), short-circuits on
empty brains (codex /ship #5).

src/commands/config.ts: validates embedding_columns JSON
shape at set time, runs the coverage gate when setting
search_embedding_column, uses Object.hasOwn for the
registry lookup.

src/commands/eval-replay.ts: replay re-runs queries against
the captured embedding_column so post-flip-config replays
don't surface as false-positive regressions.

* test: dynamic embedding column — unit + e2e coverage

50 unit cases for the resolver (resolution chain, registry
merge, validation, prototype pollution, descriptor
passthrough, isCacheSafe, normalizeEngineColumn).

8 gateway override cases — embeddingModel + dimensions
flow into providerOptions, isAvailable(touchpoint, override)
routes to the right recipe, unknown models throw clean.

4 cosineReScore + 6 ops + 5 knobs-hash + 7 mode + 9 PGLite
E2E + 7 Postgres E2E + 5 eval-replay column metadata.

Postgres E2E (gated on DATABASE_URL) covers halfvec(2560)
end-to-end on real pgvector, EXPLAIN-visible HNSW index
on the alternate column, format_type-based dim drift catch,
and the <90% coverage gate.

Pins every codex /ship fix: prototype-pollution rejection
('constructor' as column name), descriptor passthrough
validation (rejects SQL-shaped strings in dimensions),
isCacheSafe semantics (space-based, not name-based).

Total: 141 new + extended cases, all green.

* chore: bump version and changelog (v0.36.3.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync to v0.36.3.0

Add CLAUDE.md key-files entry for src/core/search/embedding-column.ts.
Annotate hybrid.ts, gateway.ts, doctor.ts, and migrate.ts entries with
v0.36.3.0 wave changes (ResolvedColumn threading, embedQuery model
override, embedding_column_registry check, migration v68). Document
knobs_hash v=2 → v=3 bump under the Search Mode section.

Regenerate llms-full.txt from the updated CLAUDE.md so the auto-checked
bundle matches source (build-llms.test.ts CI guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): two CI failures from v0.36.3.0

1. test/loadConfig-merge.test.ts: update the 'returns null when base
   config is null' contract test. Pre-v0.36 the function returned null
   for null base; the codex /ship #3 fix changed that to synthesize a
   minimal `{ engine: 'postgres' }` so env-only installs see DB-plane
   overrides. Test now pins the new contract + adds a round-trip case
   asserting the merge actually surfaces `embedding_columns` /
   `search_embedding_column` set via gbrain config set on a null base.

2. test/schema-bootstrap-coverage.test.ts was failing because
   eval_candidates.embedding_column (added by migration v68) wasn't
   covered by applyForwardReferenceBootstrap. Fix: add the column to
   PGLITE_SCHEMA_SQL's eval_candidates CREATE TABLE definition (and
   src/schema.sql for parity) so fresh installs get it natively. The
   coverage test's third tier (schemaCreateTableCols) now finds it.
   Regenerated schema-embedded.ts via bun run build:schema.

Schema-blob path is cleaner than COLUMN_EXEMPTIONS — fresh installs
skip the migration entirely; upgrade installs still run v68.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stale refs (garrytan#1201)

A community member reported docs 'have quite a bit of drift and some broken
links' and contradictions like 'says don't use bun but also to use bun.' This
PR is a top-to-bottom audit + fix across every doc file at the repo root and
under docs/. Where docs disagreed with each other, the code was the tie-breaker.

## Categories of fix

### 1. Stale CLI commands (skillpack install → scaffold)

`gbrain skillpack install` was retired in v0.36.0.0 (replaced by the
scaffold/reference/migrate-fence model). The CLI now errors out with a hint:

    $ gbrain skillpack install
    Error: 'gbrain skillpack install' was removed in v0.33.
    Use 'gbrain skillpack scaffold <name>' instead.

But the docs still recommended it:

- README.md line 29 — primary install path
- docs/INSTALL.md lines 12 — primary install path

Both updated to `gbrain skillpack scaffold --all` with the v0.36.0.0 retirement
explained inline + the migrate-fence escape hatch for users upgrading from older
releases.

### 2. The 'bun install -g vs bun link' contradiction

The community member's exact complaint. The drift:

- README.md + docs/INSTALL.md: recommended `bun install -g github:garrytan/gbrain`
- INSTALL_FOR_AGENTS.md line 29: 'Do NOT use `bun install -g github:garrytan/gbrain`.'

Reading the code + CHANGELOG: `bun install -g` IS the canonical path. Bun
occasionally blocks the top-level postinstall hook on global installs (issue garrytan#218),
but the postinstall now prints a loud recovery hint when that happens, and
`gbrain doctor` flags `schema_version: 0` and routes users to
`gbrain apply-migrations --yes`. The 'do not use' warning was correct in 2024
when the postinstall silently swallowed errors with `|| true`; it's stale now.

Reconciled:

- INSTALL_FOR_AGENTS.md Step 1: now recommends `bun install -g` as the primary
  path, documents garrytan#218 as a known issue with the recovery command, and keeps
  `git clone + bun link` as a documented fallback.
- AGENTS.md Install (5 min): same reconciliation; clone path is the fallback,
  not the default.
- docs/INSTALL.md CLI standalone: added the garrytan#218 callout so the deterministic
  fallback is one click away when the default fails.

### 3. Broken internal links

- README.md → `docs/integrations/voice.md` (file doesn't exist). The real voice
  recipe lives at `recipes/twilio-voice-brain.md` (Twilio + OpenAI Realtime).
  Fixed to point there with an accurate one-line summary.
- CONTRIBUTING.md → `docs/SQLITE_ENGINE.md` (file doesn't exist; superseded by
  PGLite per docs/ENGINES.md). Replaced with a paragraph explaining the
  supersession and pointing at the live ENGINES.md.
- docs/GBRAIN_V0.md → `docs/SQLITE_ENGINE.md` (2 references; same supersession).
  Added a historical-doc banner at the top + rewrote both references to point at
  the current ENGINES.md.

### 4. Stale API key recommendations

INSTALL_FOR_AGENTS.md Step 2 only mentioned OpenAI + Anthropic. As of v0.36.2.0
ZeroEntropy is the default embedding + reranker stack (README opens with this);
the agent install guide didn't reflect it. Added `ZEROENTROPY_API_KEY` as the
default, kept OpenAI/Voyage as documented fallbacks, noted that keys can live in
`~/.gbrain/config.json` (file plane) or env.

### 5. Stale upgrade workflow

INSTALL_FOR_AGENTS.md 'Upgrade' section assumed the clone+bun-install model
(`cd ~/gbrain && git pull && bun install && gbrain init && gbrain post-upgrade`)
and didn't mention `gbrain upgrade` (the single-command path that exists in the
CLI today: binary self-update + schema migrations + post-upgrade prompts in one).
Split into two paths — `gbrain upgrade` for the bun-install-g case (now the
default per Step 1), clone-path for the fallback case.

Also fixed AGENTS.md 'Migrate' bullet (was `gbrain apply-migrations` only;
now leads with `gbrain upgrade` and keeps apply-migrations as the manual
schema-only path).

### 6. Stale cron-workflow

INSTALL_FOR_AGENTS.md Step 7 referenced cron docs but didn't mention
`gbrain autopilot --install` (the built-in self-maintaining daemon that
exists in the CLI today) or `gbrain sync --watch` (continuous loop). Added
both as alternatives to platform-cron glue.

### 7. ZeroEntropy version typo

docs/INSTALL.md said 'the v0.36.0.0 ZE switch' — ZE landed in v0.36.2.0
(v0.36.0.0 was the skillpack-scaffold retirement). Fixed.

## What I did NOT change

- CHANGELOG.md, CLAUDE.md, TODOS.md prose mentions of historical commands like
  `gbrain skillpack install` are correct as history — they're documenting what
  was true in past releases. Only forward-looking docs got updated.
- The 'broken link' false-positive matches in CHANGELOG / CLAUDE / TODOS are
  inside code-fence examples or regex patterns (`[Name](people/slug)`,
  `[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])`, `[--json](interrupted)`); they're
  illustrative syntax, not real links. Leaving alone.
- llms.txt / llms-full.txt regenerated via `bun run build:llms` so the
  agent-fetch documentation map matches the new content.

## Verification

- `bun run src/cli.ts --help` cross-checked against every command/flag the
  install docs reference: init, doctor, apply-migrations, upgrade, post-upgrade,
  skillpack scaffold/reference/migrate-fence, embed --stale, sync --watch,
  autopilot --install, dream, integrations list, extract links/timeline,
  graph-query, query, search modes — all real, all current.
- `bun run src/cli.ts skillpack install` confirmed to error out with the
  retirement hint pointing at scaffold (proves the README guidance was actively
  misleading users into a dead-end).
- Re-ran the broken-internal-link scanner across all root .md + docs/**/*.md;
  zero real broken links remain (5 residual matches are illustrative syntax
  inside prose, not actionable links).

Co-authored-by: garrytan-agents <agents@garrytan-agents.local>
…--remediate + Minions (garrytan#1193)

* feat(schema): op_checkpoints table + doctor_run_id partial GIN (v67+v68)

T1 of brain-health-100 wave. Two new migrations underpin autonomous
remediation via Minions:

- v67 op_checkpoints — shared checkpoint table for long-running ops
  (embed, extract, lint, backlinks, reindex, integrity). Pre-fix each
  op had its own file-backed checkpoint or none. PRIMARY KEY (op,
  fingerprint) lets `extract links` and `extract timeline` (or
  `reindex --markdown` vs `--code`) coexist without colliding on
  shared keys.

- v68 minion_jobs_doctor_run_id_idx — partial GIN on
  `minion_jobs.data WHERE data ? 'doctor_run_id'`. Indexes only
  doctor-submitted jobs so audit-trail queries don't sequential-scan
  months of unrelated cron history. PGLite skips via empty sqlFor.

Applied to src/schema.sql + src/core/pglite-schema.ts so both engines
get the table on fresh-install. Bootstrap coverage test +
122-case migrate test both pass.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + folded scope B from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): op-checkpoint module — DB-backed checkpoint primitive

T2 of brain-health-100 wave. Six exports plus per-op fingerprint helpers:

  loadOpCheckpoint(engine, key)     → string[]   (completed keys; [] if none)
  recordCompleted(engine, key, ks)  → void       (UPSERT atomic)
  clearOpCheckpoint(engine, key)    → void       (clean-exit drop)
  resumeFilter(all, completed)      → string[]   (pure; drives batched walks)
  purgeStaleCheckpoints(engine, ttl)→ number     (cycle purge phase consumer)

Fingerprint helpers:
  fingerprint(params)               — sha8 of canonical-JSON
  embedFingerprint(p)               — model+dim+slug+source variation
  extractFingerprint(p)             — mode (links vs timeline)
  reindexFingerprint(p)             — markdown vs code vs slug + chunker_version
  lintFingerprint, backlinksFingerprint, integrityFingerprint, importFingerprint

Canonical-JSON over keys-sorted ensures the same params produce the
same fingerprint across runs and hosts. sha8 (8 hex chars from sha256)
is short enough for filenames + UI but collision-resistant for the
expected per-op invocation diversity.

DB-backed for both engines (PGLite has the table too via v67). Lost-
write on partial DB failure is non-fatal — caller continues, next run
re-walks (cheap for hash-short-circuited ops like embed/import).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + codex #10–16 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): brain-score-recommendations — shared data layer

T4 of brain-health-100 wave. Pure module — no engine I/O. Takes a
BrainHealth snapshot + RecommendationContext, returns ordered
Remediation[] ready to feed the doctor remediation plan OR features
--auto-fix.

Three public exports:
  computeRecommendations(health, ctx)  → Remediation[]
  classifyChecks(checks, ctx)          → CheckClassification[]
  maxReachableScore(health, classes)   → number (0-100 ceiling)

D13 — three-state classification per check: remediable / human_only /
blocked. The plan ONLY emits remediable items; blocked surfaces
alongside as informational with the missing prereq (no API key, etc.).
Closes the spin-loop bug on empty / API-key-missing brains (codex #20).

D14 — every Remediation has a stable string id (sync.repo, embed.stale,
backlinks.fix, extract.all). depends_on references ids, not check names.

D9 — idempotency_key is content-hash from canonical-JSON of params.
Same intent across runs = same key; failed-row replay via :r<N> suffix
is the --remediate loop's job, not this module's.

Scope item +A (cost-budget gate) — Remediation.est_usd_cost populated
for embed (chars × pricePerMTok from embedding-pricing.ts) and Anthropic
jobs (estimateAnthropicCost helper). doctor --remediate --max-usd N
gates submission against est_total_usd_cost.

Both consumers (doctor + features per D15) import from here. Features
executes inline (D15 contract preserved), doctor submits via queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handlers): 11 new Minion handlers + 3 added to PROTECTED + sync noExtract fix

T5 of brain-health-100 wave.

PROTECTED_JOB_NAMES extension (D11): synthesize, patterns, consolidate.
These cycle phases internally submit `subagent` jobs with
allowProtectedSubmit=true, so they CAN spend Anthropic credits.
Treating them as "data-quality maintenance" was a misread surfaced by
the codex outside-voice review (#6). Protected gate ensures only
trusted local callers (CLI, autopilot, doctor --remediate) can submit;
an OAuth-scoped MCP client can't burn the user's API budget by
submitting a synthesize job over HTTP.

11 new handlers registered in jobs.ts registerBuiltinHandlers:

  PROTECTED (3) — phase-wrappers that spawn subagent children:
    synthesize, patterns, consolidate

  Open (8) — DB/fs writes only, no LLM spend:
    reindex, repair-jsonb, orphans, integrity, purge,
    extract_facts, resolve_symbol_edges, recompute_emotional_weight

Phase-wrappers all delegate to `runCycle({ phases: [name] })` rather
than extracting standalone phase functions. Cycle.ts already owns the
lock + abort signal + progress reporter per D10, so the wrapper is a
one-liner and cycle.ts remains the single source of truth for phase
semantics. Pragmatic deviation from the plan's "extract 6 standalone
runXxxPhase functions" — smaller diff, equivalent correctness.

Standalone `sync` handler now passes `noExtract: true` (codex #5 fix).
Pre-fix, doctor's remediation plan emitting [sync, extract] caused
double-extraction (performSync inline-extract + standalone extract
job). Now sync defers extract to the dedicated handler. Callers that
want inline extract pass { noExtract: false } in job params.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T5 + D10 + D11 + codex #5/#6 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): --remediation-plan + --remediate CLI surfaces

T6 of brain-health-100 wave. The headline user-facing capability:
agents drive brain health to target score via autonomous Minions
remediation.

Two new flags on `gbrain doctor`:

  --remediation-plan [--json] [--target-score N]
    Read-only. Emits ordered Remediation[] from BrainHealth + context.
    Uses cheap path (D7) — engine.getHealth() + computeRecommendations,
    NOT a full doctor walk. JSON shape is stable agent contract.

  --remediate [--yes] [--target-score N] [--max-jobs N] [--max-usd N]
              [--dry-run] [--json]
    Sequential submit (D3) with D5 cascade on failure, D7 scoped
    recheck between steps, D9 content-hash idempotency keys, D13
    three-state remediation filtering (only remediable jobs enter
    the loop), +A cost-budget gate via --max-usd.

Check.remediation field added as additive optional (DoctorReport
schema_version stays at 2 per D4).

PGLite path: synchronous in-process execution with short polling.
Postgres path: durable queue submission with waitForCompletion.

The --remediate loop:
  1. Compute initial plan from BrainHealth
  2. Refuse if --target-score > maxReachableScore(health, classes)
  3. Refuse if est_total_usd_cost > --max-usd
  4. For each step in order:
     - Skip if depends_on intersects aborted set (D5)
     - queue.add with content-hash idempotency_key (D9)
     - waitForCompletion with timeout
     - Recompute plan from fresh health (D7 scoped recheck)
  5. Exit 0 if all completed; 1 if any failed/aborted

doctor_run_id UUID stamps every submitted job's data field so
operators can later query `SELECT * FROM minion_jobs WHERE
data->>'doctor_run_id' = '<uuid>'` (indexed via v68 partial GIN).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T6 + D1/D3/D5/D7/D9/D13 + folded scope A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): maybeBackground helper + apply --background to embed

T7 of brain-health-100 wave. New helper in src/core/cli-options.ts
formalizes the --background flag pattern. Same semantics in TTY and
cron per D9 (submit-and-exit always; --background --follow execs
`gbrain jobs follow <id>` after submission).

  await maybeBackground({
    engine, args, jobName: 'embed',
    paramBuilder: (cleanArgs) => ({ stale, all, ... }),
  })
  // returns true if backgrounded → caller exits

Content-hash idempotency key (D9): `cli:embed:sha8(canonical-JSON(params))`.
No time-slot. Same intent across runs = same key. Failed-row replay
is the doctor --remediate loop's job, not this path's.

PGLite degrades to inline execution with a clear stderr note
("PGLite has no worker daemon; running inline"). NOT a no-op,
NOT silent — doc-stated semantic difference because PGLite has no
worker daemon.

Applied to `gbrain embed` as the reference integration. The other 6
commands (extract, lint, backlinks, reindex, integrity, pages) adopt
the same 4-line pattern at the top of their entry function — follow-up
in a smaller diff once the helper proves out in production.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T7 + D9 + Gap 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(autopilot): targeted-submit loop + op_checkpoints GC in purge phase

T8 of brain-health-100 wave.

Autopilot dispatch changes (src/commands/autopilot.ts):

Pre-fix: every tick submitted ONE autopilot-cycle job, full phase
set, regardless of brain state. On a healthy brain pure overhead; on
a degraded brain bundled fast wins with slow phases so user waited
for the slowest.

New decision logic (T8 from plan):
  - score >= 95 AND empty plan AND <60min since last full → SLEEP
  - score >= 95 AND empty plan AND >=60min → submit autopilot-cycle
    (phase-coupling exercise)
  - plan <= 3 steps AND est_total < 5min → submit individual handlers
    (targeted; uses D9 content-hash idempotency keys per step;
    maxWaiting:1 per submit per codex #17)
  - else → submit autopilot-cycle (the hammer)

D10 cycle-lock invariant guarantees targeted-submit and autopilot-cycle
can never run concurrently (both acquire gbrain-cycle), closing the
"60-min floor double-processes queued targeted jobs" failure mode.

Computation uses cheap path (D7) — engine.getHealth() + computeRecommendations,
NOT a full doctor walk. Adds ~1 SQL count query per tick; negligible
on a 50K-page brain.

PROTECTED handlers (synthesize/patterns/consolidate) are submitted with
allowProtectedSubmit:true; autopilot is a trusted local caller.

Cycle purge phase (src/core/cycle.ts):

Added op_checkpoints GC (+C folded scope item). 7-day TTL — any
reasonable long-running op finishes inside that window. Non-fatal
on pre-v67 brains (table missing).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T8 + D7/D9/D10 + codex #17 + folded scope +C).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(core): brain-score-recommendations + op-checkpoint unit tests

T10 of brain-health-100 wave — load-bearing decision-pinning tests.

test/brain-score-recommendations.test.ts (22 cases):
  - Healthy brain → empty plan
  - Per-component remediation paths (sync, embed, backlinks, extract)
  - depends_on wiring (extract → sync; embed → sync when stale)
  - Severity ordering (critical > high > medium > low)
  - D6 #5 determinism: same input twice → byte-identical output
  - D9 idempotency keys: content-hash format, no time-slot
  - D9 source isolation: different --source → different key
  - D13 status field always 'remediable' in output
  - +A cost-estimate populated for embed
  - classifyChecks: remediable / blocked / human_only triage
  - maxReachableScore: all-remediable → 100; all-blocked → current

test/op-checkpoint.test.ts (20 cases):
  - fingerprint stability + key-order invariance (canonical-JSON)
  - codex #11: extract links vs timeline get different fingerprints
  - codex #12: reindex markdown vs code get different fingerprints
  - codex #15: embed model+dim variation produces different fingerprints
  - reindex chunker_version bump invalidates checkpoint
  - DB round-trip (load → record → load)
  - Cross-fingerprint isolation (linksKey vs timelineKey)
  - clearOpCheckpoint idempotency on missing rows
  - resumeFilter purity (no I/O, deterministic)
  - purgeStaleCheckpoints TTL respect

42 new tests, all pass. PGLite engine + resetPgliteState pattern per
CLAUDE.md test-isolation guide.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v0.36.0.0 — brain-health-100 wave + docs/llms refresh

T12 of brain-health-100 wave. VERSION + package.json bumped 0.35.6.0
→ 0.36.0.0. CHANGELOG entry leads ELI10 ("your agent can now drive
your brain to 90/100 by itself, on a cron, without you watching")
then drills into the precise mechanics per CLAUDE.md voice rules.

llms.txt + llms-full.txt regenerated via bun run build:llms.

Trio audit (CLAUDE.md mandatory pre-push check):
  VERSION:     0.36.0.0
  package.json: 0.36.0.0
  CHANGELOG:   ## [0.36.0.0] - 2026-05-18

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update README/CLAUDE/AGENTS/maintain for v0.36.4.0 brain-health-100 wave

- README.md: New-in-v0.36.4.0 callout — `gbrain doctor --remediate` headline,
  autopilot health-aware tick, eleven new background-job types, three PROTECTED.
- CLAUDE.md: Key Files entries for `op-checkpoint.ts`, `brain-score-recommendations.ts`,
  doctor.ts / jobs.ts / protected-names.ts / autopilot.ts / cycle.ts / embed.ts /
  cli-options.ts extensions; new "Key commands added in v0.36.4.0" section.
- AGENTS.md: Common-tasks entry pointing agents at the one-command remediation loop.
- skills/maintain/SKILL.md: Autonomous Phase (gbrain doctor --remediate) at the top,
  manual per-dimension walk preserved as the fallback path.
- llms-full.txt: regenerated to pick up the CLAUDE.md changes (project rule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): respectful tone on spend caps for v0.36.4.0

Reframed the cost-budget callout. Pre-fix language said the spend cap
prevents a synthesize loop from "burning $100 of Anthropic credits
while you're at lunch" — casually treating $100 as the throwaway number
is tone-deaf. $100 is a meaningful amount for many people.

New language: "spend cap so a synthesize loop can't run up your
Anthropic bill while you're at lunch. The cap is yours to set per run."
And: "Pass --max-usd 5 (or whatever cap you're comfortable with)."
And: "Pick the cap that fits your wallet."

Also reframed three adjacent lines:
- "healthy brains stop burning cycles" → "stop spending tokens on
  work that has nothing to do"
- "agent can't submit them and burn your API budget" → "can't submit
  them on your behalf. Your provider bill stays in your hands"
- Table cell "Cron with cost cap" / "--max-usd 5" → "Cron with spend
  cap" / "--max-usd N"

llms-full.txt regenerated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…"database_url"]) (garrytan#1192)

* v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"])

Replaces PR garrytan#1137's plaintext-config / plaintext-env workarounds with code.
Shell-job params gain `inherit: ["database_url"]`, validated pre-enqueue in
both the CLI (`gbrain jobs submit`) and `submit_job` MCP op handler. Worker
resolves the value from its own loadConfig() at child-spawn time; the
persisted `minion_jobs.data` row stores only the name. Plain
`env: { GBRAIN_DATABASE_URL: ... }` / `env: { DATABASE_URL: ... }` /
`env: { GBRAIN_DIRECT_DATABASE_URL: ... }` are rejected pre-enqueue with a
paste-ready hint pointing at `inherit:`.

Codex pre-landing review caught two bypasses + one missing shadow name:
- H1: cmd/argv inline-secret regex scan (cmd:"GBRAIN_DATABASE_URL=... gbrain
  sync" was a clean bypass — fixed)
- H3: GBRAIN_DIRECT_DATABASE_URL added to shadowKeys
- H2: honest docs about output-side leakage (stdout_tail/stderr_tail can still
  carry the value if the script prints it; that's the script author's
  responsibility, not gbrain's)

Also: gbrain doctor learns home_dir_in_worktree (warns when ~/.gbrain lives
inside a git worktree); ~/.gbrain/.gitignore retroactive via saveConfig +
post-upgrade.

New canonical guide: docs/guides/agent-to-gbrain.md (two-domain framing for
downstream agent authors: MCP ops via OAuth vs localOnly admin ops via
shell-job inherit:).

Closes garrytan#1137. Tests: +53 new (21 validator + 12 inherit-record + 6
ensureGitignore + 5 doctor + 2 PGLite E2E + 7 codex-driven H1/H3 cases).

Credit: @WinterMute filed PR garrytan#1137 which made the env-stripping gap visible
enough to fix in code. Thank you.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* v0.36.5.0 redesign: free-form inherit:, drop closed enum

User feedback: "agent spawning minions should have agency to do what it wants
with secrets and pass only the ones that it needs. don't be a security nazi
please."

Replaces the closed INHERITABLE enum (database_url only) with three small
helpers in shell-inherit.ts:

- INHERIT_NAME_RE: snake_case shape guard. Rejects __proto__, leading
  underscore, uppercase, path-traversal. Prototype-pollution defense.
- deriveEnvKey(name): config-key → child-env-key. Uppercase by default with
  one override: database_url → GBRAIN_DATABASE_URL.
- resolveInheritValue(cfg, name): value lookup with Object.hasOwn.

inherit: now accepts any snake_case config-key the worker has. Agent picks
what it needs per-job (database_url, anthropic_api_key, voyage_api_key, or
any custom field). Validator does NOT police WHICH keys — single-uid trust
model treats agent as peer of worker.

Drops the v0.36.5.0-RC rules that were paternalistic for the actual threat
model:
- closed-enum check
- env-shadow rejection
- cmd/argv inline-secret scan

Keeps the parts that defend real problems:
- pre-enqueue validation (closes the persistence-before-throw window)
- snake_case regex (prototype-pollution + audit-log readability)
- fail-fast on missing config value (UX guardrail, not security)

Tests: shell-validate (existing rules + new free-form + prototype-pollution
defense + T1 regression guard) and shell-inherit (regex matrix, deriveEnvKey
per-name, resolveInheritValue with hasOwn defense). E2E case now exercises
inherit:["anthropic_api_key"] to prove genuinely free-form.

Docs and CHANGELOG rewritten to reflect the open design + the design-arc
story (closed → cut → free-form). Migration file too.

7653 unit tests green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* v0.36.5.0 add: redact_secrets opt-in for stdout/stderr scrubbing

Honest defense for the documented output-side leakage. When a script prints
an inherited secret, the value lands plaintext in
result.stdout_tail / result.stderr_tail / error_text. v0.36.5.0 adds:

- `redact_secrets: true` ShellJobParams field
- `--redact-secrets` CLI convenience flag on `gbrain jobs submit shell`
- shell-redact.ts: pure `redactSecretsInText(text, secrets)` helper
  (string-mode replaceAll; regex metachars in values stay literal)
- Handler post-processes both tails before throw/return, so the persisted
  row carries `<REDACTED:name>` tokens instead of values

Only inherit-resolved values are scrubbed. env: values are not (those are
the agent's "fine in the row" channel by design). Heuristic — defeats
accidental `echo "$GBRAIN_DATABASE_URL"`, not adversarial encode-then-print.
Default false for back-compat.

Tests:
- test/minions-shell-redact.test.ts (9 cases): pure-function behavior,
  regex-metachar safety, multi-secret independent redaction, substring
  overlap, empty-input/map edge cases
- test/minions-shell-validate.test.ts: +4 cases for redact_secrets shape
- test/e2e/minions-shell-pglite.test.ts: +2 cases proving redact_secrets:
  true scrubs persisted row AND redact_secrets:false preserves plaintext
  (back-compat regression guard)

Docs + CHANGELOG + migration file + CLAUDE.md updated.

7667 unit tests green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Important

Review skipped

Too many files!

This PR contains 296 files, which is 146 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b14665e4-e014-466d-83e0-234e358afa1f

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5df13 and 6cfdaa0.

⛔ Files ignored due to path filters (4)
  • admin/dist/assets/index-DlF4rPGd.js is excluded by !**/dist/**
  • admin/dist/assets/index-GxkWX7v3.css is excluded by !**/dist/**
  • admin/dist/assets/index-jKxMqIPg.js is excluded by !**/dist/**
  • admin/dist/index.html is excluded by !**/dist/**
📒 Files selected for processing (296)
  • .gitignore
  • CHANGELOG.md
  • CLAUDE.md
  • CONTRIBUTING.md
  • DESIGN.md
  • README.md
  • TODOS.md
  • VERSION
  • admin/src/App.tsx
  • admin/src/api.ts
  • admin/src/index.css
  • admin/src/pages/Calibration.tsx
  • docs/GBRAIN_V0.md
  • docs/INSTALL.md
  • docs/UPGRADING_DOWNSTREAM_AGENTS.md
  • docs/architecture/RETRIEVAL.md
  • docs/contradictions.md
  • docs/ethos/ORIGIN.md
  • docs/guides/agent-to-gbrain.md
  • docs/guides/minions-shell-jobs.md
  • docs/guides/skillpacks-as-scaffolding.md
  • docs/issues/doctor-auto-heal-and-scoring.md
  • docs/proposals/temporal-contradiction-probe.md
  • llms-full.txt
  • openclaw.plugin.json
  • package.json
  • plugins/gbrain-codex/.codex-plugin/plugin.json
  • plugins/gbrain-codex/package.json
  • scripts/build-admin-embedded.ts
  • scripts/check-admin-embedded.sh
  • scripts/check-privacy.sh
  • scripts/check-proposal-pii.sh
  • scripts/check-synthetic-corpus-privacy.sh
  • scripts/check-test-real-names.sh
  • skills/RESOLVER.md
  • skills/_AGENT_README.md
  • skills/conventions/calibration.md
  • skills/maintain/SKILL.md
  • skills/manifest.json
  • skills/migrations/v0.35.7.0.md
  • skills/migrations/v0.36.2.0.md
  • skills/migrations/v0.36.5.0.md
  • skills/skillpack-harvest/SKILL.md
  • skills/skillpack-harvest/routing-eval.jsonl
  • src/admin-embedded.ts
  • src/cli.ts
  • src/commands/apply-migrations.ts
  • src/commands/autopilot.ts
  • src/commands/backlinks.ts
  • src/commands/calibration.ts
  • src/commands/check-resolvable.ts
  • src/commands/config.ts
  • src/commands/doctor.ts
  • src/commands/embed.ts
  • src/commands/eval-replay.ts
  • src/commands/eval-suspected-contradictions.ts
  • src/commands/eval-trajectory.ts
  • src/commands/eval.ts
  • src/commands/extract.ts
  • src/commands/founder-scorecard.ts
  • src/commands/jobs.ts
  • src/commands/migrations/v0_11_0.ts
  • src/commands/models.ts
  • src/commands/search.ts
  • src/commands/serve-http.ts
  • src/commands/serve.ts
  • src/commands/skillpack-check.ts
  • src/commands/skillpack.ts
  • src/commands/sync.ts
  • src/commands/takes.ts
  • src/commands/think.ts
  • src/commands/upgrade.ts
  • src/commands/ze-switch.ts
  • src/core/ai/dims.ts
  • src/core/ai/gateway.ts
  • src/core/brain-score-recommendations.ts
  • src/core/calibration/cross-brain.ts
  • src/core/calibration/gstack-coupling.ts
  • src/core/calibration/nudge.ts
  • src/core/calibration/recall-footer.ts
  • src/core/calibration/svg-renderer.ts
  • src/core/calibration/take-forecast.ts
  • src/core/calibration/templates.ts
  • src/core/calibration/think-ab.ts
  • src/core/calibration/undo-wave.ts
  • src/core/calibration/voice-gate.ts
  • src/core/cli-options.ts
  • src/core/config.ts
  • src/core/cycle.ts
  • src/core/cycle/base-phase.ts
  • src/core/cycle/calibration-profile.ts
  • src/core/cycle/extract-facts.ts
  • src/core/cycle/grade-takes.ts
  • src/core/cycle/phantom-redirect.ts
  • src/core/cycle/phases/consolidate.ts
  • src/core/cycle/propose-takes.ts
  • src/core/cycle/transcript-discovery.ts
  • src/core/embedding.ts
  • src/core/engine.ts
  • src/core/entities/resolve.ts
  • src/core/eval-capture.ts
  • src/core/eval-contradictions/auto-supersession.ts
  • src/core/eval-contradictions/cache.ts
  • src/core/eval-contradictions/calibration-join.ts
  • src/core/eval-contradictions/cost-prompt.ts
  • src/core/eval-contradictions/date-filter.ts
  • src/core/eval-contradictions/judge.ts
  • src/core/eval-contradictions/runner.ts
  • src/core/eval-contradictions/severity-classify.ts
  • src/core/eval-contradictions/trends.ts
  • src/core/eval-contradictions/types.ts
  • src/core/facts-fence.ts
  • src/core/facts/backstop.ts
  • src/core/facts/extract-from-fence.ts
  • src/core/facts/extract.ts
  • src/core/facts/fence-write.ts
  • src/core/facts/phantom-audit.ts
  • src/core/facts/stub-guard-audit.ts
  • src/core/git-remote.ts
  • src/core/migrate.ts
  • src/core/minions/child-worker-supervisor.ts
  • src/core/minions/exit-classification.ts
  • src/core/minions/handlers/shell-audit.ts
  • src/core/minions/handlers/shell-inherit.ts
  • src/core/minions/handlers/shell-redact.ts
  • src/core/minions/handlers/shell-validate.ts
  • src/core/minions/handlers/shell.ts
  • src/core/minions/handlers/supervisor-audit.ts
  • src/core/minions/protected-names.ts
  • src/core/oauth-provider.ts
  • src/core/op-checkpoint.ts
  • src/core/operations-descriptions.ts
  • src/core/operations.ts
  • src/core/pglite-engine.ts
  • src/core/pglite-schema.ts
  • src/core/postgres-engine.ts
  • src/core/repo-root.ts
  • src/core/resolvers/builtin/x-api/handle-to-tweet.ts
  • src/core/retrieval-upgrade-planner.ts
  • src/core/retrieval-upgrade-prompt.ts
  • src/core/schema-embedded.ts
  • src/core/search/embedding-column.ts
  • src/core/search/hybrid.ts
  • src/core/search/mode.ts
  • src/core/skillpack/apply-hunks.ts
  • src/core/skillpack/bundle.ts
  • src/core/skillpack/copy.ts
  • src/core/skillpack/diff-text.ts
  • src/core/skillpack/harvest-lint.ts
  • src/core/skillpack/harvest.ts
  • src/core/skillpack/migrate-fence.ts
  • src/core/skillpack/reference.ts
  • src/core/skillpack/scaffold.ts
  • src/core/skillpack/scrub-legacy.ts
  • src/core/sync.ts
  • src/core/think/index.ts
  • src/core/think/prompt.ts
  • src/core/trajectory.ts
  • src/core/types.ts
  • src/core/utils.ts
  • src/mcp/tool-defs.ts
  • src/schema.sql
  • test/admin-embed-spawn.serial.test.ts
  • test/ai/adaptive-embed-batch.test.ts
  • test/ai/dims-openai.test.ts
  • test/ai/gateway.test.ts
  • test/ai/no-batch-cap-suppression.serial.test.ts
  • test/apply-migrations-pglite-spawn.serial.test.ts
  • test/apply-migrations.test.ts
  • test/asymmetric-encoding-contract.test.ts
  • test/autopilot-cycle-failure-classification.test.ts
  • test/autopilot-install.test.ts
  • test/backlinks.test.ts
  • test/balanced-reranker-default.test.ts
  • test/brain-score-recommendations.test.ts
  • test/brain-writer.test.ts
  • test/calibration-cli.test.ts
  • test/calibration-profile.test.ts
  • test/check-resolvable-cli.test.ts
  • test/cli-args.test.ts
  • test/config-ensure-gitignore.test.ts
  • test/config.test.ts
  • test/consolidate-valid-until.test.ts
  • test/core/base-phase.test.ts
  • test/core/cycle.serial.test.ts
  • test/cosine-rescore-column.test.ts
  • test/cross-brain-calibration.test.ts
  • test/doctor-calibration-checks.test.ts
  • test/doctor-child-orphans.test.ts
  • test/doctor-home-dir-in-worktree.test.ts
  • test/doctor-ze-checks.test.ts
  • test/doctor.test.ts
  • test/e2e/embedding-column-pglite.test.ts
  • test/e2e/embedding-column-postgres.test.ts
  • test/e2e/eval-contradictions-postgres.test.ts
  • test/e2e/eval-replay-column.test.ts
  • test/e2e/minions-shell-pglite.test.ts
  • test/e2e/phantom-redirect.test.ts
  • test/e2e/serve-http-oauth.test.ts
  • test/e2e/skillpack-flow.test.ts
  • test/e2e/v0_28_5-fix-wave.test.ts
  • test/e2e/zeroentropy-live.test.ts
  • test/engine-find-trajectory.test.ts
  • test/entity-resolve-perf.slow.test.ts
  • test/entity-resolve.test.ts
  • test/eval-contradictions-auto-supersession.test.ts
  • test/eval-contradictions-cache.test.ts
  • test/eval-contradictions-calibration-join.test.ts
  • test/eval-contradictions-cost-prompt.test.ts
  • test/eval-contradictions-cross-source.test.ts
  • test/eval-contradictions-date-filter.test.ts
  • test/eval-contradictions-engine.test.ts
  • test/eval-contradictions-integrations.test.ts
  • test/eval-contradictions-judge.test.ts
  • test/eval-contradictions-runner.test.ts
  • test/eval-contradictions-severity.test.ts
  • test/eval-contradictions-trends.test.ts
  • test/eval-contradictions/no-valid-until-write.test.ts
  • test/eval-trajectory.test.ts
  • test/exit-classification.test.ts
  • test/extract-facts-phase.test.ts
  • test/facts-backstop.test.ts
  • test/facts-fence-typed.test.ts
  • test/fence-write.test.ts
  • test/fix-wave-structural.test.ts
  • test/fixtures/calibration/README.md
  • test/fixtures/calibration/extract-takes-corpus/companies-acme-example.md
  • test/fixtures/calibration/extract-takes-corpus/concept-startup-market-dynamics.gradeable-claims.json
  • test/fixtures/calibration/extract-takes-corpus/concept-startup-market-dynamics.md
  • test/fixtures/calibration/extract-takes-corpus/daily-2026-04-15.gradeable-claims.json
  • test/fixtures/calibration/extract-takes-corpus/daily-2026-04-15.md
  • test/fixtures/calibration/extract-takes-corpus/decision-log-2025-q3.md
  • test/fixtures/calibration/extract-takes-corpus/essay-cities-and-ambition.md
  • test/fixtures/calibration/extract-takes-corpus/meeting-2026-04-03.md
  • test/fixtures/calibration/extract-takes-corpus/meeting-2026-04-10-fundraise-fund-a.gradeable-claims.json
  • test/fixtures/calibration/extract-takes-corpus/meeting-2026-04-10-fundraise-fund-a.md
  • test/fixtures/calibration/extract-takes-corpus/people-alice-example.md
  • test/fixtures/calibration/holdout/concept-founder-execution.gradeable-claims.json
  • test/fixtures/calibration/holdout/concept-founder-execution.md
  • test/fixtures/calibration/holdout/daily-2026-04-18.gradeable-claims.json
  • test/fixtures/calibration/holdout/daily-2026-04-18.md
  • test/fixtures/calibration/holdout/essay-on-conviction.gradeable-claims.json
  • test/fixtures/calibration/holdout/essay-on-conviction.md
  • test/fixtures/calibration/holdout/meeting-2026-04-17-hiring-charlie-example.gradeable-claims.json
  • test/fixtures/calibration/holdout/meeting-2026-04-17-hiring-charlie-example.md
  • test/fixtures/calibration/holdout/people-bob-example.gradeable-claims.json
  • test/fixtures/calibration/holdout/people-bob-example.md
  • test/founder-scorecard.test.ts
  • test/gateway-embed-model-override.test.ts
  • test/git-remote.test.ts
  • test/grade-takes-ensemble.test.ts
  • test/grade-takes.test.ts
  • test/gstack-learnings-coupling.test.ts
  • test/helpers/extract-added-columns.ts
  • test/loadConfig-merge.test.ts
  • test/mcp-eval-capture.test.ts
  • test/mcp-tool-defs.test.ts
  • test/migrate.test.ts
  • test/minions-shell-inherit.test.ts
  • test/minions-shell-redact.test.ts
  • test/minions-shell-validate.test.ts
  • test/nudge.test.ts
  • test/oauth.test.ts
  • test/op-checkpoint.test.ts
  • test/openai-compat-multimodal.test.ts
  • test/operations-embedding-column.test.ts
  • test/operations-find-trajectory.test.ts
  • test/orphans.test.ts
  • test/phantom-redirect-engine-parity.test.ts
  • test/phantom-redirect.test.ts
  • test/propose-takes.test.ts
  • test/query-cache.test.ts
  • test/readme-hero-anchors.test.ts
  • test/recall-footer.test.ts
  • test/regressions/v0.36.1.0-iron-rule.test.ts
  • test/repo-root.test.ts
  • test/resolvers.test.ts
  • test/retrieval-upgrade-planner.test.ts
  • test/schema-bootstrap-coverage.test.ts
  • test/scripts/check-proposal-pii.test.ts
  • test/search-mode.test.ts
  • test/search.test.ts
  • test/search/embedding-column.test.ts
  • test/search/hybrid-reranker-integration.test.ts
  • test/search/knobs-hash-reranker.test.ts
  • test/serve-http-bootstrap-token.test.ts
  • test/skillpack-apply-hunks.test.ts
  • test/skillpack-changed-since-version.test.ts
  • test/skillpack-copy.test.ts
  • test/skillpack-frontmatter-sources.test.ts
  • test/skillpack-harvest-lint.test.ts
  • test/skillpack-harvest.test.ts
  • test/skillpack-migrate-fence.test.ts
  • test/skillpack-reference-apply.test.ts
  • test/skillpack-reference.test.ts
  • test/skillpack-scaffold.test.ts

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eva/merge-upstream-v0.36.5.0

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge upstream GBrain v0.36.5 while preserving Eva OpenClaw defaults

3 participants