feat(state): port gro to cli-common state components [MON-5371] by rianjs · Pull Request #134 · open-cli-collective/google-readonly

rianjs · 2026-05-19T22:18:22Z

Closes #131. Implements the gro half of working-with-state.md §6.4 unit 3:
config-dir resolver via cli-common/statedir; atomic config writes; one-time
copy-leave-old relocation on macOS/Windows; cli-common/cache envelope with
hard-coded per-resource TTL; removal of the user-configurable cache TTL knob
and the `gro config cache` subcommand.

§3.2 acceptance matrix (per merge gate)

Config policy: copy-leave-old at `gro init`; runtime mutation-free w/
ErrRelocationConflict soft-fallback (`LoadConfigForRuntime`); divergent
old/new fails loud naming both paths and mutating nothing. Linux: zero-byte
op (old path ≡ new path). `token.json` is excluded from the copy and
handled exclusively by the existing §1.8 token migrator, which now probes
both old and new paths with full conflict semantics.

Cache policy: drop-on-floor. The new cli-common envelope schema makes
the old `DriveCache` shape an identity/version-mismatch miss; a malformed
envelope is mapped to a miss inside gro's wrapper (cache is disposable per §4).

Case	Config	Cache
old-only	TestRelocate_OldOnly_CopiedAtInit	TestMigrateLegacyCacheDir/carries warm cache
new-only	TestRelocate_NewOnly_LeftUntouched	TestCache_GetSetDrives
same value both	TestRelocate_Equal_NoOp	TestCache_CorruptedCache/old DriveCache shape → miss
conflicting	TestRelocate_Divergent_FailLoudNamesBothPaths_MutatesNothing	(disposable: schema-mismatch → miss; covered above)
malformed old	TestRelocate_MalformedOld_FailLoud	TestMigrateLegacyCacheDir/unreadable legacy preserved
malformed new	TestRelocate_MalformedNew_FailLoud	TestCache_CorruptedCache/malformed JSON treated as miss
neither	TestRelocate_Neither_PathResolvedNotCreated	TestCache_GetSetDrives/returns nil for missing
no real-dir writes	statedirtest.Hermetic everywhere	statedirtest.Hermetic

Surfaces

Config dir resolver: `configDirPath()` → `statedir.Scope{Name:"google-readonly"}.ConfigDir()/ConfigDirEnsured()`.
Atomic SaveConfig: `os.CreateTemp` + `Chmod 0600` + `Rename`.
Relocation gate: new `internal/config/relocate.go` — `DetectConfigRelocation` (mutation-free), `ApplyConfigRelocation` (whole-dir copy minus token.json, atomic per-file, idempotent), `LoadConfigForRuntime` (soft-conflict wrapper for non-init callers). Init wires the gate ahead of `EnsureMigrated`.
Cache adoption: `internal/cache/cache.go` now wraps `cli-common/cache.Locator{Root: statedir.Cache{Tool:"google-readonly"}.CacheDir(), InstanceKey:"default"}` with `ReadResource`/`WriteResource` and hard-coded 24h TTL for drives.
Drop user-TTL: `Config.CacheTTLHours`, `DefaultCacheTTLHours`, `GetCacheTTL{,Hours}`, `Prompter.AskCacheTTL`, init step 6 TTL block — gone. Older config.yml with `cache_ttl_hours:` still loads (yaml ignores unknowns).
Remove cache subcommand: deleted `gro config cache {show,clear,ttl}` (clearing folds into `gro config clear --all`).
Hermetic test helper: `credtest.Setup` delegates to `statedirtest.Hermetic` (full 7-var set — closes a Windows real-dir leak). New `credtest.ConfigDir(t)` for tests that need the resolved dir.

Token-secret hygiene

`ApplyConfigRelocation` deliberately excludes `token.json` — copying it would
leave a stale plaintext at the old path (the exact class MON-5370 guarded
against). Instead, `keychain/migrate.go` `discover()` was extended to
enumerate the old hand-rolled token path alongside the new one. Conflict
detection (`planMigration`) treats divergent old/new tokens as a §1.8
conflict and refuses to migrate without `--overwrite` — no side-channel scrub
can race ahead of the planner.

Out of scope

Secrets/keyring storage (location-independent via cli-common/credstore).
Cutting the cli-common semver tag (blocked on MON-5375).
Pre-B2b `/cache/` migrator stays — its purpose is location, not
schema, and it's still useful for installs that pre-date B2b.

Verification

`make test` ✅ 1368 tests pass.
`make lint` ✅ 0 issues.
Codex architect review (session `019e4019`, reused across the whole INT-310
state-port batch) converged at blockers=0 majors=0 minors=0 nits=1 (cosmetic).

Adopt cli-common/statedir for the config-dir resolver (native per OS: $XDG_CONFIG_HOME→Linux, ~/Library/Application Support→macOS, %APPDATA%→Windows; relative $XDG_CONFIG_HOME now errors per the §1.1 intentional tightening). Make SaveConfig atomic via os.CreateTemp + rename at 0600 under a 0700 dir. Add a macOS/Windows config-relocation gate (copy-leave-old; fail loud on divergence; mutation-free runtime read-fallback via the new LoadConfigForRuntime wrapper used by non-init callers). The gate runs ahead of EnsureMigrated in `gro init` so divergent old/new can't be silently papered over. Whole-dir copy excludes token.json (a secret); the existing §1.8 token migrator gains the old-hand-rolled token path as an additional legacy candidate with full conflict semantics (equal→migrate+delete-both; divergent→fail loud). Replace internal/cache with a thin wrapper over cli-common/cache: atomic temp+rename envelope, version-mismatch-as-miss, hard-coded per-resource TTL (§4.4). The pre-MON-5371 `cache_ttl_hours` config field, `DefaultCacheTTLHours`, GetCacheTTL{,Hours}, the AskCacheTTL prompt and `gro config cache {show,clear,ttl}` subcommand are removed (older config.yml that still carries the field loads cleanly — yaml ignores unknowns; clearing the cache folds into `gro config clear --all`). Switch internal/credtest to delegate state-dir isolation to cli-common/statedirtest (full 7-var set) — closes a pre-existing Windows real-dir leak (AppData/USERPROFILE/XDG_DATA_HOME were missing). Tests that hand-built `<tmp>/xdgconfig/<DirName>/…` paths now resolve via the resolver so they're correct on every OS. §3.2 acceptance matrix coverage: 8 cases × 2 surfaces (config in internal/config/relocate_test.go; cache in internal/cache/cache_test.go). Init reconcile coverage: detect-runs-before-migrate, divergent-aborts- mutates-nothing, copy-needed-triggers-apply. Closes #131 [MON-5371]

rianjs · 2026-05-19T22:22:25Z

Findings

Blocker: oauth_client_path is durable config but is excluded from relocation conflict detection.
internal/config/relocate.go:157-173 ignores OAuthClientPath entirely, even though internal/config/config.go documents it as an org/user override. That means old/new config.yml files with different explicit OAuth client paths are classified as equal and runtime silently uses the new side. The §3.2 conflicting-config row should fail loud for this. Treat old-default vs new-default as equivalent if needed, but explicit non-default divergence needs to conflict.

Blocker: old-only config.json installs are not relocated or read from the old hand-rolled dir.
detectRelocation only checks old/config.yml (internal/config/relocate.go:103-113). LoadConfig only falls back to old/config.yml (internal/config/config.go:247-274), and loadLegacyJSON reads config.json only from the new statedir dir (internal/config/config.go:285-300). A pre-port macOS/Windows user with only old config.json is treated like a fresh install, so the existing per-file migrators never see the old config. This breaks the “whole-dir copy so existing migrators still work” plan.

Major: pre-B2b cache relocation does not cover the old hand-rolled config dir.
LegacyCacheDir now resolves under the new statedir config dir (internal/config/config.go:145-153), while ApplyConfigRelocation skips directories (internal/config/relocate.go:208-211). So <old hand-rolled>/google-readonly/cache/drives.json is neither copied at init nor found by migrateLegacyCacheDir (internal/cache/cache.go:75-108). The PR matrix’s old-only cache carry test covers only the new-dir legacy-cache seam, not the macOS/Windows old-dir state.

Minor: the runtime soft-conflict test is a stub.
TestLoadConfigForRuntime_SoftConflict sets up old/new divergent configs but never calls LoadConfigForRuntime or asserts returned canonical config / nil error / warning behavior (internal/config/relocate_test.go:283-316). Given runtime soft fallback is a key architectural distinction from init fail-loud, this should be pinned.

rianjs · 2026-05-19T22:23:52Z

TDD coverage assessment

Verdict: Gaps

The relocation gate, cache adoption, and init ordering invariant are well-covered. Two genuine gaps where a regression could ship green:

Gap 1 — keychain `discover()` old-path candidate is untested

internal/keychain/migrate.go:243-253 adds a new OldHandRolledTokenPath() candidate with a dedupe against the new path. There are no tests in internal/keychain/ exercising discover() at all (no migrate_test.go), so:

The macOS/Windows "old != new emits both candidates" behavior is unverified.
The Linux "old == new must dedupe (no double candidate with identical bytes)" invariant is unverified.
Worse: a regression that drops the new branch entirely (leaving only the new statedir path) would not be caught — and that would silently strand pre-MON-5371 token.json files for upgraders on macOS/Windows, exactly the platforms this PR's relocation is targeted at.

Suggest a discover() test with the legacy-scan env vars set, two token.json files seeded under injected old/new paths, asserting candidate count and locations across the path-divergence and path-identity cases.

Gap 2 — `LoadConfigForRuntime` soft-conflict path doesn't actually assert behavior

relocate_test.go:283-316 (TestLoadConfigForRuntime_SoftConflict) sets up divergent old/new, then never calls LoadConfigForRuntime — it t.Skips on Linux and otherwise just falls off the end without invoking the SUT or checking the once-warn fires / canonical-new is returned. The soft-conflict semantic (warn once, return new cfg) is currently unproven; a regression that returned the old cfg, or panicked, or warned every call, would ship green.

Either inject a resolver seam so the case is exercised on Linux, or drop the test and add a unit test on warnReloConflictOnce plus a small refactor that makes the LoadConfigForRuntime branch table-testable without t.Setenv gymnastics.

Everything else (cache parse-error→miss, pre-5371 shape→miss, drop-on-floor Clear semantics, detect-runs-before-migrate ordering, copy-needed→apply ordering, idempotent apply, token.json excluded from copy) is adequately proven.

- relocate detector now treats `old/config.json` as a valid legacy source (was YAML-only); LoadConfig falls back to old/config.json when new/yml is absent. A pre-MON-5371 macOS/Windows user on the legacy JSON form is no longer treated as a fresh install. - configsMaterialEqual now compares `oauth_client_path` via oauthClientPathEquiv: each-side-at-its-own-default → equal (location artifact); explicit non-default divergence → fail loud. - migrateLegacyCacheDir now probes both the new statedir LegacyCacheDir and the old hand-rolled `<old>/google-readonly/cache/` (new export config.OldHandRolledLegacyCacheDir). Dedupes on Linux. - Replace the stubbed TestLoadConfigForRuntime_SoftConflict with two real assertions (soft-conflict returns canonical+nil-err; no-conflict pass-through) via a new testable seam. - Add TestMigrateTokenFile_OldHandRolledPath and TestMigrateTokenFile_OldAndNewDivergent for the new old-token enumeration; covers §1.8 equal-migrate and divergent-fail-loud paths. [MON-5371]

rianjs · 2026-05-19T22:29:47Z

Findings

Blocker: old-only malformed config still mutates before failing.
detectRelocation now recognizes old config.yml or config.json, but in the oldPresent && !newPresent branch it returns CopyNeeded without parsing the old file first (internal/config/relocate.go:121-128). gro init then applies the relocation before EnsureMigrated (internal/cmd/initcmd/init.go:290-299), so an old-only malformed YAML/JSON file gets copied into the new dir before LoadConfig later fails parsing it (internal/config/config.go:262-284). That breaks the §3.2 malformed-old row: fail loud, mutate nothing. Please parse oldPath before returning relocOldOnly, and add old-only malformed YAML + JSON tests asserting the new dir/file is not created.

Codex caught: detectRelocation's oldPresent+!newPresent branch was returning CopyNeeded=true without parsing the old file, so init's ApplyConfigRelocation would carry a malformed YAML/JSON into the new dir before LoadConfig later choked on it — that's a §3.2 row-5 (malformed old) regression: the contract is fail loud, mutate nothing. Parse the old file in the old-only branch and surface ErrRelocationConflict ahead of any copy. Add TestRelocate_OldOnlyMalformedYAML_FailLoud_MutatesNothing and TestRelocate_OldOnlyMalformedJSON_FailLoud_MutatesNothing asserting the new dir is not created and CopyNeeded is false. [MON-5371]

rianjs · 2026-05-20T00:42:43Z

No findings. The old-only malformed branch now validates before CopyNeeded, preserving the fail-loud/mutate-nothing invariant for both YAML and JSON. The prior oauth path, config.json, cache relocation, runtime soft-conflict, and token-source concerns also look aligned with the batch architecture.

Focused verification run: go test ./internal/config ./internal/cache ./internal/keychain passed.

monit-reviewer

Automated PR Review

Reviewed commit: 7ba3105

Summary

Reviewer	Findings
harness-engineering:harness-architecture-reviewer	3
harness-engineering:harness-enforcement-reviewer	3
harness-engineering:harness-knowledge-reviewer	1
harness-engineering:harness-self-documenting-code-reviewer	1

harness-engineering:harness-architecture-reviewer (3 findings)

💡 Suggestion - internal/config/relocate.go:325

reloConflictOnce is a package-level sync.Once that is never reset between test runs in the same binary. Any test that exercises LoadConfigForRuntime in a conflict state will exhaust the Once, silently suppressing the warning for all subsequent tests in the same binary. If a future test ever asserts on the stderr warning, it will flake depending on execution order. Consider a reset hook or injecting an io.Writer so tests can verify the behavior independently.

💡 Suggestion - internal/cache/cache.go:47

drivesTTL = "24h" is a string constant parsed via time.ParseDuration at every WriteResource call. A malformed string after a refactor surfaces only at the first cache write in production. Add a package-level init-time validation (e.g. panic on bad parse) or a TestMain assertion so the invariant is caught at test time.

💡 Suggestion - internal/cache/cache.go:170

nowFn is an unexported package-level variable mutated in tests without a t.Parallel() guard. If anyone adds t.Parallel() to the outer test or its subtests, writes to nowFn from concurrent subtests will be a data race caught by -race. Prefer passing a clock func through Cache at construction time rather than a mutable global.

harness-engineering:harness-enforcement-reviewer (3 findings)

⚠️ Should Fix - internal/config/relocate.go:229

configsMaterialEqual only checks Keyring.Backend. If KeyringConfig has additional fields (e.g. a passphrase env var for the file backend), two configs differing only on those fields would be classified as equal, silently ignoring a genuine user divergence. The equality check should use reflect.DeepEqual on the entire Keyring sub-struct, or explicitly enumerate every field.

⚠️ Should Fix - internal/config/config.go:248

When both old and new configs are present and the new config.yml is malformed, detectRelocation returns ErrRelocationConflict (wrapped), so LoadConfig sets relErr and continues. It then tries to re-parse the malformed YAML and returns a bare parse error. LoadConfigForRuntime does not recognize this as ErrRelocationConflict, so it propagates the hard parse error instead of soft-degrading. The test matrix covers detectRelocation in isolation but has no test exercising the full LoadConfig → LoadConfigForRuntime stack for the malformed-new, old-present scenario.

💡 Suggestion - internal/cache/cache.go:68

migrateLegacyCacheDir now probes two legacy locations: LegacyCacheDir (statedir-resolved) and OldHandRolledLegacyCacheDir (pre-MON-5371 hand-rolled). The second probe is the one that matters on a macOS/Windows install that pre-dates both B2b and MON-5371, yet cache_test.go has no subtest that seeds a warm cache exclusively at the OldHandRolledLegacyCacheDir path and verifies it is carried. The dual-probe logic is new risky code with only one probe tested.

harness-engineering:harness-knowledge-reviewer (1 findings)

💡 Suggestion - internal/cmd/initcmd/init.go:289

The nil guard if reloc.CopyNeeded && d.ApplyConfigRelocation != nil silently skips the copy when ApplyConfigRelocation is nil but DetectConfigRelocation signalled CopyNeeded. A test stub injecting only DetectConfigRelocation can hit this path and leave the config unrelocated with no error or log. Either remove the nil guard and require ApplyConfigRelocation in initDeps, or return an error when CopyNeeded is true and the func is nil.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/cmd/initcmd/init.go:394

The configExistedBefore variable and its initialization are dead code suppressed with _ = configExistedBefore rather than removed. The snapshot-before-save gate it guarded (AskCacheTTL) is gone; the variable, its assignment in runWith, and the parameter in finishExisting with its blank-identifier suppressor should all be deleted to avoid misleading future readers.

4 info-level observations excluded. Run with --verbose to include.

4 PR discussion threads considered.

Completed in 4m 06s | $1.84 | sonnet | daemon 0.2.120 | Glorfindel

Field	Value
Model	sonnet
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	4m 06s wall · 13m 04s compute (Reviewers: 3m 10s · Synthesis: 54s)
Cost	$1.84
Tokens	301.8k in / 45.1k out
Turns	6

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	33.1k	3.0k	0	33.1k (1h)	$0.18
harness-engineering:harness-architecture-reviewer	sonnet	53.9k	10.0k	2.1k	51.8k (1h)	$0.39
harness-engineering:harness-enforcement-reviewer	sonnet	53.9k	10.1k	2.1k	51.8k (1h)	$0.39
harness-engineering:harness-knowledge-reviewer	sonnet	53.9k	10.6k	2.1k	51.8k (1h)	$0.40
harness-engineering:harness-self-documenting-code-reviewer	sonnet	53.3k	7.6k	2.1k	51.2k (1h)	$0.35
security:security-code-auditor	haiku	53.6k	3.7k	0	53.6k (1h)	$0.13

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

- configsMaterialEqual now uses reflect.DeepEqual on the whole Keyring sub-struct, not just Backend, so any future KeyringConfig field is auto-covered as a divergence. - LoadConfig skips the new-dir re-parse and legacy-JSON re-read when relErr is already a detect-time ErrRelocationConflict (which includes the malformed-new-wrapped-in-conflict case). The wrapped conflict error wins, so LoadConfigForRuntime soft-degrades cleanly instead of hard-failing on a bare parse error. - Add cache_test subtest seeding a warm cache exclusively at the old hand-rolled legacy cache subdir and asserting the dual-probe carry + legacy-dir removal. - Delete configExistedBefore plumbing (variable + threading through tryExistingToken / finishExisting); the AskCacheTTL gate it guarded was removed in this same PR so the suppressor was dead code. [MON-5371]

Superseded by updated review

monit-reviewer

Automated PR Review

Reviewed commit: 36c1355 | Previous: 7ba3105 (incremental)

Summary

Reviewer	Findings
harness-engineering:harness-architecture-reviewer	1
harness-engineering:harness-enforcement-reviewer	1
harness-engineering:harness-self-documenting-code-reviewer	1

harness-engineering:harness-architecture-reviewer (1 findings)

⚠️ Should Fix - internal/config/config.go:219

LoadConfig doc comment says the canonical new-dir config is returned alongside ErrRelocationConflict so callers can soft-degrade, but the implementation never reads the new-dir config when relErr != nil — the read block is guarded by reloc.NewPath != "" && relErr == nil. LoadConfigForRuntime therefore returns an all-defaults *Config (empty OAuthClientPath, empty GrantedScopes) instead of the actual new-dir settings. Consequences: (1) users with a non-default oauth_client_path get silent auth failures ('unable to read OAuth client JSON') during the conflict window; (2) the stale-scope check in me.go is silently skipped (CheckScopesMigration returns '' for an empty slice), masking a re-auth requirement. Fix: attempt the new-dir read unconditionally, store it separately from the relErr gate, and return it alongside the conflict error.

harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/config/relocate.go:225

oauthClientPathEquiv compares OAuthClientPath values from loadConfigFromFile, which deliberately does not call applyDefaults (and therefore does not call ExpandPath). A user whose old config.yml has oauth_client_path: ~/oauth_client.json and whose new config.yml has the same value as a fully-expanded absolute path will see a false ErrRelocationConflict because the tilde form and the expanded form differ as strings. Both sides should be normalized (via ExpandPath or equivalent) before the equality comparison.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/cache/cache.go:96

Cache.Clear() calls os.RemoveAll(c.loc.Root), which removes the entire tool-level cache root (os.UserCacheDir()/google-readonly), not just the current instance subtree. If a future cli-common Locator introduces a second instance key, any Clear() call silently evicts all instances. The safer scope is os.RemoveAll(filepath.Join(c.loc.Root, c.loc.InstanceKey)) to match only the instance being managed.

12 PR discussion threads considered.

Completed in 4m 03s | $1.72 | sonnet | daemon 0.2.120 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 2 · Session resumed
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	4m 03s wall · 10m 31s compute (Reviewers: 2m 49s · Synthesis: 47s)
Cost	$1.72
Tokens	334.0k in / 34.5k out
Turns	6

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	42.2k	2.6k	13.6k	28.5k (1h)	$0.15
harness-engineering:harness-architecture-reviewer	sonnet	54.0k	8.5k	2.1k	51.9k (1h)	$0.37
harness-engineering:harness-enforcement-reviewer	sonnet	54.0k	7.2k	2.1k	51.9k (1h)	$0.35
harness-engineering:harness-knowledge-reviewer	sonnet	54.0k	6.2k	2.1k	51.9k (1h)	$0.33
harness-engineering:harness-self-documenting-code-reviewer	sonnet	53.3k	7.2k	2.1k	51.2k (1h)	$0.35
security:security-code-auditor	haiku	53.6k	1.9k	0	53.6k (1h)	$0.12
discussion-summarizer	—	23.0k	828	13.6k	9.3k (1h)	$0.05

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

- LoadConfig: attempt new-dir read unconditionally so callers soft-degrading via LoadConfigForRuntime get the user's actual settings (OAuthClientPath, GrantedScopes, …) alongside ErrRelocationConflict instead of an empty Config. The relErr-set branch still suppresses bare parse-error propagation so the soft-degrade contract holds. - oauthClientPathEquiv: ExpandPath both sides before comparing so tilde vs fully-expanded forms don't false-positive as divergent. - Cache.Clear: scope to <Root>/<InstanceKey> instead of the tool-level Root so a future multi-instance Locator can't have one instance's Clear evict the others. [MON-5371]

Superseded by updated review

monit-reviewer

Automated PR Review

Reviewed commit: 94f60bb | Previous: 36c1355 (incremental)

Approved with 6 non-blocking suggestions below. Address at your discretion.

Summary

Reviewer	Findings
harness-engineering:harness-architecture-reviewer	1
harness-engineering:harness-enforcement-reviewer	2
harness-engineering:harness-knowledge-reviewer	2
harness-engineering:harness-self-documenting-code-reviewer	1

harness-engineering:harness-architecture-reviewer (1 findings)

💡 Suggestion - internal/cache/cache.go:88

GetDrives catches json.SyntaxError and json.UnmarshalTypeError via errors.As as cache misses, but not json.InvalidUnmarshalError or io.ErrUnexpectedEOF (which cli-common/cache.ReadResource may wrap around a truncated read). A partially-written cache file producing one of these unwrapped error types would propagate to the caller instead of being demoted to a miss, breaking the 'corrupt → miss' contract. Consider widening the catch or adding a comment that names the specific cli-common wrapping guarantees this relies on.

harness-engineering:harness-enforcement-reviewer (2 findings)

💡 Suggestion - internal/config/relocate_test.go:287

TestRelocate_MalformedOld_FailLoud seeds both old and new dirs with config files (malformed old, valid new), so it exercises the both-present branch in detectRelocation. The old-only-malformed code path (oldPresent && !newPresent with invalid YAML) is handled in detectRelocation but has no dedicated test variant. The acceptance matrix lists 'malformed old' as a covered case, but this path is untested.

💡 Suggestion - internal/config/config.go:219

LoadConfig returns a non-nil *Config alongside a non-nil error (ErrRelocationConflict), intentionally breaking Go's convention that when err != nil the primary return value is unusable. Future callers of the exported LoadConfig who follow the standard if err != nil { return err } pattern will silently discard the valid config and lose the soft-degrade semantics. Making LoadConfig unexported and exposing only LoadConfigForRuntime as the public runtime entry point would eliminate this footgun.

harness-engineering:harness-knowledge-reviewer (2 findings)

💡 Suggestion - internal/config/relocate.go:182

When oldPresent && !newPresent and the old config is malformed, detectRelocation returns Kind: relocBothDivergent rather than a distinct kind. LoadConfig's read-priority logic only special-cases relocOldOnly for reading the old file; relocBothDivergent is not in that branch, so a user with a malformed old-only config ends up with all-defaults returned by LoadConfigForRuntime instead of a clear parse error surfaced to the user. A dedicated relocMalformedOld kind (or relocOldOnly with CopyNeeded:false and the parse error) would make the behavior both correct and self-documenting.

💡 Suggestion - internal/cache/cache.go:170

GetDir() returns c.loc.Root (the tool-level cache root: /google-readonly), not the per-instance directory where files are actually written (/). TestCache_Clear confirms real data lives one level deeper than GetDir() reports. A future caller using GetDir() to construct data paths will look in the wrong place. Either return filepath.Join(c.loc.Root, c.loc.InstanceKey) or rename to GetRootDir and add a doc comment clarifying this is a tool-root accessor.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/config/config.go:258

Double-failure path: when relErr is ErrRelocationConflict and the new config's YAML is also malformed, LoadConfig resets cfg to &Config{} and returns (empty-defaults, relErr). LoadConfigForRuntime then swallows relErr and returns those empty defaults with nil error. Downstream callers receive a wrong OAuthClientPath (empty → default, possibly non-existent) and empty GrantedScopes with no error surfaced. The code comment acknowledges the trade-off, but the consequence — silently running with no config on a double-failure — merits an explicit note on LoadConfigForRuntime's doc comment so callers know this scenario exists.

3 info-level observations excluded. Run with --verbose to include.

15 PR discussion threads considered.

Completed in 6m 15s | $1.97 | sonnet | daemon 0.2.120 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 3 · Session resumed
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	6m 15s wall · 16m 02s compute (Reviewers: 4m 14s · Synthesis: 1m 47s)
Cost	$1.97
Tokens	342.4k in / 53.1k out
Turns	6

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	52.2k	5.7k	13.6k	38.5k (1h)	$0.23
harness-engineering:harness-architecture-reviewer	sonnet	53.9k	5.7k	2.1k	51.8k (1h)	$0.33
harness-engineering:harness-enforcement-reviewer	sonnet	53.9k	13.0k	2.1k	51.8k (1h)	$0.44
harness-engineering:harness-knowledge-reviewer	sonnet	53.9k	10.4k	2.1k	51.8k (1h)	$0.40
harness-engineering:harness-self-documenting-code-reviewer	sonnet	53.3k	10.4k	2.1k	51.1k (1h)	$0.39
security:security-code-auditor	haiku	53.5k	7.6k	0	53.5k (1h)	$0.15
discussion-summarizer	—	21.7k	345	13.6k	8.1k (1h)	$0.04

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

monit-reviewer · 2026-05-20T01:02:28Z

+}
+
+// configsMaterialEqual compares the user-meaningful subset of two Configs.
+// CredentialRef / GrantedScopes / Keyring.Backend are real user settings and


🔵 Low (harness-engineering:harness-knowledge-reviewer): When oldPresent && !newPresent and the old config is malformed, detectRelocation returns Kind: relocBothDivergent rather than a distinct kind. LoadConfig's read-priority logic only special-cases relocOldOnly for reading the old file; relocBothDivergent is not in that branch, so a user with a malformed old-only config ends up with all-defaults returned by LoadConfigForRuntime instead of a clear parse error surfaced to the user. A dedicated relocMalformedOld kind (or relocOldOnly with CopyNeeded:false and the parse error) would make the behavior both correct and self-documenting.

Reply to this thread when addressed.

monit-reviewer · 2026-05-20T01:02:28Z

+	oldDir, newDir := reloctest(t)
+	if err := os.MkdirAll(oldDir, 0o700); err != nil {
+		t.Fatal(err)
+	}


🔵 Low (harness-engineering:harness-enforcement-reviewer): TestRelocate_MalformedOld_FailLoud seeds both old and new dirs with config files (malformed old, valid new), so it exercises the both-present branch in detectRelocation. The old-only-malformed code path (oldPresent && !newPresent with invalid YAML) is handled in detectRelocation but has no dedicated test variant. The acceptance matrix lists 'malformed old' as a covered case, but this path is untested.

Reply to this thread when addressed.

monit-reviewer · 2026-05-20T01:02:28Z

+		config.OldHandRolledLegacyCacheDir,
+	}
+	tried := map[string]bool{}
+	for _, p := range probes {


🔵 Low (harness-engineering:harness-architecture-reviewer): GetDrives catches json.SyntaxError and json.UnmarshalTypeError via errors.As as cache misses, but not json.InvalidUnmarshalError or io.ErrUnexpectedEOF (which cli-common/cache.ReadResource may wrap around a truncated read). A partially-written cache file producing one of these unwrapped error types would propagate to the caller instead of being demoted to a miss, breaking the 'corrupt → miss' contract. Consider widening the catch or adding a comment that names the specific cli-common wrapping guarantees this relies on.

Reply to this thread when addressed.

monit-reviewer · 2026-05-20T01:02:28Z


-// GetDir returns the cache directory path
+// GetDir returns the cache directory path.
 func (c *Cache) GetDir() string {


🔵 Low (harness-engineering:harness-knowledge-reviewer): GetDir() returns c.loc.Root (the tool-level cache root: /google-readonly), not the per-instance directory where files are actually written (/). TestCache_Clear confirms real data lives one level deeper than GetDir() reports. A future caller using GetDir() to construct data paths will look in the wrong place. Either return filepath.Join(c.loc.Root, c.loc.InstanceKey) or rename to GetRootDir and add a doc comment clarifying this is a tool-root accessor.

Reply to this thread when addressed.

monit-reviewer · 2026-05-20T01:02:28Z

+	// error: the wrapped conflict already names both paths and the cause,
+	// and a hard-fail here would defeat the soft-degrade contract.
+	cfg := &Config{}
+	read := false


🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): Double-failure path: when relErr is ErrRelocationConflict and the new config's YAML is also malformed, LoadConfig resets cfg to &Config{} and returns (empty-defaults, relErr). LoadConfigForRuntime then swallows relErr and returns those empty defaults with nil error. Downstream callers receive a wrong OAuthClientPath (empty → default, possibly non-existent) and empty GrantedScopes with no error surfaced. The code comment acknowledges the trade-off, but the consequence — silently running with no config on a double-failure — merits an explicit note on LoadConfigForRuntime's doc comment so callers know this scenario exists.

Reply to this thread when addressed.

monit-reviewer · 2026-05-20T01:02:28Z

@@ -226,33 +219,98 @@ func ShortenPath(path string) string {
 	return path


🔵 Low (harness-engineering:harness-enforcement-reviewer): LoadConfig returns a non-nil *Config alongside a non-nil error (ErrRelocationConflict), intentionally breaking Go's convention that when err != nil the primary return value is unusable. Future callers of the exported LoadConfig who follow the standard if err != nil { return err } pattern will silently discard the valid config and lose the soft-degrade semantics. Making LoadConfig unexported and exposing only LoadConfigForRuntime as the public runtime entry point would eliminate this footgun.

Reply to this thread when addressed.

rianjs · 2026-05-20T01:04:06Z

Findings

Blocker: malformed durable config still soft-degrades to defaults on runtime paths.
DetectConfigRelocation correctly returns ErrRelocationConflict for malformed old/new configs (internal/config/relocate.go:131-149), but LoadConfig now suppresses canonical read/parse failures whenever relErr is set and leaves cfg empty (internal/config/config.go:262-279). LoadConfigForRuntime then suppresses every ErrRelocationConflict and returns that defaulted config with nil error (internal/config/relocate.go:335-340). So a malformed new config, or old-only malformed config, becomes “warn + use defaults” for non-init commands. That violates the durable-state §3.2 malformed-old/malformed-new fail-loud contract and can silently switch credential_ref back to google-readonly/default.

I’d distinguish parse/corrupt relocation errors from parseable old/new divergence, or only soft-suppress when the canonical new config was successfully read.

Codex final-pass blocker: LoadConfig was suppressing canonical read failures whenever relErr was set, returning all-defaults cfg + nil through LoadConfigForRuntime. A malformed new config (under a both-present detect-time conflict) would therefore silently warn-and- default, swapping CredentialRef back to google-readonly/default for non-init commands and masking the corrupt file. LoadConfig now tracks whether a canonical config was actually read; if relErr is set AND no read succeeded, return (nil, relErr) instead of (empty cfg, relErr). LoadConfigForRuntime then only soft-degrades when cfg is non-nil, so the runtime command hard-fails on a malformed canonical config and the user sees the conflict instead of silent default swap. Added TestLoadConfigForRuntime_MalformedCanonicalUnderConflict_HardFails pinning the new contract via the testable seam (which now mirrors the real LoadConfig + LoadConfigForRuntime branching). [MON-5371]

rianjs · 2026-05-20T01:06:31Z

No findings. The final changes line up with the intended split: init remains fail-loud and mutation-gated, runtime only soft-degrades when it has a successfully read canonical config, malformed canonical config hard-fails, token handling stays in the §1.8 migrator, and cache behavior remains disposable and scoped.

Focused verification passed: go test ./internal/config ./internal/cache ./internal/keychain.

monit-reviewer previously requested changes May 20, 2026

View reviewed changes

Comment thread internal/config/config.go

Comment thread internal/config/relocate.go

Comment thread internal/cache/cache.go

monit-reviewer approved these changes May 20, 2026

View reviewed changes

rianjs merged commit 1f472eb into main May 20, 2026
2 checks passed

rianjs deleted the ktlo/MON-5371-cli-common-state-port branch May 20, 2026 01:07

This was referenced May 21, 2026

config clear --all removes the legacy Drive cache but not the legacy config dir #137

Closed

fix(config): clear --all removes legacy config file, not just legacy cache #138

Merged

		@@ -226,33 +219,98 @@ func ShortenPath(path string) string {
		return path

Conversation

rianjs commented May 19, 2026

§3.2 acceptance matrix (per merge gate)

Surfaces

Token-secret hygiene

Out of scope

Verification

Uh oh!

rianjs commented May 19, 2026

Findings

Uh oh!

rianjs commented May 19, 2026

TDD coverage assessment

Gap 1 — keychain discover() old-path candidate is untested

Gap 2 — LoadConfigForRuntime soft-conflict path doesn't actually assert behavior

Uh oh!

rianjs commented May 19, 2026

Findings

Uh oh!

rianjs commented May 20, 2026

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

monit-reviewer May 20, 2026

Choose a reason for hiding this comment

Uh oh!

rianjs commented May 20, 2026

Findings

Uh oh!

rianjs commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gap 1 — keychain `discover()` old-path candidate is untested

Gap 2 — `LoadConfigForRuntime` soft-conflict path doesn't actually assert behavior