Skip to content

feat(state): port gro to cli-common state components [MON-5371]#134

Merged
rianjs merged 6 commits into
mainfrom
ktlo/MON-5371-cli-common-state-port
May 20, 2026
Merged

feat(state): port gro to cli-common state components [MON-5371]#134
rianjs merged 6 commits into
mainfrom
ktlo/MON-5371-cli-common-state-port

Conversation

@rianjs
Copy link
Copy Markdown
Contributor

@rianjs rianjs commented May 19, 2026

Closes #131. Implements the gro half of working-with-state.md §6.4 unit 3:
config-dir resolver via cli-common/statedir; atomic config writes; one-time
copy-leave-old relocation on macOS/Windows; cli-common/cache envelope with
hard-coded per-resource TTL; removal of the user-configurable cache TTL knob
and the `gro config cache` subcommand.

§3.2 acceptance matrix (per merge gate)

Config policy: copy-leave-old at `gro init`; runtime mutation-free w/
ErrRelocationConflict soft-fallback (`LoadConfigForRuntime`); divergent
old/new fails loud naming both paths and mutating nothing. Linux: zero-byte
op (old path ≡ new path). `token.json` is excluded from the copy and
handled exclusively by the existing §1.8 token migrator, which now probes
both old and new paths with full conflict semantics.

Cache policy: drop-on-floor. The new cli-common envelope schema makes
the old `DriveCache` shape an identity/version-mismatch miss; a malformed
envelope is mapped to a miss inside gro's wrapper (cache is disposable per §4).

Case Config Cache
old-only TestRelocate_OldOnly_CopiedAtInit TestMigrateLegacyCacheDir/carries warm cache
new-only TestRelocate_NewOnly_LeftUntouched TestCache_GetSetDrives
same value both TestRelocate_Equal_NoOp TestCache_CorruptedCache/old DriveCache shape → miss
conflicting TestRelocate_Divergent_FailLoudNamesBothPaths_MutatesNothing (disposable: schema-mismatch → miss; covered above)
malformed old TestRelocate_MalformedOld_FailLoud TestMigrateLegacyCacheDir/unreadable legacy preserved
malformed new TestRelocate_MalformedNew_FailLoud TestCache_CorruptedCache/malformed JSON treated as miss
neither TestRelocate_Neither_PathResolvedNotCreated TestCache_GetSetDrives/returns nil for missing
no real-dir writes statedirtest.Hermetic everywhere statedirtest.Hermetic

Surfaces

  1. Config dir resolver: `configDirPath()` → `statedir.Scope{Name:"google-readonly"}.ConfigDir()/ConfigDirEnsured()`.
  2. Atomic SaveConfig: `os.CreateTemp` + `Chmod 0600` + `Rename`.
  3. Relocation gate: new `internal/config/relocate.go` — `DetectConfigRelocation` (mutation-free), `ApplyConfigRelocation` (whole-dir copy minus token.json, atomic per-file, idempotent), `LoadConfigForRuntime` (soft-conflict wrapper for non-init callers). Init wires the gate ahead of `EnsureMigrated`.
  4. Cache adoption: `internal/cache/cache.go` now wraps `cli-common/cache.Locator{Root: statedir.Cache{Tool:"google-readonly"}.CacheDir(), InstanceKey:"default"}` with `ReadResource`/`WriteResource` and hard-coded 24h TTL for drives.
  5. Drop user-TTL: `Config.CacheTTLHours`, `DefaultCacheTTLHours`, `GetCacheTTL{,Hours}`, `Prompter.AskCacheTTL`, init step 6 TTL block — gone. Older config.yml with `cache_ttl_hours:` still loads (yaml ignores unknowns).
  6. Remove cache subcommand: deleted `gro config cache {show,clear,ttl}` (clearing folds into `gro config clear --all`).
  7. Hermetic test helper: `credtest.Setup` delegates to `statedirtest.Hermetic` (full 7-var set — closes a Windows real-dir leak). New `credtest.ConfigDir(t)` for tests that need the resolved dir.

Token-secret hygiene

`ApplyConfigRelocation` deliberately excludes `token.json` — copying it would
leave a stale plaintext at the old path (the exact class MON-5370 guarded
against). Instead, `keychain/migrate.go` `discover()` was extended to
enumerate the old hand-rolled token path alongside the new one. Conflict
detection (`planMigration`) treats divergent old/new tokens as a §1.8
conflict and refuses to migrate without `--overwrite` — no side-channel scrub
can race ahead of the planner.

Out of scope

  • Secrets/keyring storage (location-independent via cli-common/credstore).
  • Cutting the cli-common semver tag (blocked on MON-5375).
  • Pre-B2b `/cache/` migrator stays — its purpose is location, not
    schema, and it's still useful for installs that pre-date B2b.

Verification

  • `make test` ✅ 1368 tests pass.
  • `make lint` ✅ 0 issues.
  • Codex architect review (session `019e4019`, reused across the whole INT-310
    state-port batch) converged at blockers=0 majors=0 minors=0 nits=1 (cosmetic).

Adopt cli-common/statedir for the config-dir resolver (native per OS:
$XDG_CONFIG_HOME→Linux, ~/Library/Application Support→macOS,
%APPDATA%→Windows; relative $XDG_CONFIG_HOME now errors per the §1.1
intentional tightening). Make SaveConfig atomic via os.CreateTemp +
rename at 0600 under a 0700 dir.

Add a macOS/Windows config-relocation gate (copy-leave-old; fail loud
on divergence; mutation-free runtime read-fallback via the new
LoadConfigForRuntime wrapper used by non-init callers). The gate runs
ahead of EnsureMigrated in `gro init` so divergent old/new can't be
silently papered over. Whole-dir copy excludes token.json (a secret);
the existing §1.8 token migrator gains the old-hand-rolled token path
as an additional legacy candidate with full conflict semantics
(equal→migrate+delete-both; divergent→fail loud).

Replace internal/cache with a thin wrapper over cli-common/cache:
atomic temp+rename envelope, version-mismatch-as-miss, hard-coded
per-resource TTL (§4.4). The pre-MON-5371 `cache_ttl_hours` config
field, `DefaultCacheTTLHours`, GetCacheTTL{,Hours}, the AskCacheTTL
prompt and `gro config cache {show,clear,ttl}` subcommand are removed
(older config.yml that still carries the field loads cleanly — yaml
ignores unknowns; clearing the cache folds into `gro config clear
--all`).

Switch internal/credtest to delegate state-dir isolation to
cli-common/statedirtest (full 7-var set) — closes a pre-existing
Windows real-dir leak (AppData/USERPROFILE/XDG_DATA_HOME were missing).
Tests that hand-built `<tmp>/xdgconfig/<DirName>/…` paths now resolve
via the resolver so they're correct on every OS.

§3.2 acceptance matrix coverage: 8 cases × 2 surfaces (config in
internal/config/relocate_test.go; cache in internal/cache/cache_test.go).
Init reconcile coverage: detect-runs-before-migrate, divergent-aborts-
mutates-nothing, copy-needed-triggers-apply.

Closes #131
[MON-5371]
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 19, 2026

Findings

Blocker: oauth_client_path is durable config but is excluded from relocation conflict detection.
internal/config/relocate.go:157-173 ignores OAuthClientPath entirely, even though internal/config/config.go documents it as an org/user override. That means old/new config.yml files with different explicit OAuth client paths are classified as equal and runtime silently uses the new side. The §3.2 conflicting-config row should fail loud for this. Treat old-default vs new-default as equivalent if needed, but explicit non-default divergence needs to conflict.

Blocker: old-only config.json installs are not relocated or read from the old hand-rolled dir.
detectRelocation only checks old/config.yml (internal/config/relocate.go:103-113). LoadConfig only falls back to old/config.yml (internal/config/config.go:247-274), and loadLegacyJSON reads config.json only from the new statedir dir (internal/config/config.go:285-300). A pre-port macOS/Windows user with only old config.json is treated like a fresh install, so the existing per-file migrators never see the old config. This breaks the “whole-dir copy so existing migrators still work” plan.

Major: pre-B2b cache relocation does not cover the old hand-rolled config dir.
LegacyCacheDir now resolves under the new statedir config dir (internal/config/config.go:145-153), while ApplyConfigRelocation skips directories (internal/config/relocate.go:208-211). So <old hand-rolled>/google-readonly/cache/drives.json is neither copied at init nor found by migrateLegacyCacheDir (internal/cache/cache.go:75-108). The PR matrix’s old-only cache carry test covers only the new-dir legacy-cache seam, not the macOS/Windows old-dir state.

Minor: the runtime soft-conflict test is a stub.
TestLoadConfigForRuntime_SoftConflict sets up old/new divergent configs but never calls LoadConfigForRuntime or asserts returned canonical config / nil error / warning behavior (internal/config/relocate_test.go:283-316). Given runtime soft fallback is a key architectural distinction from init fail-loud, this should be pinned.

@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 19, 2026

TDD coverage assessment

Verdict: Gaps

The relocation gate, cache adoption, and init ordering invariant are well-covered. Two genuine gaps where a regression could ship green:

Gap 1 — keychain discover() old-path candidate is untested

internal/keychain/migrate.go:243-253 adds a new OldHandRolledTokenPath() candidate with a dedupe against the new path. There are no tests in internal/keychain/ exercising discover() at all (no migrate_test.go), so:

  • The macOS/Windows "old != new emits both candidates" behavior is unverified.
  • The Linux "old == new must dedupe (no double candidate with identical bytes)" invariant is unverified.
  • Worse: a regression that drops the new branch entirely (leaving only the new statedir path) would not be caught — and that would silently strand pre-MON-5371 token.json files for upgraders on macOS/Windows, exactly the platforms this PR's relocation is targeted at.

Suggest a discover() test with the legacy-scan env vars set, two token.json files seeded under injected old/new paths, asserting candidate count and locations across the path-divergence and path-identity cases.

Gap 2 — LoadConfigForRuntime soft-conflict path doesn't actually assert behavior

relocate_test.go:283-316 (TestLoadConfigForRuntime_SoftConflict) sets up divergent old/new, then never calls LoadConfigForRuntime — it t.Skips on Linux and otherwise just falls off the end without invoking the SUT or checking the once-warn fires / canonical-new is returned. The soft-conflict semantic (warn once, return new cfg) is currently unproven; a regression that returned the old cfg, or panicked, or warned every call, would ship green.

Either inject a resolver seam so the case is exercised on Linux, or drop the test and add a unit test on warnReloConflictOnce plus a small refactor that makes the LoadConfigForRuntime branch table-testable without t.Setenv gymnastics.

Everything else (cache parse-error→miss, pre-5371 shape→miss, drop-on-floor Clear semantics, detect-runs-before-migrate ordering, copy-needed→apply ordering, idempotent apply, token.json excluded from copy) is adequately proven.

- relocate detector now treats `old/config.json` as a valid legacy source
  (was YAML-only); LoadConfig falls back to old/config.json when new/yml
  is absent. A pre-MON-5371 macOS/Windows user on the legacy JSON form is
  no longer treated as a fresh install.
- configsMaterialEqual now compares `oauth_client_path` via
  oauthClientPathEquiv: each-side-at-its-own-default → equal (location
  artifact); explicit non-default divergence → fail loud.
- migrateLegacyCacheDir now probes both the new statedir LegacyCacheDir
  and the old hand-rolled `<old>/google-readonly/cache/` (new export
  config.OldHandRolledLegacyCacheDir). Dedupes on Linux.
- Replace the stubbed TestLoadConfigForRuntime_SoftConflict with two
  real assertions (soft-conflict returns canonical+nil-err;
  no-conflict pass-through) via a new testable seam.
- Add TestMigrateTokenFile_OldHandRolledPath and
  TestMigrateTokenFile_OldAndNewDivergent for the new old-token
  enumeration; covers §1.8 equal-migrate and divergent-fail-loud paths.

[MON-5371]
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 19, 2026

Findings

Blocker: old-only malformed config still mutates before failing.
detectRelocation now recognizes old config.yml or config.json, but in the oldPresent && !newPresent branch it returns CopyNeeded without parsing the old file first (internal/config/relocate.go:121-128). gro init then applies the relocation before EnsureMigrated (internal/cmd/initcmd/init.go:290-299), so an old-only malformed YAML/JSON file gets copied into the new dir before LoadConfig later fails parsing it (internal/config/config.go:262-284). That breaks the §3.2 malformed-old row: fail loud, mutate nothing. Please parse oldPath before returning relocOldOnly, and add old-only malformed YAML + JSON tests asserting the new dir/file is not created.

Codex caught: detectRelocation's oldPresent+!newPresent branch was
returning CopyNeeded=true without parsing the old file, so init's
ApplyConfigRelocation would carry a malformed YAML/JSON into the new
dir before LoadConfig later choked on it — that's a §3.2 row-5 (malformed
old) regression: the contract is fail loud, mutate nothing.

Parse the old file in the old-only branch and surface
ErrRelocationConflict ahead of any copy. Add
TestRelocate_OldOnlyMalformedYAML_FailLoud_MutatesNothing and
TestRelocate_OldOnlyMalformedJSON_FailLoud_MutatesNothing asserting
the new dir is not created and CopyNeeded is false.

[MON-5371]
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 20, 2026

No findings. The old-only malformed branch now validates before CopyNeeded, preserving the fail-loud/mutate-nothing invariant for both YAML and JSON. The prior oauth path, config.json, cache relocation, runtime soft-conflict, and token-source concerns also look aligned with the batch architecture.

Focused verification run: go test ./internal/config ./internal/cache ./internal/keychain passed.

Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 7ba3105

Summary

Reviewer Findings
harness-engineering:harness-architecture-reviewer 3
harness-engineering:harness-enforcement-reviewer 3
harness-engineering:harness-knowledge-reviewer 1
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-architecture-reviewer (3 findings)

💡 Suggestion - internal/config/relocate.go:325

reloConflictOnce is a package-level sync.Once that is never reset between test runs in the same binary. Any test that exercises LoadConfigForRuntime in a conflict state will exhaust the Once, silently suppressing the warning for all subsequent tests in the same binary. If a future test ever asserts on the stderr warning, it will flake depending on execution order. Consider a reset hook or injecting an io.Writer so tests can verify the behavior independently.

💡 Suggestion - internal/cache/cache.go:47

drivesTTL = "24h" is a string constant parsed via time.ParseDuration at every WriteResource call. A malformed string after a refactor surfaces only at the first cache write in production. Add a package-level init-time validation (e.g. panic on bad parse) or a TestMain assertion so the invariant is caught at test time.

💡 Suggestion - internal/cache/cache.go:170

nowFn is an unexported package-level variable mutated in tests without a t.Parallel() guard. If anyone adds t.Parallel() to the outer test or its subtests, writes to nowFn from concurrent subtests will be a data race caught by -race. Prefer passing a clock func through Cache at construction time rather than a mutable global.

harness-engineering:harness-enforcement-reviewer (3 findings)

⚠️ Should Fix - internal/config/relocate.go:229

configsMaterialEqual only checks Keyring.Backend. If KeyringConfig has additional fields (e.g. a passphrase env var for the file backend), two configs differing only on those fields would be classified as equal, silently ignoring a genuine user divergence. The equality check should use reflect.DeepEqual on the entire Keyring sub-struct, or explicitly enumerate every field.

⚠️ Should Fix - internal/config/config.go:248

When both old and new configs are present and the new config.yml is malformed, detectRelocation returns ErrRelocationConflict (wrapped), so LoadConfig sets relErr and continues. It then tries to re-parse the malformed YAML and returns a bare parse error. LoadConfigForRuntime does not recognize this as ErrRelocationConflict, so it propagates the hard parse error instead of soft-degrading. The test matrix covers detectRelocation in isolation but has no test exercising the full LoadConfig → LoadConfigForRuntime stack for the malformed-new, old-present scenario.

💡 Suggestion - internal/cache/cache.go:68

migrateLegacyCacheDir now probes two legacy locations: LegacyCacheDir (statedir-resolved) and OldHandRolledLegacyCacheDir (pre-MON-5371 hand-rolled). The second probe is the one that matters on a macOS/Windows install that pre-dates both B2b and MON-5371, yet cache_test.go has no subtest that seeds a warm cache exclusively at the OldHandRolledLegacyCacheDir path and verifies it is carried. The dual-probe logic is new risky code with only one probe tested.

harness-engineering:harness-knowledge-reviewer (1 findings)

💡 Suggestion - internal/cmd/initcmd/init.go:289

The nil guard if reloc.CopyNeeded && d.ApplyConfigRelocation != nil silently skips the copy when ApplyConfigRelocation is nil but DetectConfigRelocation signalled CopyNeeded. A test stub injecting only DetectConfigRelocation can hit this path and leave the config unrelocated with no error or log. Either remove the nil guard and require ApplyConfigRelocation in initDeps, or return an error when CopyNeeded is true and the func is nil.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/cmd/initcmd/init.go:394

The configExistedBefore variable and its initialization are dead code suppressed with _ = configExistedBefore rather than removed. The snapshot-before-save gate it guarded (AskCacheTTL) is gone; the variable, its assignment in runWith, and the parameter in finishExisting with its blank-identifier suppressor should all be deleted to avoid misleading future readers.

4 info-level observations excluded. Run with --verbose to include.

4 PR discussion threads considered.


Completed in 4m 06s | $1.84 | sonnet | daemon 0.2.120 | Glorfindel
Field Value
Model sonnet
Reviewers hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 4m 06s wall · 13m 04s compute (Reviewers: 3m 10s · Synthesis: 54s)
Cost $1.84
Tokens 301.8k in / 45.1k out
Turns 6

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 33.1k 3.0k 0 33.1k (1h) $0.18
harness-engineering:harness-architecture-reviewer sonnet 53.9k 10.0k 2.1k 51.8k (1h) $0.39
harness-engineering:harness-enforcement-reviewer sonnet 53.9k 10.1k 2.1k 51.8k (1h) $0.39
harness-engineering:harness-knowledge-reviewer sonnet 53.9k 10.6k 2.1k 51.8k (1h) $0.40
harness-engineering:harness-self-documenting-code-reviewer sonnet 53.3k 7.6k 2.1k 51.2k (1h) $0.35
security:security-code-auditor haiku 53.6k 3.7k 0 53.6k (1h) $0.13

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

Comment thread internal/config/relocate.go
Comment thread internal/config/config.go Outdated
Comment thread internal/cache/cache.go
Comment thread internal/config/relocate.go
Comment thread internal/cache/cache.go
Comment thread internal/cache/cache.go
Comment thread internal/cmd/initcmd/init.go
Comment thread internal/cmd/initcmd/init.go
- configsMaterialEqual now uses reflect.DeepEqual on the whole Keyring
  sub-struct, not just Backend, so any future KeyringConfig field is
  auto-covered as a divergence.
- LoadConfig skips the new-dir re-parse and legacy-JSON re-read when relErr
  is already a detect-time ErrRelocationConflict (which includes the
  malformed-new-wrapped-in-conflict case). The wrapped conflict error wins,
  so LoadConfigForRuntime soft-degrades cleanly instead of hard-failing on
  a bare parse error.
- Add cache_test subtest seeding a warm cache exclusively at the old
  hand-rolled legacy cache subdir and asserting the dual-probe carry +
  legacy-dir removal.
- Delete configExistedBefore plumbing (variable + threading through
  tryExistingToken / finishExisting); the AskCacheTTL gate it guarded was
  removed in this same PR so the suppressor was dead code.

[MON-5371]
@monit-reviewer monit-reviewer dismissed their stale review May 20, 2026 00:54

Superseded by updated review

Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 36c1355 | Previous: 7ba3105 (incremental)

Summary

Reviewer Findings
harness-engineering:harness-architecture-reviewer 1
harness-engineering:harness-enforcement-reviewer 1
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-architecture-reviewer (1 findings)

⚠️ Should Fix - internal/config/config.go:219

LoadConfig doc comment says the canonical new-dir config is returned alongside ErrRelocationConflict so callers can soft-degrade, but the implementation never reads the new-dir config when relErr != nil — the read block is guarded by reloc.NewPath != "" && relErr == nil. LoadConfigForRuntime therefore returns an all-defaults *Config (empty OAuthClientPath, empty GrantedScopes) instead of the actual new-dir settings. Consequences: (1) users with a non-default oauth_client_path get silent auth failures ('unable to read OAuth client JSON') during the conflict window; (2) the stale-scope check in me.go is silently skipped (CheckScopesMigration returns '' for an empty slice), masking a re-auth requirement. Fix: attempt the new-dir read unconditionally, store it separately from the relErr gate, and return it alongside the conflict error.

harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/config/relocate.go:225

oauthClientPathEquiv compares OAuthClientPath values from loadConfigFromFile, which deliberately does not call applyDefaults (and therefore does not call ExpandPath). A user whose old config.yml has oauth_client_path: ~/oauth_client.json and whose new config.yml has the same value as a fully-expanded absolute path will see a false ErrRelocationConflict because the tilde form and the expanded form differ as strings. Both sides should be normalized (via ExpandPath or equivalent) before the equality comparison.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/cache/cache.go:96

Cache.Clear() calls os.RemoveAll(c.loc.Root), which removes the entire tool-level cache root (os.UserCacheDir()/google-readonly), not just the current instance subtree. If a future cli-common Locator introduces a second instance key, any Clear() call silently evicts all instances. The safer scope is os.RemoveAll(filepath.Join(c.loc.Root, c.loc.InstanceKey)) to match only the instance being managed.

12 PR discussion threads considered.


Completed in 4m 03s | $1.72 | sonnet | daemon 0.2.120 | Glorfindel
Field Value
Model sonnet
Mode Re-review · Cycle 2 · Session resumed
Reviewers hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 4m 03s wall · 10m 31s compute (Reviewers: 2m 49s · Synthesis: 47s)
Cost $1.72
Tokens 334.0k in / 34.5k out
Turns 6

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 42.2k 2.6k 13.6k 28.5k (1h) $0.15
harness-engineering:harness-architecture-reviewer sonnet 54.0k 8.5k 2.1k 51.9k (1h) $0.37
harness-engineering:harness-enforcement-reviewer sonnet 54.0k 7.2k 2.1k 51.9k (1h) $0.35
harness-engineering:harness-knowledge-reviewer sonnet 54.0k 6.2k 2.1k 51.9k (1h) $0.33
harness-engineering:harness-self-documenting-code-reviewer sonnet 53.3k 7.2k 2.1k 51.2k (1h) $0.35
security:security-code-auditor haiku 53.6k 1.9k 0 53.6k (1h) $0.12
discussion-summarizer 23.0k 828 13.6k 9.3k (1h) $0.05

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

Comment thread internal/config/config.go
Comment thread internal/config/relocate.go
Comment thread internal/cache/cache.go
- LoadConfig: attempt new-dir read unconditionally so callers soft-degrading
  via LoadConfigForRuntime get the user's actual settings (OAuthClientPath,
  GrantedScopes, …) alongside ErrRelocationConflict instead of an empty
  Config. The relErr-set branch still suppresses bare parse-error
  propagation so the soft-degrade contract holds.
- oauthClientPathEquiv: ExpandPath both sides before comparing so tilde
  vs fully-expanded forms don't false-positive as divergent.
- Cache.Clear: scope to <Root>/<InstanceKey> instead of the tool-level
  Root so a future multi-instance Locator can't have one instance's Clear
  evict the others.

[MON-5371]
@monit-reviewer monit-reviewer dismissed their stale review May 20, 2026 01:02

Superseded by updated review

Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 94f60bb | Previous: 36c1355 (incremental)

Approved with 6 non-blocking suggestions below. Address at your discretion.

Summary

Reviewer Findings
harness-engineering:harness-architecture-reviewer 1
harness-engineering:harness-enforcement-reviewer 2
harness-engineering:harness-knowledge-reviewer 2
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-architecture-reviewer (1 findings)

💡 Suggestion - internal/cache/cache.go:88

GetDrives catches json.SyntaxError and json.UnmarshalTypeError via errors.As as cache misses, but not json.InvalidUnmarshalError or io.ErrUnexpectedEOF (which cli-common/cache.ReadResource may wrap around a truncated read). A partially-written cache file producing one of these unwrapped error types would propagate to the caller instead of being demoted to a miss, breaking the 'corrupt → miss' contract. Consider widening the catch or adding a comment that names the specific cli-common wrapping guarantees this relies on.

harness-engineering:harness-enforcement-reviewer (2 findings)

💡 Suggestion - internal/config/relocate_test.go:287

TestRelocate_MalformedOld_FailLoud seeds both old and new dirs with config files (malformed old, valid new), so it exercises the both-present branch in detectRelocation. The old-only-malformed code path (oldPresent && !newPresent with invalid YAML) is handled in detectRelocation but has no dedicated test variant. The acceptance matrix lists 'malformed old' as a covered case, but this path is untested.

💡 Suggestion - internal/config/config.go:219

LoadConfig returns a non-nil *Config alongside a non-nil error (ErrRelocationConflict), intentionally breaking Go's convention that when err != nil the primary return value is unusable. Future callers of the exported LoadConfig who follow the standard if err != nil { return err } pattern will silently discard the valid config and lose the soft-degrade semantics. Making LoadConfig unexported and exposing only LoadConfigForRuntime as the public runtime entry point would eliminate this footgun.

harness-engineering:harness-knowledge-reviewer (2 findings)

💡 Suggestion - internal/config/relocate.go:182

When oldPresent && !newPresent and the old config is malformed, detectRelocation returns Kind: relocBothDivergent rather than a distinct kind. LoadConfig's read-priority logic only special-cases relocOldOnly for reading the old file; relocBothDivergent is not in that branch, so a user with a malformed old-only config ends up with all-defaults returned by LoadConfigForRuntime instead of a clear parse error surfaced to the user. A dedicated relocMalformedOld kind (or relocOldOnly with CopyNeeded:false and the parse error) would make the behavior both correct and self-documenting.

💡 Suggestion - internal/cache/cache.go:170

GetDir() returns c.loc.Root (the tool-level cache root: /google-readonly), not the per-instance directory where files are actually written (/). TestCache_Clear confirms real data lives one level deeper than GetDir() reports. A future caller using GetDir() to construct data paths will look in the wrong place. Either return filepath.Join(c.loc.Root, c.loc.InstanceKey) or rename to GetRootDir and add a doc comment clarifying this is a tool-root accessor.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/config/config.go:258

Double-failure path: when relErr is ErrRelocationConflict and the new config's YAML is also malformed, LoadConfig resets cfg to &Config{} and returns (empty-defaults, relErr). LoadConfigForRuntime then swallows relErr and returns those empty defaults with nil error. Downstream callers receive a wrong OAuthClientPath (empty → default, possibly non-existent) and empty GrantedScopes with no error surfaced. The code comment acknowledges the trade-off, but the consequence — silently running with no config on a double-failure — merits an explicit note on LoadConfigForRuntime's doc comment so callers know this scenario exists.

3 info-level observations excluded. Run with --verbose to include.

15 PR discussion threads considered.


Completed in 6m 15s | $1.97 | sonnet | daemon 0.2.120 | Glorfindel
Field Value
Model sonnet
Mode Re-review · Cycle 3 · Session resumed
Reviewers hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 6m 15s wall · 16m 02s compute (Reviewers: 4m 14s · Synthesis: 1m 47s)
Cost $1.97
Tokens 342.4k in / 53.1k out
Turns 6

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 52.2k 5.7k 13.6k 38.5k (1h) $0.23
harness-engineering:harness-architecture-reviewer sonnet 53.9k 5.7k 2.1k 51.8k (1h) $0.33
harness-engineering:harness-enforcement-reviewer sonnet 53.9k 13.0k 2.1k 51.8k (1h) $0.44
harness-engineering:harness-knowledge-reviewer sonnet 53.9k 10.4k 2.1k 51.8k (1h) $0.40
harness-engineering:harness-self-documenting-code-reviewer sonnet 53.3k 10.4k 2.1k 51.1k (1h) $0.39
security:security-code-auditor haiku 53.5k 7.6k 0 53.5k (1h) $0.15
discussion-summarizer 21.7k 345 13.6k 8.1k (1h) $0.04

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

}

// configsMaterialEqual compares the user-meaningful subset of two Configs.
// CredentialRef / GrantedScopes / Keyring.Backend are real user settings and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-knowledge-reviewer): When oldPresent && !newPresent and the old config is malformed, detectRelocation returns Kind: relocBothDivergent rather than a distinct kind. LoadConfig's read-priority logic only special-cases relocOldOnly for reading the old file; relocBothDivergent is not in that branch, so a user with a malformed old-only config ends up with all-defaults returned by LoadConfigForRuntime instead of a clear parse error surfaced to the user. A dedicated relocMalformedOld kind (or relocOldOnly with CopyNeeded:false and the parse error) would make the behavior both correct and self-documenting.

Reply to this thread when addressed.

oldDir, newDir := reloctest(t)
if err := os.MkdirAll(oldDir, 0o700); err != nil {
t.Fatal(err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-enforcement-reviewer): TestRelocate_MalformedOld_FailLoud seeds both old and new dirs with config files (malformed old, valid new), so it exercises the both-present branch in detectRelocation. The old-only-malformed code path (oldPresent && !newPresent with invalid YAML) is handled in detectRelocation but has no dedicated test variant. The acceptance matrix lists 'malformed old' as a covered case, but this path is untested.

Reply to this thread when addressed.

Comment thread internal/cache/cache.go
config.OldHandRolledLegacyCacheDir,
}
tried := map[string]bool{}
for _, p := range probes {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-architecture-reviewer): GetDrives catches json.SyntaxError and json.UnmarshalTypeError via errors.As as cache misses, but not json.InvalidUnmarshalError or io.ErrUnexpectedEOF (which cli-common/cache.ReadResource may wrap around a truncated read). A partially-written cache file producing one of these unwrapped error types would propagate to the caller instead of being demoted to a miss, breaking the 'corrupt → miss' contract. Consider widening the catch or adding a comment that names the specific cli-common wrapping guarantees this relies on.

Reply to this thread when addressed.

Comment thread internal/cache/cache.go

// GetDir returns the cache directory path
// GetDir returns the cache directory path.
func (c *Cache) GetDir() string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-knowledge-reviewer): GetDir() returns c.loc.Root (the tool-level cache root: /google-readonly), not the per-instance directory where files are actually written (/). TestCache_Clear confirms real data lives one level deeper than GetDir() reports. A future caller using GetDir() to construct data paths will look in the wrong place. Either return filepath.Join(c.loc.Root, c.loc.InstanceKey) or rename to GetRootDir and add a doc comment clarifying this is a tool-root accessor.

Reply to this thread when addressed.

Comment thread internal/config/config.go
// error: the wrapped conflict already names both paths and the cause,
// and a hard-fail here would defeat the soft-degrade contract.
cfg := &Config{}
read := false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): Double-failure path: when relErr is ErrRelocationConflict and the new config's YAML is also malformed, LoadConfig resets cfg to &Config{} and returns (empty-defaults, relErr). LoadConfigForRuntime then swallows relErr and returns those empty defaults with nil error. Downstream callers receive a wrong OAuthClientPath (empty → default, possibly non-existent) and empty GrantedScopes with no error surfaced. The code comment acknowledges the trade-off, but the consequence — silently running with no config on a double-failure — merits an explicit note on LoadConfigForRuntime's doc comment so callers know this scenario exists.

Reply to this thread when addressed.

Comment thread internal/config/config.go
@@ -226,33 +219,98 @@ func ShortenPath(path string) string {
return path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-enforcement-reviewer): LoadConfig returns a non-nil *Config alongside a non-nil error (ErrRelocationConflict), intentionally breaking Go's convention that when err != nil the primary return value is unusable. Future callers of the exported LoadConfig who follow the standard if err != nil { return err } pattern will silently discard the valid config and lose the soft-degrade semantics. Making LoadConfig unexported and exposing only LoadConfigForRuntime as the public runtime entry point would eliminate this footgun.

Reply to this thread when addressed.

@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 20, 2026

Findings

Blocker: malformed durable config still soft-degrades to defaults on runtime paths.
DetectConfigRelocation correctly returns ErrRelocationConflict for malformed old/new configs (internal/config/relocate.go:131-149), but LoadConfig now suppresses canonical read/parse failures whenever relErr is set and leaves cfg empty (internal/config/config.go:262-279). LoadConfigForRuntime then suppresses every ErrRelocationConflict and returns that defaulted config with nil error (internal/config/relocate.go:335-340). So a malformed new config, or old-only malformed config, becomes “warn + use defaults” for non-init commands. That violates the durable-state §3.2 malformed-old/malformed-new fail-loud contract and can silently switch credential_ref back to google-readonly/default.

I’d distinguish parse/corrupt relocation errors from parseable old/new divergence, or only soft-suppress when the canonical new config was successfully read.

Codex final-pass blocker: LoadConfig was suppressing canonical read
failures whenever relErr was set, returning all-defaults cfg + nil
through LoadConfigForRuntime. A malformed new config (under a
both-present detect-time conflict) would therefore silently warn-and-
default, swapping CredentialRef back to google-readonly/default for
non-init commands and masking the corrupt file.

LoadConfig now tracks whether a canonical config was actually read; if
relErr is set AND no read succeeded, return (nil, relErr) instead of
(empty cfg, relErr). LoadConfigForRuntime then only soft-degrades when
cfg is non-nil, so the runtime command hard-fails on a malformed
canonical config and the user sees the conflict instead of silent
default swap.

Added TestLoadConfigForRuntime_MalformedCanonicalUnderConflict_HardFails
pinning the new contract via the testable seam (which now mirrors the
real LoadConfig + LoadConfigForRuntime branching).

[MON-5371]
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 20, 2026

No findings. The final changes line up with the intended split: init remains fail-loud and mutation-gated, runtime only soft-degrades when it has a successfully read canonical config, malformed canonical config hard-fails, token handling stays in the §1.8 migrator, and cache behavior remains disposable and scoped.

Focused verification passed: go test ./internal/config ./internal/cache ./internal/keychain.

@rianjs rianjs merged commit 1f472eb into main May 20, 2026
2 checks passed
@rianjs rianjs deleted the ktlo/MON-5371-cli-common-state-port branch May 20, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port gro to cli-common state components (statedir config + cli-common/cache; drop user-TTL & config cache cmds)

2 participants