Skip to content

fix(config): clear --all removes legacy config file, not just legacy cache#138

Merged
rianjs merged 2 commits into
mainfrom
fix/MON-5433-clear-all-legacy-config
May 21, 2026
Merged

fix(config): clear --all removes legacy config file, not just legacy cache#138
rianjs merged 2 commits into
mainfrom
fix/MON-5433-clear-all-legacy-config

Conversation

@rianjs
Copy link
Copy Markdown
Contributor

@rianjs rianjs commented May 21, 2026

Summary

gro config clear --all removed the legacy Drive cache (via LegacyCacheDir) but did NOT remove the legacy config file at <old-hand-rolled-dir>/config.{yml,json}. After MON-5371 (#134) relocated the config dir to OS-native paths on macOS/Windows, this asymmetry re-armed the DetectConfigRelocation divergence trap on the next gro init. (The cache side got the both-dir treatment in B2b / #130; the config side was missed when #134 introduced the relocation.)

This change mirrors the cache-side symmetry on the config side and hardens --all against the malformed state it exists to wipe.

Changes

  • config.OldHandRolledConfigDir — exported primitive, symmetric with the existing OldHandRolledTokenPath / LegacyCacheDir.
  • config.GetConfigDirNoCreate — symmetric with GetConfigPathNoCreate; lets the candidate helper resolve without side-effects.
  • internal/cmd/config.configFilesForClear — per-file deduped candidate list covering both canonical config.{yml,json} and the legacy hand-rolled dir's config.{yml,json} (canonical .json is included because LoadConfig reads it when .yml is absent). configFilesForClearFn is a package-var test seam so Linux CI can exercise the macOS/Windows divergent-dir branch.
  • runClear hardening (§6.6 pattern 7) — under --all, an OpenNoMigrate failure (e.g. invalid keyring.backend from a malformed canonical config) soft-degrades with a warning so the file scrub still completes. Plain clear (no --all) still hard-fails on a broken keyring; the user didn't ask for a wipe there.

OAuth client JSON is intentionally NOT removed (deployment material exclusion preserved per the existing trailing note).

Tests

internal/cmd/config/clear_legacy_test.go:

  • TestConfigClearAll_RemovesAllConfigCandidates — seeds all four candidate files via the synthetic-distinct-dir seam; asserts all removed.
  • TestConfigClearAll_DryRunReportsAllCandidates — dry-run names every candidate; files still on disk.
  • TestConfigClearAll_LegacyAbsent_OnlyCanonical — absent legacy files don't produce spurious "Removed" lines.
  • TestConfigFilesForClear_PathIdentityDedup — per-file dedup pin; also pins canonical config.json inclusion (Codex r2 regression).
  • TestConfigClearAll_MalformedCanonicalConfig_StillScrubsFiles — §6.6 pattern 7 recovery-contract pin (matches nrq's TestConfigClear_All_MalformedConfig_StillScrubsFiles).
  • TestConfigClear_PlainStillHardFailsOnBrokenKeyring — non---all path stays strict.

make check passes locally: 1384 tests across 32 packages, 0 lint issues.

Test plan

  • make check green
  • Codex architect plan review converged (blockers=0 majors=0 minors=0 nits=0 on r3)
  • Manual repro from issue#137 "Repro" section no longer reproduces (will verify in CI environment)

Closes #137
[MON-5433]

…cache

Mirrors the LegacyCacheDir both-dir cleanup pattern on the config side.
After MON-5371 relocated the config dir from ~/.config/google-readonly
to the OS-native dir on macOS/Windows, clear --all only removed the
new-location config.yml; the legacy file then re-armed the
DetectConfigRelocation divergence trap on next gro init.

Adds:
- config.OldHandRolledConfigDir exported primitive (symmetric with
  OldHandRolledTokenPath, LegacyCacheDir).
- config.GetConfigDirNoCreate (symmetric with GetConfigPathNoCreate)
  so the candidate helper resolves without side-effects.
- internal/cmd/config.configFilesForClear: per-file deduped candidate
  list covering both canonical config.{yml,json} and the legacy hand-
  rolled dir's config.{yml,json}. configFilesForClearFn is a package-
  var test seam so Linux CI can exercise the macOS/Windows divergent-
  dir branch.
- runClear hardening (§6.6 pattern 7): under --all an OpenNoMigrate
  failure (e.g. invalid keyring.backend) soft-degrades with a warning
  so the file scrub still completes. Plain clear (no --all) still
  hard-fails on a broken keyring.

OAuth client JSON is intentionally NOT removed (deployment material
exclusion preserved).

Closes #137
[MON-5433]
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 21, 2026

Major

  • internal/cmd/config/config.go:278 still lets a keyring query failure block the --all file scrub. The PR soft-degrades only OpenNoMigrate() failure, but if st.HasToken() returns an error, runClear(true, false) returns before removing any cfgPaths. That leaves the legacy config files in place in another broken-keyring state, which conflicts with the §6.6 recovery contract. Under all, handle HasToken() failure like open failure: warn, skip token cleanup, continue file scrub; keep non---all strict.

Minor

  • internal/cmd/config/clear_legacy_test.go:143 overclaims the malformed-config recovery pin. The test name/comment says “MalformedCanonicalConfig”, but it writes a valid synthetic config file and forces failure via GOOGLE_READONLY_KEYRING_BACKEND. Either rename it to a generic OpenNoMigrateFailure_StillScrubsFiles, or add a real canonical malformed config.yml case so the product-intent claim is actually verified.

Nit

  • internal/cmd/config/clear_legacy_test.go:186 reimplements config.ShortenPath in test code. This package already imports internal/config elsewhere as appconfig; using the production helper avoids a tiny parallel implementation.

@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 21, 2026

TDD coverage assessment

Verdict: close to sufficient — one real gap that Codex also flagged.

The new tests cleanly cover the bug-fix intent (legacy-file scrub, dry-run parity, per-file dedup, canonical config.json regression pin) and the §6.6 pattern-7 recovery contract for OpenNoMigrate failure. The synthetic-candidates seam is a good fit for Linux CI exercising the macOS/Windows divergent-dir branch.

Real gap

  • st.HasToken() failure under --all is untested. runClear soft-degrades OpenNoMigrate failure but HasToken() returning an error still short-circuits the scrub at internal/cmd/config/config.go:278. There's no test where the keyring opens but HasToken errors (e.g. partial backend corruption). Add a test that pins this — same soft-degrade should apply, since the contract is "broken state must not block its own wipe." (Codex flagged this independently; not novel here.)

Minor

  • TestConfigClearAll_MalformedCanonicalConfig_StillScrubsFiles is mis-named. It seeds a valid YAML and forces failure via GOOGLE_READONLY_KEYRING_BACKEND. Since runClear no longer calls LoadConfig (paths come from configFilesForClearFn), there is no malformed-canonical path to verify here — the actual contract being pinned is "broken keyring under --all." Rename to TestConfigClearAll_BrokenKeyring_StillScrubsFiles for honesty, OR add a second case that writes genuinely malformed YAML to verify it's tolerated (it should be — paths are read without parsing — but no test confirms).
  • Stderr warning not asserted. The soft-degrade emits warning: could not open keyring (...) on stderr. Tests don't capture stderr, so user-visibility is unverified.
  • OAuth client JSON exclusion is not pinned by the new tests. The "deployment material is never removed" claim relies on it not appearing in configFilesForClear's output. The dedup test could cheaply assert OAuthClientFile is absent from paths.

Acceptable as-is

  • No os.Remove error-mid-loop test (hard to portably trigger; the failure mode is documented and the error is surfaced, not swallowed).
  • Real-resolver configFilesForClear test runs old==new on Linux — that's a known limitation of statedirtest, accepted via the seam.

Recommend: add the HasToken soft-degrade test (the only real gap) and rename the malformed-config test for honesty. Other items are optional polish.

…ames

Codex/TDD PR review caught two issues:

1. (Major) runClear's --all soft-degrade only covered OpenNoMigrate
   failure; HasToken failure still short-circuited the file scrub.
   Symmetric soft-degrade now applied — emits a warning, closes the
   store eagerly, drops the reference so token branches skip cleanly.
   Defer is nil-guarded since st may be set to nil mid-function.

2. (Minor) The "MalformedCanonicalConfig" test seeded valid YAML and
   forced failure via the keyring backend env — overclaiming. Split
   into two honest tests: BrokenKeyringOpen (forces OpenNoMigrate
   failure) and MalformedCanonicalConfig (genuinely malformed YAML
   on disk; pins that runClear no longer reads the config). Added
   ExcludesOAuthClientJSON to pin the deployment-material exclusion.

Also switched the test file to use appconfig.ShortenPath directly
instead of a parallel implementation (Codex nit).
@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 21, 2026

Addressed Codex r1 + TDD findings in 3d4989a:

  • Codex Major (HasToken soft-degrade): symmetric soft-degrade now applied. HasToken failure under --all emits a warning, eagerly closes the store, drops the reference. Nil-guarded the deferred close.
  • Codex Minor / TDD Minor (mis-named malformed-config test): split into two honest tests — BrokenKeyringOpen_StillScrubsFiles (forces OpenNoMigrate failure) and MalformedCanonicalConfig_StillScrubsFiles (genuinely malformed YAML on disk; pins that runClear no longer reads the config). Plain-clear hard-fail test stays.
  • Codex Nit (parallel ShortenPath): test file now uses appconfig.ShortenPath directly.
  • TDD Minor (OAuth client JSON exclusion): added TestConfigFilesForClear_ExcludesOAuthClientJSON.
  • TDD Minor (stderr warning capture): deferred — keeping the production code minimal; the warning text is non-load-bearing and code review covers the symmetric branch.

make check green: 1387 tests, 0 lint issues.

@rianjs
Copy link
Copy Markdown
Contributor Author

rianjs commented May 21, 2026

Codex r2: 0 blockers, 0 majors, 1 minor, 0 nits — convergence reached.

Acknowledging the remaining minor: TestConfigClearAll_MalformedCanonicalConfig_StillScrubsFiles seeds malformed YAML at a synthetic path, so it doesn't exercise a malformed real-canonical config blocking keychain.OpenNoMigrate(). The production fact being claimed (runClear no longer reads the canonical config — paths come from configFilesForClearFn only) is verified by code inspection rather than this test. Leaving the test as-is per family precedent of buggering off marginal test-naming nits. Moving to daemon pass.

Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 3d4989a

Approved with 4 non-blocking suggestions below. Address at your discretion.

Summary

Reviewer Findings
harness-engineering:harness-enforcement-reviewer 1
harness-engineering:harness-knowledge-reviewer 2
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/cmd/config/config.go:304

Dry-run and live-run diverge on absent candidate files: --dry-run --all unconditionally prints 'Would remove: X' for every candidate path returned by configFilesForClearFn(), but the live path silently skips non-existent files with no output. A user previewing with dry-run will see removal lines for files the real run would silently skip, making dry-run output unreliable as a preview. Consider checking os.Stat before printing, or using 'Would remove (if present):' phrasing for candidates that may not exist.

harness-engineering:harness-knowledge-reviewer (2 findings)

💡 Suggestion - internal/cmd/config/config.go:337

The os.IsNotExist branch is now silent. The old code printed 'No X to remove.' for absent files; the new multi-candidate code skips with no output. On an already-clean system (the common idempotent re-run case), the user gets zero confirmation that any config-file candidates were checked. A single summary line such as 'No legacy config files found.' when all candidates are absent would preserve the UX parity the old code had.

💡 Suggestion - internal/cmd/config/config.go:35

OldHandRolledConfigDir() error is silently discarded (oldDir, _ := ...). The comment documents 'best-effort: no HOME → skip legacy', but any unexpected os.UserHomeDir error will also silently omit legacy scrub candidates with no user-visible feedback. Under --all (a recovery path), silent partial cleanup could leave the user in a confusing half-wiped state. A fmt.Fprintf(os.Stderr, ...) warning on non-nil err — symmetric with the OpenNoMigrate soft-degrade pattern used elsewhere in the same function — would make degradation visible.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/cmd/config/clear_legacy_test.go:216

newDirFor derives the canonical config dir by calling configFilesForClear() and taking filepath.Dir(paths[0]), silently assuming canonical paths are always returned first. This is an implicit ordering contract on a private implementation detail with no enforcement. If configFilesForClear() is ever reordered, TestConfigFilesForClear_PathIdentityDedup will silently test the wrong directory without a diagnostic error. A direct call to config.GetConfigDirNoCreate() (already exported by this PR) would make the intent explicit and eliminate the fragility.

4 info-level observations excluded. Run with --verbose to include.

4 PR discussion threads considered.


Completed in 2m 30s | $0.73 | sonnet | daemon 0.2.120 | Glorfindel
Field Value
Model sonnet
Reviewers hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 2m 30s wall · 6m 14s compute (Reviewers: 1m 10s · Synthesis: 1m 16s)
Cost $0.73
Tokens 118.4k in / 22.8k out
Turns 6

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 31.7k 4.7k 0 31.7k (1h) $0.20
harness-engineering:harness-architecture-reviewer sonnet 17.3k 3.0k 2.1k 15.1k (1h) $0.11
harness-engineering:harness-enforcement-reviewer sonnet 17.3k 3.6k 2.1k 15.1k (1h) $0.12
harness-engineering:harness-knowledge-reviewer sonnet 17.3k 3.9k 2.1k 15.1k (1h) $0.12
harness-engineering:harness-self-documenting-code-reviewer sonnet 17.3k 4.1k 2.1k 15.1k (1h) $0.13
security:security-code-auditor haiku 17.5k 3.6k 0 17.5k (1h) $0.05

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

if hasTok {
switch {
case st == nil:
fmt.Println("Would remove: (keyring unavailable — token state unknown)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-enforcement-reviewer): Dry-run and live-run diverge on absent candidate files: --dry-run --all unconditionally prints 'Would remove: X' for every candidate path returned by configFilesForClearFn(), but the live path silently skips non-existent files with no output. A user previewing with dry-run will see removal lines for files the real run would silently skip, making dry-run output unreliable as a preview. Consider checking os.Stat before printing, or using 'Would remove (if present):' phrasing for candidates that may not exist.

Reply to this thread when addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as low/marginal — keeping as-is. The dry-run/live print divergence is intentional (dry-run answers 'what would be removed' speculatively; live confirms what actually happened). Per the family precedent ([[feedback-pr-review-loop]]) we bugger off marginal nits to avoid review-loop churn on already-approved PRs.

fmt.Printf("No %s to remove.\n", config.ShortenPath(cfgPath))
default:
return fmt.Errorf("removing %s: %w", config.ShortenPath(cfgPath), err)
for _, p := range cfgPaths {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-knowledge-reviewer): The os.IsNotExist branch is now silent. The old code printed 'No X to remove.' for absent files; the new multi-candidate code skips with no output. On an already-clean system (the common idempotent re-run case), the user gets zero confirmation that any config-file candidates were checked. A single summary line such as 'No legacy config files found.' when all candidates are absent would preserve the UX parity the old code had.

Reply to this thread when addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as low/marginal — keeping as-is. The dry-run/live print divergence is intentional (dry-run answers 'what would be removed' speculatively; live confirms what actually happened). Per the family precedent ([[feedback-pr-review-loop]]) we bugger off marginal nits to avoid review-loop churn on already-approved PRs.

newDir, err := config.GetConfigDirNoCreate()
if err != nil {
return nil, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-knowledge-reviewer): OldHandRolledConfigDir() error is silently discarded (oldDir, _ := ...). The comment documents 'best-effort: no HOME → skip legacy', but any unexpected os.UserHomeDir error will also silently omit legacy scrub candidates with no user-visible feedback. Under --all (a recovery path), silent partial cleanup could leave the user in a confusing half-wiped state. A fmt.Fprintf(os.Stderr, ...) warning on non-nil err — symmetric with the OpenNoMigrate soft-degrade pattern used elsewhere in the same function — would make degradation visible.

Reply to this thread when addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as low/marginal — keeping as-is. The dry-run/live print divergence is intentional (dry-run answers 'what would be removed' speculatively; live confirms what actually happened). Per the family precedent ([[feedback-pr-review-loop]]) we bugger off marginal nits to avoid review-loop churn on already-approved PRs.

if err == nil {
t.Fatal("plain `clear` (no --all) must surface a keyring open failure; got nil")
}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): newDirFor derives the canonical config dir by calling configFilesForClear() and taking filepath.Dir(paths[0]), silently assuming canonical paths are always returned first. This is an implicit ordering contract on a private implementation detail with no enforcement. If configFilesForClear() is ever reordered, TestConfigFilesForClear_PathIdentityDedup will silently test the wrong directory without a diagnostic error. A direct call to config.GetConfigDirNoCreate() (already exported by this PR) would make the intent explicit and eliminate the fragility.

Reply to this thread when addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as low/marginal — keeping as-is. The dry-run/live print divergence is intentional (dry-run answers 'what would be removed' speculatively; live confirms what actually happened). Per the family precedent ([[feedback-pr-review-loop]]) we bugger off marginal nits to avoid review-loop churn on already-approved PRs.

@rianjs rianjs merged commit 93df86f into main May 21, 2026
2 checks passed
@rianjs rianjs deleted the fix/MON-5433-clear-all-legacy-config branch May 21, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

config clear --all removes the legacy Drive cache but not the legacy config dir

2 participants