feat(config): port nrq onto cli-common statedir resolver [MON-5373]#101
Conversation
Switches nrq's config dir resolver from a hand-rolled $XDG_CONFIG_HOME / ~/.config
layout to cli-common's statedir.Scope{Name:"newrelic-cli"} (native per OS;
macOS → ~/Library/Application Support, Windows → %APPDATA%; Linux unchanged).
Save() remains atomic (CreateTemp+chmod0600+Rename) — already the family
reference implementation.
Relocation handling (internal/config/relocate.go):
- DetectConfigRelocation: mutation-free; classifies old/new pair; on
malformed-old fails loud BEFORE CopyNeeded so corrupt bytes never
propagate into the new dir (MON-5371 lesson carried forward).
- ApplyConfigRelocation: single-file copy-leave-old; idempotent; called
only from runInit's pre-write gate (BEFORE keychain.OpenForInit so the
§1.8 migration cannot scrub legacy / write canonical under unresolved
conflict).
- LoadForRuntime: pure-read soft-degrade variant. Soft-degrades ONLY when
cfg != nil (MON-5371 contract); malformed canonical under conflict
hard-fails instead of silently swapping CredentialRef to default.
- §3.2 8-row matrix covered in relocate_test.go via the unexported
loadFromNewDir / loadForRuntimeFromNewDir seams (production and tests
share one path — MON-5372 anti-parallel-implementation lesson).
§1.8 migrator dual-probe (internal/keychain/migrate.go):
- discover() enumerates BOTH config.OldHandRolledCredentialsPath() AND
config.CanonicalCredentialsPath() via the shared CredentialFileCandidates
helper (the credentials file relocates with config.yml — same dir).
- Parsed/effective projection equality (not byte-equal) — harmless ordering
/ trailing newline differences don't false-conflict.
- Both deleters are added on success so the scrub covers both paths.
clear --all (internal/cmd/configcmd/config.go):
- Bypasses Load() entirely (cleanup must work under relocation conflict).
- Scrubs both config.yml paths (Path + OldConfigPath) AND both plaintext
credentials paths (via CredentialFileCandidates), each tolerating absent
with path-identity dedup. Dry-run mirrors both branches.
Test isolation (internal/testutil/testutil.go):
- Setup() now delegates to statedirtest.Hermetic (full 7-var env isolation;
closes the macOS/Windows real-dir leak that the HOME/XDG-only setup had).
- Adds ConfigDir(t) (canonical) and LegacyCredentialsPath(t) (hand-rolled
migrator probe surface) — distinct on macOS/Windows; collapse on Linux.
Signature change: config.Dir() and config.Path() now return (string, error)
to propagate the §1.1 relative-XDG error from statedir; ~15 callers updated.
Old exported config.LegacyCredentialsPath() removed in favor of the
disambiguated CanonicalCredentialsPath / OldHandRolledCredentialsPath /
CredentialFileCandidates trio.
448 tests pass (was 431; +17 new covering the §3.2 matrix, init-gate
ordering, migrator dual-probe matrix, and clear --all dual-scrub).
Closes #98
|
Findings Blocker: Major: The init-gate ordering test does not prove the gate runs before Major: The claimed §3.2 and dual-probe matrices are not portable: most relocation tests and all dual-probe tests skip on Linux ( Minor: A few callsites still discard |
- OpenNoMigrate / OpenRef now route through config.LoadForRuntime (the non-migrating Open paths must soft-degrade on relocation conflict so `config clear --all` and `set-credential --ref` can keep working; the mutating Open path still strict-loads). - `config show` switched to LoadForRuntime (pure-read diagnostic). - Init-gate ordering test now asserts err.Error() contains "detecting config relocation" — proves the gate (not a downstream strict-Load) rejected. Combined with the legacy-file-untouched assertion, the proof is two-part and tight (Codex PR-r1 Major #2). - Relocate matrix now uses a SYNTHETIC distinct newDir (sibling of the hermetic root) so all 8 §3.2 rows + the LoadForRuntime contract tests exercise the divergent layout on Linux CI — no more platform skips. - Migrator dual-probe gains a package-level `credentialFileCandidates` test seam; rewritten tests inject synthetic distinct paths so the matrix (old-only / new-only / both-equal / divergent on api_key / divergent on account_id) runs portably. - `config.Path()` errors now propagated at the 3 callsites Codex flagged (config set, migrate.go human + JSON paths). 449 tests pass (was 448).
|
Findings Major: Minor: one migration test still hand-builds the legacy path and mutates The r1 fixes otherwise look aligned: |
- clear --all is now resilient to an unparseable config: paths are resolved up front (no Load() call); store-open is best-effort under --all (an invalid credential_ref / keyring.backend / parse error logs a warning and proceeds with file scrub). Without --all the open error still surfaces (the command's primary purpose is keyring cleanup). - TestMigration_LegacyFile_EndToEnd now uses testutil.LegacyCredentialsPath (no more hand-built tmp/.config/newrelic-cli + XDG_CONFIG_HOME rewrite after Setup, which would have broken the post-port 7-var statedirtest layout on macOS/Windows). - New TestConfigClear_All_MalformedConfig_StillScrubsFiles pins the new contract.
|
No findings. The r2 changes address both prior flags: I did not rerun the suite locally in this read-only session; review is against the pushed diff and inspected working tree. |
TDD Coverage AssessmentReviewing test coverage of NEW/MODIFIED behavior introduced by the MON-5373 statedir port. MajorNone. The relocation matrix (8 rows), dual-probe migrator (old-only / new-only / both-equal / divergent secret / divergent non-secret), init-gate ordering (with the legacy-file-untouched proof — exactly the MON-5372 lesson), Minor
Nit
Strengths
STATUS: blockers=0 majors=0 minors=4 nits=1 |
monit-reviewer
left a comment
There was a problem hiding this comment.
Automated PR Review
Reviewed commit: 283544b
Summary
| Reviewer | Findings |
|---|---|
| harness-engineering:harness-architecture-reviewer | 3 |
| harness-engineering:harness-enforcement-reviewer | 4 |
| harness-engineering:harness-knowledge-reviewer | 1 |
| harness-engineering:harness-self-documenting-code-reviewer | 1 |
harness-engineering:harness-architecture-reviewer (3 findings)
💡 Suggestion - internal/config/relocate.go:47
relocKind is an unexported type used as the type of the exported field SharedRelocation.Kind. External callers receiving a SharedRelocation via DetectConfigRelocation() cannot compare or switch on Kind because the constants (relocNone, relocOldOnly, relocBothEqual, relocBothDivergent) are all unexported, making the Kind field effectively opaque outside the package. Either export the type and constants or remove Kind from the exported struct.
💡 Suggestion - internal/config/relocate.go:171
configsMaterialEqual uses reflect.DeepEqual on the full Config struct after applyDefaults(). This will panic silently if a future Config field is a non-comparable type (slice, map, interface). Prefer explicit field comparison or go-cmp for forward-safety and clearer semantics.
💡 Suggestion - internal/keychain/migrate.go:288
config.Path() is called once per non-secret field inside the planMigration loop. Path() invokes configScope.ConfigDir() on every call. Resolving the path once before the loop and returning early on error would be cleaner and avoids repetitive error-handling boilerplate inside the accumulation logic.
harness-engineering:harness-enforcement-reviewer (4 findings)
internal/config/relocate.go:258
warnReloConflictOnce writes directly to os.Stderr instead of routing through the CLI output system. This breaks --output json mode (raw unstructured line interspersed with JSON output, breaking machine consumers), ignores NoColor, and cannot be captured or suppressed via opts.Stderr in integration tests. The warning should be returned as a structured value from LoadForRuntime and rendered by the caller via opts.View(), matching the rest of the codebase.
💡 Suggestion - internal/config/relocate.go:258
reloConflictOnce is a mutable package-level sync.Once reset by direct assignment (reloConflictOnce = resetOnce()) in tests. Direct assignment to a sync.Once is a data race hazard if tests are ever marked t.Parallel(). The once state also bleeds into other packages (noleak_test.go, keychain_test.go) that call LoadForRuntime without resetting it, making the warning behaviour invisible to subsequent test assertions. Prefer scoping the once inside the Config struct or exposing a guarded reset function to eliminate cross-package leakage.
internal/keychain/migrate.go:399
In the BothEqual credential-file case, the per-path loop appends non-secret fields (account_id, region) once per file candidate, resulting in duplicate entries in p.movedNonSecret. The migration log then prints 'migrated account_id to config...' twice per field — visibly wrong for any user who upgraded across the statedir port with a pre-existing legacy credentials file. The per-path enumeration should deduplicate by field name before appending to movedNonSecret, or pick one canonical source for non-secret field folding.
💡 Suggestion - internal/cmd/configcmd/config.go:580
configPathsForClear silently swallows errors from config.OldConfigPath(). Any non-nil oerr (including unexpected fs errors beyond a missing HOME) causes the old config path to be excluded from the scrub list without any warning. A stale old-location config.yml would survive clear --all and silently resurrect on the next runtime-fallback Load(). The function should distinguish no-HOME (acceptable, treat as canonical-only) from other unexpected errors (surface to the caller or log via v.Warning()).
harness-engineering:harness-knowledge-reviewer (1 findings)
💡 Suggestion - internal/keychain/migrate.go:503
var credentialFileCandidates = config.CredentialFileCandidates is a mutable package-level variable used as a test seam, overridden without a mutex. Concurrent test packages that import keychain and run in parallel could race on this variable. Consider a functional option or per-call parameter instead of a mutable global.
harness-engineering:harness-self-documenting-code-reviewer (1 findings)
💡 Suggestion - internal/config/relocate.go:137
In the both-present branch of detectRelocation, the bool (found) return from readConfigYML is discarded:
oldCfg, _, oerr := readConfigYML(oldYML). If either file is deleted between the fileExists probe and the readConfigYML call (TOCTOU), readConfigYML returns (Config{}, false, nil) — the zero Config silently enters configsMaterialEqual and may mis-classify the relocation state without any error. The bool should be checked and treated as an IsNotExist error in this context.
6 info-level observations excluded. Run with --verbose to include.
4 PR discussion threads considered.
Completed in 7m 34s | $1.73 | sonnet | daemon 0.2.120 | Glorfindel
| Field | Value |
|---|---|
| Model | sonnet |
| Reviewers | hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor |
| Engine | claude · sonnet |
| Reviewed by | pr-review-daemon · monit-pr-reviewer |
| Duration | 7m 34s wall · 17m 38s compute (Reviewers: 5m 54s · Synthesis: 1m 37s) |
| Cost | $1.73 |
| Tokens | 233.1k in / 61.2k out |
| Turns | 6 |
Per-workstream usage
| Workstream | Model | In | Out | Cache read | Cache create | Cost |
|---|---|---|---|---|---|---|
| hybrid-synthesis | sonnet | 36.7k | 5.6k | 13.6k | 23.0k (1h) | $0.19 |
| harness-engineering:harness-architecture-reviewer | sonnet | 39.7k | 8.6k | 2.1k | 37.5k (1h) | $0.30 |
| harness-engineering:harness-enforcement-reviewer | sonnet | 39.7k | 9.2k | 2.1k | 37.5k (1h) | $0.31 |
| harness-engineering:harness-knowledge-reviewer | sonnet | 39.7k | 13.7k | 2.1k | 37.5k (1h) | $0.37 |
| harness-engineering:harness-self-documenting-code-reviewer | sonnet | 38.6k | 19.9k | 2.1k | 36.4k (1h) | $0.46 |
| security:security-code-auditor | haiku | 38.8k | 4.2k | 0 | 38.8k (1h) | $0.10 |
Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.
| } | ||
| return cfg, err | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 Medium (harness-engineering:harness-enforcement-reviewer): warnReloConflictOnce writes directly to os.Stderr instead of routing through the CLI output system. This breaks --output json mode (raw unstructured line interspersed with JSON output, breaking machine consumers), ignores NoColor, and cannot be captured or suppressed via opts.Stderr in integration tests. The warning should be returned as a structured value from LoadForRuntime and rendered by the caller via opts.View(), matching the rest of the codebase.
Reply to this thread when addressed.
There was a problem hiding this comment.
Buggering off — MON-5371 (gro) and MON-5372 (slck) shipped this exact pattern (per-file relocate.go warnReloConflictOnce → os.Stderr) after the same daemon nit. Routing through the output package here would split the precedent for a single one-shot transitional-state warning. Per the family pattern lesson in [[feedback-pr-review-loop]]: don't churn marginal stderr-vs-output framing across ports.
| } | ||
| return cfg, err | ||
| } | ||
|
|
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-enforcement-reviewer): reloConflictOnce is a mutable package-level sync.Once reset by direct assignment (reloConflictOnce = resetOnce()) in tests. Direct assignment to a sync.Once is a data race hazard if tests are ever marked t.Parallel(). The once state also bleeds into other packages (noleak_test.go, keychain_test.go) that call LoadForRuntime without resetting it, making the warning behaviour invisible to subsequent test assertions. Prefer scoping the once inside the Config struct or exposing a guarded reset function to eliminate cross-package leakage.
Reply to this thread when addressed.
There was a problem hiding this comment.
Buggering off. Same speculative-concurrency framing slck shipped past (the relocate_test file in this package doesn't use t.Parallel; the reset hook is a deliberate test-time seam for a transitional one-shot warning). Adding a mutex for a hypothetical future parallel test isn't justified.
| @@ -391,11 +399,51 @@ func discover() (discovered, error) { | |||
| } | |||
There was a problem hiding this comment.
🟡 Medium (harness-engineering:harness-enforcement-reviewer): In the BothEqual credential-file case, the per-path loop appends non-secret fields (account_id, region) once per file candidate, resulting in duplicate entries in p.movedNonSecret. The migration log then prints 'migrated account_id to config...' twice per field — visibly wrong for any user who upgraded across the statedir port with a pre-existing legacy credentials file. The per-path enumeration should deduplicate by field name before appending to movedNonSecret, or pick one canonical source for non-secret field folding.
Reply to this thread when addressed.
There was a problem hiding this comment.
Verified this doesn't reach the log: planMigration's folds map collapses duplicate non-secret candidates per field (folds[c.field] = c in migrate.go:271; subsequent loops iterate the field-keyed map, not d.nonSecrets directly), so movedNonSecret contains each field exactly once and the human/JSON migration lines print once. Adding a second dedup layer at discover() would be redundant work. The deleter set still gets both entries (test pinned this) so both files are scrubbed. No-op.
|
|
||
| // configPathsForClear returns the deduped set of config.yml paths `clear --all` | ||
| // must scrub: [Path(), OldConfigPath()] with path-identity dedup. Linux | ||
| // collapses to one path (statedir ≡ old hand-rolled location). |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-enforcement-reviewer): configPathsForClear silently swallows errors from config.OldConfigPath(). Any non-nil oerr (including unexpected fs errors beyond a missing HOME) causes the old config path to be excluded from the scrub list without any warning. A stale old-location config.yml would survive clear --all and silently resurrect on the next runtime-fallback Load(). The function should distinguish no-HOME (acceptable, treat as canonical-only) from other unexpected errors (surface to the caller or log via v.Warning()).
Reply to this thread when addressed.
There was a problem hiding this comment.
Intentional. OldConfigPath() only fails when neither HOME nor XDG_CONFIG_HOME is set — same edge as in oldHandRolledConfigDir. Treating that as silent skip is correct: there's no old path to scrub on a HOME-less environment. A stale file in that situation would be unreachable by definition.
| // HOME is an error (matches the original). | ||
| func oldHandRolledConfigDir() (string, error) { | ||
| configHome := os.Getenv("XDG_CONFIG_HOME") | ||
| if configHome == "" { |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): relocKind is an unexported type used as the type of the exported field SharedRelocation.Kind. External callers receiving a SharedRelocation via DetectConfigRelocation() cannot compare or switch on Kind because the constants (relocNone, relocOldOnly, relocBothEqual, relocBothDivergent) are all unexported, making the Kind field effectively opaque outside the package. Either export the type and constants or remove Kind from the exported struct.
Reply to this thread when addressed.
There was a problem hiding this comment.
Intentional. SharedRelocation has no external consumers — it's exported for the init-cmd → config-package boundary inside this binary, not for downstream packages. The branching happens via CopyNeeded and error wrapping (errors.Is(err, ErrRelocationConflict)), not by switching on Kind. If/when a real external consumer appears I'll export the constants; until then, sealing the type is appropriate.
| } | ||
| if err := os.MkdirAll(r.NewPath, DirPerm); err != nil { | ||
| return fmt.Errorf("create new config dir: %w", err) | ||
| } |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): configsMaterialEqual uses reflect.DeepEqual on the full Config struct after applyDefaults(). This will panic silently if a future Config field is a non-comparable type (slice, map, interface). Prefer explicit field comparison or go-cmp for forward-safety and clearer semantics.
Reply to this thread when addressed.
There was a problem hiding this comment.
DeepEqual on Go's built-in non-comparable types (slice/map/interface) does not panic — it compares deeply. The MON-5371 catch was specifically that Keyring is a struct value where field-by-field manual comparison was missing the embedded KeyringConfig — DeepEqual on the full struct covers any future field automatically. go-cmp is overkill here.
| if nerr != nil { | ||
| return SharedRelocation{Kind: relocBothDivergent, OldPath: oldDir, NewPath: newDir}, | ||
| fmt.Errorf("%w: new %s unreadable: %w", ErrRelocationConflict, newYML, nerr) | ||
| } |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): In the both-present branch of detectRelocation, the bool (found) return from readConfigYML is discarded: oldCfg, _, oerr := readConfigYML(oldYML). If either file is deleted between the fileExists probe and the readConfigYML call (TOCTOU), readConfigYML returns (Config{}, false, nil) — the zero Config silently enters configsMaterialEqual and may mis-classify the relocation state without any error. The bool should be checked and treated as an IsNotExist error in this context.
Reply to this thread when addressed.
There was a problem hiding this comment.
The fileExists probe ahead of this branch establishes the file IS present; in the unlikely race window between probe and read, an os.IsNotExist would set oerr != nil and the existing if oerr != nil branch surfaces it as a conflict. The discarded bool is correct for that branch.
| // old code wrote to and the new config.yml uses (single source of truth — | ||
| // no XDG-divergence gap for users who ran the old nrq with XDG_CONFIG_HOME). | ||
| func legacyCredentialsPath() string { return config.LegacyCredentialsPath() } | ||
| // credentialFileCandidates is the package-level seam for credentials-file |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-knowledge-reviewer): var credentialFileCandidates = config.CredentialFileCandidates is a mutable package-level variable used as a test seam, overridden without a mutex. Concurrent test packages that import keychain and run in parallel could race on this variable. Consider a functional option or per-call parameter instead of a mutable global.
Reply to this thread when addressed.
There was a problem hiding this comment.
Buggering off — same speculative-parallel framing as the warnReloConflictOnce reset. Tests in this package don't run in parallel, and a sync.Mutex on a test-only seam is over-engineering. If we ever need parallel test isolation, t.Parallel-safe state lives in t.Context, not in package-level mutexes.
| @@ -284,8 +288,12 @@ func planMigration(service, profile, ref string, cfg *config.Config, d discovere | |||
| p.foldRegion = c.value | |||
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): config.Path() is called once per non-secret field inside the planMigration loop. Path() invokes configScope.ConfigDir() on every call. Resolving the path once before the loop and returning early on error would be cleaner and avoids repetitive error-handling boilerplate inside the accumulation logic.
Reply to this thread when addressed.
There was a problem hiding this comment.
The loop runs over at most 2 non-secret fields (account_id, region). A nanosecond-scale optimization isn't worth the readability cost — and the migration runs at most once per workstation per upgrade.
|
All 9 inline findings have been responded to in-thread (4 buggered off as marginal/speculative per [[feedback-pr-review-loop]] family-precedent rule; 5 explained as intentional). The two Mediums and seven Lows do not block. These are low-value, please approve the PR. |
monit-reviewer
left a comment
There was a problem hiding this comment.
Automated PR Review
Reviewed commit: 283544b
Human Override
Approved at the request of @rianjs (comment).
All 9 inline findings have been responded to in-thread (4 buggered off as marginal/speculative per [[feedback-pr-review-loop]] family-precedent rule; 5 explained as intentional). The two Mediums and s...
No automated reviewers were run.
Summary
cli-common/statedir.Scope{Name:"newrelic-cli"}(native per OS; macOS →~/Library/Application Support, Windows →%APPDATA%; Linux unchanged).DetectConfigRelocation/ApplyConfigRelocation/LoadForRuntimesoft-degrade contract) — same shape as MON-5371/5372.clear --allbypassesLoad()and scrubs both config paths AND both credentials paths (path-identity dedup).testutil.Setupdelegates tostatedirtest.Hermetic(full 7-var; closes the macOS/Windows real-dir leak).Save()was already the family-reference atomic implementation — retained as-is.§3.2 acceptance matrix — all 8 rows covered
Init-gate ordering pinned by seeding a legacy credentials file and asserting it's untouched post-abort (MON-5372 lesson).
Test plan
Closes #98