feat(auth): two-phase migrate write for the bundle-shape contract#38
Conversation
…ck-first record PR2 of the refresh-token series. Hardens `migrateLegacyAuth` against two classes of issue exposed by PR1's bundle contract: 1. Retry-after-failure clobbers a freshly-logged-in v2 record. Pre-existing behaviour used `upsert` (replace-not-merge), which would destroy any newer state if a previous run's `markMigrated()` had failed and the user has since logged in afresh between attempts. 2. A partial migration leaves the user without working credentials. The keyring-first write meant a crash between `setSecret` and `upsert` left a keyring orphan and no record — the CLI couldn't read the token until the next migration attempt completed. Both are solved by restructuring the migration write into two phases: - **Phase 1 (record-first)**: write the v2 record with `fallbackToken: legacyToken` and `hasRefreshToken: false`. Atomic via the new `UserRecordStore.tryInsert?` hook (added in PR1) when the store implements it; falls back to list-then-upsert otherwise (narrow TOCTOU window, tolerable for a one-time migration). Returns `false` when a v2 record for the account already exists — Phase 2 is then skipped to preserve the live state. - **Phase 2 (keyring move)**: only when Phase 1 wrote the record. Delegates to `writeRecordWithKeyringFallback`, which moves the token to the per-user keyring slot and upserts a clean record without `fallbackToken`. Best-effort: a failure leaves the Phase 1 fallback in place and the CLI continues to work via the plaintext fallback; a later `set()` or `setBundle()` upgrades the credential naturally. Behaviour preserved for all existing callers: - The contract surface (`migrateLegacyAuth` signature, `MigrateAuthResult`, `MigrateSkipReason`) is unchanged. - Single-token records still end up with `hasRefreshToken: false` (PR1's `writeRecordWithKeyringFallback` already writes this). - Migrated records get `setDefaultId(account.id)` best-effort unless another default is already pinned (existing behaviour). - `markMigrated` is still the one-way gate; cleanup still runs concurrently. Test surface (382 → 386): - `tryInsert` path exercised when the store opts in. - Pre-existing v2 record skips Phase 2 (both atomic and legacy paths). - Phase 2 keyring failure is benign — Phase 1 fallback survives, marker + cleanup still run. - Existing legacy-keyring / WSL / cleanup tests updated for the new two-upsert happy-path call count. - `buildUserRecords` test harness now accepts `withTryInsert: true` so tests can pick the atomic vs legacy code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
doistbot
left a comment
There was a problem hiding this comment.
This PR addresses migration data loss and race conditions by restructuring the legacy auth migration into a two-phase process that secures a fallback record before modifying the keyring. The approach effectively mitigates TOCTOU vulnerabilities and crash-related token loss, providing a much safer transition path. There are a few areas to refine, such as ensuring existing v2 records are fully readable before skipping the second phase, tightening error handling during the keyring move, and strengthening the test suite against implementation coupling and repetitive setup.
8 findings on #38. All resolved. P1 — correctness - Verify the existing v2 record is readable before retiring the legacy state. A clean record from a prior `set()` / `setBundle()` has no `fallbackToken` and depends on the keyring being reachable; if it isn't, cleaning up the legacy slot would brick the user. New `verifyExistingRecordReadable` short-circuits with `skipped(legacy-keyring-unreachable)` so the next retry has a working credential to fall back to. P2 — correctness - Inline Phase 2 instead of delegating to `writeRecordWithKeyringFallback`. The delegated path double-upserted the same fallback record on the offline branch (Phase 1 had already written it). Phase 2 now calls `setSecret` directly, falls through on `SecureStoreUnavailableError`, and only upserts the clean shape when the keyring write actually succeeded. - Narrow the Phase 2 catch. Only `SecureStoreUnavailableError` is swallowed; other failures surface as `skipped(user-record-write-failed)` so operators see transient issues and the marker stays unset for retry. P2 — refactor / DRY - Extract `buildSingleTokenRecord(account, fallbackToken?)` into `record-write.ts`. Migration Phase 1 and `writeBundleWithKeyringFallback` both use it — no more drift between the two record-shape sites. - `runMigration` test helper takes `withTryInsert?: boolean` and threads it through `buildUserRecords`. The three new manually-wired-up tests collapse into the standard helper, and pre-existing v2 record cases use `it.each([true, false])` to cover both paths with one body. P2 — tests - Replace the "exact upsert count" assertion with end-state assertions (token in keyring, no plaintext residue, legacy slot empty). - New `listSpy` on `buildUserRecords` so the `tryInsert` test can assert `list()` was never consulted — the TOCTOU race the atomic path is meant to prevent. - New regression: existing v2 record + keyring offline returns `skipped(legacy-keyring-unreachable)` without touching legacy state. - New regression: existing v2 record with `fallbackToken` skips the readability probe entirely (record is self-sufficient). - New regression: Phase 2 `SecureStoreUnavailable` falls back silently (Phase 1 record survives, marker still set, cleanup runs). - New regression: Phase 2 non-keyring `setSecret` failure surfaces as `skipped(user-record-write-failed)`. P3 — comment hygiene - Drop the "Phase 1 + Phase 2 = two upserts" restatement comment. - Drop the meta-comment justifying end-state assertions. - JSDoc on `ensureV2Record` now spells out "time-of-check, time-of-use" rather than the bare TOCTOU acronym. 386 → 389 tests pass. Check / type-check / build all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… → phase1Written No behaviour change. Two passes: 1. JSDoc audit — collapsed restated detail across the main function's numbered list, the two helper JSDocs, and the inline comments. The main `migrateLegacyAuth` block goes from 32 lines to 19; helper docs compress to 6 lines each; inline comments either drop (when they duplicated JSDoc) or trim to the WHY only. 2. Naming nit (per Scott on #38): `phase1Wrote` → `phase1Written`. Past participle reads cleanly as a boolean flag. Net -52 LOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@doistbot /review |
doistbot
left a comment
There was a problem hiding this comment.
This PR successfully implements the two-phase migration strategy, significantly improving the resilience of the authentication migration against partial failures and data loss. The structured approach to fallback handling provides a much safer upgrade path for legacy credentials. There are a few critical edge cases to resolve regarding unhandled storage rejections that could crash the CLI and a retry short-circuit that might leave tokens in plaintext indefinitely. Additionally, there are some opportunities to deduplicate token read logic, optimize configuration reads, and tighten up test assertions and observability logs.
10 findings on #38. All resolved. P1 — correctness - Phase 2 retry on existing fallback-bearing record. When the existence check returns false AND the existing record's `fallbackToken` matches our legacy token, treat it as a prior incomplete migration and finish Phase 2 (move to keyring, clear fallback). Without this, a transient Phase 2 failure stuck the user on plaintext fallback indefinitely; future retries skipped Phase 2 because "record exists". - `userRecords.list()` failures inside the readability branch are now caught — the existence/readability path runs through the outer `migrateLegacyAuth` try/catch via `ensureV2Record`'s richer return, and `runPhase2` similarly bubbles errors to the caller as `skipped(user-record-write-failed)`. P2 — refactor / DRY - New `keyring/internal.ts` with `readAccessTokenForRecord(record, secureStore)` returning a structured `ReadAccessTokenOutcome` union. Single source for "fallback first, then keyring slot, then error variants". `KeyringTokenStore.active()` and the migration's readability probe both call it; consumers map the outcome to their own error contract (typed `CliError` vs. `MigrateSkipReason`). - `ensureV2Record` returns `{ written: true } | { written: false; existing }` so the orchestration doesn't pay a second `list()` to re-fetch the existing record. Phase 2 retry and the external-record readability branch both reuse the returned record. - Phase 2 extracted into `runPhase2(userRecords, userSlot, account, legacyToken)` so the new "initial write" and "retry" paths share one implementation. P2 — observability - New `user-keyring-unreachable` skip reason. The per-user slot being offline is now distinct from the legacy slot being offline; operators no longer see "legacy credential unreachable" when the legacy slot was just read successfully. Both reasons stay retryable. P2 — docs - `buildSingleTokenRecord` JSDoc no longer claims `writeBundleWithKeyringFallback` uses it. The bundle path carries expiry fields the single-token shape doesn't, so they're intentionally separate; the helper captures only the `hasRefreshToken: false` + optional `fallbackToken` pair that the migration paths share. P3 — tests - Dropped the `upsertSpy` count assertion in the Phase 2 SecureStoreUnavailable test (coupled to non-atomic Phase 1). - New regression: existing clean v2 record + per-user slot empty → `skipped(user-record-write-failed)`, marker stays unset. - New regression: existing record with external `fallbackToken` skips Phase 2 AND never probes the per-user slot (`km.getCalls` assertion locks the no-IPC behaviour). - New regression: existing record carrying our legacy token retries Phase 2 cleanly (setSecret + clean upsert; final state matches a one-shot migration). - Dropped the "One handle for both…" restated comment per the repo's "comments explain why, not what" guidance. 386 → 391 tests pass. Check / type-check / build all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## [0.18.0](v0.17.0...v0.18.0) (2026-05-19) ### Features * **auth:** two-phase migrate write for the bundle-shape contract ([#38](#38)) ([9cf5e9c](9cf5e9c))
|
🎉 This PR is included in version 0.18.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary
PR2 of three. Hardens
migrateLegacyAuthagainst the two ways the storage contract change in #37 can bite a partially-completed migration:upsert(replace-not-merge), which would destroy any newer state if a previous run'smarkMigrated()had failed and the user has since logged in afresh.setSecretandupsertleft a keyring orphan + no record — the CLI couldn't read the token until the next migration attempt completed.Both solved by restructuring the migration write into two phases.
Approach
fallbackToken: legacyTokenandhasRefreshToken: false. Atomic via the new optionalUserRecordStore.tryInsert?hook (introduced by #37) when present; falls back to list-then-upsert otherwise — narrow TOCTOU window, tolerable for a one-time migration. Returnsfalsewhen a v2 record for the account already exists; Phase 2 is then skipped so the live record stays untouched.writeRecordWithKeyringFallback(single-source for the keyring-then-clean-upsert flow) which moves the token into the per-user slot and clears thefallbackToken. Best-effort: a failure leaves the Phase 1 fallback in place; reads keep working; a laterset()/setBundle()upgrades the credential naturally.Marker, default promotion, and cleanup ordering all unchanged.
Contract surface
No public API changes.
MigrateLegacyAuthOptions,MigrateAuthResult,MigrateSkipReasonare byte-for-byte identical. Stores that don't implement the optionaltryInsert?get the legacy code path.Test plan
npm run check— cleannpm run type-check— cleannpm test— 386 pass (382 → +4 new: tryInsert atomic path, pre-existing v2 record skips Phase 2 atomic, pre-existing v2 record skips Phase 2 legacy, Phase 2 keyring failure benign)npm run build— cleanNote on PR3
PR3 (refresh helper + PKCE.refreshToken via
oauth4webapi+flow.tsswap) lands next. Together the three PRs ship the full refresh-token feature.🤖 Generated with Claude Code