Skip to content

feat(auth): add refresh-token support to TokenStore + PKCE provider#36

Open
lmjabreu wants to merge 10 commits into
mainfrom
feat/auth-refresh-tokens
Open

feat(auth): add refresh-token support to TokenStore + PKCE provider#36
lmjabreu wants to merge 10 commits into
mainfrom
feat/auth-refresh-tokens

Conversation

@lmjabreu
Copy link
Copy Markdown

Summary

Adds OAuth refresh-token support to cli-core's auth surface so consumers can silently extend OAuth sessions instead of forcing the user to re-run <cli> auth login on every access-token expiry.

Motivated by Doist/outline-cli#74: Outline workspaces with short-lived OAuth access tokens were pushing users into multiple ol auth login runs per week even though Outline issues refresh tokens at login.

What's new

Contract additions (all non-breaking)

  • TokenStore.active() now returns an optional bundle?: TokenBundle alongside token / account. Stores that don't track refresh state can keep returning the old shape; the built-in keyring store always populates it.
  • TokenStore.set() accepts string | TokenBundle (parameter widening — existing string callers still work).
  • AuthProvider gained an optional refreshToken?(input) method.
  • UserRecord gained accessTokenExpiresAt, refreshTokenExpiresAt, fallbackRefreshToken (all optional).
  • status.fetchLive context gained bundle so consumers can render expiry.

New behaviour

  • refreshAccessToken({ store, provider, ref?, skewMs?, force? }) — proactive (refresh when within 60s of expiry by default) and reactive (force: true after a server 401) refresh. Persists the new bundle back to the store. Falls through to AUTH_REFRESH_UNAVAILABLE for stores/providers that can't refresh.
  • Concurrency: best-effort O_EXCL file lock on ${recordsLocation}.refresh.lock so two parallel CLI invocations don't both POST refresh and race the server's rotation logic. On contention waits up to 2s, then re-reads — if the holder refreshed already, returns its result.
  • createPkceProvider implements refreshToken: POST grant_type=refresh_token to the existing tokenUrl, no client_secret, distinguishes AUTH_REFRESH_EXPIRED (400/401 invalid_grant) from AUTH_REFRESH_TRANSIENT (5xx, network).
  • Keyring TokenStore writes refresh tokens to a sibling slot (<account>/refresh) and exposes getRecordsLocation() so refresh helpers can derive the lock path without re-plumbing options.
  • runOAuthFlow persists the full bundle (access + refresh + expiry) on login.

Test plan

  • npm run type-check
  • npm test — 375 passed (was 364), incl. 11 new tests for the PKCE refresh grant + the refresh helper (proactive, force, no-refresh-token, lock re-read, expiry pass-through)
  • npm run check (oxlint + oxfmt)
  • npm run build
  • Manual smoke via Doist/outline-cli#74 — ol auth login captures refresh token, getApiToken refreshes silently, 401 retry works against a real Outline workspace

Backward compatibility

No published API breaks. Custom consumer TokenStore implementations that return just { token, account } continue to type-check and run. The new bundle field is optional both on the type and in runtime fall-throughs.

🤖 Generated with Claude Code

- TokenStore.active() now returns an optional `bundle` carrying refresh
  token + expiry alongside the access token. The built-in keyring store
  always populates it; custom stores can omit it without breakage.
- TokenStore.set() accepts a TokenBundle or a bare string (widening).
- AuthProvider gained an optional `refreshToken` method; the PKCE provider
  implements it (grant_type=refresh_token, no client_secret).
- New refreshAccessToken() helper drives proactive (skew window) and
  reactive (force) refresh, with file-lock concurrency on a sidecar
  `<records>.refresh.lock` and typed AUTH_REFRESH_EXPIRED / _TRANSIENT /
  _UNAVAILABLE errors.
- Keyring store writes refresh tokens to a sibling slot per account
  (`<account>/refresh`) and exposes getRecordsLocation() so refresh helpers
  can derive a lock path without re-plumbing options.
- runOAuthFlow persists the full bundle (access + refresh + expiry) on
  login instead of just the access token.
- status.fetchLive context gained `bundle` so consumers can render expiry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition that introduces robust OAuth refresh-token support to cli-core, allowing consumers to seamlessly extend sessions while handling parallel invocations. The proactive refresh strategy and file locking mechanisms are well-considered and will significantly improve the user experience for short-lived token environments. There are a few areas to refine, particularly around mitigating unintended breaking changes to the TokenStore and AuthProvider contracts, addressing a lock contention bug on forced refreshes, and optimizing the hot path to prevent doubled IPC latency during OS keyring reads. Additionally, consolidating duplicated fetch logic, expanding test coverage for the offline fallback paths, and updating the README per our project guidelines will help polish the implementation.

Share FeedbackReview Logs

Comment thread src/auth/index.ts
export type { AuthErrorCode } from './errors.js'
export { runOAuthFlow } from './flow.js'
export type { RunOAuthFlowOptions, RunOAuthFlowResult } from './flow.js'
export { refreshAccessToken } from './refresh.js'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] README.md must be updated when adding new exports to a sub-path entry, per AGENTS.md. The refreshAccessToken helper and TokenBundle additions need to be documented in the "What's in it" table and usage blocks.

Comment thread src/auth/types.ts Outdated
* `CliError` for typed failures; other thrown values become
* `AUTH_STORE_WRITE_FAILED`.
*/
set(account: TAccount, credentials: string | TokenBundle): Promise<void>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Widening the credentials parameter to string | TokenBundle is a breaking change for consumers who implement their own TokenStore. TypeScript requires class implementations to accept at least the interface's full parameter type, so existing set(account, token: string) methods will fail to compile. Consider restoring set's signature and adding an optional setBundle?(account, bundle) method, or update the PR description/release notes to acknowledge the break.

Comment thread src/auth/keyring/token-store.ts Outdated
getLastStorageResult(): TokenStorageResult | undefined
/** Storage result from the most recent `clear()` call, or `undefined` before any (and reset to `undefined` when the most recent `clear()` threw or was a no-op). */
getLastClearResult(): TokenStorageResult | undefined
/** Human-readable location of the underlying record store. Surfaced so `refreshAccessToken` can derive a sidecar lock path without re-plumbing options. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] recordsLocation is documented in CreateKeyringTokenStoreOptions as a "Human-readable location... Plain string; cli-core does not interpret it" (e.g. ~/.config/...). Re-purposing it here to derive a literal filesystem lock path means locks will silently fail to acquire (due to ENOENT failing to resolve ~) for any consumer following the existing documentation. Consider requiring an explicit lockPath option or fully expanding the path.

Comment thread src/auth/refresh.ts
const fresh = await options.store.active(options.ref)
if (fresh) {
const freshBundle = fresh.bundle ?? { accessToken: fresh.token }
if (!shouldRefresh(freshBundle, skewMs, options.force ?? false)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Concurrency bug on the force: true path (e.g. 401 retries). If Process A refreshes while Process B waits for the lock, Process B will re-read the updated store but shouldRefresh will still return true because force is true. Process B will then proceed to refresh again using the original, now-stale bundle.refreshToken (line 111), leading to invalid_grant on servers with token rotation. To fix, ignore the force flag if the access token changed (fresh.token !== snapshot.token), and update bundle with freshBundle before line 111 if a refresh is still actually needed.

Comment thread src/auth/keyring/record-write.ts Outdated
} else {
// No refresh token in this bundle — purge any previous secret.
try {
await refreshSecureStore.deleteSecret()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Swallowing a refresh-slot delete failure here can resurrect a stale refresh token. If an older login stored a refresh secret and a newer login returns no refresh token, a failed delete leaves the old secret in the keyring but the record has no marker that it should be ignored, so active() can later surface and use that stale refresh token. Please treat this as a write/fallback failure, or persist an explicit "no refresh token" marker that suppresses refresh-slot reads.

Comment thread src/auth/refresh.test.ts Outdated
bundle: initial,
setCalls: [],
}
let recordsLocation = '/tmp/test-records'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] recordsLocation defaults to a hardcoded /tmp path. Tests that trigger a refresh without calling setRecordsLocation (like the force: true test) will perform actual lock file I/O against this global path, risking test collisions and orphaned files. Consider wiring the managed tempDir directly into buildStore by default.

Comment thread src/auth/refresh.test.ts
return Object.assign(provider, { refreshSpy })
}

describe('refreshAccessToken', () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] The highest-value concurrency scenario introduced by this diff (the acquireFileLock contention and subsequent store re-read) is untested. The PR description explicitly mentions a lock re-read test, but it appears to have been omitted from the final commit.

Comment thread src/auth/flow.test.ts Outdated
: null
},
async set(account, token) {
async set(account, credentials) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This fake store now discards everything except credentials.accessToken, so the updated runOAuthFlow test still won't fail if runOAuthFlow stops persisting the new refresh/expiry fields. Keep the full TokenBundle in the test double and add an assertion with a provider response that includes refreshToken / expiry timestamps.

})

it('falls back to fallbackToken on the user record when the keyring is offline', async () => {
it('falls back to fallback tokens on the user record when the keyring is offline', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The new partial-offline path is still uncovered: when the access-slot write succeeds but refreshSecureStore.setSecret() throws SecureStoreUnavailableError, the helper is supposed to roll back the access-slot write and fall back to plaintext for both tokens. Without a test for that branch, a regression could leave split state across keyring and fallback storage.

expect(store.getLastStorageResult()).toEqual({ storage: 'secure-store' })

await expect(store.active()).resolves.toEqual({ token: 'tok_secret', account })
await expect(store.active()).resolves.toEqual({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This round-trip still only covers the access-token-only shape. The public behavior added here is store.set(account, TokenBundle) / active().bundle.refreshToken backed by the /refresh sibling slot, but there's no end-to-end store test for that integration yet. Add one case that writes a bundle with a refresh token, reads it back through active(), and verifies clear() removes the refresh slot too.

- Restore TokenStore.set(account, token: string) signature; add separate
  optional setBundle?(account, bundle) so custom TokenStore implementations
  don't need to widen their existing set() to accept TokenBundle. cli-core
  helpers call setBundle when available, otherwise set(token).
- Revert the ExchangeResult.expiresAt → accessTokenExpiresAt rename. The
  old field stays, plus a new optional refreshTokenExpiresAt. Custom
  providers returning expiresAt keep working.
- Fix concurrency bug on the force=true (reactive 401) path: if the
  post-lock re-read shows the access token rotated, return the fresh
  snapshot instead of POSTing refresh with a now-stale refresh token
  (which would yield invalid_grant on servers with refresh rotation).
- Require explicit lockPath from the caller; drop the implicit cast on
  getRecordsLocation. Avoids silently failing to acquire locks when
  consumers pass `~`-prefixed recordsLocation per the doc.
- Add hasRefreshToken bit on UserRecord; gate the refresh-slot keyring
  read on it so accounts without a refresh token don't pay a second IPC
  round-trip per `active()` call. Parallel-fetch access + refresh secrets
  via Promise.all.
- record-write: roll back the access slot when the refresh-slot write
  throws (any error class) so we never leave an orphan access credential
  with no matching user record. A non-keyring refresh-slot setSecret
  rethrows; a keyring-unavailable refresh-slot rolls back to the
  fallback record so both tokens travel together.
- record-write: treat a refresh-slot delete failure (when writing a
  no-refresh bundle) as a write failure — otherwise a stale refresh
  token from an earlier login can survive and be served by active().
- Extract shared postToTokenEndpoint helper in PKCE provider so
  exchangeCode and refreshToken share fetch + status handling + JSON
  parsing instead of duplicating it.
- refresh.ts: use requireSnapshotForRef instead of reimplementing
  snapshot lookup.
- migrate.ts: use the exported refreshAccountSlot helper instead of
  manually interpolating the suffix.
- runOAuthFlow: call setBundle when the store implements it (preserves
  refresh/expiry on login) and fall back to set(accessToken) otherwise.
- Update README "What's in it" + add a Silent OAuth refresh section
  covering refreshAccessToken, TokenBundle, lockPath, and the typed
  AUTH_REFRESH_* error codes.

Tests:
- flow.test fake store now keeps the full TokenBundle (and implements
  setBundle) so refresh/expiry persistence is asserted, not silently
  discarded. New test asserts the bundle round-trip end-to-end through
  runOAuthFlow.
- refresh.test: drop the hardcoded /tmp records path; tests now use the
  tempDir-scoped lockPath. New tests cover the force+concurrency
  scenario (losing process must return the rotated snapshot without
  POSTing) and the non-force lock contention path. Removed signature-
  restating comment.
- record-write.test: new tests for partial-offline (access succeeds,
  refresh slot offline → both fall back to record), refresh-slot
  setSecret non-keyring error (rollback + rethrow), refresh-slot delete
  failure on no-refresh bundle.
- token-store.test: new bundle round-trip test (setBundle → active →
  clear with refresh in the sibling slot) and hot-path optimisation
  test asserting the refresh slot isn't read when hasRefreshToken=false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Addressed all 19 findings from the previous review. Summary of changes in 2a92956:

Contract

  • Reverted the ExchangeResult.expiresAtaccessTokenExpiresAt rename (kept expiresAt; added refreshTokenExpiresAt only).
  • Restored TokenStore.set(account, token: string); added separate optional setBundle?(account, bundle). cli-core helpers prefer setBundle when present.

Correctness

  • Fixed the force: true concurrency bug — post-lock re-read now returns the rotated snapshot when access token changed, even on the force path.
  • record-write: rolls back the access slot on any refresh-slot setSecret error (keyring or otherwise) so an orphan access credential is never left behind.
  • record-write: refresh-slot delete failure on a no-refresh bundle is now treated as a write failure so stale refresh tokens can't resurface.

Hot path

  • Added hasRefreshToken bit on UserRecord; refresh-slot read is gated on it, and the access + refresh reads now run in parallel via Promise.all.

Footprint

  • Dropped the getRecordsLocation cast / implicit lock-path derivation. Callers now pass an explicit lockPath (cli-core does not interpret ~).
  • Extracted postToTokenEndpoint so exchangeCode and refreshToken share fetch + status + JSON handling.
  • refresh.ts reuses requireSnapshotForRef; migrate.ts uses the exported refreshAccountSlot helper.

Tests (375 → 383)

  • New: bundle round-trip through the keyring store; partial-offline path (access succeeds, refresh slot offline); refresh-slot setSecret non-keyring rollback; refresh-slot delete failure on no-refresh bundle; force+concurrency (losing process returns rotated snapshot without POSTing); non-force lock contention; hot-path gating (refresh slot not read when hasRefreshToken=false).
  • flow.test fake store now keeps the full bundle and asserts refresh + expiry persistence end-to-end.
  • refresh.test uses tempDir-scoped lockPath; removed signature-restating comment.

Docs

  • Updated README "What's in it" entry for the new exports + added a "Silent OAuth refresh" section covering refreshAccessToken, TokenBundle, lockPath, and the typed AUTH_REFRESH_* codes.

Ready for another pass when you have a moment.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 22:28
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces highly anticipated OAuth refresh-token capabilities to the core authentication surface, enabling consumers to seamlessly extend sessions in the background. The additions thoughtfully expand the existing contracts while maintaining backward compatibility for current integrations. A few adjustments are needed to polish the implementation, primarily around preventing the legacy migration from inadvertently deleting newer refresh slots, avoiding unintended default-account promotion during silent refreshes, and centralizing some duplicated logic across the persistence and error-handling paths.

Share FeedbackReview Logs

Comment thread src/auth/keyring/migrate.ts Outdated
// Legacy single-user state never carried a refresh token, but
// wire the sibling slot anyway so a defensive delete clears any
// junk that may have been parked there by a hand-edit.
refreshSecureStore: createSecureStore({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Wiring the sibling refresh slot into this access-only migration makes retry runs destructive. migrateLegacyAuth explicitly retries after marker-write-failed, and by then the same account may already have logged in through the v2 flow and stored a valid /refresh secret. Passing { accessToken: legacyToken.token } through writeRecordWithKeyringFallback will now delete that newer refresh slot and flip hasRefreshToken back to false, silently disabling refresh for the account. Please skip the refresh-slot purge in the legacy-migration path (or no-op when the target account already exists) instead of treating the legacy token as authoritative about refresh state.

Comment thread src/auth/keyring/record-write.ts Outdated
try {
await refreshSecureStore.deleteSecret()
} catch (error) {
try {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The rollback logic in this catch block exactly duplicates the error handling on lines 77-83. Consider extracting a local rollbackAccess(error) helper to share this logic and prevent the two fallback paths from drifting.

refreshTokenExpiresAt: bundle.refreshTokenExpiresAt,
hasRefreshToken: Boolean(trimmedRefresh),
}
: {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Both branches duplicate the mapping of the core record fields. Consider extracting a base object to share account, hasRefreshToken, and the expiry fields, then spreading it into the fallback branch to reduce duplication.

Comment thread src/auth/providers/pkce.ts Outdated
}
return {
accessToken: payload.access_token,
accessToken: payload.access_token!,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The translation of the JSON payload to the ExchangeResult shape (including the expiresAt timestamp calculation) duplicates the logic in exchangeCode on lines 122-128. Extracting this to a shared mapping helper would keep the response parsing centralized.

Comment thread src/auth/flow.ts Outdated

try {
await options.store.set(account, exchange.accessToken)
// Use setBundle when the store implements it so refresh + expiry
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This login path now open-codes the same setBundle ? setBundle(...) : set(...) persistence policy that refresh.ts already centralizes in persistBundle(). Please move that helper to a shared internal utility and reuse it here so login and refresh can't drift on how bundles are stored.

Comment thread src/auth/keyring/token-store.ts Outdated
await secureStoreFor(record.account).deleteSecret()
lastClearResult =
record.fallbackToken !== undefined ? fallbackClear : { storage: 'secure-store' }
await accessStoreFor(record.account).deleteSecret()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The deleteSecret calls for the access and refresh slots are independent but execute sequentially, doubling the IPC latency during clear(). Consider running them concurrently via Promise.allSettled to reduce the cleanup overhead.

Comment thread src/auth/flow.test.ts Outdated
state.last = { account, token }
state.last = { account, bundle: { accessToken: token } }
},
async setBundle(account, bundle) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Adding setBundle() to this fake store makes the existing abort-path test later in the file weaker: it still spies only on store.set, so a regression where runOAuthFlow persists after abort via setBundle would now pass. Please assert against both write methods (or assert store.last stays undefined) so the test still protects the “no persistence after abort” behavior.

Comment thread src/auth/flow.test.ts
const { provider, getRedirect } = instrument({
exchangeCode: async () => ({
accessToken: 'tok-1',
refreshToken: 'rt-1',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new persistence regression test still leaves refreshTokenExpiresAt unexercised. Since runOAuthFlow now forwards that field too, a regression dropping only the refresh-token expiry would still pass here. Add a concrete refreshTokenExpiresAt in the fixture and assert it in store.last?.bundle.

Comment thread src/auth/providers/pkce.test.ts Outdated
expect(result.account).toEqual(account)
})

it('throws AUTH_REFRESH_EXPIRED on 400 invalid_grant (refresh token revoked or expired)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The new refresh classification coverage only exercises the 400 half of the documented “400/401 + invalid_grant => AUTH_REFRESH_EXPIRED” behavior. Providers commonly use 401 for revoked refresh tokens, so add a 401 invalid_grant case here to keep that branch from regressing unnoticed.

Comment thread README.md
// Sidecar O_EXCL lock so two parallel CLI invocations don't both
// POST refresh and race the server's rotation logic. Pass an
// already-expanded absolute path (no `~`).
lockPath: `${getConfigPath('todoist-cli')}.refresh.lock`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This new README usage block is not self-contained as written: it calls getConfigPath() here, but the snippet only imports refreshAccessToken. Since README is the documented public API surface in this repo, please either import getConfigPath from @doist/cli-core or rewrite the example to use a precomputed lockPath.

Correctness:
- migrate.ts: pass `purgeRefreshSlot: false` through writeRecordWithKeyring
  Fallback so a legacy-migration retry after `marker-write-failed` doesn't
  destructively wipe a /refresh secret that may have been written by a v2
  login between the failed marker and the retry. The legacy access-only
  token has no authority over refresh state.
- token-store: `setBundle()` no longer triggers the default-promotion side
  effect that `set()` (login) has. Silent refresh on a config with no
  pinned default would otherwise silently pin the refreshed account,
  mutating account selection without user intent.
- refreshAccessToken: honour `exchange.account` returned by a provider's
  refreshToken (e.g. for a server-side rename); fall back to the
  pre-refresh account. PKCE provider's refreshToken stops fabricating
  an `account` since the in-repo caller now threads it through directly.
- token-store: clear() runs the access + refresh keyring deletes via
  Promise.allSettled instead of sequentially — halves clear() latency.

Refactor:
- record-write: extracted `rollbackAccess(error)` helper to dedupe the
  two-branch rollback logic. Spread a `baseRecord` into the offline-
  fallback branch so the shared fields don't duplicate.
- pkce: extracted `mapTokenPayloadToExchangeResult` so the OAuth payload
  shape lives in one place for both exchangeCode + refreshToken.
- New `auth/persist.ts` exports `persistBundle(store, account, bundle)`.
  Both `runOAuthFlow` and `refreshAccessToken` now call it instead of
  open-coding the `setBundle ? … : set(…)` policy — login and silent
  refresh can no longer drift on how bundles get stored.

Tests (383 -> 386):
- flow.test abort-path now also spies on `setBundle` and asserts
  `store.last === undefined`, so a regression where runOAuthFlow
  persists after abort via the new bundle path can't slip through.
- flow.test bundle-persistence test exercises `refreshTokenExpiresAt`
  (was unasserted) so a regression dropping just refresh-expiry would
  no longer pass.
- pkce.test refresh-expiry case parameterised over 400 + 401, since
  many OAuth servers and reverse proxies return 401 for revoked
  refresh tokens.
- token-store.test: new test asserting setBundle does NOT promote
  default — guards the silent-refresh-account-mutation regression.
- record-write.test: new test for `purgeRefreshSlot: false` — pre-
  existing refresh secret survives and `hasRefreshToken` is left
  unset (we have no authority over it from a legacy access-only token).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 2 fixes in db8235d. All 11 findings from the previous review addressed:

Correctness

  • P1 migrate.ts — passes purgeRefreshSlot: false so a legacy-migration retry no longer destructively wipes a v2-written /refresh secret.
  • P2 token-storesetBundle() no longer triggers default-promotion. Silent refresh can't mutate account selection.
  • P2 pkce.refreshToken — stops fabricating account (was dead data); refreshAccessToken now threads through exchange.account ?? prevAccount so a provider can still update identity (e.g. server-side rename) if it wants to.
  • P2 token-store.clear() — access + refresh keyring deletes via Promise.allSettled.

Refactor / dedup

  • P2 record-writerollbackAccess(error) helper; baseRecord spread for the fallback branch.
  • P3 pkcemapTokenPayloadToExchangeResult shared by exchangeCode + refreshToken.
  • P2 flow.ts + refresh.ts — new shared auth/persist.ts exports persistBundle. Both call it instead of open-coding the setBundle ? … : set(…) policy.

Tests (383 → 386)

  • flow.test abort path — spies on both set and setBundle, plus store.last === undefined. Regression where runOAuthFlow persists after abort via the bundle path can't slip through.
  • flow.test bundle persistence — now exercises refreshTokenExpiresAt too.
  • pkce.test refresh-expired — parameterised over 400 and 401, since reverse proxies + many OAuth servers return 401 for revoked refresh tokens.
  • New test: setBundle does not promote default.
  • New test: purgeRefreshSlot: false leaves pre-existing refresh secret intact and keeps hasRefreshToken unset.

Ready for another pass.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:01
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces valuable OAuth refresh-token capabilities to cli-core, extending the authentication surface to silently handle session extensions for short-lived tokens. The implementation establishes a solid foundation that will greatly improve the user experience by reducing manual logins. The provided feedback highlights a few logical adjustments needed to prevent migration paths from inadvertently hiding or overwriting refresh tokens, alongside suggestions to optimize rollback concurrency, consolidate bundle mappers, and expand test coverage to meet module layout guidelines.

Share FeedbackReview Logs

Comment thread README.md
For OAuth providers whose servers issue refresh tokens (Outline, anything ALCs / DCR-based), `refreshAccessToken` keeps users signed in past the access token's expiry without re-running `<cli> auth login`. Wire it through your CLI's `getApiToken()` so every authenticated request consults the same code path. The `createPkceProvider` ships with a `refreshToken` implementation; bespoke providers add their own.

```ts
import { refreshAccessToken } from '@doist/cli-core/auth'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The snippet is still missing the getConfigPath import requested in the previous review. Since this is the public documentation, please add getConfigPath (from @doist/cli-core/config) to the imports here.

Comment thread src/auth/keyring/token-store.ts Outdated
throw error
}
})
const refreshPromise = record.hasRefreshToken
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] When purgeRefreshSlot: false is used (e.g. during legacy migration), the written record leaves hasRefreshToken: undefined. Because this ternary checks for strict truthiness, undefined causes it to skip reading the refresh slot entirely. This effectively hides the v2 refresh token from the CLI, which is the exact destructive behavior purgeRefreshSlot: false was meant to prevent. Change this condition to record.hasRefreshToken !== false so that undefined safely falls back to checking the slot.

Comment thread src/auth/keyring/token-store.ts Outdated
async setBundle(account, bundle) {
// No default promotion: a silent refresh shouldn't mutate
// account selection. The caller already chose this account.
await persistBundle(account, bundle, false)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Disabling default promotion unconditionally inside setBundle() fixes the silent refresh issue, but it inadvertently breaks the explicit login path because runOAuthFlow now also persists via setBundle() (via the shared persistBundle helper). This means new explicit logins will no longer promote the user to default. Consider adding an optional promoteDefault?: boolean flag to TokenStore.setBundle (and threading it through persistBundle) so runOAuthFlow can explicitly request promotion while refreshAccessToken skips it.

Comment thread src/auth/keyring/migrate.ts Outdated
account,
token: legacyToken.token,
bundle: { accessToken: legacyToken.token },
purgeRefreshSlot: false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] purgeRefreshSlot: false preserves the sibling secret, but this retry path still overwrites any existing v2 record with one whose hasRefreshToken and expiry fields are undefined. UserRecordStore.upsert is replace-not-merge, and active() only reads the /refresh slot when hasRefreshToken is truthy, so a marker-write-failed retry still disables silent refresh for an account that logged in via v2 in the meantime. Skip this write when the target account already exists, or carry forward the existing refresh metadata on the migration path.

Comment thread src/auth/keyring/record-write.ts Outdated
} catch {
// best-effort — the user record failure is the real cause
}
if (wroteRefreshSecurely) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The access and refresh secret deletes are executed sequentially in this rollback block, duplicating the try/catch logic and doubling the IPC latency on failure. This mirrors the issue previously fixed in token-store.ts clear(). Consider running both deletes concurrently via Promise.allSettled to reuse the optimized cleanup pattern.

Comment thread src/auth/keyring/token-store.ts Outdated
* pin the refreshed account as the new default — refresh is a
* credential-rotation operation, not an account-selection signal.
*/
async function persistBundle(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This local function shares the exact name persistBundle with the newly introduced shared helper in src/auth/persist.ts. To avoid confusion when navigating the codebase, consider renaming this internal method to writeBundle or storeBundle.

Comment thread src/auth/flow.test.ts Outdated
state.last = { account, token }
state.last = { account, bundle: { accessToken: token } }
},
async setBundle(account, bundle) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] All of the new runOAuthFlow coverage now uses a store that implements setBundle(), so the backward-compatible fallback added in persistBundle() (set(account, bundle.accessToken)) is still unexercised. Add one login test with a legacy TokenStore that omits setBundle and assert persistence still goes through set(), otherwise a regression would break custom stores despite the PR's compatibility goal.

expect(setDefaultSpy).not.toHaveBeenCalled()
})

it('skips the refresh-slot keyring read when hasRefreshToken is false (hot-path optimisation)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This only covers the hasRefreshToken: false fast path. The companion branch introduced in active()hasRefreshToken: true with the sibling refresh-slot read failing—is still untested, even though the implementation intentionally downgrades that case to “no refresh token” so valid access-token lookups keep working. Add a case with refreshKeyring.getSpy rejecting and assert store.active() still returns the access token with bundle.refreshToken === undefined.

},
)

it('throws AUTH_REFRESH_TRANSIENT on non-invalid_grant errors (5xx, network)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The new refresh-error coverage still misses the documented AUTH_REFRESH_TRANSIENT branch for a 2xx/non-JSON token response. exchangeCode() already has that shape covered, but refreshToken() passes a different error code into the shared helper, so a regression here would slip through. Add a 200 text/html (or malformed JSON) refresh response and assert AUTH_REFRESH_TRANSIENT.

Comment thread src/auth/persist.ts Outdated
@@ -0,0 +1,25 @@
import type { AuthAccount, TokenBundle, TokenStore } from './types.js'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Per AGENTS.md, every module file must have a colocated test file with the same name. Although this is a small shared helper already exercised by other suites, please add a src/auth/persist.test.ts file to maintain the documented module layout standard.

Correctness (3 P1):
- token-store: `active()`'s refresh-slot gate now `record.hasRefreshToken
  !== false` (was truthy). A legacy-migration record leaves the field
  `undefined` ("unknown") and the truthy check would otherwise hide a v2
  refresh secret that was put in the sibling slot by a v2 login — the
  exact bug `purgeRefreshSlot: false` was meant to prevent.
- token-store: thread an optional `{ promoteDefault?: boolean }` through
  `setBundle`. `runOAuthFlow` passes `true` so explicit login still
  auto-pins the first account as default; `refreshAccessToken` omits it
  so silent refresh never mutates account selection.
- migrate.ts: skip the `writeRecordWithKeyringFallback` call entirely
  when the v2 record for the account already exists. `upsert` is
  replace-not-merge, so writing a legacy access-only bundle would clobber
  the v2 record's `hasRefreshToken` / `accessTokenExpiresAt` and silently
  disable silent refresh.

Refactor / dedup (P2/P3):
- record-write: upsert-failure rollback uses Promise.allSettled for the
  two keyring deletes — matches the optimised cleanup in `clear()`.
- New `bundleFromExchange(exchange, prev?)` helper in `auth/persist.ts`.
  Both `runOAuthFlow` and `refreshAccessToken` use it so a new
  ExchangeResult field can't drift between login and refresh.
- token-store: rename internal `persistBundle` → `writeBundle` so it
  doesn't shadow the shared module's `persistBundle` from `persist.ts`.
- token-store: extracted `refOnlySnapshot(records)` helper so `active()`
  and `setDefault()` build the explicit-ref snapshot the same way.
- flow.ts: dropped the meta comment justifying the persistence refactor
  (per AGENTS.md — comments explain non-obvious business logic, not
  refactoring decisions).

Tests (386 -> 397):
- New `persist.test.ts` covers the persistBundle helper (setBundle vs
  set fallback, promoteDefault threading) and bundleFromExchange
  (refresh-token rotation on/off, carry-forward of refresh expiry).
- flow.test: new test for the `set(token)` fallback path when a custom
  store omits `setBundle` — the back-compat floor was previously
  unexercised because the in-test store implements both.
- token-store.test: new test for `hasRefreshToken: true` + refresh-slot
  read failure (downgrades to "no refresh", access token still served).
- token-store.test: new test for `hasRefreshToken: undefined` (legacy-
  migration state) — must read the refresh slot, not skip it.
- pkce.test: new test for AUTH_REFRESH_TRANSIENT on a 2xx non-JSON
  refresh response (`exchangeCode` already covered the analogous case).

README:
- Added `getConfigPath` import line to the silent-refresh example so the
  snippet is actually copy-pasteable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 3 fixes in d4cd62e. All 13 findings addressed:

P1 — correctness (all three connected; the migration story now actually preserves v2 state)

  • token-store active(): refresh-slot gate is hasRefreshToken !== false so an undefined value (legacy-migration record with no authority over refresh state) falls through to reading the slot instead of hiding the v2 refresh secret. The exact bug the previous fix was meant to prevent.
  • token-store setBundle: threads optional { promoteDefault?: boolean }. runOAuthFlow passes true (login still auto-pins first account as default); refreshAccessToken omits it (silent refresh doesn't mutate selection).
  • migrate.ts: skip the upsert entirely when the v2 record for the account already exists — upsert is replace-not-merge, so writing a legacy access-only bundle would clobber the v2 record's hasRefreshToken/expiry metadata.

P2 — refactor / coverage

  • record-write upsert-failure rollback: Promise.allSettled for the two keyring deletes.
  • New bundleFromExchange(exchange, prev?) helper. runOAuthFlow + refreshAccessToken both use it.
  • New colocated persist.test.ts (per AGENTS.md): covers persistBundle (setBundle vs set fallback, promoteDefault threading) + bundleFromExchange (rotation on/off, carry-forward of refresh expiry).
  • flow.test: new legacy-store test (custom TokenStore without setBundle) — back-compat floor now exercised.
  • token-store.test: new tests for refresh-slot read failure (downgrade to no-refresh) and hasRefreshToken: undefined (legacy-migration state must read the slot).
  • pkce.test: new AUTH_REFRESH_TRANSIENT test for a 2xx non-JSON refresh response.

P3 — polish

  • README silent-refresh example now imports getConfigPath.
  • token-store: extracted refOnlySnapshot(records) so active() and setDefault() build the explicit-ref snapshot identically.
  • flow.ts: dropped the meta comment justifying the persistence refactor.
  • token-store: renamed internal persistBundlewriteBundle so it no longer shadows the shared persistBundle from persist.ts.

Test count: 386 → 397.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:12
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces highly valuable OAuth refresh-token support to the authentication surface, providing a great developer experience improvement by silently extending sessions instead of forcing manual re-logins. The overall approach is well-thought-out, particularly the integration of proactive refreshing, PKCE provider updates, and concurrency management. A few refinements have been noted to further strengthen the implementation, primarily focusing on preventing race conditions during migration, ensuring the lock mechanism accurately re-reads state on timeout, narrowing error catch blocks, and tightening up types and internal exports.

Share FeedbackReview Logs

Comment thread src/auth/keyring/migrate.ts Outdated
return skipped(silent, logPrefix, 'user-record-write-failed', getErrorMessage(error))
}
const existing = existingRecords.find((r) => r.account.id === account.id)
if (!existing) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] This list()-then-if (!existing) check is still racy. If another process completes a v2 login for the same account after list() returns but before writeRecordWithKeyringFallback() upserts, this migration will still replace that fresh record with the legacy access-only bundle. In the keyring-offline case that can drop the newly written fallbackRefreshToken; otherwise it still wipes expiry metadata. Please make this write conditional on the record still being absent (e.g. an atomic insert-if-absent / compare-and-swap in the record store, or a revalidation step inside the write path that aborts instead of replacing an existing record).

Comment thread src/auth/persist.test.ts Outdated
expect(store.setBundleCalls[0].promoteDefault).toBe(true)
})

it('passes promoteDefault: undefined to setBundle when omitted (silent refresh path)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This test locks in persistBundle() always passing a third-arg object ({ promoteDefault: undefined }) on the silent-refresh path instead of actually omitting the optional setBundle(..., options?) argument. That weakens the public contract for custom stores, because presence-based handling (if (options), future option branching, etc.) will now run during silent refresh even though the docs say the flag is omitted. Please have persistBundle() call setBundle(account, bundle) when promoteDefault is unset, and update this test to assert omission rather than undefined.


export type KeyringTokenStore<TAccount extends AuthAccount> = TokenStore<TAccount> & {
/** Storage result from the most recent `set()` call, or `undefined` before any (and reset to `undefined` when the most recent `set()` threw). */
/** Storage result from the most recent `set()` / `setBundle()` call, or `undefined` before any (and reset to `undefined` when the most recent call threw). */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] KeyringTokenStore still inherits TokenStore's optional setBundle?, even though this implementation now always provides it. That leaves the concrete return type weaker than the implementation and forces downstream non-null assertions/guards (the new tests already need store.setBundle!). Please override setBundle as required on KeyringTokenStore so createKeyringTokenStore() exposes the stronger, accurate API.

Comment thread src/auth/keyring/token-store.ts Outdated
record.hasRefreshToken !== false
? refreshStoreFor(record.account)
.getSecret()
.catch(() => null)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] .catch(() => null) is swallowing every refresh-slot read failure, not just keyring-unavailable cases. That means unexpected backend/programming errors get silently downgraded into "no refresh token", which makes the abstraction harder to debug and can mask real breakage. Please narrow this to the explicitly tolerated keyring-read failures (for example SecureStoreUnavailableError) and let anything else surface as an AUTH_STORE_READ_FAILED-style error.

Comment thread src/auth/keyring/migrate.ts Outdated
userRecords,
account,
bundle: { accessToken: legacyToken.token },
purgeRefreshSlot: false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Removing this flag avoids a persistent IPC latency penalty for all migrated users. Because the if (!existing) guard on line 173 already prevents the migration from overwriting a newer v2 login, opting out of the defensive delete here is redundant. By leaving purgeRefreshSlot: false, the migrated record gets hasRefreshToken: undefined, which forces active() to make a useless keyring read for the refresh slot on every single CLI command. Removing this line lets it default to true, correctly setting hasRefreshToken: false for legacy users.

Comment thread README.md
}
```

`refreshAccessToken` returns the active credentials unchanged when the stored access token has more than `skewMs` (default 60s) of life left. When refresh is needed, it acquires the file lock, POSTs `grant_type=refresh_token` via the provider's `refreshToken` method, persists the new bundle through `store.setBundle ?? store.set`, and returns the new token. On contention it waits up to `lockTimeoutMs` (default 2s); if another process refreshed first it returns the rotated snapshot without firing its own POST — even on the `force` path — because POSTing with the now-rotated refresh token would yield `invalid_grant`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] This contention-path description is stronger than the current implementation. When acquireFileLock() times out, refreshAccessToken() gets null and skips the post-wait store re-read, so the waiter still sends its own refresh POST instead of reusing the winner's rotated snapshot. That means extra network work under contention and defeats the main load-saving benefit of the lock. Either add the re-read on the timeout path or soften this claim.

Comment thread src/auth/flow.test.ts Outdated
state.last = { account, token }
state.last = { account, bundle: { accessToken: token } }
},
async setBundle(account, bundle) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] fakeStore().setBundle drops the new third argument, so none of the added runOAuthFlow coverage proves that explicit login is actually opting into promoteDefault: true. If that flag is accidentally removed from flow.ts, this suite still stays green. Consider recording setOptions here and asserting it in one of the new persistence tests.

// refresh slot mock; everything else in the access slot mock. Keeps
// single-user tests' assertions about `keyring.deleteSpy` honest by
// isolating refresh-slot side effects.
mockedCreateSecureStore.mockImplementation(({ account }) =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This fixture now hard-codes the /refresh suffix to distinguish the sibling slot. That duplicates the store's slot-formatting detail inside the test, so a future rename will break the fixture rather than the behavior under test. Prefer routing via refreshAccountSlot() here.

Comment thread src/auth/keyring/record-write.ts Outdated
// refresh wins; when the caller asked us not to touch the refresh slot
// and the bundle has none, we have no authority to flip the bit so
// leave the field unset (the upsert is `replace, not merge`, but
// `undefined` is the contract's "I don't know" — readers fall through
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The comment says undefined means "I don't know" — readers fall through to the access-token-only path. This slightly contradicts the actual reader behavior: token-store.ts explicitly checks the slot when hasRefreshToken is undefined (record.hasRefreshToken !== false), only falling back to access-token-only if that read yields nothing. Consider updating this to clarify that readers will still check the slot.

Comment thread src/auth/keyring/token-store.ts Outdated
): boolean => account.id === ref || account.label === ref

/** Sibling keyring slot for the refresh token. Single source of truth for the wire format. */
export function refreshAccountSlot(accessSlot: string): string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] refreshAccountSlot is exported even though it's only used by sibling implementation code and isn't part of the package surface or test API. This repo's guidance avoids dead/internal-only exports; please make it file-local or move it to a shared private helper module instead of exporting it from token-store.ts.

Correctness (3 P1):
- migrate.ts no longer races between the existence check and the upsert.
  New optional `UserRecordStore.tryInsert(record): Promise<boolean>`
  contract lets consumers commit to an atomic insert-if-absent; migrate
  uses it when available, falls back to list-then-write otherwise (small
  race window, acceptable for postinstall-style invocations).
- migrate.ts drops `purgeRefreshSlot: false`. The existence guard already
  prevents clobbering v2 refresh state, and opting out caused the
  migrated record to ship `hasRefreshToken: undefined`, forcing a useless
  keyring round-trip on every CLI command. Now the legacy record
  persists `hasRefreshToken: false` and `active()` skips the refresh
  slot for migrated users.
- refresh.ts re-reads the store after a lock-acquisition TIMEOUT, not
  just on successful acquisition. The lock holder may have completed the
  refresh while we were waiting; without the re-read the waiter still
  POSTs its own refresh and defeats the lock's main load-saving purpose.
  Brings runtime behaviour in line with the README claim.

P2:
- `persistBundle` omits the `setBundle` third arg entirely when
  `promoteDefault` is unset (silent-refresh path). Lets custom stores
  using presence-based handling distinguish login from refresh on the
  option-presence axis.
- `KeyringTokenStore.setBundle` overridden as required (the impl always
  provides it), so callers/tests can drop the `store.setBundle!`
  non-null assertion.
- Refresh-slot read `.catch` narrowed to `SecureStoreUnavailableError` —
  unexpected backend/programming errors propagate as
  `AUTH_STORE_READ_FAILED` instead of silently downgrading to
  "no refresh".
- `flow.test` fake store now captures the literal `setBundle` options
  arg. New assertion proves `runOAuthFlow` opts into
  `promoteDefault: true` on the explicit login path.
- `refreshAccountSlot` moved to a new internal `keyring/slot-naming.ts`
  module (not re-exported). Production code + tests + migrate all
  import from there, so the wire format is single-sourced.

P3:
- `token-store.test` fixture routes via `refreshAccountSlot()` instead
  of hard-coding `/refresh`.
- `record-write.ts` comment about `hasRefreshToken: undefined` corrected
  to reflect that readers DO probe the slot (vs the previous wording
  that suggested they fall straight through to access-only).

Tests (397 -> 400):
- migrate.test: new tests for the `tryInsert` happy path (atomic insert
  → record + access slot updated) and the "v2 record already exists"
  path (`tryInsert` returns false → v2 record and both keyring slots
  left untouched).
- refresh.test: new test for the lock-timeout re-read (lock held by
  another process → acquireFileLock returns null → re-read still
  surfaces the rotated snapshot, no duplicate POST).
- persist.test rewritten to assert literal omission of the `setBundle`
  options arg on the silent-refresh path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 4 fixes in 05463f7. All 10 findings addressed:

P1 — correctness

  • migrate.ts: new optional UserRecordStore.tryInsert(record): Promise<boolean> contract. Migration uses the atomic insert when consumers supply it (eliminates the TOCTOU race between existence check and upsert); falls back to list-then-write otherwise with a documented (small) race window.
  • migrate.ts: dropped purgeRefreshSlot: false — the existence guard prevents clobbering, and the flag caused migrated records to ship hasRefreshToken: undefined (useless refresh-slot keyring read every command). Now persists hasRefreshToken: false and active() short-circuits for migrated users.
  • refresh.ts: re-reads the store after lock-acquisition timeout, not just on successful acquisition. The waiter no longer POSTs a duplicate refresh when the lock-holder finished mid-wait. Runtime now matches the README claim.

P2

  • persistBundle omits the setBundle options arg entirely when promoteDefault is unset (silent-refresh path). Lets presence-based handling in custom stores stay honest.
  • KeyringTokenStore.setBundle overridden as required (impl always provides it). Drops the store.setBundle! non-null assertions.
  • Refresh-slot read .catch narrowed to SecureStoreUnavailableError. Unexpected backend errors now propagate as AUTH_STORE_READ_FAILED instead of silently downgrading.
  • flow.test fake store captures the literal setBundle options arg; new assertion proves runOAuthFlow passes promoteDefault: true on login.
  • refreshAccountSlot moved to a new internal keyring/slot-naming.ts (not re-exported). token-store, migrate, and tests all import from there.

P3

  • token-store.test fixture routes via refreshAccountSlot() instead of hard-coding /refresh.
  • record-write.ts comment corrected: readers DO probe the slot when hasRefreshToken is undefined.

Tests: 397 → 400.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:23
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces valuable OAuth refresh-token capabilities to the auth surface, seamlessly integrating proactive background refreshing and expanded TokenStore contracts. Extending these sessions silently is a great quality-of-life improvement that will significantly reduce re-login friction for users. Before merging, there are a few important edge cases to address regarding corrupted record states during offline migration and logout operations, alongside some needed test coverage for failure states, a hot-path performance fix for older records, and minor cleanup around dead code and project conventions.

Share FeedbackReview Logs

Comment thread src/auth/keyring/token-store.ts Outdated
)
},
const accessToken = rawAccess?.trim()
if (!accessToken) return null
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Returning null here when the access keyring slot is empty (e.g. deleted out-of-band) introduces a regression. Because attachLogoutCommand relies on active() and throws ACCOUNT_NOT_FOUND / NOT_AUTHENTICATED on null, the user will be unable to run logout to clear the corrupted record. Restore the previous behavior of throwing new CliError('AUTH_STORE_READ_FAILED', ...) so the CLI recognizes the corrupted state and logout's error handler can clear it.

Comment thread src/auth/keyring/migrate.ts Outdated
})
const inserted = await userRecords.tryInsert({
account,
fallbackToken: undefined,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] If the process crashes between tryInsert and setSecret, or if setSecret throws an unexpected error, the user record is left with fallbackToken: undefined and an empty keyring slot. Subsequent migrations will skip because tryInsert returns false, permanently breaking the CLI for this user with AUTH_STORE_READ_FAILED. Write fallbackToken: legacyToken.token initially during tryInsert, then upsert to clear the fallback token only after setSecret succeeds.

Comment thread src/auth/keyring/migrate.ts Outdated
// `active()` skips the refresh-slot round-trip per command.
// When the record already exists, `tryInsert` returns false
// and we leave the v2 record alone.
const refreshStore = createSecureStore({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The createSecureStore instantiations for the access and refresh slots are duplicated across the tryInsert and else branches. Consider creating accessStore and refreshStore once before the if (userRecords.tryInsert) block to reuse the instances and reduce boilerplate.

* has no authority over refresh state and must not erase it. Defaults
* to `true` (the safe default for fresh logins).
*/
purgeRefreshSlot?: boolean
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] purgeRefreshSlot appears to be dead code. The comment mentions it is used by migrateLegacyAuth, but migrateLegacyAuth was refactored (using tryInsert and an existence check) and no longer passes purgeRefreshSlot: false. Consider removing this parameter and simplifying the hasRefreshToken and defensive delete logic below.

Comment thread src/auth/keyring/token-store.test.ts Outdated
const { keyring, refreshKeyring, store, state } = fixture()
const expiresAt = Date.now() + 3_600_000

await store.setBundle!(account, {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The non-null assertion ! is unnecessary here (and on line 167). store is inferred as KeyringTokenStore<Account>, which explicitly overrides setBundle to be NonNullable specifically so consumers and tests can omit this assertion.

Comment thread src/auth/keyring/migrate.ts Outdated
} else {
const existing = (await userRecords.list()).find((r) => r.account.id === account.id)
if (!existing) {
await writeRecordWithKeyringFallback({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This non-tryInsert fallback still relies on writeRecordWithKeyringFallback()'s default purgeRefreshSlot: true. For consumers that don't implement tryInsert, the accepted list()write race can still erase a refresh secret that a parallel v2 login stored moments earlier, even though this PR added purgeRefreshSlot: false specifically so migration doesn't claim authority over refresh state. Please pass purgeRefreshSlot: false here.

// token". The access-token path runs in parallel and its own
// catch maps a keyring-offline failure to `AUTH_STORE_READ_FAILED`.
const refreshPromise =
record.hasRefreshToken !== false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Treating hasRefreshToken: undefined as "try the slot" is correct for the migration-retry case, but it also means every pre-refresh keyring-backed record written by older versions now does a second keyring read on every active() forever. Those older records never had this bit, so upgraded users without refresh tokens pick up a permanent hot-path IPC regression. Consider backfilling hasRefreshToken: false after the first null refresh-slot read (or via a one-off record migration) so the extra round-trip is only paid once.

expect(setDefaultSpy).not.toHaveBeenCalled()
})

it('downgrades to no-refresh when hasRefreshToken is true but the refresh-slot read fails', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new case locks in the SecureStoreUnavailableError downgrade, but it leaves the companion branch untested: non-SecureStoreUnavailableError failures from the refresh-slot read are now supposed to propagate. Please add a sibling test with refreshKeyring.getSpy.mockRejectedValueOnce(new Error('boom')) and assert store.active() rejects, otherwise a future broad .catch(() => null) regression would slip through.

Comment thread src/auth/keyring/migrate.test.ts Outdated
expect(lines).not.toContain('sensitive')
})

it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The new tryInsert coverage only exercises the all-online happy path and the false/existing-record path. The new atomic branch also has its own keyring-offline fallback (tryInsert succeeds, then the per-user setSecret throws SecureStoreUnavailableError and migration must upsert a fallbackToken). Please add that case here; the older offline-migration tests go through the pre-tryInsert path and won’t catch regressions in this new branch.

@@ -0,0 +1,12 @@
/**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new sibling module ships without a colocated slot-naming.test.ts. The repo’s module layout expects files under src/<area>/ to follow the same colocated-test rule, so please either add a small direct test for the slot-format helper or keep this logic inside an already-tested module.

Correctness (2 P1):
- migrate.ts: two-phase write in the tryInsert path. Phase 1 inserts the
  record with `fallbackToken: legacyToken.token` so a crash between
  tryInsert and setSecret leaves a self-sufficient record (still readable
  via the plaintext fallback). Phase 2 moves the secret into the keyring,
  then upserts to clear the fallback. Without this, a crash mid-migration
  could permanently brick the user with AUTH_STORE_READ_FAILED on every
  subsequent retry.
- token-store.ts: AUTH_STORE_READ_FAILED is now thrown directly from
  `readBundleForRecord` instead of smearing the corrupted-state signal
  across two functions (caller used to translate null into the throw).
  attachLogoutCommand's existing catch still recognises the code and
  clears the corrupted record.

P2:
- migrate.ts non-tryInsert fallback now passes `purgeRefreshSlot: false`.
  Consumers without atomic insert still see a small list-then-write race,
  and the legacy access-only bundle has no authority over refresh state
  even on that path.
- token-store: backfill `hasRefreshToken: false` best-effort on the
  record after the refresh slot read returns null for an undefined bit
  (pre-PR records). Subsequent active() calls skip the extra IPC.
  Backfill is fire-and-forget and tolerates a vanishingly rare race
  with a concurrent setBundle.
- Refresh-slot read tolerance is now tightened: only
  SecureStoreUnavailableError downgrades to "no refresh"; other errors
  propagate so a programming bug can't silently turn into "no refresh
  token available".
- Drop `store.setBundle!` non-null assertions in token-store.test
  (KeyringTokenStore.setBundle is required, the `!` was redundant).
- migrate.ts: share `accessStore` + `refreshStore` createSecureStore
  instances across both branches (dedups the per-branch instantiation).

Tests (400 -> 405):
- token-store.test: new test for the non-keyring refresh-slot read
  failure propagating (companion to the SecureStoreUnavailableError
  downgrade test).
- token-store.test: new test asserting the `hasRefreshToken: false`
  backfill on first null refresh-slot read for pre-PR records.
- migrate.test: new test for the tryInsert keyring-offline fallback
  (setSecret fails → record keeps the plaintext fallback, follow-up
  upsert doesn't run).
- New colocated `slot-naming.test.ts` per AGENTS.md (pins the wire
  format + guards the fixture-routing suffix invariant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:33
@lmjabreu
Copy link
Copy Markdown
Author

Round 5 fixes in c8d224e. All 10 findings addressed:

P1 — correctness

  • migrate.ts: two-phase write closes the crash window. Phase 1 tryInsert writes fallbackToken: legacyToken.token so a crash before the keyring write leaves a self-sufficient record. Phase 2 setSecret moves the token into the keyring, then upserts to clear the fallback. A retry after a phase-1-only crash still has a recoverable record (legacy path).
  • token-store: AUTH_STORE_READ_FAILED now thrown directly from readBundleForRecord — single source of truth for the corruption signal; active() no longer translates null into a throw.

P2

  • migrate.ts non-tryInsert fallback now passes purgeRefreshSlot: false (still racy without atomic insert, but the legacy bundle no longer claims authority over refresh state).
  • token-store: backfill hasRefreshToken: false best-effort after the first null refresh-slot read for pre-PR records. Fire-and-forget; the rare race with concurrent setBundle is documented but vanishingly unlikely (silent refresh requires a stored refresh token in the first place).
  • Refresh-slot read tolerance narrowed: SecureStoreUnavailableError still downgrades to "no refresh"; everything else propagates.
  • token-store.test: new test for non-keyring error propagation (companion to the existing downgrade test).
  • migrate.test: new test for the tryInsert + keyring-offline path (fallback survives, follow-up upsert skipped).
  • New colocated slot-naming.test.ts per AGENTS.md.

P3

  • migrate.ts dedups the access + refresh createSecureStore instantiations across the tryInsert and fallback branches.
  • token-store.test drops store.setBundle! non-null assertions (now redundant given the required override on KeyringTokenStore).

Tests: 400 → 405.

Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a robust mechanism for OAuth refresh-token handling across the auth surface, thoughtfully extending stores and providers to handle session renewals without breaking existing consumers. The implementation establishes a solid foundation for token lifecycles and backward compatibility, though a few edge cases around concurrency and state synchronization need tightening. Specifically, there are some race conditions in the migration and lock checks that could clobber concurrent updates, alongside a few missing state promotions, brittle test assertions, and a minor omission in the README's API documentation.

Share FeedbackReview Logs

Comment thread src/auth/refresh.ts Outdated
const fresh = await requireSnapshotForRef(options.store, options.ref)
if (fresh) {
const freshBundle = fresh.bundle ?? { accessToken: fresh.token }
if (fresh.token !== snapshot.token) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This post-lock race check only looks at fresh.token !== snapshot.token. If a provider reuses the access-token string across refreshes but rotates the refresh token or expiry, force: true will still fall through to a second refresh POST and immediately supersede the bundle the lock holder just wrote. Compare the whole bundle (at least refresh token / expiry metadata) before deciding that no other process refreshed.

Comment thread src/auth/keyring/migrate.ts Outdated
// Phase 2: move the secret into the keyring, then upsert to
// clear the fallback. If the keyring is offline we leave the
// fallback in place (the correct degraded state).
const inserted = await userRecords.tryInsert({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] tryInsert() only proves the record was absent at insert time; it does not give this migration exclusive ownership afterward. A concurrent v2 login that lands after this insert can have its fresh access token overwritten by setSecret(), its record reset by the later upsert(), and its refresh slot deleted below. This path still needs a CAS/ownership check before finalizing, otherwise it can clobber the very v2 login it is trying to preserve.

Comment thread src/auth/keyring/token-store.ts Outdated
// (login is foreground; silent refresh requires an already-stored
// refresh token). Failures here are swallowed: the worst case is
// the next `active()` pays the same extra IPC and tries again.
if (record.hasRefreshToken === undefined && refreshToken === undefined) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This read-path backfill does a full replace upsert() from a stale snapshot, but UserRecordStore.upsert is replace-not-merge. If a concurrent login/refresh writes new expiry fields or hasRefreshToken: true before this background promise settles, this backfill can silently wipe that metadata. Please avoid doing this without a CAS/patch primitive (or drop the backfill and pay the extra probe).


expect(snapshot?.token).toBe('at-from-migration')
expect(snapshot?.bundle?.refreshToken).toBe('rt-survived-migration')
expect(refreshKeyring.getSpy).toHaveBeenCalled()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This hasRefreshToken: undefined path never gets promoted to true after a successful refresh-slot read. In the migration-race case this test models, that means active() will keep probing the refresh keyring slot on every command forever. Please backfill hasRefreshToken: true when the slot read succeeds so affected accounts only pay the extra IPC once.

Comment thread src/auth/keyring/migrate.test.ts Outdated
expect(lines).not.toContain('sensitive')
})

it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This test is meant to prove the tryInsert path avoids the racy existence check, but it never asserts that userRecords.list() is skipped. A regression to await list(); await tryInsert(...) would still pass here while reintroducing the race. Please wrap or spy on list() in this branch and assert it is not called.

Comment thread src/auth/keyring/slot-naming.test.ts Outdated
expect(refreshAccountSlot('user-42')).toBe('user-42/refresh')
})

it('does not collapse an empty access slot to a bare suffix', () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This test locks in refreshAccountSlot('') semantics only because the token-store.test.ts fixture routes mocks with endsWith(refreshAccountSlot('')); the production code does not rely on that empty-string contract. That makes future slot-format refactors fail for fixture reasons rather than auth behavior. Prefer routing the fixture by explicit account names and drop this empty-input assertion.

Comment thread README.md Outdated

Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense.

`TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This README paragraph documents the new TokenStore.setBundle contract as setBundle?(account, bundle), but the exported type now also accepts options?: { promoteDefault?: boolean }. Please update the docs to include that third argument so the public API docs stay in sync and custom store implementers don’t miss the explicit-login default-promotion hook.

`tryInsert` proves the record was absent at insert time but gives no
exclusive ownership afterward. A v2 login that completes between our
phase-1 insert and the follow-up keyring/upsert steps would be clobbered:
our setSecret would overwrite their access token, our upsert would reset
their record, our refresh-slot delete would remove their refresh secret.

Mitigate with ownership-check re-reads before each follow-up:
- Use a unique placeholder signature on tryInsert (`fallbackToken:
  legacyToken.token` + `hasRefreshToken: undefined`) so the re-read can
  distinguish "still ours" from "v2 login took over" — no v2 login ever
  produces that exact shape (v2 always writes hasRefreshToken: true or
  false, never undefined).
- Re-read before setSecret, before the post-setSecret upsert, and before
  the refresh-slot delete. If the record has been replaced at any point,
  abort the rest. If we already wrote our access token to the keyring
  before discovering we lost ownership, roll that write back so the v2
  login's setSecret is the authoritative one.

The residual race window is now bounded by the gap between a re-read and
the very next call (microseconds) rather than the entire multi-step
sequence. Truly atomic CAS would need a contract-level
`compareAndSet`/`updateIf` op — deferred until a concrete consumer asks
for it.

Per the user's instruction, only the P1 is addressed this round; the
five P2s and one P3 are noted in the PR comment for follow-up.

Tests (405 -> 406):
- migrate.test: new test exercising the race — tryInsert succeeds with
  our placeholder, then a v2 login completes between phases, and the
  ownership-check re-read aborts our keyring writes so v2's state
  survives intact. Also updated the existing `tryInsert` tests to
  reflect the new placeholder signature (hasRefreshToken: undefined)
  and to actually insert into the harness state so the re-reads see
  the placeholder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 6 — P1 only (per the user's instruction for the next few rounds; deferring P2/P3 to follow-up).

P1a7fd8e8: migrate.ts ownership-check race. tryInsert only proves absence at insert time; a v2 login that completes between phases would have its access token, record, and refresh secret all clobbered by our follow-up writes.

Fix:

  • The migration's tryInsert payload uses a unique placeholder signature (fallbackToken: legacyToken.token + hasRefreshToken: undefined) that no v2 login ever produces.
  • Re-read the record before each follow-up step (setSecret, the post-setSecret upsert, the refresh-slot delete). If the shape changed, abort the rest.
  • If we already wrote our access token to the keyring before noticing we lost ownership, roll the keyring write back so v2's setSecret is the authoritative one.

Race window now bounded by the gap between a re-read and the next call (microseconds) rather than the full multi-step sequence. True atomic CAS would need a contract-level compareAndSet/updateIf op — deferred until a concrete consumer asks for it.

Deferred (P2/P3) — noted for follow-up:

  • [P2] refresh.ts post-lock race compares only access tokens (provider could reuse string with rotated refresh)
  • [P2] token-store backfill upsert is replace-not-merge from a stale snapshot
  • [P2] token-store.test missing hasRefreshToken: undefined → true after successful refresh-slot read
  • [P2] migrate.test tryInsert test doesn't assert userRecords avoids the racy list() call
  • [P2] slot-naming.test fixture-coupled
  • [P3] README documents setBundle?(account, bundle) without the options arg

Tests: 405 → 406.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:42
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces much-needed refresh-token support to the auth surface, adding proactive refresh capabilities, concurrency locks, and updated token storage contracts to minimize user re-authentication prompts. The approach effectively extends session lifecycles, though the new persistence and migration paths introduce some edge cases in state management. A few adjustments are needed to resolve race conditions in secret rollbacks and lock freshness checks, safely handle concurrent upserts without dropping metadata, and consolidate duplicated keyring fallback logic.

Share FeedbackReview Logs

Comment thread README.md Outdated

Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense.

`TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The public API documentation for TokenStore.setBundle is missing the newly added options?: { promoteDefault?: boolean } argument.

Comment thread src/auth/keyring/migrate.test.ts Outdated
expect(lines).not.toContain('sensitive')
})

it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The test for the tryInsert path fails to verify that the racy userRecords.list() call is actually skipped (e.g., via a spy).

Comment thread src/auth/keyring/slot-naming.test.ts Outdated
expect(refreshAccountSlot('user-42')).toBe('user-42/refresh')
})

it('does not collapse an empty access slot to a bare suffix', () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The test locks in an empty-string contract (refreshAccountSlot('')) due to test fixture routing, creating unnecessary coupling.


expect(snapshot?.token).toBe('at-from-migration')
expect(snapshot?.bundle?.refreshToken).toBe('rt-survived-migration')
expect(refreshKeyring.getSpy).toHaveBeenCalled()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The hasRefreshToken: undefined legacy migration state is never promoted to true after a successful refresh-slot read. Affected accounts will repeatedly incur the IPC penalty until this is backfilled in the implementation.

Comment thread README.md
}
```

`refreshAccessToken` returns the active credentials unchanged when the stored access token has more than `skewMs` (default 60s) of life left. When refresh is needed, it acquires the file lock, POSTs `grant_type=refresh_token` via the provider's `refreshToken` method, persists the new bundle through `store.setBundle ?? store.set`, and returns the new token. On contention it waits up to `lockTimeoutMs` (default 2s); if another process refreshed first it returns the rotated snapshot without firing its own POST — even on the `force` path — because POSTing with the now-rotated refresh token would yield `invalid_grant`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] This guarantee is stronger than the current implementation. refreshAccessToken() only treats the lock holder as the winner when the reread sees a different access-token string; if a provider reuses the access token but rotates the refresh token or expiry, the waiter still fires its own refresh POST and can hit invalid_grant. Either broaden the post-lock freshness check to compare the stored bundle, or soften this sentence until that case is handled.

Comment thread src/auth/keyring/migrate.ts Outdated
}
}
} else {
const existing = (await userRecords.list()).find((r) => r.account.id === account.id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The logic to find the existing record (await userRecords.list()).find((r) => r.account.id === account.id) is duplicated here and in recordStillOurs() (L211). Consider extracting a local helper (e.g., getRecord()) to reuse this look-up path.

account: refreshAccountSlot(accountSlot),
})

if (userRecords.tryInsert) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new tryInsert branch is hand-rolling the same keyring-write/fallback policy that writeRecordWithKeyringFallback() already owns: move the access token into the keyring when possible, leave plaintext fallback on SecureStoreUnavailableError, clear stale refresh state, and flip hasRefreshToken. Keeping that logic in two places means future fixes to trimming, rollback, or refresh-slot handling can drift again. Please extract the ownership-check-specific pieces into a shared internal helper (or extend writeRecordWithKeyringFallback with a hook/callback) so migration and normal writes reuse the same persistence path.

// the assertion isn't racy.
await new Promise((resolve) => setTimeout(resolve, 0))

expect(upsertSpy).toHaveBeenCalled()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new backfill path depends on active() doing an upsert({ ...record, hasRefreshToken: false }) from a stale snapshot. With the current UserRecordStore contract, upsert is replace-not-merge, so a concurrent login/refresh can silently drop newer metadata (hasRefreshToken: true, fresh expiry fields, fallback tokens, etc.). Please avoid mutating the whole record from the read path unless the store can do a compare-and-set or other merge-safe update.

Comment thread src/auth/keyring/migrate.ts Outdated
)
}
const inserted = await userRecords.tryInsert(placeholder)
if (inserted && (await recordStillOurs())) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Redundant file read. tryInsert already ensures the placeholder was just inserted. Immediately calling recordStillOurs() performs a redundant file read without meaningfully closing the race window before the slow setSecret operation. You can safely remove this initial check to save disk I/O.

Comment thread src/auth/keyring/migrate.ts Outdated
await accessStore.deleteSecret().catch(() => undefined)
}
}
if (await recordStillOurs()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Redundant and logically flawed check. recordStillOurs() requires hasRefreshToken === undefined, but line 229 just mutated the record to hasRefreshToken: false. This subsequent check will always fail, causing an unnecessary file read and silently skipping the best-effort cleanup of the refresh slot. Move the cleanup before the upsert or cache the ownership state.

All three P1s are direct consequences of round-6's ownership-check
introduction:

1. refresh.ts post-lock re-read: also compare the refresh token, not just
   the access-token string. A provider that rotates only the refresh
   token (or reuses the access-token string) would otherwise let the
   waiter POST with a now-stale refresh and hit `invalid_grant`.

2. migrate.ts rollback guard: only delete the access slot when it still
   contains OUR legacy token. The previous unconditional delete could
   remove a v2 login's access token (its `setSecret` may have run after
   ours), leaving the preserved v2 record permanently unreadable.
   Reading the slot first is racy at the absolute limit but covers the
   realistic case (the race window is microseconds).

3. migrate.ts ownership-check caching: the post-upsert re-read of
   `recordStillOurs()` was guaranteed to return false because the upsert
   itself flipped `hasRefreshToken` away from the placeholder signature.
   Net effect: the refresh-slot cleanup was silently skipped on every
   run, plus a wasted file read. One re-read after the keyring write
   now gates both the upsert and the refresh-slot cleanup.

Tests (406 -> 407):
- migrate.test: new test for the rollback-guard. Simulates v2 login
  overwriting the slot mid-migration; asserts the guarded delete
  declines to remove v2's access token and v2's record stays intact.

Deferred P2/P3 (per the user's instruction):
- P2 token-store backfill upsert is replace-not-merge from stale snapshot
- P2 token-store.test missing `undefined → true` promotion path
- P2 migrate.ts hand-rolls keyring-write/fallback policy
- P2 migrate.ts redundant tryInsert+recordStillOurs (small)
- P3 README setBundle docs missing options arg
- P3 migrate.test tryInsert mock duplicated across cases
- P3 migrate.ts `list().find` duplicated
- P3 slot-naming.test fixture-coupled

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 7 — P1 only (3 of 3 addressed in 0fca19e).

P1 #1 — refresh.ts post-lock check: now compares the refresh token too, not just fresh.token !== snapshot.token. A provider that rotates only the refresh token (or reuses the access-token string) no longer leaves the waiter holding a stale refresh.

P1 #2 — migrate.ts rollback guard: only delete the access slot when it still contains the legacy token. A blind delete could remove a v2 login's access token (their setSecret may have run after ours), leaving v2's preserved record permanently unreadable. Read-then-conditional-delete is still racy at the absolute limit but covers the realistic case.

P1 #3 — migrate.ts ownership-check caching: the post-upsert recordStillOurs() re-read was guaranteed to return false (upsert flipped hasRefreshToken away from the placeholder signature), so the refresh-slot cleanup was silently skipped on every run. One re-read after the keyring write now gates both the upsert AND the cleanup.

Deferred per the user's instruction (P2/P3, noted for follow-up):

  • token-store backfill is replace-not-merge from a stale snapshot
  • token-store.test missing the undefined → true promotion-after-successful-read path
  • migrate.ts hand-rolls keyring-write policy (could reuse writeRecordWithKeyringFallback)
  • migrate.ts redundant tryInsert+recordStillOurs immediately after
  • README setBundle docs missing the options arg
  • migrate.test tryInsert mock duplicated across cases
  • migrate.ts list().find duplicated between branches
  • slot-naming.test fixture-coupled

Tests: 406 → 407.

@lmjabreu lmjabreu requested a review from doistbot May 18, 2026 23:49
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR successfully introduces refresh-token support and PKCE provider enhancements to the CLI core, which will significantly improve the user experience by reducing manual re-authentications. The proactive concurrency handling and backward-compatible contract updates are well-considered and cleanly implemented. A few adjustments are noted to prevent potential concurrent overwrite risks during legacy backfills, unify duplicated persistence and test logic, ensure the new options arguments are fully documented, and expand test coverage for the legacy record upgrade path.

Share FeedbackReview Logs

Comment thread src/auth/keyring/token-store.ts Outdated
// refresh token). Failures here are swallowed: the worst case is
// the next `active()` pays the same extra IPC and tries again.
if (record.hasRefreshToken === undefined && refreshToken === undefined) {
void userRecords.upsert({ ...record, hasRefreshToken: false }).catch(() => undefined)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This fire-and-forget upsert({ ...record, hasRefreshToken: false }) writes a stale full snapshot from the earlier list() read. Because UserRecordStore.upsert is replace-not-merge, a concurrent login/refresh for the same account can be overwritten here: newly written expiry fields or hasRefreshToken: true get reset, and in the worst case a just-written refresh token becomes invisible because this flips the bit back to false. Please gate this backfill on a fresh ownership check (or a patch/CAS-style API) instead of replacing the whole record from the stale snapshot.

Comment thread README.md Outdated

Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense.

`TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The public contract here is still documented as setBundle?(account, bundle), but the actual API now also accepts options?: { promoteDefault?: boolean }. That omission matters for custom stores: a 2-arg implementation still type-checks, but runOAuthFlow now relies on promoteDefault: true to preserve the existing “first login becomes default” behavior. Please document the options arg here so consumers don’t accidentally drop that behavior.

Comment thread src/auth/keyring/migrate.test.ts Outdated
const harness = buildUserRecords<Account>()
// Stub tryInsert that ALSO actually inserts into the harness so
// the migration's ownership-check re-reads see the placeholder.
const tryInsert = vi.fn(async (record: UserRecord<Account>) => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This tryInsert mock implementation is duplicated identically across multiple test cases (lines 359, 408, 461, 514). Extracting this into a shared test utility or integrating it into buildUserRecords() would reduce boilerplate.

Comment thread src/auth/keyring/migrate.ts Outdated
}
const inserted = await userRecords.tryInsert(placeholder)
if (inserted && (await recordStillOurs())) {
let movedToKeyring = false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This branch hand-rolls the access token keyring write, offline error swallowing, and conditional rollback logic. This duplicates the persistence policy already encapsulated in writeRecordWithKeyringFallback(). Consider extending writeRecordWithKeyringFallback to accept an ownership-check predicate or pre-upsert hook so this branch can reuse the shared helper.

Comment thread src/auth/keyring/migrate.ts Outdated
}
}
} else {
const existing = (await userRecords.list()).find((r) => r.account.id === account.id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The account lookup logic (await userRecords.list()).find((r) => r.account.id === account.id) is identical to the read inside recordStillOurs() (line 211). Extracting this to a shared findRecord helper would avoid repeating the list-and-find pattern.

Comment thread src/auth/keyring/slot-naming.test.ts Outdated
})

it('does not collapse an empty access slot to a bare suffix', () => {
// Defensive: `endsWith(refreshAccountSlot(''))` is how the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This test is unnecessarily coupled to the test fixture's string-matching implementation (endsWith). Tests should verify the module's behavior directly rather than its suitability for a specific mock routing implementation elsewhere.

await expect(store.active()).rejects.toThrow(/boom/)
})

it('backfills hasRefreshToken: false on the record when undefined and the refresh slot is empty', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This covers the hasRefreshToken: undefined -> false backfill, but the suite still misses the symmetric upgrade path: an existing pre-PR record being overwritten via setBundle(...) should persist hasRefreshToken: true. That's the real upgrade path for legacy records after the first successful login/refresh, and without a test a regression can leave those accounts probing the refresh slot forever.

Comment thread src/auth/keyring/migrate.test.ts Outdated
expect(lines).not.toContain('sensitive')
})

it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The test title says we're avoiding the racy existence check, but the assertions only prove that tryInsert() ran. Since migrateLegacyAuth still legitimately calls list() later for ownership checks, a regression that reintroduced an initial list() -> tryInsert() sequence would still pass here. Please spy/stub userRecords.list() so this case distinguishes the pre-insert check from the post-insert re-reads.

P2:
- token-store backfill no longer issues `upsert({ ...staleRecord, ... })`.
  New `backfillHasRefreshToken` helper re-reads the record inside the
  upsert and writes only when every field still matches the snapshot we
  made the backfill decision on. Avoids clobbering a concurrent
  setBundle's richer state (upsert is replace-not-merge per contract).
- The same helper now also handles the symmetric `undefined → true`
  promotion when the refresh slot turns out to be populated, so pre-PR
  records with refresh tokens stop probing the slot every active() call.
- New `keyring/internal.ts` exports `findById` and `trySetSecret`.
  migrate.ts and record-write.ts route through `trySetSecret` so the
  "only SecureStoreUnavailableError downgrades to fallback" policy
  lives in one place. token-store.ts + migrate.ts use `findById` to
  drop the duplicated `(await list()).find(...)` pattern.
- migrate.test extracts `stubTryInsert(harness)` helper — the
  "if-has-set-return-false-else-insert" tryInsert mock pattern was
  identical across 3 test cases. Tests with bespoke race emulation in
  their tryInsert (where the duplication wasn't actually a duplicate)
  keep their inline implementation.

P3:
- README `setBundle` docs now show the full signature including
  `options?: { promoteDefault?: boolean }` and explain the
  presence-vs-absence contract for login vs refresh callers.
- migrate.test "tryInsert avoids the racy existence check" test now
  spies on `userRecords.list()` and explicitly proves `tryInsert` ran
  (the old assertions only proved the mock function fired).
- slot-naming.test asserts the module's behavior directly (the result
  contains the access slot, is deterministic, and is unique per
  access slot) instead of locking in the literal suffix or coupling
  to the fixture's `endsWith` routing.

Tests (407 -> 410):
- New test: undefined → true promotion when refresh slot populated.
- New test: backfill skips when a concurrent write replaced the record
  between the active() read and the upsert (replace-not-merge guard).
- slot-naming.test rewritten to property-based assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 8 P2/P3 follow-up in aca2702. All 8 deferred findings addressed:

P2 — correctness/safety

  • token-store.ts backfill no longer issues upsert({ ...staleRecord, hasRefreshToken: false }). New backfillHasRefreshToken helper re-reads inside the upsert and writes only when every field still matches the snapshot we made the decision on — avoids clobbering a concurrent setBundle's richer state.
  • Symmetric undefined → true promotion added: pre-PR records that DO have a refresh token now backfill to true so subsequent active() calls don't keep probing the slot.
  • New keyring/internal.ts exports findById + trySetSecret. migrate.ts and record-write.ts route through trySetSecret so the "only SecureStoreUnavailableError downgrades to fallback" policy lives in one place. The tryInsert branch's keyring-write policy is now shared with writeRecordWithKeyringFallback.
  • migrate.test extracts stubTryInsert(harness) — the basic mock pattern was identical across 3 tests. Tests with bespoke race emulation in their tryInsert keep their inline implementations (the duplication was genuine variation there).

P3 — polish

  • README setBundle docs show the full signature including options?: { promoteDefault?: boolean } and explain the presence-vs-absence contract for login vs refresh callers.
  • migrate.test tryInsert test now spies on userRecords.list() to prove the atomic path actually ran.
  • slot-naming.test asserts module behaviour directly (contains access slot, deterministic, unique per access slot) instead of pinning the literal suffix or coupling to the fixture's endsWith.
  • findById helper drops the duplicated (await list()).find(...) pattern in both migrate branches.

Tests: 407 → 410.

@lmjabreu lmjabreu requested a review from doistbot May 19, 2026 06:06
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR thoughtfully enhances the auth surface by introducing OAuth refresh-token support to TokenStore and the PKCE provider, which will vastly improve the CLI experience for users with short-lived sessions. The implementation provides a solid foundation for seamless token management while keeping the public contracts flexible. A few areas require slight adjustments, including addressing atomicity and rollback safety during keyring token updates, handling crash-recovery states in the migration path, consolidating persistence logic for better performance, and ensuring new module layouts and concrete wire formats have comprehensive test coverage.

Share FeedbackReview Logs

Comment thread src/auth/keyring/migrate.ts Outdated
// risk a destructive delete.
}
}
if (stillOurs) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] stillOurs is computed before the upsert({ account, hasRefreshToken: false }) call and then reused here to decide whether to clear the refresh slot. If a concurrent v2 login lands after that upsert but before this delete, its newly-written refresh token gets wiped while its record now says hasRefreshToken: true, so the account can no longer refresh. Please re-check ownership after the upsert using a post-upsert signature (or otherwise guard the delete on current ownership) before calling refreshStore.deleteSecret().

account: refreshAccountSlot(accountSlot),
})

if (userRecords.tryInsert) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This tryInsert branch now open-codes the same keyring/record finalization policy that the non-atomic branch delegates to writeRecordWithKeyringFallback. That leaves two persistence implementations to keep in sync whenever the stored shape changes again (hasRefreshToken, fallback fields, expiry metadata, refresh-slot cleanup, etc.). Consider extracting the post-insert finalization into a shared helper, or teaching writeRecordWithKeyringFallback to operate on an already-inserted placeholder, so both migration paths reuse one write policy.

Comment thread src/auth/keyring/record-write.ts Outdated
// `migrateLegacyAuth`) pass `purgeRefreshSlot: false` to opt out
// of this purge entirely.
try {
await refreshSecureStore.deleteSecret()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Deleting the old refresh slot before userRecords.upsert(record) commits is not rollback-safe. If the later upsert fails, the catch block only removes the new access token and never restores the deleted refresh token, so a failed no-refresh write can destroy the caller's last good refresh credential while leaving the old record in place. Please either defer this delete until after the record write succeeds or capture/restore the prior refresh secret on rollback.

Comment thread src/auth/keyring/migrate.ts Outdated
)
}
const inserted = await userRecords.tryInsert(placeholder)
if (inserted && (await recordStillOurs())) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This treats tryInsert() === false as "some other v2 record already exists", but it is also the state left by an earlier crash after the placeholder write. On that retry path we skip the keyring move/finalization here, then still call markMigrated() and clean up the legacy slot below, which permanently freezes the account in the placeholder/plaintext state. Consider detecting the placeholder signature when tryInsert returns false and resuming/normalizing it before considering the migration complete.

Comment thread src/auth/keyring/record-write.ts Outdated
if (storedSecurely) {
if (trimmedRefresh) {
try {
await refreshSecureStore.setSecret(trimmedRefresh)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] By the time we reach this line, the access-slot setSecret has already completed, so every setBundle/silent-refresh with a refresh token now pays two keyring IPC round-trips in series before the record write. These two slot writes are independent, so consider issuing them together (and rolling back whichever succeeded on partial failure) to avoid adding avoidable latency to the refresh path.

Comment thread src/auth/keyring/slot-naming.test.ts Outdated
it('derives a per-access-slot refresh slot name that includes the access slot', () => {
// The contract: the returned slot name is a function of the
// access slot name AND uniquely identifies the refresh slot for
// it. We don't pin the exact suffix here — the wire format is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This test explicitly avoids pinning the exact refresh-slot slug, but that slug is persisted user state in the OS keyring. If refreshAccountSlot() ever changes from /refresh to something else, this test (and the token-store.test fixture, which also derives its routing from the helper) will still pass while existing users' stored refresh tokens become unreadable after upgrade. Please lock the concrete wire format down in at least one test so a compatibility-breaking rename fails the suite.

@@ -0,0 +1,36 @@
import type { AuthAccount } from '../types.js'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new module breaks the repo’s module-layout convention that src/<area>/<file>.ts files ship with a colocated <file>.test.ts. Please either add src/auth/keyring/internal.test.ts for findById/trySetSecret or fold these helpers back into an existing tested module.

Reframing in response to the bouncing P1 count across rounds 5-9: each
"fix" to the multi-step migration was patching a local symptom while
opening an adjacent race window in the same area. The fundamental
problem was the migration trying to move the legacy token into the
per-user keyring slot under a contract (UserRecordStore) with no
transactional atomicity — every workaround (ownership-check re-reads,
cached `stillOurs`, guarded delete) closed one window and opened
another.

This change accepts a slight functional trade-off in exchange for
complete race-freedom:

- migrate.ts no longer writes to the per-user keyring slot. It writes
  a v2 record with the legacy token as `fallbackToken` and stops. The
  v2 record format ALREADY supports fallbackToken (it's how WSL /
  headless Linux works), and the runtime reads it before consulting
  the keyring slot. The first subsequent v2 login moves the secret
  into the keyring atomically via writeRecordWithKeyringFallback.
- Trade-off: legacy token sits in plaintext on disk for one login
  cycle. It was already plaintext in v1 config, so this is deferral,
  not new exposure.
- Eliminated: ownership-check re-reads, the placeholder signature
  trick, the cached stillOurs flag, the guarded rollback delete, the
  refresh-slot purge, and the entire "did we lose ownership" decision
  tree. Multi-step write becomes single-call.

record-write.ts: the defensive refresh-slot purge for no-refresh
bundles now happens AFTER the upsert commits (round-9 P1.2). A failed
upsert can't destroy a previous refresh secret. The post-upsert delete
is best-effort because the upserted `hasRefreshToken: false` already
prevents readers from consulting the slot.

Tests (410 → 414):
- Migration tests rewritten for the simpler shape (no per-user keyring
  assertions). Race tests collapse to a single "concurrent v2 login
  is harmless" property test — the simplification removes the race
  surface they used to probe.
- record-write test for refresh-slot delete failure rewritten:
  best-effort post-upsert, no rollback.
- New colocated `internal.test.ts` covers `findById` + `trySetSecret`.
- `slot-naming.test` restores the literal `/refresh` suffix pin —
  the slug is persisted state in the OS keyring, so a rename WOULD
  break upgraders and should be caught loudly.

Compatibility notes:
- `MigrateLegacyAuthOptions.accountForUser` is now @deprecated and a
  no-op (migration doesn't write per-user keyring slots anymore).
  Kept on the type for back-compat; the first subsequent v2 login
  respects the consumer's `createKeyringTokenStore({ accountForUser })`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmjabreu
Copy link
Copy Markdown
Author

Round 10 in ae34bb6 — stepping back instead of patching another symptom.

The pattern, finally addressed at the source. Across rounds 5–9 the migration kept oscillating between competing races: tryInsert gives no ongoing ownership; cached stillOurs always returns false after the upsert flips the signature; rollback delete can wipe a parallel v2 login's credential; the post-upsert ownership check has to use a different signature than the pre-upsert one. Every fix opened another window.

Root cause: the migration was doing too much. Moving the legacy token into the per-user keyring slot atop a UserRecordStore contract with no transactional primitives is inherently multi-step.

Change: migration now writes a v2 record with fallbackToken: legacyToken.token and stops. No per-user keyring writes. The v2 record format ALREADY supports fallbackToken (it's the WSL/headless Linux path), and the runtime reads it before any keyring slot. The first subsequent v2 login moves the secret into the keyring atomically via writeRecordWithKeyringFallback.

Trade-off: legacy token sits in plaintext on disk for one login cycle. But it was already plaintext in v1 config — this is deferral, not new exposure.

Eliminated: ownership-check re-reads, placeholder signature trick, cached stillOurs, guarded rollback delete, refresh-slot purge, the entire multi-step decision tree. The migration is now ~10 lines.

Round-9 P1.2 (record-write.ts) addressed in the same commit: the no-refresh defensive purge now runs AFTER the upsert commits, best-effort. A failed upsert can no longer destroy a previous refresh secret.

Tests: 410 → 414. Most race tests deleted (no race surface left to test). New "concurrent v2 login mid-migration is harmless" property test, plus internal.test.ts (P2 from round 8), and slot-naming.test restores the literal /refresh suffix pin (slug is persisted user state — a rename SHOULD be caught loudly).

accountForUser option deprecated as @deprecated no-op for migration. v2 logins still respect the consumer's createKeyringTokenStore({ accountForUser }).

@lmjabreu lmjabreu requested a review from doistbot May 19, 2026 06:43
Copy link
Copy Markdown
Member

@doistbot doistbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces an excellent quality-of-life improvement by adding OAuth refresh-token support to the auth surface, ensuring users are no longer frequently prompted to re-authenticate. The implementation thoughtfully extends the existing TokenStore contract and introduces a robust PKCE provider to smoothly handle token expirations. There are just a few remaining details to refine around optimizing concurrent keyring writes, safeguarding the legacy token migration logic against premature deletion, cleaning up leftover state flags from previous iterations, and rounding out the test coverage and documentation.

Share FeedbackReview Logs

let storedSecurely = await trySetSecret(secureStore, trimmedAccess)

let wroteRefreshSecurely = false
if (storedSecurely && trimmedRefresh) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The access and refresh tokens are written to the OS keyring sequentially, which doubles the IPC latency on the hot path (as flagged in Round 9 of the PR review). Issuing these writes concurrently (e.g., via Promise.allSettled) and rolling back the successful one if the other fails would avoid this latency penalty.

// `false` means a v2 record already exists for this account
// — leave it alone (it may carry refresh + expiry that the
// legacy token has no authority over).
await userRecords.tryInsert(record)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This ignores the tryInsert() result and the helper still goes on to mark migration complete and delete the legacy token later. If a same-id v2 record already exists but its credential is broken/missing (for example the keyring entry was removed externally), migration will retire the still-valid legacy token and leave active() failing with AUTH_STORE_READ_FAILED. Please treat the "record already exists" case here (and the findById() short-circuit below) as a non-destructive skip unless you can verify the existing v2 record is actually readable.

// first subsequent v2 login moves the secret atomically via
// `writeRecordWithKeyringFallback`.
expect(km.slots.has('user-1')).toBe(false)
expect(km.slots.has('user-1/refresh')).toBe(false)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Missed reuse: consider using the exported refreshAccountSlot('user-1') helper here (and on lines 464, 487, 509, 531) instead of hardcoding the '/refresh' string literal. This avoids duplication and ensures the test stays aligned with the single source of truth for the slot wire format.

let wroteRefreshSecurely = false
if (storedSecurely && trimmedRefresh) {
try {
await refreshSecureStore.setSecret(trimmedRefresh)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This refresh-slot write open-codes the same SecureStoreUnavailableError downgrade/rethrow policy that trySetSecret() was just added to centralize for the access slot. Please route this branch through the shared helper too, or extract a small two-slot write helper, so the keyring-offline policy only lives in one place.

const record = {
account,
fallbackToken: legacyToken.token,
hasRefreshToken: false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] hasRefreshToken: false is stronger than this helper can guarantee now that migration no longer reads or owns the sibling refresh slot. false tells createKeyringTokenStore.active() to skip probing that slot entirely, so in the remaining list-then-upsert race (or any retry state where a v2 refresh slot already exists) this record will hide a real refresh credential. Leave the bit unset here so the migrated record stays in the intended "unknown" state.


/**
* Stub `tryInsert` for a harness so that successful inserts actually land
* in the harness state — the migration's ownership-check re-reads need to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The comment references the migration's ownership-check re-reads which were removed in the Round 10 simplification (migration now simply inserts a fallback record). This is a stale leftover from previous iterations and can be removed.

expect(km.slots.get('user-1/refresh')?.secret).toBe('v2_refresh')
})

it('leaves the v2 record alone when tryInsert returns false (existing v2 login)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This test is functionally identical to the preceding test survives a concurrent v2 login mid-migration (race-free property) (line 441). Both seed a v2 record with a refresh token, set up keyring slots, ensure tryInsert returns false, and assert that the record and keyring are untouched. One of them should be removed to avoid test duplication.

expect(store.getLastClearResult()).toEqual({ storage: 'secure-store' })
})

it('round-trips setBundle → active → clear with refresh token in the sibling slot', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new refresh-slot round-trip still only exercises the default user-${id} naming. Since this PR adds a sibling refresh slot and keeps accountForUser as an override point, a regression where refreshStoreFor() forgets to apply the custom slug would slip past the suite: the existing custom-slug test only covers set() without a refresh token. Please add one setBundle/active/clear case with factoryOpts.accountForUser so both the custom access slot and custom refresh slot are pinned.

Comment thread src/auth/flow.test.ts
expect(store.last?.setBundleOptions).toEqual({ promoteDefault: true })
})

it('falls back to set(token) when the store does not implement setBundle (custom-store back-compat)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] This back-compat case replays the whole OAuth flow just to prove the setBundle ?? set fallback, but src/auth/persist.test.ts already covers that branch directly. Keeping the fallback assertion at the helper layer (and relying on the other flow tests to cover that runOAuthFlow persists at all) would remove a lot of fake-provider/fake-browser setup without losing meaningful coverage.

Comment thread README.md
| `testing` (subpath) | `describeEmptyMachineOutput` | Vitest helpers reusable by consuming CLIs (e.g. parametrised empty-state suite covering `--json` / `--ndjson` / human modes). |
| Module | Key exports | Purpose |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `auth` (subpath) | `attachLoginCommand`, `attachLogoutCommand`, `attachStatusCommand`, `attachTokenViewCommand`, `runOAuthFlow`, `refreshAccessToken`, `createPkceProvider`, `createSecureStore`, `createKeyringTokenStore`, `migrateLegacyAuth`, PKCE helpers, `AuthProvider` / `TokenStore` / `TokenBundle` / `RefreshInput` / `AccountRef` / `SecureStore` / `UserRecordStore` types, `AttachLogoutRevokeContext` | OAuth runtime plus the Commander attachers for `<cli> [auth] login` / `logout` / `status` / `token`. `attachLogoutCommand` accepts an optional `revokeToken` hook for best-effort server-side token revocation. Ships the standard public-client PKCE flow (`createPkceProvider` — implements both `exchangeCode` and the optional `refreshToken` grant), a thin cross-platform OS-keyring wrapper (`createSecureStore`), and a multi-account keyring-backed `TokenStore` (`createKeyringTokenStore`) that stores access + refresh secrets in the OS credential manager (refresh in a sibling slot, gated by a `hasRefreshToken` bit on the user record so accounts without one don't pay an extra keyring round-trip per command) and degrades to plaintext in the consumer's config when the keyring is unavailable (WSL/headless Linux/containers). `refreshAccessToken` drives proactive (skew window) and reactive (`force: true`, for 401 retries) silent re-auth, with optional sidecar `lockPath` for cross-process safety. `AuthProvider` and `TokenStore` remain the escape hatches for DCR or fully bespoke backends. `logout` / `status` / `token` always attach `--user <ref>` and thread the parsed ref to `store.active(ref)` (and `store.clear(ref)` on `logout`). `commander` (when using the attachers), `open` (browser launch), and `@napi-rs/keyring` (when using `createSecureStore` or the keyring `TokenStore`) are optional peer/optional deps. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This auth export list still omits the newly exported RefreshAccessTokenOptions type from src/auth/index.ts. Repo guidance requires README updates to reflect newly added public exports in the same PR, so please add it here (or document it elsewhere in the auth API section).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants