Skip to content

Handle nil keyring gracefully and fix gRPC v3 broker startup race#1603

Open
vzhang-stripe wants to merge 2 commits into
masterfrom
vzhang/keyring-nil-safety-and-grpc-broker-fix
Open

Handle nil keyring gracefully and fix gRPC v3 broker startup race#1603
vzhang-stripe wants to merge 2 commits into
masterfrom
vzhang/keyring-nil-safety-and-grpc-broker-fix

Conversation

@vzhang-stripe
Copy link
Copy Markdown
Contributor

@vzhang-stripe vzhang-stripe commented May 22, 2026

Fix intermittent V3 plugin keychain initialization failures

Summary

Fix an intermittent directory plugin failure where plugin commands can exit with:

Error: Keychain not initialized. Call initKeychain(coreCLIHelper) first.

The main fix is in the V3 host/plugin gRPC sequencing. We now start the plugin RunCommand first and publish the CoreCLIHelper broker slightly later, which avoids a race in the current non-Go plugin bootstrap. This PR also makes true keyring/backend issues fail with a clearer host-side error instead of the misleading initialization message.

Root cause

The directory plugin uses the TS plugin bootstrap, which can drop unsolicited broker ConnInfo if it arrives before the plugin has registered a pending dial for that service ID.

Before this change, the host published the CoreCLIHelper broker immediately. If the plugin had not yet started waiting on that broker ID, the connection info could be lost. The plugin would then time out during helper setup, never finish initKeychain(coreCLIHelper), and later fail with Keychain not initialized.

This matches the observed behavior:

  • intermittent failures on macOS
  • the same error reported on Linux
  • a ~5s gap before the plugin surfaces the keychain initialization error

What changed

  • Reordered V3 RunCommand setup so the plugin RPC starts before helper broker publication.
  • Added a small host-side delay before publishing the helper broker as a compatibility workaround for the current TS bootstrap behavior.
  • Factored helper broker startup into a helper with explicit startup failure handling and cleanup.
  • Added a nil-keyring guard on the host so real system keyring issues return system keychain is unavailable instead of surfacing as Keychain not initialized.
  • Added regression tests for:
    • legacy ordering reproducing the dropped-publication failure
    • new ordering avoiding the race
    • helper broker startup behavior
    • nil keyring behavior

Why this approach

The more principled fix is in the TS bootstrap: buffer early broker ConnInfo instead of dropping it. This PR fixes the problem from the CLI host side so existing plugins work reliably without requiring an immediate bootstrap rollout.

Testing

  • go test ./pkg/plugins
  • go test -race ./pkg/plugins

Return a typed errKeychainUnavailable error when config.KeyRing is nil
instead of panicking on a nil pointer dereference. Refactor the gRPC v3
plugin broker to start the CoreCLIHelper server with a short publish
delay so non-Go plugins can register pending dials before the connection
info arrives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
@vzhang-stripe vzhang-stripe requested a review from a team as a code owner May 22, 2026 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant