test(onboard): retry callback request to absorb listener-startup race#103
Merged
Conversation
TestBuild_OAuthE2E (and the two Microsoft variants) flakes intermittently on CI with `Get http://127.0.0.1:NNNN/callback...: EOF`. The binary binds the local listener and prints the auth URL as soon as net.Listen returns, but the goroutine that calls Accept may not be scheduled yet on a busy runner — the test then fires the callback GET into a half-ready server and trips EOF. simulateCallback now retries on transport errors for up to ~1s (10 attempts at 100ms backoff) and respects the test context. The callback handler is single-use (it triggers server.Shutdown after a successful response), so retries only execute when the first attempt never reached the handler — they cannot mask a real callback failure. Verified by running the three E2E tests with -race -count=5 locally (15 runs, all green).
arnaugiralt
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens
TestBuild_OAuthE2E(and its Microsoft variants) against a flake observed on CI for PR #102 (Get http://127.0.0.1:NNNN/callback...: EOF).The binary in
cmd/chaperone-onboardbinds the callback listener and prints the auth URL as soon asnet.Listenreturns. The goroutine that actually callsAcceptmay not be scheduled yet on a busy runner — the test then firessimulateCallbackinto a half-ready server and trips EOF.simulateCallbacknow retries on transport errors for up to ~1s (10 attempts × 100ms backoff) and respects the test context. The callback handler is single-use (it triggersserver.Shutdownafter a successful response), so retries only execute when the first attempt never reached the handler — they cannot mask a real callback failure.Test-only change; no production code touched.
Test plan
go test -race -run 'TestBuild_OAuthE2E|TestBuild_MicrosoftE2E' -count=5 ./cmd/chaperone-onboard/— 15 runs, all greengo test -race ./cmd/chaperone-onboard/— full package green