fix: close and delete under re-checked lock to fix race conditions#81
Merged
fix: close and delete under re-checked lock to fix race conditions#81
Conversation
Reviewer's GuideRefactors the sweeper to first collect expired servers under lock and then perform close and delete operations with additional under-lock rechecks to eliminate race conditions with concurrent server creation and revival. Sequence diagram for sweeper close/delete with concurrent getServer race conditionssequenceDiagram
actor Client
participant getServer
participant sweeper
participant mu
participant servers
participant Server
rect rgb(235, 245, 255)
sweeper->>mu: Lock()
sweeper->>servers: Scan for expired entries
servers-->>sweeper: Return expired srv for url
sweeper->>mu: Unlock()
end
par Concurrent_getServer
Client->>getServer: Request server for url
getServer->>mu: Lock()
alt Existing_draining_server
getServer->>servers: Lookup url
servers-->>getServer: Return srv
getServer->>Server: Reset DrainedAt to zero
else New_server_for_url
getServer->>servers: Create and store new *Server for url
end
getServer->>mu: Unlock()
and Sweeper_processing_expired
sweeper->>mu: Lock()
alt Entry_changed_or_revived
sweeper->>servers: Check servers[url]
sweeper-->>sweeper: servers[url] != srv OR DrainedAt is_zero
sweeper->>mu: Unlock()
sweeper-->>sweeper: Skip close and delete
else Entry_still_same_and_draining
sweeper->>servers: Confirm servers[url] == srv AND DrainedAt non_zero
sweeper->>mu: Unlock()
sweeper->>Server: Instance.Close()
sweeper->>mu: Lock()
alt No_new_server_created
sweeper->>servers: Check servers[url] == srv
sweeper->>servers: delete(servers, url)
else New_server_replaced_entry
sweeper-->>sweeper: servers[url] != srv
sweeper-->>sweeper: Skip delete to avoid removing new server
end
sweeper->>mu: Unlock()
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The re-check condition before closing appears inverted relative to the comment/intent: to skip closing when an entry has been revived you probably want to continue when
servers[e.url] != e.srv || e.srv.DrainedAt.IsZero()rather than|| !e.srv.DrainedAt.IsZero(). - Within the per-entry loop you can reduce lock churn and make the control flow clearer by holding
mufor the entire re-check block per phase (e.g., lock, validate conditions and capture a local flag, unlock, then close) instead of multiple short lock/unlock pairs around individual conditionals.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The re-check condition before closing appears inverted relative to the comment/intent: to skip closing when an entry has been revived you probably want to continue when `servers[e.url] != e.srv || e.srv.DrainedAt.IsZero()` rather than `|| !e.srv.DrainedAt.IsZero()`.
- Within the per-entry loop you can reduce lock churn and make the control flow clearer by holding `mu` for the entire re-check block per phase (e.g., lock, validate conditions and capture a local flag, unlock, then close) instead of multiple short lock/unlock pairs around individual conditionals.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
25870d9 to
ae7bed1
Compare
Extract tryCloseAndDelete(url, srv) to hold lock per phase with defer, making the two-phase pattern easy to read and reuse. Phase 1 (check): servers[url] == srv && !DrainedAt.IsZero() → unlock, then close Phase 2 (delete): servers[url] == srv → delete under lock with defer This closes the revive race (getServer resetting DrainedAt between collection and close) and the delete race (setServer creating a new entry for the same URL between close and delete).
ae7bed1 to
1c940e0
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #81 +/- ##
=========================================
+ Coverage 8.93% 12.71% +3.77%
=========================================
Files 27 27
Lines 1533 1565 +32
=========================================
+ Hits 137 199 +62
+ Misses 1381 1351 -30
Partials 15 15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
added 2 commits
March 30, 2026 12:05
Add xray_test.go covering: - setServer/getServer basic operation - Close marks server as draining (DrainedAt non-zero) - getServer revives a draining server (DrainedAt reset to zero) - Close is idempotent - CloseAll drains all servers - Sweeper removes expired entries after DrainTimeout - Sweeper skips revived entries (DrainedAt reset to zero) - Sweeper skips active entries (DrainedAt is zero) - tryCloseAndDelete: nil srv, wrong pointer, revived entry, replaced entry - Concurrent getServer/setServer/Close with race detector Production code changes: - Make DrainTimeout/SweepInterval package vars (configurable per-test) - startSweeper() lazily starts sweeper via sync.Once (not in init) - StopSweeper() stops running sweeper and waits for goroutine exit - ResetForTest() clears map and stops sweeper between tests - Add nil guard on srv.Instance in tryCloseAndDelete for testability - Add quit channel to sweeper so it can be stopped cleanly
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Follow-up to #79/#80 — two race conditions found in review:
Changes
Before closing: re-verify
servers[e.url] == e.srvANDDrainedAtstill non-zero.Before delete: re-verify
servers[e.url] == e.srv— skip if a new server was created for that URL.Both checks are under lock, making the operations effectively race-free.
Acceptance Criteria
Summary by Sourcery
Prevent race conditions in the xray server sweeper by decoupling collection from closing and re-validating entries under lock before closing and deletion.
Bug Fixes: