fix: release mu before Instance.Close() in sweeper (follow-up to #79)#80
fix: release mu before Instance.Close() in sweeper (follow-up to #79)#80
Conversation
Reviewer's GuideIntroduces a background sweeper that drains and later closes xray core instances outside the global mutex, adds drain/cleanup timing constants, and updates server lifecycle management (Close, CloseAll, getServer, setServer) to support delayed, non-blocking cleanup while keeping map operations safe. Sequence diagram for delayed server close via sweepersequenceDiagram
actor Caller
participant API as xray_package
participant Map as servers_map
participant Sweeper as sweeper_goroutine
participant Inst as core_Instance
Caller->>API: Close(proxyURL)
API->>Map: lock mu
API->>Map: lookup servers[proxyURL]
alt server_found_and_active
API->>Map: set DrainedAt = now
end
API->>Map: unlock mu
loop every_sweepInterval
Sweeper->>Sweeper: sleep sweepInterval
Sweeper->>Map: lock mu
Sweeper->>Map: scan servers for DrainedAt != zero
Sweeper->>Sweeper: collect expired entries
Sweeper->>Map: unlock mu
loop for_each_expired_entry
Sweeper->>Inst: Instance.Close()
Sweeper->>Map: lock mu
Sweeper->>Map: delete servers[url]
Sweeper->>Map: unlock mu
end
end
Class diagram for Server struct and lifecycle functionsclassDiagram
class Server {
+core.Instance Instance
+int SocksPort
+time.Time DrainedAt
}
class Package_xray {
<<module>>
+map~string, *Server~ servers
+sync.Mutex mu
+const time.Duration drainTimeout
+const time.Duration sweepInterval
+getServer(proxyURL string) *Server
+setServer(proxyURL string, instance *core.Instance, port int) void
+Close(proxyURL string) void
+CloseAll() void
+sweeper() void
}
Package_xray "1" o-- "*" Server : manages
Server --> "1" core.Instance : contains
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- There is a potential logic race in
sweeper: you collecturl/*Serverpairs under the lock, then laterdelete(servers, e.url)under a new lock; if a new server is created for the same URL after the first unlock, the sweeper will delete the new entry instead of the expired one—consider only deleting ifservers[e.url] == e.srvor otherwise tracking generations. - Similarly,
getServercan clearDrainedAt(revive) after the sweeper has already decided an entry is expired but before it is closed, leading to a server that appears active but will still be closed shortly; you may want to guard against this by re-checkingDrainedAtor an additional state flag when deciding to close.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There is a potential logic race in `sweeper`: you collect `url`/`*Server` pairs under the lock, then later `delete(servers, e.url)` under a new lock; if a new server is created for the same URL after the first unlock, the sweeper will delete the new entry instead of the expired one—consider only deleting if `servers[e.url] == e.srv` or otherwise tracking generations.
- Similarly, `getServer` can clear `DrainedAt` (revive) after the sweeper has already decided an entry is expired but before it is closed, leading to a server that appears active but will still be closed shortly; you may want to guard against this by re-checking `DrainedAt` or an additional state flag when deciding to close.
## Individual Comments
### Comment 1
<location path="xray/xray.go" line_range="92-101" />
<code_context>
+
+ // Collect expired URLs under lock, then release lock before closing
+ // to avoid blocking all map operations while Instance.Close() runs.
+ var expired []struct {
+ url string
+ srv *Server
+ }
+ mu.Lock()
+ now := time.Now()
+ for url, srv := range servers {
+ if !srv.DrainedAt.IsZero() && now.Sub(srv.DrainedAt) > drainTimeout {
+ expired = append(expired, struct {
+ url string
+ srv *Server
+ }{url, srv})
+ }
+ }
+ mu.Unlock()
+
+ // Close instances outside the critical section.
+ for _, e := range expired {
+ e.srv.Instance.Close() //nolint: errcheck
+ mu.Lock()
</code_context>
<issue_to_address>
**issue (bug_risk):** Reviving a draining server can race with the sweeper and still result in the instance being closed and deleted.
The sweeper computes expiry under the lock, releases it, and later re-enters the lock only to delete. In between, `getServer` can revive a draining server by resetting `DrainedAt` to zero, but the sweeper will still close and delete it because it never re-checks the expiry condition.
When re-entering the critical section, re-validate expiry before deletion. For example, store only URLs in `expired`, then on delete re-fetch `srv` from the map and confirm `DrainedAt` is still non-zero and past `drainTimeout`; if it was revived, skip closing and deleting.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| var expired []struct { | ||
| url string | ||
| srv *Server | ||
| } | ||
| mu.Lock() | ||
| now := time.Now() | ||
| for url, srv := range servers { | ||
| if !srv.DrainedAt.IsZero() && now.Sub(srv.DrainedAt) > drainTimeout { | ||
| expired = append(expired, struct { | ||
| url string |
There was a problem hiding this comment.
issue (bug_risk): Reviving a draining server can race with the sweeper and still result in the instance being closed and deleted.
The sweeper computes expiry under the lock, releases it, and later re-enters the lock only to delete. In between, getServer can revive a draining server by resetting DrainedAt to zero, but the sweeper will still close and delete it because it never re-checks the expiry condition.
When re-entering the critical section, re-validate expiry before deletion. For example, store only URLs in expired, then on delete re-fetch srv from the map and confirm DrainedAt is still non-zero and past drainTimeout; if it was revived, skip closing and deleting.
Collect expired entries under lock, then release lock and close instances outside the critical section to avoid blocking all map operations if Close() is slow.
aba44f0 to
4471fb5
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #80 +/- ##
========================================
+ Coverage 8.79% 8.93% +0.13%
========================================
Files 27 27
Lines 1523 1533 +10
========================================
+ Hits 134 137 +3
- Misses 1374 1381 +7
Partials 15 15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Background
Follow-up to #79 — the previous fix held the mutex while calling
srv.Instance.Close(), which could block all map operations if Close() is slow.Changes
Collect expired entries under lock, release lock, then close instances outside the critical section. The
delete()call runs under lock again but is fast.Acceptance Criteria
Close()andCloseAll()work correctlyInstance.Close()Summary by Sourcery
Introduce a background sweeper to drain and close xray server instances asynchronously after a timeout, avoiding holding the global mutex during potentially slow Close calls.
Bug Fixes:
Enhancements: