Skip to content

fix: release mu before Instance.Close() in sweeper (follow-up to #79)#80

Merged
cnlangzi merged 1 commit intomainfrom
fix/sweeper-mutex-close
Mar 30, 2026
Merged

fix: release mu before Instance.Close() in sweeper (follow-up to #79)#80
cnlangzi merged 1 commit intomainfrom
fix/sweeper-mutex-close

Conversation

@cnlangzi
Copy link
Copy Markdown
Owner

@cnlangzi cnlangzi commented Mar 30, 2026

Background

Follow-up to #79 — the previous fix held the mutex while calling srv.Instance.Close(), which could block all map operations if Close() is slow.

Changes

Collect expired entries under lock, release lock, then close instances outside the critical section. The delete() call runs under lock again but is fast.

Acceptance Criteria

  • No goroutine accumulation
  • Memory stays stable
  • Close() and CloseAll() work correctly
  • sweeper does not hold mu while calling slow Instance.Close()

Summary by Sourcery

Introduce a background sweeper to drain and close xray server instances asynchronously after a timeout, avoiding holding the global mutex during potentially slow Close calls.

Bug Fixes:

  • Prevent the sweeper from blocking all server map operations by releasing the mutex before invoking potentially slow Instance.Close calls.
  • Ensure Close and CloseAll mark servers as draining and rely on the sweeper to close instances, reducing goroutine and memory leaks.

Enhancements:

  • Track server drain state and support reviving draining servers on access to allow in-flight operations to complete cleanly before closure.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Mar 30, 2026

Reviewer's Guide

Introduces a background sweeper that drains and later closes xray core instances outside the global mutex, adds drain/cleanup timing constants, and updates server lifecycle management (Close, CloseAll, getServer, setServer) to support delayed, non-blocking cleanup while keeping map operations safe.

Sequence diagram for delayed server close via sweeper

sequenceDiagram
    actor Caller
    participant API as xray_package
    participant Map as servers_map
    participant Sweeper as sweeper_goroutine
    participant Inst as core_Instance

    Caller->>API: Close(proxyURL)
    API->>Map: lock mu
    API->>Map: lookup servers[proxyURL]
    alt server_found_and_active
        API->>Map: set DrainedAt = now
    end
    API->>Map: unlock mu

    loop every_sweepInterval
        Sweeper->>Sweeper: sleep sweepInterval
        Sweeper->>Map: lock mu
        Sweeper->>Map: scan servers for DrainedAt != zero
        Sweeper->>Sweeper: collect expired entries
        Sweeper->>Map: unlock mu

        loop for_each_expired_entry
            Sweeper->>Inst: Instance.Close()
            Sweeper->>Map: lock mu
            Sweeper->>Map: delete servers[url]
            Sweeper->>Map: unlock mu
        end
    end
Loading

Class diagram for Server struct and lifecycle functions

classDiagram
    class Server {
        +core.Instance Instance
        +int SocksPort
        +time.Time DrainedAt
    }

    class Package_xray {
        <<module>>
        +map~string, *Server~ servers
        +sync.Mutex mu
        +const time.Duration drainTimeout
        +const time.Duration sweepInterval
        +getServer(proxyURL string) *Server
        +setServer(proxyURL string, instance *core.Instance, port int) void
        +Close(proxyURL string) void
        +CloseAll() void
        +sweeper() void
    }

    Package_xray "1" o-- "*" Server : manages
    Server --> "1" core.Instance : contains
Loading

File-Level Changes

Change Details Files
Add background sweeper goroutine with drain timeout and sweep interval to close instances outside the mutex.
  • Define drainTimeout and sweepInterval constants governing delayed close and sweep frequency.
  • Introduce Server.DrainedAt timestamp field to track when an instance entered draining state.
  • Start sweeper() in init() to periodically scan servers for expired draining instances.
  • In sweeper(), collect expired servers under lock using current time, then unlock, Close() their instances, and delete them from the map with brief re-locks.
xray/xray.go
Adjust server lifecycle helpers to support draining semantics instead of immediate close.
  • In getServer, if a server is in draining state, reset DrainedAt to zero to revive it.
  • In setServer, initialize DrainedAt to the zero time for new servers.
  • Rewrite Close(proxyURL) to only mark a server as draining (set DrainedAt) when first closed, without closing the instance or deleting the entry immediately.
  • Rewrite CloseAll() to mark all active servers as draining using a shared timestamp instead of immediately closing instances and deleting them.
xray/xray.go
Minor import formatting cleanup.
  • Remove extra blank lines between grouped imports to tighten import block formatting.
xray/xray.go

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • There is a potential logic race in sweeper: you collect url/*Server pairs under the lock, then later delete(servers, e.url) under a new lock; if a new server is created for the same URL after the first unlock, the sweeper will delete the new entry instead of the expired one—consider only deleting if servers[e.url] == e.srv or otherwise tracking generations.
  • Similarly, getServer can clear DrainedAt (revive) after the sweeper has already decided an entry is expired but before it is closed, leading to a server that appears active but will still be closed shortly; you may want to guard against this by re-checking DrainedAt or an additional state flag when deciding to close.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There is a potential logic race in `sweeper`: you collect `url`/`*Server` pairs under the lock, then later `delete(servers, e.url)` under a new lock; if a new server is created for the same URL after the first unlock, the sweeper will delete the new entry instead of the expired one—consider only deleting if `servers[e.url] == e.srv` or otherwise tracking generations.
- Similarly, `getServer` can clear `DrainedAt` (revive) after the sweeper has already decided an entry is expired but before it is closed, leading to a server that appears active but will still be closed shortly; you may want to guard against this by re-checking `DrainedAt` or an additional state flag when deciding to close.

## Individual Comments

### Comment 1
<location path="xray/xray.go" line_range="92-101" />
<code_context>
+
+		// Collect expired URLs under lock, then release lock before closing
+		// to avoid blocking all map operations while Instance.Close() runs.
+		var expired []struct {
+			url  string
+			srv  *Server
+		}
+		mu.Lock()
+		now := time.Now()
+		for url, srv := range servers {
+			if !srv.DrainedAt.IsZero() && now.Sub(srv.DrainedAt) > drainTimeout {
+				expired = append(expired, struct {
+					url  string
+					srv  *Server
+				}{url, srv})
+			}
+		}
+		mu.Unlock()
+
+		// Close instances outside the critical section.
+		for _, e := range expired {
+			e.srv.Instance.Close() //nolint: errcheck
+			mu.Lock()
</code_context>
<issue_to_address>
**issue (bug_risk):** Reviving a draining server can race with the sweeper and still result in the instance being closed and deleted.

The sweeper computes expiry under the lock, releases it, and later re-enters the lock only to delete. In between, `getServer` can revive a draining server by resetting `DrainedAt` to zero, but the sweeper will still close and delete it because it never re-checks the expiry condition.

When re-entering the critical section, re-validate expiry before deletion. For example, store only URLs in `expired`, then on delete re-fetch `srv` from the map and confirm `DrainedAt` is still non-zero and past `drainTimeout`; if it was revived, skip closing and deleting.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread xray/xray.go
Comment on lines +92 to +101
var expired []struct {
url string
srv *Server
}
mu.Lock()
now := time.Now()
for url, srv := range servers {
if !srv.DrainedAt.IsZero() && now.Sub(srv.DrainedAt) > drainTimeout {
expired = append(expired, struct {
url string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Reviving a draining server can race with the sweeper and still result in the instance being closed and deleted.

The sweeper computes expiry under the lock, releases it, and later re-enters the lock only to delete. In between, getServer can revive a draining server by resetting DrainedAt to zero, but the sweeper will still close and delete it because it never re-checks the expiry condition.

When re-entering the critical section, re-validate expiry before deletion. For example, store only URLs in expired, then on delete re-fetch srv from the map and confirm DrainedAt is still non-zero and past drainTimeout; if it was revived, skip closing and deleting.

Collect expired entries under lock, then release lock and close
instances outside the critical section to avoid blocking all map
operations if Close() is slow.
@cnlangzi cnlangzi force-pushed the fix/sweeper-mutex-close branch from aba44f0 to 4471fb5 Compare March 30, 2026 03:14
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 8.93%. Comparing base (0b5ab19) to head (4471fb5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
xray/xray.go 25.00% 9 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main     #80      +/-   ##
========================================
+ Coverage   8.79%   8.93%   +0.13%     
========================================
  Files         27      27              
  Lines       1523    1533      +10     
========================================
+ Hits         134     137       +3     
- Misses      1374    1381       +7     
  Partials      15      15              
Flag Coverage Δ
Tests 8.93% <25.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cnlangzi cnlangzi merged commit 294a699 into main Mar 30, 2026
7 checks passed
@cnlangzi cnlangzi deleted the fix/sweeper-mutex-close branch March 30, 2026 03:18
@cnlangzi cnlangzi linked an issue Mar 30, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: xray goroutine leak in servers map (sweeper solution)

1 participant