Skip to content

Add io_close hook to TestScheduler#171

Merged
ioquatix merged 2 commits into
mainfrom
test-ruby-gc-bug
May 12, 2026
Merged

Add io_close hook to TestScheduler#171
ioquatix merged 2 commits into
mainfrom
test-ruby-gc-bug

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

@samuel-williams-shopify samuel-williams-shopify commented May 9, 2026

When a TestScheduler is active and IO#close is invoked from inside a scheduled fiber, Ruby looks for an io_close scheduler hook. Until now the test scheduler did not define one, so Ruby fell back to rb_nogvl and blocking_operation_wait — routing every close through the worker pool unnecessarily.

This PR adds the hook:

  • If the configured selector supports io_close (e.g. URing), delegate to it.
  • Otherwise close the descriptor synchronously inside Fiber.blocking { ... }, which suppresses scheduler hooks for the duration of the close.

Handles both legacy IO objects and raw Integer file descriptors — Ruby 4.0+ passes the raw fd to the hook.

The change is one method, fourteen lines.

samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, the C frame of rb_fiber_scheduler_blocking_operation_wait
is no longer active. In optimised builds (-O3 --enable-shared), blocking_operation
may be held only in a machine register not saved/scanned by the conservative GC,
allowing it to be collected. get_blocking_operation() at line 1104 then reads
freed/reused memory, crashing with rb_unexpected_object_type.

Confirmed by reproducing the crash using:
  ./configure --enable-shared --disable-install-doc --enable-yjit cppflags=-DENABLE_PATH_CHECK=0

RB_GC_GUARD(blocking_operation) after rb_funcall forces the compiler to keep
the VALUE on the stack (volatile read), ensuring the GC always finds it.

See: socketry/io-event#170
     socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

Use rb_gc_register_address to pin blocking_operation as a precise GC root
during rb_funcall. The scheduler's blocking_operation_wait may cause a fiber
switch via rb_fiber_scheduler_block, which suspends the calling fiber. The
conservative GC does not find the VALUE on the suspended fiber's C stack
(possibly due to it being in a machine register not captured in the saved
context), so the object can be collected or moved without updating the local
VALUE. rb_gc_register_address ensures the object is a precise root that is
always found and properly handled by both the regular and compacting GC.
rb_gc_unregister_address is called after the last use of the raw
pointer (which is derived from blocking_operation) to avoid a dangling
registered address.

Confirmed by io-event CI which reliably crashes without this fix and passes
with it: socketry/io-event#171

Co-authored-by: Cursor <cursoragent@cursor.com>
@samuel-williams-shopify samuel-williams-shopify force-pushed the test-ruby-gc-bug branch 3 times, most recently from 31cc39f to 10807b5 Compare May 10, 2026 02:13
@samuel-williams-shopify samuel-williams-shopify changed the title Test io-event against Ruby GC bug (blocking_operation safety) Add io_close to TestScheduler May 10, 2026
@samuel-williams-shopify samuel-williams-shopify force-pushed the test-ruby-gc-bug branch 16 times, most recently from 099bcf4 to 92d1830 Compare May 12, 2026 02:22
@samuel-williams-shopify samuel-williams-shopify changed the title Add io_close to TestScheduler Add io_close hook to TestScheduler May 12, 2026
When a TestScheduler is active, IO#close was previously falling back
to rb_nogvl and blocking_operation_wait. Adding the io_close scheduler
hook routes the close through the selector (when supported) or closes
the descriptor synchronously via Fiber.blocking, avoiding an
unnecessary trip through the worker pool.

Handles both legacy IO objects and raw Integer file descriptors (Ruby
4.0+ passes the raw fd to the hook).

Co-authored-by: Cursor <cursoragent@cursor.com>
@samuel-williams-shopify samuel-williams-shopify force-pushed the test-ruby-gc-bug branch 7 times, most recently from f6ee054 to 9d5506c Compare May 12, 2026 03:53
Ruby's fiber-scheduler `io_close` hook (Ruby 4.0+, see `rb_fiber_scheduler_io_close` in CRuby) is invoked with a raw integer file descriptor — never an `IO` object. Earlier Rubies don't invoke the hook at all.

Only `URing` implements `io_close` (async close via the ring); other selectors let Ruby use its default `IO#close` path. Both `Debug::Selector` and `TestScheduler` now define a small `Forwarders` module whose methods are mixed into their singleton class only when the wrapped selector actually implements the corresponding method. This preserves async close when wrapping `URing` and keeps `respond_to?` reflecting the real backend.

Drops the dead `IO`-object branch from `uring.c`, the `Forwarders` doc, and the test — Ruby's contract is integer-only.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ioquatix ioquatix merged commit 7eb061e into main May 12, 2026
54 of 60 checks passed
@ioquatix ioquatix deleted the test-ruby-gc-bug branch May 12, 2026 04:04
@samuel-williams-shopify samuel-williams-shopify added this to the v1.16.0 milestone May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants