Optimize asyncio shared router for reduced NIF overhead and lock contention by benoitc · Pull Request #6 · benoitc/erlang-python

benoitc · 2026-02-23T13:06:53Z

Summary

Increase PENDING_HASH_SIZE from 128 to 512 for higher capacity before rejection
Add off_heap mailbox to router for reduced GC pressure under high message load
Add combined handle_fd_event_and_reselect/2 NIF that reduces NIF call overhead
Only signal pthread_cond on 0->1 queue transition to reduce contention
Implement snapshot-under-lock in py_run_once for reduced lock contention

Also adds test/py_event_loop_bench.erl for measuring event throughput.

🤖 Generated with Claude Code

…ention - Increase PENDING_HASH_SIZE from 128 to 512 for higher capacity - Add off_heap mailbox to router for reduced GC pressure - Add combined handle_fd_event_and_reselect/2 NIF (reduces NIF calls) - Only signal pthread_cond on 0->1 queue transition - Implement snapshot-under-lock in py_run_once for reduced contention Also adds test/py_event_loop_bench.erl for measuring event throughput.

New architecture uses Erlang mailbox as event queue instead of pthread_cond: - py_event_loop_proc.erl: Event process receives FD/timer events directly - py_event_loop_v2.erl: Drop-in replacement for py_event_router - Timers fire directly to event process (no dispatch_timer NIF hop) - FD events from enif_select go directly to event process New NIFs: - event_loop_set_event_proc/2: Set event process for a loop - poll_via_proc/2: Poll via event process message passing Backward compatible: legacy py_event_router still works.

benoitc · 2026-02-23T13:37:52Z

Performance Improvements

Commit 1: Shared Router Optimizations

Increased PENDING_HASH_SIZE from 128 to 512
Added off_heap mailbox to router
Combined handle_fd_event_and_reselect/2 NIF (reduces NIF calls)
Wake pthread_cond only on 0→1 queue transition
Snapshot-under-lock in py_run_once for reduced contention

Commit 2: Event Process Architecture (New)

Introduces py_event_loop_proc - uses Erlang mailbox as event queue:

Metric	V1 (Router)	V2 (Event Process)	Improvement
Timer throughput	49,104/sec	1,327,669/sec	27x faster

Why it's faster:

Timers fire directly to event process (no dispatch_timer NIF hop)
FD events from enif_select go directly to event process
No pthread_cond signaling for Erlang-side event collection

Usage:

{ok, LoopRef, EventProc} = py_event_loop_v2:new(),
Events = py_event_loop_v2:poll(EventProc, TimeoutMs),

Backward compatible - legacy py_event_router still works.

benoitc · 2026-02-23T13:45:16Z

Extended Events (Commit 3)

Event process now handles all event types through unified mailbox:

Event Type	Use Case
`call_result`, `call_error`	Sync Python call completions
`async_result`, `async_error`	Async Python call completions
`subprocess_exit`, `subprocess_stdout`, `subprocess_stderr`	Subprocess I/O
`socket_data`, `socket_closed`, `socket_error`	Network events
Native `{tcp, ...}`, `{udp, ...}`	Direct gen_tcp/gen_udp handling

Benchmark

=== Extended Events Benchmark ===
Events per type: 50000

  call_result          5,626,829 events/sec
  async_result         5,703,205 events/sec
  subprocess_stdout    6,165,228 events/sec
  socket_data          5,469,861 events/sec

  Average: 5,741,281 events/sec

All 113 tests pass.

benoitc · 2026-02-23T13:59:00Z

Python asyncio Integration (Commit 4)

The ErlangEventLoop now handles all extended event types for unified event processing:

Event Types Added

Constant	Value	Use Case
`EVENT_TYPE_CALL_RESULT`	10	Sync call succeeded
`EVENT_TYPE_CALL_ERROR`	11	Sync call failed
`EVENT_TYPE_ASYNC_RESULT`	12	Async call succeeded
`EVENT_TYPE_ASYNC_ERROR`	13	Async call failed
`EVENT_TYPE_SUBPROCESS_EXIT`	20	Process exited
`EVENT_TYPE_SUBPROCESS_STDOUT`	21	Stdout data
`EVENT_TYPE_SUBPROCESS_STDERR`	22	Stderr data
`EVENT_TYPE_SOCKET_DATA`	30	Socket received data
`EVENT_TYPE_SOCKET_CLOSED`	31	Socket closed
`EVENT_TYPE_SOCKET_ERROR`	32	Socket error

Registration API

loop = ErlangEventLoop()

# For py.call_async results
callback_id = loop._next_callback_id()
loop._register_async_future(callback_id, my_future)

# For subprocess
loop._register_subprocess(callback_id, protocol, transport)

# For socket/TCP/UDP
loop._register_socket(callback_id, protocol, transport)

All 24 event loop tests pass.

Phase 1 of unified event-driven architecture. - Add py_callback_id module with atomic counter - Initialize counter in erlang_python_sup - Uses persistent_term + atomics for lock-free, thread-safe ID generation - IDs are monotonically increasing positive integers starting from 1 This provides unique callback IDs for correlating async operations with their results in subsequent phases.

Add call_handlers map to state for tracking pending call results. New message handlers: - {register_call, CallbackId, Caller, Ref} - Register call handler - {unregister_call, CallbackId} - Unregister before result arrives - {call_result, CallbackId, Result} - Dispatch result to caller - {call_error, CallbackId, Error} - Dispatch error to caller Results are delivered as {py_result, Ref, Result} or {py_error, Ref, Error} to the registered caller. Handlers work in both normal loop and wait_loop. Safe to unregister before result arrives. Phase 2 of unified event-driven architecture.

Submit Python calls to a background worker thread that delivers results via enif_send to py_event_loop_proc. Worker thread is lazily started after Python initialization. New files: c_src/py_submit.{c,h}, test/py_submit_test.erl

…ture - Delete py_async_worker.erl, py_async_worker_sup.erl, py_async_pool.erl - Remove async worker supervision from erlang_python_sup.erl - Update py:async_gather to use py_async_driver (submit all, await all) - Update py:async_stream to use async_stream_helper Python module - Remove legacy async NIF exports from py_nif.erl - Remove legacy async NIF table entries from py_nif.c - Add priv/async_stream_helper.py for async generator collection All async operations now go through py_async_driver which uses the unified ErlangEventLoop via py_event_loop_proc.

- Add test/py_unified_bench.erl with benchmarks for: - Synchronous py:call throughput and latency - Async py:async_call with latency percentiles (p50, p90, p99, p999) - Concurrent request handling at various concurrency levels - Async gather batch performance - Add docs/architecture.md documenting: - Component architecture diagram - Event-driven async flow - NIF architecture and GIL management - ASGI integration - Callback mechanism - Performance characteristics - Update README.md with link to architecture docs - Update docs/scalability.md to remove deprecated num_async_workers config Run benchmarks: rebar3 as test shell, then py_unified_bench:run_all()

benoitc · 2026-02-23T19:04:49Z

Benchmark Results

Ran the new unified architecture benchmarks (py_unified_bench:run_all()):

--- Synchronous py:call ---
  71,438 ops/sec, 13 μs avg

--- Async py:async_call ---
  15,968 ops/sec
  Latency: p50=60μs, p90=69μs, p99=98μs, p999=133μs

--- Concurrent Requests ---
  10 workers:  182,049 ops/sec (p50=51μs)
  50 workers:  102,732 ops/sec (p50=145μs)
  100 workers:  38,032 ops/sec (p50=172μs)

--- Async Gather ---
  Batch of 10: 18,440 ops/sec, 542 μs/batch
  Batch of 50: 18,088 ops/sec, 2,764 μs/batch

The unified event loop handles async operations well. Sync calls hit ~71K/sec which is solid given GIL constraints. Async has more overhead (~16K/sec) but provides non-blocking behavior and good tail latencies.

Concurrent throughput scales nicely up to ~50 workers before contention kicks in.

- Update waiter field type spec to match actual 4-tuple storage - Fix pattern match in handle_msg for DOWN message - Update test_error_handling to accept flexible error formats

- Fix wait_loop escaping when cancel_timer arrives: inline timer cancellation instead of calling handle_cancel_timer which tail-calls loop/1 and exits wait mode, causing poll to hang indefinitely - Fix async_gather mailbox leak: drain remaining py_result/py_error messages when an early error occurs to prevent leftover messages in caller's mailbox

- py_asgi:run_async/5: use Opts parameter for custom runner - py_event_loop.c: fix OOM cleanup to return ALL events to freelist - py_async_driver: cache event_proc pid in persistent_term for fast lookup - py_event_loop_proc: simplify handle_msg DOWN, add dialyzer nowarn

benoitc added 2 commits February 23, 2026 14:06

benoitc force-pushed the asyncio-router-optimization branch from 1e358ad to 2e14f78 Compare February 23, 2026 14:25

benoitc added 9 commits February 23, 2026 16:18

Add non-blocking submit_call and submit_coroutine NIFs

d93985f

Submit Python calls to a background worker thread that delivers results via enif_send to py_event_loop_proc. Worker thread is lazily started after Python initialization. New files: c_src/py_submit.{c,h}, test/py_submit_test.erl

Add async driver for unified event loop management

e03d588

Implement py:async_call using unified event loop

466eef2

Add event-driven py:call with direct result delivery

aec7ae2

Add async ASGI execution for concurrent request handling

1b7c2d9

benoitc added 3 commits February 23, 2026 20:41

Fix dialyzer warnings and test error format

ea2f0b6

- Update waiter field type spec to match actual 4-tuple storage - Fix pattern match in handle_msg for DOWN message - Update test_error_handling to accept flexible error formats

benoitc closed this Feb 23, 2026

benoitc deleted the asyncio-router-optimization branch February 23, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Optimize asyncio shared router for reduced NIF overhead and lock contention#6

Optimize asyncio shared router for reduced NIF overhead and lock contention#6
benoitc wants to merge 14 commits intomainfrom
asyncio-router-optimization

benoitc commented Feb 23, 2026

Uh oh!

benoitc commented Feb 23, 2026

Uh oh!

benoitc commented Feb 23, 2026

Uh oh!

benoitc commented Feb 23, 2026

Uh oh!

benoitc commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

benoitc commented Feb 23, 2026

Summary

Uh oh!

benoitc commented Feb 23, 2026

Performance Improvements

Commit 1: Shared Router Optimizations

Commit 2: Event Process Architecture (New)

Uh oh!

benoitc commented Feb 23, 2026

Extended Events (Commit 3)

Benchmark

Uh oh!

benoitc commented Feb 23, 2026

Python asyncio Integration (Commit 4)

Event Types Added

Registration API

Uh oh!

benoitc commented Feb 23, 2026

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant