Optimize asyncio shared router for reduced NIF overhead and lock contention#6
Closed
Optimize asyncio shared router for reduced NIF overhead and lock contention#6
Conversation
…ention - Increase PENDING_HASH_SIZE from 128 to 512 for higher capacity - Add off_heap mailbox to router for reduced GC pressure - Add combined handle_fd_event_and_reselect/2 NIF (reduces NIF calls) - Only signal pthread_cond on 0->1 queue transition - Implement snapshot-under-lock in py_run_once for reduced contention Also adds test/py_event_loop_bench.erl for measuring event throughput.
New architecture uses Erlang mailbox as event queue instead of pthread_cond: - py_event_loop_proc.erl: Event process receives FD/timer events directly - py_event_loop_v2.erl: Drop-in replacement for py_event_router - Timers fire directly to event process (no dispatch_timer NIF hop) - FD events from enif_select go directly to event process New NIFs: - event_loop_set_event_proc/2: Set event process for a loop - poll_via_proc/2: Poll via event process message passing Backward compatible: legacy py_event_router still works.
Owner
Author
Performance ImprovementsCommit 1: Shared Router Optimizations
Commit 2: Event Process Architecture (New)Introduces
Why it's faster:
Usage: {ok, LoopRef, EventProc} = py_event_loop_v2:new(),
Events = py_event_loop_v2:poll(EventProc, TimeoutMs),Backward compatible - legacy |
Owner
Author
Extended Events (Commit 3)Event process now handles all event types through unified mailbox:
BenchmarkAll 113 tests pass. |
Owner
Author
Python asyncio Integration (Commit 4)The Event Types Added
Registration APIloop = ErlangEventLoop()
# For py.call_async results
callback_id = loop._next_callback_id()
loop._register_async_future(callback_id, my_future)
# For subprocess
loop._register_subprocess(callback_id, protocol, transport)
# For socket/TCP/UDP
loop._register_socket(callback_id, protocol, transport)All 24 event loop tests pass. |
1e358ad to
2e14f78
Compare
Phase 1 of unified event-driven architecture. - Add py_callback_id module with atomic counter - Initialize counter in erlang_python_sup - Uses persistent_term + atomics for lock-free, thread-safe ID generation - IDs are monotonically increasing positive integers starting from 1 This provides unique callback IDs for correlating async operations with their results in subsequent phases.
Add call_handlers map to state for tracking pending call results.
New message handlers:
- {register_call, CallbackId, Caller, Ref} - Register call handler
- {unregister_call, CallbackId} - Unregister before result arrives
- {call_result, CallbackId, Result} - Dispatch result to caller
- {call_error, CallbackId, Error} - Dispatch error to caller
Results are delivered as {py_result, Ref, Result} or {py_error, Ref, Error}
to the registered caller. Handlers work in both normal loop and wait_loop.
Safe to unregister before result arrives.
Phase 2 of unified event-driven architecture.
Submit Python calls to a background worker thread that delivers
results via enif_send to py_event_loop_proc. Worker thread is
lazily started after Python initialization.
New files: c_src/py_submit.{c,h}, test/py_submit_test.erl
…ture - Delete py_async_worker.erl, py_async_worker_sup.erl, py_async_pool.erl - Remove async worker supervision from erlang_python_sup.erl - Update py:async_gather to use py_async_driver (submit all, await all) - Update py:async_stream to use async_stream_helper Python module - Remove legacy async NIF exports from py_nif.erl - Remove legacy async NIF table entries from py_nif.c - Add priv/async_stream_helper.py for async generator collection All async operations now go through py_async_driver which uses the unified ErlangEventLoop via py_event_loop_proc.
- Add test/py_unified_bench.erl with benchmarks for: - Synchronous py:call throughput and latency - Async py:async_call with latency percentiles (p50, p90, p99, p999) - Concurrent request handling at various concurrency levels - Async gather batch performance - Add docs/architecture.md documenting: - Component architecture diagram - Event-driven async flow - NIF architecture and GIL management - ASGI integration - Callback mechanism - Performance characteristics - Update README.md with link to architecture docs - Update docs/scalability.md to remove deprecated num_async_workers config Run benchmarks: rebar3 as test shell, then py_unified_bench:run_all()
Owner
Author
Benchmark ResultsRan the new unified architecture benchmarks ( The unified event loop handles async operations well. Sync calls hit ~71K/sec which is solid given GIL constraints. Async has more overhead (~16K/sec) but provides non-blocking behavior and good tail latencies. Concurrent throughput scales nicely up to ~50 workers before contention kicks in. |
- Update waiter field type spec to match actual 4-tuple storage - Fix pattern match in handle_msg for DOWN message - Update test_error_handling to accept flexible error formats
- Fix wait_loop escaping when cancel_timer arrives: inline timer cancellation instead of calling handle_cancel_timer which tail-calls loop/1 and exits wait mode, causing poll to hang indefinitely - Fix async_gather mailbox leak: drain remaining py_result/py_error messages when an early error occurs to prevent leftover messages in caller's mailbox
- py_asgi:run_async/5: use Opts parameter for custom runner - py_event_loop.c: fix OOM cleanup to return ALL events to freelist - py_async_driver: cache event_proc pid in persistent_term for fast lookup - py_event_loop_proc: simplify handle_msg DOWN, add dialyzer nowarn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
handle_fd_event_and_reselect/2NIF that reduces NIF call overheadpy_run_oncefor reduced lock contentionAlso adds
test/py_event_loop_bench.erlfor measuring event throughput.🤖 Generated with Claude Code