Skip to content

Commit 4bf88eb

Browse files
committed
src: implement a prototype for uv coroutines
Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode/Open 4.6 src: add async context tracking for uv coroutines src: handle microtask/nextTick draining in coroutines src: fill in more details of the coroutine impl src: add coroutine README.md src: improve performance of coroutine implementation src: update coro readme details src: use per-env cached strings for coroutine type names
1 parent f3633ef commit 4bf88eb

File tree

13 files changed

+2434
-108
lines changed

13 files changed

+2434
-108
lines changed

node.gyp

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,12 @@
217217
'src/cleanup_queue.h',
218218
'src/cleanup_queue-inl.h',
219219
'src/compile_cache.h',
220-
'src/connect_wrap.h',
221-
'src/connection_wrap.h',
220+
'src/connect_wrap.h',
221+
'src/connection_wrap.h',
222+
'src/coro/uv_task.h',
223+
'src/coro/uv_tracked_task.h',
224+
'src/coro/uv_awaitable.h',
225+
'src/coro/uv_promise.h',
222226
'src/cppgc_helpers.h',
223227
'src/cppgc_helpers.cc',
224228
'src/dataqueue/queue.h',

src/api/callback.cc

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ using v8::Value;
2222
CallbackScope::CallbackScope(Isolate* isolate,
2323
Local<Object> object,
2424
async_context async_context)
25-
: CallbackScope(Environment::GetCurrent(isolate), object, async_context) {}
25+
: CallbackScope(Environment::GetCurrent(isolate), object, async_context) {}
2626

2727
CallbackScope::CallbackScope(Environment* env,
2828
Local<Object> object,
@@ -52,8 +52,7 @@ CallbackScope::CallbackScope(Environment* env,
5252
}
5353

5454
CallbackScope::~CallbackScope() {
55-
if (try_catch_.HasCaught())
56-
private_->MarkAsFailed();
55+
if (try_catch_.HasCaught()) private_->MarkAsFailed();
5756
delete private_;
5857
}
5958

@@ -86,7 +85,15 @@ InternalCallbackScope::InternalCallbackScope(
8685
} else {
8786
object = std::get<Global<Object>*>(object_arg);
8887
}
89-
std::visit([](auto* ptr) { CHECK_NOT_NULL(ptr); }, object);
88+
// Global<Object>* may be null when no resource object was created
89+
// (e.g., coroutine tasks when async_hooks are not active).
90+
// push_async_context already handles the null case by skipping the
91+
// native_execution_async_resources_ store.
92+
if (auto* gptr = std::get_if<Global<Object>*>(&object)) {
93+
CHECK_IMPLIES(*gptr != nullptr, !(*gptr)->IsEmpty());
94+
} else {
95+
std::visit([](auto* ptr) { CHECK_NOT_NULL(ptr); }, object);
96+
}
9097

9198
env->PushAsyncCallbackScope();
9299

@@ -217,8 +224,7 @@ MaybeLocal<Value> InternalMakeCallback(Environment* env,
217224
Local<Value> context_frame) {
218225
CHECK(!recv.IsEmpty());
219226
#ifdef DEBUG
220-
for (int i = 0; i < argc; i++)
221-
CHECK(!argv[i].IsEmpty());
227+
for (int i = 0; i < argc; i++) CHECK(!argv[i].IsEmpty());
222228
#endif
223229

224230
Local<Function> hook_cb = env->async_hooks_callback_trampoline();
@@ -231,8 +237,9 @@ MaybeLocal<Value> InternalMakeCallback(Environment* env,
231237
flags = InternalCallbackScope::kSkipAsyncHooks;
232238
use_async_hooks_trampoline =
233239
async_hooks->fields()[AsyncHooks::kBefore] +
234-
async_hooks->fields()[AsyncHooks::kAfter] +
235-
async_hooks->fields()[AsyncHooks::kUsesExecutionAsyncResource] > 0;
240+
async_hooks->fields()[AsyncHooks::kAfter] +
241+
async_hooks->fields()[AsyncHooks::kUsesExecutionAsyncResource] >
242+
0;
236243
}
237244

238245
InternalCallbackScope scope(

src/coro/README.md

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# C++20 Coroutine support for libuv
2+
3+
This directory contains an experimental C++20 coroutine layer for writing
4+
asynchronous libuv operations as sequential C++ code using `co_await`.
5+
6+
The primary goal is to allow multi-step async operations (such as
7+
open + stat + read + close) to be written as straight-line C++ instead of
8+
callback chains, while maintaining full integration with Node.js async\_hooks,
9+
AsyncLocalStorage, microtask draining, and environment lifecycle management.
10+
11+
## File overview
12+
13+
* `uv_task.h` -- `UvTask<T>`: The lightweight, untracked coroutine return type.
14+
No V8 or Node.js dependencies. Suitable for internal C++ coroutines that do
15+
not need async\_hooks visibility or task queue draining.
16+
17+
* `uv_tracked_task.h` -- `UvTrackedTask<T, Name>`: The fully-integrated
18+
coroutine return type. Each resume-to-suspend segment is wrapped in an
19+
`InternalCallbackScope`, giving it the same semantics as any other callback
20+
entry into Node.js. The `Name` template parameter is a compile-time string
21+
that identifies the async resource type visible to `async_hooks.createHook()`.
22+
23+
* `uv_awaitable.h` -- Awaitable wrappers for libuv async operations:
24+
`UvFsAwaitable` (fs operations), `UvFsStatAwaitable` (stat-family),
25+
`UvWorkAwaitable` (thread pool work), and `UvGetAddrInfoAwaitable`
26+
(DNS resolution). Each embeds the libuv request struct directly in the
27+
coroutine frame, avoiding separate heap allocations. Each also exposes a
28+
`cancelable_req()` method returning the underlying `uv_req_t*` for
29+
cancellation support during environment teardown.
30+
31+
* `uv_promise.h` -- Helpers for bridging coroutines to JavaScript Promises:
32+
`MakePromise()`, `ResolvePromise()`, `RejectPromiseWithUVError()`. The
33+
resolve and reject helpers guard against calling V8 APIs when the
34+
environment is shutting down (`can_call_into_js()` check).
35+
36+
## Usage
37+
38+
### Basic pattern (binding function)
39+
40+
```cpp
41+
// The coroutine. The return type carries the async resource name as
42+
// a compile-time template argument.
43+
static coro::UvTrackedTask<void, "FSREQPROMISE"> DoAccessImpl(
44+
Environment* env,
45+
v8::Global<v8::Promise::Resolver> resolver,
46+
std::string path,
47+
int mode) {
48+
ssize_t result = co_await coro::UvFs(
49+
env->event_loop(), uv_fs_access, path.c_str(), mode);
50+
if (result < 0)
51+
coro::RejectPromiseWithUVError(env, resolver, result, "access",
52+
path.c_str());
53+
else
54+
coro::ResolvePromiseUndefined(env, resolver);
55+
}
56+
57+
// The binding entry point (called from JavaScript).
58+
static void Access(const FunctionCallbackInfo<Value>& args) {
59+
Environment* env = Environment::GetCurrent(args);
60+
// ... parse args, check permissions ...
61+
62+
auto resolver = coro::MakePromise(env, args);
63+
auto task = DoAccessImpl(env, std::move(resolver), path, mode);
64+
task.InitTracking(env); // assigns async_id, captures context, emits init
65+
task.Start(); // begins execution (fire-and-forget)
66+
}
67+
```
68+
69+
### Multi-step operations
70+
71+
Multiple libuv calls within a single coroutine are sequential co\_await
72+
expressions. The intermediate steps (between two co\_await points) are pure C++
73+
with no V8 overhead:
74+
75+
```cpp
76+
static coro::UvTrackedTask<void, "COROREADFILE"> ReadFileImpl(
77+
Environment* env,
78+
v8::Global<v8::Promise::Resolver> resolver,
79+
std::string path) {
80+
ssize_t fd = co_await coro::UvFs(
81+
env->event_loop(), uv_fs_open, path.c_str(), O_RDONLY, 0);
82+
if (fd < 0) { /* reject and co_return */ }
83+
84+
auto [err, stat] = co_await coro::UvFsStat(
85+
env->event_loop(), uv_fs_fstat, static_cast<uv_file>(fd));
86+
// ... read, close, resolve ...
87+
}
88+
```
89+
90+
### Coroutine composition
91+
92+
`UvTask<T>` and `UvTrackedTask<T, Name>` can be co\_awaited from other
93+
coroutines. This allows factoring common operations into reusable helpers:
94+
95+
```cpp
96+
UvTask<ssize_t> OpenFile(uv_loop_t* loop, const char* path, int flags) {
97+
co_return co_await UvFs(loop, uv_fs_open, path, flags, 0);
98+
}
99+
100+
UvTrackedTask<void, "MYOP"> OuterCoroutine(Environment* env, ...) {
101+
ssize_t fd = co_await OpenFile(env->event_loop(), path, O_RDONLY);
102+
// ...
103+
}
104+
```
105+
106+
## Lifecycle
107+
108+
### UvTask (untracked)
109+
110+
`UvTask<T>` uses lazy initialization. The coroutine does not run until it is
111+
either co\_awaited from another coroutine (symmetric transfer) or explicitly
112+
started with `Start()`. When `Start()` is called, the coroutine runs until its
113+
first `co_await`, then control returns to the caller. The coroutine frame
114+
self-destructs when the coroutine completes.
115+
116+
### UvTrackedTask (tracked)
117+
118+
`UvTrackedTask<T, Name>` follows the same lazy/fire-and-forget pattern but
119+
adds three phases around `Start()`:
120+
121+
1. **Creation**: The coroutine frame is allocated from the thread-local
122+
free-list (see "Frame allocator" below). The coroutine is suspended at
123+
`initial_suspend` (lazy).
124+
125+
2. **`InitTracking(env)`**: Assigns an `async_id`, captures the current
126+
`async_context_frame` (for AsyncLocalStorage propagation), emits a trace
127+
event, and registers in the Environment's coroutine task list for
128+
cancellation during teardown. If async\_hooks listeners are active
129+
(`kInit > 0` or `kUsesExecutionAsyncResource > 0`), a resource object
130+
is created for `executionAsyncResource()` and the `init` hook is emitted.
131+
The type name V8 string is cached per Isolate in
132+
`IsolateData::static_str_map`, so only the first coroutine of a given
133+
type per Isolate pays the `String::NewFromUtf8` cost.
134+
135+
3. **`Start()`**: Marks the task as detached (fire-and-forget) and resumes
136+
the coroutine. Each resume-to-suspend segment is wrapped in an
137+
`InternalCallbackScope` that provides:
138+
* async\_hooks `before`/`after` events
139+
* `async_context_frame` save/restore (AsyncLocalStorage)
140+
* Microtask and `process.nextTick` draining on close
141+
* `request_waiting_` counter management for event loop liveness
142+
143+
4. **Completion**: At `final_suspend`, the last `InternalCallbackScope` is
144+
closed (draining task queues), the async\_hooks `destroy` event is emitted,
145+
the task is unregistered from the Environment, and the coroutine frame is
146+
returned to the thread-local free-list. If a detached coroutine has a
147+
captured C++ exception that was never observed, `std::terminate()` is
148+
called rather than silently discarding it.
149+
150+
## How the awaitable dispatch works
151+
152+
The `UvFs()` factory function returns a `UvFsAwaitable` that embeds a `uv_fs_t`
153+
directly in the coroutine frame. When the coroutine hits `co_await`:
154+
155+
1. `await_transform()` on the promise wraps it in a `TrackedAwaitable`.
156+
2. `TrackedAwaitable::await_suspend()`:
157+
* Closes the current `InternalCallbackScope` (drains microtasks/nextTick).
158+
* Records the `uv_req_t*` for cancellation support (via `cancelable_req()`).
159+
* Increments `request_waiting_` (event loop liveness).
160+
* Calls the inner `await_suspend()`, which dispatches the libuv call with
161+
`req_.data = this` pointing back to the awaitable.
162+
3. The coroutine is suspended. Control returns to the event loop.
163+
4. When the libuv operation completes, `OnComplete()` calls
164+
`handle_.resume()` to resume the coroutine.
165+
5. `TrackedAwaitable::await_resume()`:
166+
* Decrements `request_waiting_`.
167+
* Clears the cancellation pointer.
168+
* Opens a new `InternalCallbackScope` for the next segment.
169+
* Returns the result (e.g., `req_.result` for fs operations).
170+
171+
The liveness counter and cancellation tracking are conditional on the inner
172+
awaitable having a `cancelable_req()` method (checked at compile time via a
173+
`requires` expression). When co\_awaiting another `UvTask` or `UvTrackedTask`
174+
(coroutine composition), these steps are skipped.
175+
176+
## Environment teardown
177+
178+
During `Environment::CleanupHandles()`, the coroutine task list is iterated and
179+
`Cancel()` is called on each active task. This calls `uv_cancel()` on the
180+
in-flight libuv request (if any), which causes the libuv callback to fire with
181+
`UV_ECANCELED`. The coroutine resumes, sees the error, and completes normally.
182+
The `request_waiting_` counter ensures the teardown loop waits for all
183+
coroutine I/O to finish before destroying the Environment.
184+
185+
## Frame allocator
186+
187+
Coroutine frames are allocated from a thread-local free-list rather than going
188+
through `malloc`/`free` on every creation and destruction. This is implemented
189+
via `promise_type::operator new` and `operator delete` in `TrackedPromiseBase`,
190+
which route through `CoroFrameAlloc()` and `CoroFrameFree()`.
191+
192+
The free-list uses size-class buckets with 256-byte granularity, covering
193+
frames up to 4096 bytes (which covers typical coroutine frames). Frames larger
194+
than 4096 bytes fall through to the global `operator new`. Since all coroutines
195+
run on the event loop thread, the free-list requires no locking.
196+
197+
Each bucket has a high-water mark of 32 cached frames. When a frame is freed
198+
and its bucket is already at capacity, the frame is returned directly to the
199+
system allocator instead of being cached. This bounds the retained memory
200+
per bucket to at most 32 \* bucket\_size bytes (e.g., 32 \* 1024 = 32KB for the
201+
1024-byte size class), preventing unbounded growth after a burst of concurrent
202+
coroutines.
203+
204+
After the first coroutine of a given size class completes, subsequent
205+
coroutines of the same size class are allocated from the free-list with zero
206+
`malloc` overhead.
207+
208+
## Allocation comparison with ReqWrap
209+
210+
For a single async operation (e.g., `fsPromises.access`):
211+
212+
| | ReqWrap pattern | Coroutine (no hooks) | Coroutine (hooks active) |
213+
| -------------------- | --------------- | -------------------- | ------------------------ |
214+
| C++ heap allocations | 3 | 0 (free-list hit) | 0 (free-list hit) |
215+
| V8 heap objects | 7 | 2 (resolver+promise) | 3 (+ resource object) |
216+
| Total allocations | 10 | 2 | 3 |
217+
218+
For a multi-step operation (open + stat + read + close):
219+
220+
| | 4x ReqWrap | Single coroutine (no hooks) | Single coroutine (hooks active) |
221+
| ----------------------------- | ---------- | --------------------------- | ------------------------------- |
222+
| C++ heap allocations | 12 | 0 (free-list hit) | 0 (free-list hit) |
223+
| V8 heap objects | 28 | 2 | 3 |
224+
| Total allocations | 40 | 2 | 3 |
225+
| InternalCallbackScope entries | 4 | 5 (one per segment) | 5 |
226+
227+
The coroutine frame embeds the `uv_fs_t` (\~440 bytes) directly. The compiler
228+
may overlay non-simultaneously-live awaitables in the frame, so a multi-step
229+
coroutine does not necessarily pay N times the `uv_fs_t` cost.
230+
231+
## Known limitations
232+
233+
* **Heap snapshot visibility**: The coroutine frame is not visible to V8 heap
234+
snapshots or `MemoryRetainer`. The thread-local free-list allocator reduces
235+
malloc pressure but does not provide V8 with per-frame memory accounting.
236+
The exact frame contents are not inspectable from heap snapshot tooling.
237+
238+
* **Snapshot serialization**: `UvTrackedTask` holds `v8::Global` handles that
239+
cannot be serialized into a startup snapshot. There is currently no safety
240+
check to prevent snapshotting while coroutines are active. In practice this
241+
is not a problem because snapshots are taken at startup before I/O begins.
242+
243+
* **Trace event names**: The existing `AsyncWrap` trace events use a
244+
`ProviderType` enum with a switch statement to select the trace event name.
245+
The coroutine pattern uses free-form string names. The `init` trace event
246+
uses the provided name; the `destroy` trace event currently uses a generic
247+
`"coroutine"` category name rather than the per-instance name.
248+
249+
* **Free-list retention**: The thread-local free-list retains up to 32 frames
250+
per size class bucket after a burst of concurrent coroutines. These frames
251+
are held until reused or the thread exits. The bound is configurable via
252+
`kMaxCachedPerBucket`.
253+
254+
* **Cached type name strings**: The type name `v8::Eternal<v8::String>` is
255+
cached in `IsolateData::static_str_map`, keyed by the `const char*` from
256+
the `ConstString` template parameter. This is per-Isolate and safe with
257+
Worker threads (each Worker has its own `IsolateData`). The Eternal handles
258+
are never freed, but there is at most one per unique type name string per
259+
Isolate.

0 commit comments

Comments
 (0)