|
| 1 | +# C++20 Coroutine support for libuv |
| 2 | + |
| 3 | +This directory contains an experimental C++20 coroutine layer for writing |
| 4 | +asynchronous libuv operations as sequential C++ code using `co_await`. |
| 5 | + |
| 6 | +The primary goal is to allow multi-step async operations (such as |
| 7 | +open + stat + read + close) to be written as straight-line C++ instead of |
| 8 | +callback chains, while maintaining full integration with Node.js async\_hooks, |
| 9 | +AsyncLocalStorage, microtask draining, and environment lifecycle management. |
| 10 | + |
| 11 | +## File overview |
| 12 | + |
| 13 | +* `uv_task.h` -- `UvTask<T>`: The lightweight, untracked coroutine return type. |
| 14 | + No V8 or Node.js dependencies. Suitable for internal C++ coroutines that do |
| 15 | + not need async\_hooks visibility or task queue draining. |
| 16 | + |
| 17 | +* `uv_tracked_task.h` -- `UvTrackedTask<T, Name>`: The fully-integrated |
| 18 | + coroutine return type. Each resume-to-suspend segment is wrapped in an |
| 19 | + `InternalCallbackScope`, giving it the same semantics as any other callback |
| 20 | + entry into Node.js. The `Name` template parameter is a compile-time string |
| 21 | + that identifies the async resource type visible to `async_hooks.createHook()`. |
| 22 | + |
| 23 | +* `uv_awaitable.h` -- Awaitable wrappers for libuv async operations: |
| 24 | + `UvFsAwaitable` (fs operations), `UvFsStatAwaitable` (stat-family), |
| 25 | + `UvWorkAwaitable` (thread pool work), and `UvGetAddrInfoAwaitable` |
| 26 | + (DNS resolution). Each embeds the libuv request struct directly in the |
| 27 | + coroutine frame, avoiding separate heap allocations. Each also exposes a |
| 28 | + `cancelable_req()` method returning the underlying `uv_req_t*` for |
| 29 | + cancellation support during environment teardown. |
| 30 | + |
| 31 | +* `uv_promise.h` -- Helpers for bridging coroutines to JavaScript Promises: |
| 32 | + `MakePromise()`, `ResolvePromise()`, `RejectPromiseWithUVError()`. The |
| 33 | + resolve and reject helpers guard against calling V8 APIs when the |
| 34 | + environment is shutting down (`can_call_into_js()` check). |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +### Basic pattern (binding function) |
| 39 | + |
| 40 | +```cpp |
| 41 | +// The coroutine. The return type carries the async resource name as |
| 42 | +// a compile-time template argument. |
| 43 | +static coro::UvTrackedTask<void, "FSREQPROMISE"> DoAccessImpl( |
| 44 | + Environment* env, |
| 45 | + v8::Global<v8::Promise::Resolver> resolver, |
| 46 | + std::string path, |
| 47 | + int mode) { |
| 48 | + ssize_t result = co_await coro::UvFs( |
| 49 | + env->event_loop(), uv_fs_access, path.c_str(), mode); |
| 50 | + if (result < 0) |
| 51 | + coro::RejectPromiseWithUVError(env, resolver, result, "access", |
| 52 | + path.c_str()); |
| 53 | + else |
| 54 | + coro::ResolvePromiseUndefined(env, resolver); |
| 55 | +} |
| 56 | + |
| 57 | +// The binding entry point (called from JavaScript). |
| 58 | +static void Access(const FunctionCallbackInfo<Value>& args) { |
| 59 | + Environment* env = Environment::GetCurrent(args); |
| 60 | + // ... parse args, check permissions ... |
| 61 | + |
| 62 | + auto resolver = coro::MakePromise(env, args); |
| 63 | + auto task = DoAccessImpl(env, std::move(resolver), path, mode); |
| 64 | + task.InitTracking(env); // assigns async_id, captures context, emits init |
| 65 | + task.Start(); // begins execution (fire-and-forget) |
| 66 | +} |
| 67 | +``` |
| 68 | +
|
| 69 | +### Multi-step operations |
| 70 | +
|
| 71 | +Multiple libuv calls within a single coroutine are sequential co\_await |
| 72 | +expressions. The intermediate steps (between two co\_await points) are pure C++ |
| 73 | +with no V8 overhead: |
| 74 | +
|
| 75 | +```cpp |
| 76 | +static coro::UvTrackedTask<void, "COROREADFILE"> ReadFileImpl( |
| 77 | + Environment* env, |
| 78 | + v8::Global<v8::Promise::Resolver> resolver, |
| 79 | + std::string path) { |
| 80 | + ssize_t fd = co_await coro::UvFs( |
| 81 | + env->event_loop(), uv_fs_open, path.c_str(), O_RDONLY, 0); |
| 82 | + if (fd < 0) { /* reject and co_return */ } |
| 83 | +
|
| 84 | + auto [err, stat] = co_await coro::UvFsStat( |
| 85 | + env->event_loop(), uv_fs_fstat, static_cast<uv_file>(fd)); |
| 86 | + // ... read, close, resolve ... |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +### Coroutine composition |
| 91 | + |
| 92 | +`UvTask<T>` and `UvTrackedTask<T, Name>` can be co\_awaited from other |
| 93 | +coroutines. This allows factoring common operations into reusable helpers: |
| 94 | + |
| 95 | +```cpp |
| 96 | +UvTask<ssize_t> OpenFile(uv_loop_t* loop, const char* path, int flags) { |
| 97 | + co_return co_await UvFs(loop, uv_fs_open, path, flags, 0); |
| 98 | +} |
| 99 | + |
| 100 | +UvTrackedTask<void, "MYOP"> OuterCoroutine(Environment* env, ...) { |
| 101 | + ssize_t fd = co_await OpenFile(env->event_loop(), path, O_RDONLY); |
| 102 | + // ... |
| 103 | +} |
| 104 | +``` |
| 105 | +
|
| 106 | +## Lifecycle |
| 107 | +
|
| 108 | +### UvTask (untracked) |
| 109 | +
|
| 110 | +`UvTask<T>` uses lazy initialization. The coroutine does not run until it is |
| 111 | +either co\_awaited from another coroutine (symmetric transfer) or explicitly |
| 112 | +started with `Start()`. When `Start()` is called, the coroutine runs until its |
| 113 | +first `co_await`, then control returns to the caller. The coroutine frame |
| 114 | +self-destructs when the coroutine completes. |
| 115 | +
|
| 116 | +### UvTrackedTask (tracked) |
| 117 | +
|
| 118 | +`UvTrackedTask<T, Name>` follows the same lazy/fire-and-forget pattern but |
| 119 | +adds three phases around `Start()`: |
| 120 | +
|
| 121 | +1. **Creation**: The coroutine frame is allocated from the thread-local |
| 122 | + free-list (see "Frame allocator" below). The coroutine is suspended at |
| 123 | + `initial_suspend` (lazy). |
| 124 | +
|
| 125 | +2. **`InitTracking(env)`**: Assigns an `async_id`, captures the current |
| 126 | + `async_context_frame` (for AsyncLocalStorage propagation), emits a trace |
| 127 | + event, and registers in the Environment's coroutine task list for |
| 128 | + cancellation during teardown. If async\_hooks listeners are active |
| 129 | + (`kInit > 0` or `kUsesExecutionAsyncResource > 0`), a resource object |
| 130 | + is created for `executionAsyncResource()` and the `init` hook is emitted. |
| 131 | + The type name V8 string is cached per Isolate in |
| 132 | + `IsolateData::static_str_map`, so only the first coroutine of a given |
| 133 | + type per Isolate pays the `String::NewFromUtf8` cost. |
| 134 | +
|
| 135 | +3. **`Start()`**: Marks the task as detached (fire-and-forget) and resumes |
| 136 | + the coroutine. Each resume-to-suspend segment is wrapped in an |
| 137 | + `InternalCallbackScope` that provides: |
| 138 | + * async\_hooks `before`/`after` events |
| 139 | + * `async_context_frame` save/restore (AsyncLocalStorage) |
| 140 | + * Microtask and `process.nextTick` draining on close |
| 141 | + * `request_waiting_` counter management for event loop liveness |
| 142 | +
|
| 143 | +4. **Completion**: At `final_suspend`, the last `InternalCallbackScope` is |
| 144 | + closed (draining task queues), the async\_hooks `destroy` event is emitted, |
| 145 | + the task is unregistered from the Environment, and the coroutine frame is |
| 146 | + returned to the thread-local free-list. If a detached coroutine has a |
| 147 | + captured C++ exception that was never observed, `std::terminate()` is |
| 148 | + called rather than silently discarding it. |
| 149 | +
|
| 150 | +## How the awaitable dispatch works |
| 151 | +
|
| 152 | +The `UvFs()` factory function returns a `UvFsAwaitable` that embeds a `uv_fs_t` |
| 153 | +directly in the coroutine frame. When the coroutine hits `co_await`: |
| 154 | +
|
| 155 | +1. `await_transform()` on the promise wraps it in a `TrackedAwaitable`. |
| 156 | +2. `TrackedAwaitable::await_suspend()`: |
| 157 | + * Closes the current `InternalCallbackScope` (drains microtasks/nextTick). |
| 158 | + * Records the `uv_req_t*` for cancellation support (via `cancelable_req()`). |
| 159 | + * Increments `request_waiting_` (event loop liveness). |
| 160 | + * Calls the inner `await_suspend()`, which dispatches the libuv call with |
| 161 | + `req_.data = this` pointing back to the awaitable. |
| 162 | +3. The coroutine is suspended. Control returns to the event loop. |
| 163 | +4. When the libuv operation completes, `OnComplete()` calls |
| 164 | + `handle_.resume()` to resume the coroutine. |
| 165 | +5. `TrackedAwaitable::await_resume()`: |
| 166 | + * Decrements `request_waiting_`. |
| 167 | + * Clears the cancellation pointer. |
| 168 | + * Opens a new `InternalCallbackScope` for the next segment. |
| 169 | + * Returns the result (e.g., `req_.result` for fs operations). |
| 170 | +
|
| 171 | +The liveness counter and cancellation tracking are conditional on the inner |
| 172 | +awaitable having a `cancelable_req()` method (checked at compile time via a |
| 173 | +`requires` expression). When co\_awaiting another `UvTask` or `UvTrackedTask` |
| 174 | +(coroutine composition), these steps are skipped. |
| 175 | +
|
| 176 | +## Environment teardown |
| 177 | +
|
| 178 | +During `Environment::CleanupHandles()`, the coroutine task list is iterated and |
| 179 | +`Cancel()` is called on each active task. This calls `uv_cancel()` on the |
| 180 | +in-flight libuv request (if any), which causes the libuv callback to fire with |
| 181 | +`UV_ECANCELED`. The coroutine resumes, sees the error, and completes normally. |
| 182 | +The `request_waiting_` counter ensures the teardown loop waits for all |
| 183 | +coroutine I/O to finish before destroying the Environment. |
| 184 | +
|
| 185 | +## Frame allocator |
| 186 | +
|
| 187 | +Coroutine frames are allocated from a thread-local free-list rather than going |
| 188 | +through `malloc`/`free` on every creation and destruction. This is implemented |
| 189 | +via `promise_type::operator new` and `operator delete` in `TrackedPromiseBase`, |
| 190 | +which route through `CoroFrameAlloc()` and `CoroFrameFree()`. |
| 191 | +
|
| 192 | +The free-list uses size-class buckets with 256-byte granularity, covering |
| 193 | +frames up to 4096 bytes (which covers typical coroutine frames). Frames larger |
| 194 | +than 4096 bytes fall through to the global `operator new`. Since all coroutines |
| 195 | +run on the event loop thread, the free-list requires no locking. |
| 196 | +
|
| 197 | +Each bucket has a high-water mark of 32 cached frames. When a frame is freed |
| 198 | +and its bucket is already at capacity, the frame is returned directly to the |
| 199 | +system allocator instead of being cached. This bounds the retained memory |
| 200 | +per bucket to at most 32 \* bucket\_size bytes (e.g., 32 \* 1024 = 32KB for the |
| 201 | +1024-byte size class), preventing unbounded growth after a burst of concurrent |
| 202 | +coroutines. |
| 203 | +
|
| 204 | +After the first coroutine of a given size class completes, subsequent |
| 205 | +coroutines of the same size class are allocated from the free-list with zero |
| 206 | +`malloc` overhead. |
| 207 | +
|
| 208 | +## Allocation comparison with ReqWrap |
| 209 | +
|
| 210 | +For a single async operation (e.g., `fsPromises.access`): |
| 211 | +
|
| 212 | +| | ReqWrap pattern | Coroutine (no hooks) | Coroutine (hooks active) | |
| 213 | +| -------------------- | --------------- | -------------------- | ------------------------ | |
| 214 | +| C++ heap allocations | 3 | 0 (free-list hit) | 0 (free-list hit) | |
| 215 | +| V8 heap objects | 7 | 2 (resolver+promise) | 3 (+ resource object) | |
| 216 | +| Total allocations | 10 | 2 | 3 | |
| 217 | +
|
| 218 | +For a multi-step operation (open + stat + read + close): |
| 219 | +
|
| 220 | +| | 4x ReqWrap | Single coroutine (no hooks) | Single coroutine (hooks active) | |
| 221 | +| ----------------------------- | ---------- | --------------------------- | ------------------------------- | |
| 222 | +| C++ heap allocations | 12 | 0 (free-list hit) | 0 (free-list hit) | |
| 223 | +| V8 heap objects | 28 | 2 | 3 | |
| 224 | +| Total allocations | 40 | 2 | 3 | |
| 225 | +| InternalCallbackScope entries | 4 | 5 (one per segment) | 5 | |
| 226 | +
|
| 227 | +The coroutine frame embeds the `uv_fs_t` (\~440 bytes) directly. The compiler |
| 228 | +may overlay non-simultaneously-live awaitables in the frame, so a multi-step |
| 229 | +coroutine does not necessarily pay N times the `uv_fs_t` cost. |
| 230 | +
|
| 231 | +## Known limitations |
| 232 | +
|
| 233 | +* **Heap snapshot visibility**: The coroutine frame is not visible to V8 heap |
| 234 | + snapshots or `MemoryRetainer`. The thread-local free-list allocator reduces |
| 235 | + malloc pressure but does not provide V8 with per-frame memory accounting. |
| 236 | + The exact frame contents are not inspectable from heap snapshot tooling. |
| 237 | +
|
| 238 | +* **Snapshot serialization**: `UvTrackedTask` holds `v8::Global` handles that |
| 239 | + cannot be serialized into a startup snapshot. There is currently no safety |
| 240 | + check to prevent snapshotting while coroutines are active. In practice this |
| 241 | + is not a problem because snapshots are taken at startup before I/O begins. |
| 242 | +
|
| 243 | +* **Trace event names**: The existing `AsyncWrap` trace events use a |
| 244 | + `ProviderType` enum with a switch statement to select the trace event name. |
| 245 | + The coroutine pattern uses free-form string names. The `init` trace event |
| 246 | + uses the provided name; the `destroy` trace event currently uses a generic |
| 247 | + `"coroutine"` category name rather than the per-instance name. |
| 248 | +
|
| 249 | +* **Free-list retention**: The thread-local free-list retains up to 32 frames |
| 250 | + per size class bucket after a burst of concurrent coroutines. These frames |
| 251 | + are held until reused or the thread exits. The bound is configurable via |
| 252 | + `kMaxCachedPerBucket`. |
| 253 | +
|
| 254 | +* **Cached type name strings**: The type name `v8::Eternal<v8::String>` is |
| 255 | + cached in `IsolateData::static_str_map`, keyed by the `const char*` from |
| 256 | + the `ConstString` template parameter. This is per-Isolate and safe with |
| 257 | + Worker threads (each Worker has its own `IsolateData`). The Eternal handles |
| 258 | + are never freed, but there is at most one per unique type name string per |
| 259 | + Isolate. |
0 commit comments