Skip to content

docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan#6643

Open
pratikmankawde wants to merge 2 commits intodevelopfrom
pratik/Swtich-to-std-coroutines
Open

docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan#6643
pratikmankawde wants to merge 2 commits intodevelopfrom
pratik/Swtich-to-std-coroutines

Conversation

@pratikmankawde
Copy link
Copy Markdown
Contributor

@pratikmankawde pratikmankawde commented Mar 25, 2026

Github PR doesn't format this doc very well. So, feel free to read it on Github branch page.

High Level Overview of Change

Adds a comprehensive migration plan document (BoostToStdCoroutineSwitchPlan.md) and task list (BoostToStdCoroutineTaskList.md) for switching rippled from Boost.Coroutine2 (stackful) to C++20 standard coroutines (stackless).

This is PR 0 in the StdCoroutineSwitch chain — it contains only the plan and task-list documents, no code changes.

PR Chain

Implementation is in draft mode. Still evolving and being reviewed.

PR Branch Description
#6643 (this) pratik/Swtich-to-std-coroutinesdevelop Migration plan + task list
#6421 pratik/std-coro/add-coroutine-primitivesdevelop CoroTask, CoroTaskRunner, JobQueueAwaiter primitives
#6423 pratik/std-coro/migrate-entry-pointsadd-coroutine-primitives Migrate HTTP/WS/gRPC entry points to postCoroTask(); remove RPC::Context::coro
#6428 pratik/std-coro/migrate-test-codemigrate-entry-points Migrate coroutine tests to C++20 API
#6429 pratik/std-coro/cleanup-boost-coroutinemigrate-test-code Remove Boost::coroutine dependency and legacy Coro API; keep Boost::context for boost::asio::spawn
#6525 pratik/std-coro/tsan-fixescleanup-boost-coroutine TSAN data-race fixes for CoroTaskRunner

Context of Change

The plan covers:

  • Research & viability analysis — why C++20 stackless coroutines work for rippled's shallow yield pattern
  • Current state audit — all coroutine touchpoints, JobQueue::Coro internals, entry points, handlers
  • Migration strategy — incremental 4-phase approach with coexistence period
  • Implementation designCoroTask<T>, CoroTaskRunner (core lifecycle primitive), yieldAndPost() (compiler-robust inline awaiter), JobQueueAwaiter (deprecated single-shot awaiter), API mapping
  • CMake / conan outcomeBoost::coroutine removed, Boost::context retained for boost::asio::spawn, sanitizer defines (BOOST_USE_ASAN / BOOST_USE_TSAN / BOOST_USE_UCONTEXT) added
  • RipplePathFind design — 30-second std::condition_variable bounded wait (trade-off documented; PathFindAwaiter listed as a follow-up)
  • Testing & validation — unit tests, sanitizer testing (ASAN/TSAN), benchmarks, regression methodology
  • Risks & mitigation — risk matrix, rollback strategy, stackful→stackless limitation analysis
  • Standards & guidelines — coroutine design rules, thread safety, naming conventions, code review checklist
  • Follow-upsPathFindAwaiter, grpc::ServerContext::IsCancelled wiring, migrating boost::asio::spawn callers

API Impact

  • Public API: New feature (new methods and/or new fields)
  • Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • Peer protocol change (must be backward compatible or bump the peer protocol version)

@pratikmankawde pratikmankawde added the StdCoroutineSwitch Boost to Std Coroutine Switch label Mar 25, 2026
@pratikmankawde pratikmankawde changed the title Pratik/swtich to std coroutines docs: Pratik/swtich to std coroutines Mar 25, 2026
@pratikmankawde pratikmankawde changed the title docs: Pratik/swtich to std coroutines docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan Mar 25, 2026
@pratikmankawde pratikmankawde marked this pull request as ready for review March 25, 2026 15:02
Copy link
Copy Markdown
Contributor

@xrplf-ai-reviewer xrplf-ai-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three issues flagged inline: a year typo in the document header, and two design gaps in the gRPC migration path — CallData lifetime analysis is missing (potential use-after-free), and ServerContext cancellation propagation is unaddressed for suspended coroutines.

Review by Claude Opus 4.6 · Prompt: V12

Comment thread BoostToStdCoroutineSwitchPlan.md

- A client (e.g., a wallet app) sends an RPC request to the rippled server.
- The server wraps the request in a coroutine and schedules it on a worker thread from the JobQueue.
- The handler processes the request. Most handlers finish immediately and return a response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan gap: CallData ownership chain under C++20 not analyzed — potential use-after-free risk.

In the Boost model, shared_ptr<Coro> inside the lambda ensures CallData outlives the coroutine. With C++20, if the gRPC completion queue fires and destroys CallData while the coroutine frame still holds a reference (via RPC::Context), this is a use-after-free — the exact dangling reference risk from Concern 5, but unaddressed for the gRPC code path.

Suggested addition in Milestone 2, task 2.3: Explicitly audit CallData object lifetime relative to the CoroTaskRunner frame. Ensure CallData is kept alive (e.g., via shared_from_this() or explicit capture) for the full coroutine duration. Add a TSAN/ASAN test specifically for gRPC request lifetime.

See: gRPC

e.g. doRipplePathFind`"]
YIELD["`**coro.yield()**
Suspends execution`"]
RESUME["`**coro.post()**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan gap: gRPC ServerContext cancellation propagation not addressed in the migration.

When CallData::process() is migrated to postCoroTask() in Phase 2 (task 2.3), there is no discussion of what happens if the gRPC client disconnects or times out while the coroutine is suspended (e.g., during pathfinding). The coroutine will resume on the JobQueue with no awareness of cancellation — wasting resources and potentially writing to a dead stream.

Suggested addition in Phase 2, task 2.3: Document whether grpc::ServerContext* is threaded through RPC::Context. If so, add a cancellation check in JobQueueAwaiter::await_suspend() or at the co_await resume point: if grpc_context->IsCancelled(), return codes.Canceled rather than continuing into the handler body.

See: gRPC

@pratikmankawde pratikmankawde force-pushed the pratik/Swtich-to-std-coroutines branch from 255ecc1 to 956c105 Compare March 25, 2026 15:09
Copy link
Copy Markdown
Contributor

@xrplf-ai-reviewer xrplf-ai-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two off-by-one bugs in BUILD.md bash loops and a likely year typo in the plan doc — see inline comments.

Review by Claude Opus 4.6 · Prompt: V12

Comment thread BUILD.md Outdated
Comment thread BUILD.md Outdated
Comment thread BoostToStdCoroutineSwitchPlan.md
@pratikmankawde pratikmankawde force-pushed the pratik/Swtich-to-std-coroutines branch from 956c105 to b78202a Compare March 25, 2026 15:48
Copy link
Copy Markdown
Contributor

@xrplf-ai-reviewer xrplf-ai-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five issues flagged inline: a year typo in the document header, and four architectural gaps in the gRPC migration coverage — missing streaming RPC audit, missing CompletionQueue lifecycle analysis, missing shutdown handling in task 2.3, and a high-severity regression where the FAQ documents a 30-second thread-blocking synchronous wait that contradicts the migration's core goals.

Review by Claude Opus 4.6 · Prompt: V12

Comment thread BoostToStdCoroutineSwitchPlan.md
| `coroutine<void>::push_type` | `JobQueue.h:53` | Yield function type |
| `boost::context::protected_fixedsize_stack(1536 * 1024)` | `Coro.ipp:14` | Stack size configuration |
| `#include <boost/coroutine2/all.hpp>` | `JobQueue.h:11` | Header inclusion |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section 5.4 lists only the unary gRPC entry point — streaming RPC handlers are not audited. Add a paragraph to Section 5.4 confirming: (a) which rippled proto methods are unary vs streaming, (b) whether any streaming handler calls postCoro() or yield(), and (c) whether streaming handlers use a separate code path unaffected by this migration. Without this, a streaming RPC could silently retain the old Boost path after Phase 4 cleanup removes Coro.

See: context | gRPC

(parallel to postCoro)`"]
P1D["Unit tests for new primitives"]
P1A --> P1B --> P1C --> P1D
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No design notes on CallData lifecycle with gRPC's CompletionQueue. The plan identifies GRPCServer.cpp:102 as an entry point but doesn't verify that CoroTaskRunner lifetime outlives all CompletionQueue callbacks that reference it, or that coroutine frame ownership is safe across tag firings. Add an analysis tracing: CQ tag posted → process() called → coroutine suspended → CQ tag fires again → coroutine resumed, and confirm no raw coroutine_handle<> is stored in CQ tags without RAII ownership.

See: gRPC

- Replace `m_jobQueue.postCoro(jtCLIENT_RPC, ...)` with `postCoroTask()`
- Update lambda to return `CoroTask<void>` (add `co_return`)
- Update `processSession` to accept new coroutine type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task 2.3 is missing gRPC shutdown handling. The old Coro::post() returned false when the JobQueue was stopping, letting the CallData handler detect shutdown and call Finish() with an appropriate status. Add a sub-task: Verify that when addJob() returns false during shutdown, the awaiter causes the coroutine to terminate and the gRPC call is finished with grpc::StatusCode::UNAVAILABLE. Write a test that shuts down the JobQueue while a gRPC coroutine is suspended and confirms no RPC hangs indefinitely.

See: gRPC


| # | File | Phase | Purpose |
| --- | ------------------------------------- | ----- | ---------------------------------------- |
| 1 | `include/xrpl/core/CoroTask.h` | 1 | `CoroTask<T>` return type + promise_type |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FAQ admits blocking a worker thread for up to 30 seconds via std::condition_variable, directly contradicting the migration's goal of freeing threads during suspension and voiding the performance gains claimed in Section 4.4 for this code path. Either implement PathFindAwaiter (task 3.2) to properly suspend the coroutine, or at minimum document this as a known regression and ensure the pathfinding timeout is capped below the gRPC deadline so the thread is guaranteed to be released before the client times out.

See: std::condition_variable | gRPC

@a1q123456
Copy link
Copy Markdown
Contributor

a1q123456 commented Mar 27, 2026

Given that our coroutine use case isn't different from others at all, I think we can use the existing coroutine implementation in boost.asio instead of reinventing the wheel and implementing our own promise and future types.

This approach gives us some benefits:

  1. Properly tested coroutine implementation - boost.asio is used everywhere and by out sourcing this part to boost, we don't need to scratch our heads looking at coroutine-related and concurrent-related bugs
  2. We get less code and it shortens development time
  3. Crystal-clear design - co_spawn schedules a coroutine on an executor; boost::asio::detached means fire and forget, boost::asio::use_future returns a std::future so that you can wait for it synchronously

To make it work, we'll need to implement an executor that meets asio's requirement, refactor JobQueue and Coro to use the executor. As the second phase, we refactor to use boost.asio coroutine, and then we can replace Workers with boost::thread_pool.

I propose this plan:

Phase 1:

  1. Implement JobQueueExecutor — Custom Asio executor with execute() that calls addJob(), carrying a JobType for priority. Wraps the function to save/restore LocalValues pointer around invocation.
  2. Refactor JobQueue and Coro to use JobQueueExecutor
  3. Unit tests for JobQueueExecutor. The public API of JobQueue shouldn't change at this moment, we only need to ensure the current JobQueue tests pass.

Phase 2: Replace Coro with C++20 coroutines

  1. Remove Coro class and replace postCoro — Delete Coro, Coro.ipp, Coro_create_t, nSuspend_. New method uses co_spawn + JobQueueExecutor
  2. Update RPC::Context — Replace std::shared_ptrJobQueue::Coro coro member with a new mechanism (e.g. the executor, or a yield awaitable)
  3. Update ServerHandler (HTTP RPC) — 3 call sites: onRequest(), processSession(Session), processRequest()
  4. Update ServerHandler (WebSocket) — 1 call site: onWSMessage() + processSession(WSSession)
  5. Update GRPCServer — 1 call site: CallData::onDone() + CallData::process()
  6. Update RipplePathFind — Most complex: yield()/post()/resume() pattern with async callback needs replacing with co_await on a promise/event
  7. Update existing tests — Coroutine_test.cpp, JobQueue_test.cpp
  8. We may want to rename Coro to something like CoroutineHandle or CoroutineFrame or whatever
  9. Unit tests for the new JobQueue and Coro

Phase 3: Replace Workers with boost::asio::thread_pool

  1. Replace Workers internals — Remove Workers class, replace with thread_pool. Remove Workers::Callback inheritance from JobQueue. Drop setNumberOfThreads.
  2. Clean up Workers tests

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation to guide the (now-implemented) migration from Boost.Coroutine2 stackful coroutines to C++20 std::coroutine stackless coroutines in rippled, plus minor spellchecker dictionary updates to support the new docs.

Changes:

  • Add a comprehensive migration plan document with architecture analysis, phased rollout, testing strategy, and guidelines.
  • Add a milestone/task checklist companion document for tracking the migration work.
  • Update cspell dictionary with coroutine/migration terminology and proper nouns used in the docs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
cspell.config.yaml Adds new allowed words referenced by the added migration docs.
BoostToStdCoroutineTaskList.md New milestone-by-milestone task checklist for the coroutine migration effort.
BoostToStdCoroutineSwitchPlan.md New detailed migration plan, background, risk analysis, testing/validation strategy, and guidelines.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +8
> **Status:** Implementation Complete
> **Author:** Pratik Mankawde
> **Created:** 2026-02-25
> **Project:** rippled (XRP Ledger node)
> **Branch:** `Switch-to-std-coroutines`
> **Dependencies:** C++20 compiler support (GCC 12+, Clang 16+, MSVC 19.28+)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header says "Status: Implementation Complete", but the document still reads like a forward-looking migration plan (phases, future-tense tasks, milestones, rollback strategy). This is internally inconsistent and can mislead readers—either update the status to reflect that this is a plan/living doc, or revise sections (timeline/tasks wording) to reflect a completed migration with outcomes.

Copilot uses AI. Check for mistakes.
> **Author:** Pratik Mankawde
> **Created:** 2026-02-25
> **Project:** rippled (XRP Ledger node)
> **Branch:** `Switch-to-std-coroutines`
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document hard-codes a branch name ("Switch-to-std-coroutines") in the header. Since branch names can change and the PR metadata indicates a different spelling, consider removing the branch field or ensuring it matches the actual branch name used for this work to avoid stale/incorrect documentation.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +90
- [ ] **3.1** Migrate `doRipplePathFind()` (`RipplePathFind.cpp`)
- Replace `context.coro->yield()` with `co_await PathFindAwaiter{...}`
- Replace continuation lambda's `coro->post()` / `coro->resume()` with awaiter scheduling
- Handle shutdown case (post failure) in awaiter

- [ ] **3.2** Create `PathFindAwaiter` (or use generic `JobQueueAwaiter`)
- Encapsulate the continuation + yield pattern from `RipplePathFind.cpp` lines 108-132

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task list still describes the original planned PathFindAwaiter/co_await-based RipplePathFind migration, but the plan document/PR description notes the implementation diverged (condition_variable blocking wait, Context::coro removed, etc.). To keep this checklist useful, update the affected tasks to match the implemented approach (or clearly label this file as the pre-implementation plan checklist).

Copilot uses AI. Check for mistakes.
Comment thread cspell.config.yaml Outdated
Comment on lines +152 to +153
- MEMORYSTATUSEX
- Mankawde
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cspell words: list appears roughly alphabetized, but the new entry Mankawde is inserted after MEMORYSTATUSEX, which breaks ordering and can increase future merge conflicts. Please place the new word in the appropriate sorted position (or follow whatever ordering rule this list is intended to use).

Suggested change
- MEMORYSTATUSEX
- Mankawde
- Mankawde
- MEMORYSTATUSEX

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

This PR has conflicts, please resolve them in order for the PR to be reviewed.

pratikmankawde and others added 2 commits April 22, 2026 13:18
Comprehensive migration plan documenting the switch from
Boost.Coroutine2 to C++20 standard coroutines in rippled, including
research analysis, implementation phases, risk assessment, and
testing strategy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
@pratikmankawde pratikmankawde force-pushed the pratik/Swtich-to-std-coroutines branch from b78202a to c85cd77 Compare April 22, 2026 13:14
@github-actions
Copy link
Copy Markdown

All conflicts have been resolved. Assigned reviewers can now start or resume their review.

Copy link
Copy Markdown
Contributor

@xrplf-ai-reviewer xrplf-ai-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues.

Review by Claude Opus 4.6 · Prompt: V15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

StdCoroutineSwitch Boost to Std Coroutine Switch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants