[High Priority] Implement GraphQL WebSocket Handler CDC callback safety#4310
[High Priority] Implement GraphQL WebSocket Handler CDC callback safety#4310
Conversation
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
|
@copilot retry |
|
@makr-code retry |
|
@copilot retry |
…or CDC callback safety Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Done. All implementation items are now complete (commit
|
|
@copilot Review und Code Audit |
There was a problem hiding this comment.
Pull request overview
Implements the v1.8.0 roadmap item for GraphQL WebSocket Handler CDC callback lifetime safety, and adds a missing “step 2” validation pass for GraphQL subscription variables.
Changes:
- Adds an
alive_lifetime flag to guard CDC callback lambdas against dereferencing a destroyedGraphQLWsHandler. - Adds
validateVariables()and wires it intohandleSubscribe()to enforce basic schema-level variable presence/null/scalar/list checks. - Extends unit tests, adds a focused CI workflow, and marks the corresponding FUTURE_ENHANCEMENTS items as completed.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/api/graphql_ws_handler.cpp |
Adds alive_ guarding + variable validation logic inside handleSubscribe(). |
include/api/graphql_ws_handler.h |
Declares alive_ and the new validateVariables() helper. |
tests/test_graphql_ws_handler.cpp |
Adds tests intended to cover reset safety and variable validation branches. |
src/api/FUTURE_ENHANCEMENTS.md |
Marks the roadmap implementation notes as completed. |
.github/workflows/graphql-ws-handler-cdc-callback-safety-ci.yml |
Adds a dedicated CI workflow to build and run the focused test suite. |
Comments suppressed due to low confidence (2)
src/api/graphql_ws_handler.cpp:68
reset()clears subscriptions and marks the connection disconnected, but it does not clearpending_frames_. If any CDC callback queued frames before/during reset, those stale frames can be flushed on a laterhandleFrame()call (including a newconnection_init). Clearingpending_frames_under the same mutex inreset()would ensure reset truly returns to an empty state.
alive_->store(false, std::memory_order_release);
std::lock_guard<std::mutex> lock(mutex_);
subscriptions_.clear();
connected_.store(false, std::memory_order_relaxed);
}
src/api/graphql_ws_handler.cpp:68
alive_is set to false inreset()and never set back to true. SincehandleFrame()still allows a newconnection_initafterreset(), any subsequent CDC subscriptions created on the same handler instance will capture analive_that is permanently false and will never deliver events. Either make the handler explicitly unusable afterreset()(reject frames), or reinitialize the lifetime flag for a new session after teardown.
void GraphQLWsHandler::reset() {
// Signal any in-flight CDC callbacks to stop before the subscription
// handles (and their associated RAII teardown) are destroyed. This
// prevents a use-after-free should the CDC implementation fire the
// callback concurrently with the SubscriptionHandle destructor.
alive_->store(false, std::memory_order_release);
std::lock_guard<std::mutex> lock(mutex_);
subscriptions_.clear();
connected_.store(false, std::memory_order_relaxed);
}
You can also share your feedback on Copilot code review. Take the survey.
| // reset() must not crash even when subscriptions are active (alive-flag path). | ||
| TEST_F(GraphQLWsHandlerTest, ResetWithActiveSubscriptionDoesNotCrash) { | ||
| doHandshake(); | ||
| const std::string q = | ||
| "subscription { onChange(collection: \"orders\") { key } }"; | ||
| send({ | ||
| {"type", "subscribe"}, | ||
| {"id", "sub1"}, | ||
| {"payload", {{"query", q}}} | ||
| }); | ||
| EXPECT_EQ(handler->activeSubscriptionCount(), 1u); | ||
| // reset() sets the alive flag to false before clearing subscriptions. | ||
| // No crash or assertion should occur. | ||
| EXPECT_NO_THROW(handler->reset()); | ||
| EXPECT_EQ(handler->activeSubscriptionCount(), 0u); | ||
| EXPECT_FALSE(handler->isConnected()); | ||
| } |
| ci-scope-classifier: | ||
| permissions: | ||
| contents: read | ||
| uses: ./.github/workflows/ci-scope-classifier.yml |
| // ── 2. Schema-level variable type validation ────────────────────────────── | ||
| // (Placed here because it requires the parsed operation.) | ||
| // Verify that every variable value supplied in the payload matches the | ||
| // declared VariableDefinition type. This is the step that was originally | ||
| // planned but omitted from the sequence, leaving the gap between step 1 | ||
| // and the former step 3. |
| // planned but omitted from the sequence, leaving the gap between step 1 | ||
| // and the former step 3. | ||
| { | ||
| const json variables = payload.value("variables", json::object()); |
…oat tests Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Audit complete. 4 findings addressed in commit
|
|
@copilot fix it |
…ll, naming cleanup Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Fixed all 3 PR review findings in commit
|
Description
Implements the roadmap item GraphQL WebSocket Handler — CDC Callback Lifetime Safety (v1.8.0,
roadmap:52:api:v1.8.0:graphql-websocket-handler-cdc-callback-lifetime-safety).CDC Callback Use-After-Free Fix (
alive_lifetime flag)handleSubscribe()was capturing a rawGraphQLWsHandler*(self) in the CDC callback lambda. If the CDC system fired a callback afterSubscriptionHandledestruction, this was a use-after-free. Fix:std::shared_ptr<std::atomic<bool>> alive_member toGraphQLWsHandlerheader.alive_totrue.reset()storesfalsewithmemory_order_releasebeforesubscriptions_.clear(), ensuring any in-flight CDC callback on another thread observes the flag before handler memory is freed.handleSubscribe()capturesaliveby value (shared ownership) and loads it withmemory_order_acquirebefore dereferencingself; returns early iffalse.= delete) — the default move would leave the source'salive_null, causing a null-pointer dereference in the source's destructor (reset()dereferencesalive_).aliveForTesting()public accessor to allow white-box tests to verify the acquire/release memory-ordering contract directly.Step-5 Schema-Level Variable Type Validation
The
handleSubscribe()comment sequence labelled steps 1 and 3 with no step 2 — a planned intermediate validation was missing. Added:validateVariables(const graphql::Operation&, const nlohmann::json&)that checks: required non-null variables are present, non-null variables aren't JSON null, list-typed variables are JSON arrays, and built-in scalar types (String/ID/Int/Float/Boolean) match their declared GraphQL type."variables": nullis now treated as an empty object (same as omitting the field), consistent with standard GraphQL client behaviour.handleSubscribe(), after parse/operation-type validation and before subscription registration.Code Audit Fixes
GraphQLWsHandler(&&) = defaultintroduced a latent null-deref UB; replaced with= deleteon both move constructor and move assignment operator.handleSubscribe()steps renumbered 1–6 in code order (was 1, 3, 4, 5, 2, 6).SubscriptionEntry::activewas always set totruebut never read; removed.Floatbranch ofvalidateVariables()— valid literal float, integer coercion (GraphQL spec allows Int→Float), and wrong type (string rejected).Supporting Changes
tests/test_graphql_ws_handler.cppcovering: alive flag mechanism (3 direct tests viaaliveForTesting()), reset safety, 10 variable validation cases (String/ID/Int/Boolean/null/list), 3 Float type cases, andvariables: nullaccepted as empty..github/workflows/graphql-ws-handler-cdc-callback-safety-ci.yml— corrected reusable workflow path to./.github/workflows/01-core/ci-scope-classifier.yml; targets ubuntu-22.04 (gcc-12, clang-15) and ubuntu-24.04 (gcc-14).src/api/FUTURE_ENHANCEMENTS.md— both implementation notes marked[x].Type of Change
Testing
variables: nullaccepted as empty📚 Research & Knowledge (wenn applicable)
/docs/research/angelegt?/docs/research/implementation_influence/eingetragen?Relevante Quellen:
Checklist
FUTURE_ENHANCEMENTS.mdupdatedOriginal prompt
This section details on the original issue you should resolve
<issue_title>GraphQL WebSocket Handler — CDC Callback Lifetime Safety</issue_title>
<issue_description>### Context
This issue implements the roadmap item 'GraphQL WebSocket Handler — CDC Callback Lifetime Safety' for the api domain. It is sourced from the consolidated roadmap under 🟠 High Priority — Near-term (v1.5.0 – v1.8.0) and targets milestone v1.8.0.
Primary detail section: GraphQL WebSocket Handler — CDC Callback Lifetime Safety
Goal
Deliver the scoped changes for GraphQL WebSocket Handler — CDC Callback Lifetime Safety in src/api/ and complete the linked detail section in a release-ready state for v1.8.0.
Detailed Scope
GraphQL WebSocket Handler — CDC Callback Lifetime Safety
Priority: High
Target Version: v1.8.0
graphql_ws_handler.cpp::handleSubscribe()captures a rawGraphQLWsHandler*(self) pointer inside the CDC callback lambda that is passed toChangefeed::subscribe(). TheSubscriptionHandleRAII type should cancel the subscription on destruction, but the safety of this interaction depends on CDC correctly serialising the callback teardown before the handle destructor returns.Implementation Notes:
[ ]Rawselfpointer captured in CDC callback (graphql_ws_handler.cpp::handleSubscribe()): the lambda[self, sub_id](const themis::Changefeed::ChangeEvent& ev) { ... std::lock_guard<std::mutex> lk(self->mutex_); self->pending_frames_.push_back(frame); }is invoked by the CDC system on its own thread. If the CDC implementation allows callbacks to fire afterSubscriptionHandledestruction (even briefly), this is a use-after-free. Add astd::shared_ptr<std::atomic<bool>>"alive" flag shared between the handler and the lambda; the lambda checks it before dereferencingself, and the flag is set to false inGraphQLWsHandler::reset()before subscriptions are cleared.[ ]Missing step-2 inhandleSubscribe()comment sequence (graphql_ws_handler.cpp): the comment block labels "step 1" (reject duplicate IDs + enforce max_subscriptions) and "step 3" (parse payload), with no "step 2". This indicates a planned intermediate validation step (likely query variable type-checking against the schema) was omitted. Add schema-level argument type validation: verify thatvariablesprovided in the payload match the declaredVariableDefinitiontypes in the parsed operation before registering the subscription.Performance Targets:
Acceptance Criteria
selfpointer captured in CDC callback (graphql_ws_handler.cpp::handleSubscribe()): the lambda[self, sub_id](const themis::Changefeed::ChangeEvent& ev) { ... std::lock_guard<std::mutex> lk(self->mutex_); self->pending_frames_.push_back(frame); }is invoked by the CDC system on its own thread. If the CDC implementation allows callbacks to fire afterSubscriptionHandledestruction (even briefly), this is a use-after-free. Add astd::shared_ptr<std::atomic<bool>>"alive" flag shared between the handler and the lambda; the lambda checks it before dereferencingself, and the flag is set to false inGraphQLWsHandler::reset()before subscriptions are cleared.handleSubscribe()comment sequence (graphql_ws_handler.cpp): the comment block labels "step 1" (reject duplicate IDs + enforce max_subscriptions) and "step 3" (parse payload), with no "step 2". This indicates a planned intermediate validation step (likely query variable type-checking against the schema) was omitted. Add schema-level argument type validation: verify thatvariablesprovided in the payload match the declaredVariableDefinitiontypes in the parsed operation before registering the subscription.Relationships
References
Generated from the consolidated source roadmap. Keep the roadmap and issue in sync when scope changes.
</issue_description>
Comments on the Issue (you are @copilot in this section)
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.