Skip to content

[High Priority] Implement GraphQL WebSocket Handler CDC callback safety#4310

Merged
makr-code merged 5 commits intodevelopfrom
copilot/v1-8-0-graphql-websocket-handler
Mar 17, 2026
Merged

[High Priority] Implement GraphQL WebSocket Handler CDC callback safety#4310
makr-code merged 5 commits intodevelopfrom
copilot/v1-8-0-graphql-websocket-handler

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 17, 2026

Description

Implements the roadmap item GraphQL WebSocket Handler — CDC Callback Lifetime Safety (v1.8.0, roadmap:52:api:v1.8.0:graphql-websocket-handler-cdc-callback-lifetime-safety).

CDC Callback Use-After-Free Fix (alive_ lifetime flag)

handleSubscribe() was capturing a raw GraphQLWsHandler* (self) in the CDC callback lambda. If the CDC system fired a callback after SubscriptionHandle destruction, this was a use-after-free. Fix:

  • Added std::shared_ptr<std::atomic<bool>> alive_ member to GraphQLWsHandler header.
  • Constructor initialises alive_ to true.
  • reset() stores false with memory_order_release before subscriptions_.clear(), ensuring any in-flight CDC callback on another thread observes the flag before handler memory is freed.
  • CDC lambda in handleSubscribe() captures alive by value (shared ownership) and loads it with memory_order_acquire before dereferencing self; returns early if false.
  • Move constructor and move assignment operator explicitly deleted (= delete) — the default move would leave the source's alive_ null, causing a null-pointer dereference in the source's destructor (reset() dereferences alive_).
  • Added aliveForTesting() public accessor to allow white-box tests to verify the acquire/release memory-ordering contract directly.

Step-5 Schema-Level Variable Type Validation

The handleSubscribe() comment sequence labelled steps 1 and 3 with no step 2 — a planned intermediate validation was missing. Added:

  • Private static helper validateVariables(const graphql::Operation&, const nlohmann::json&) that checks: required non-null variables are present, non-null variables aren't JSON null, list-typed variables are JSON arrays, and built-in scalar types (String/ID/Int/Float/Boolean) match their declared GraphQL type.
  • Explicit "variables": null is now treated as an empty object (same as omitting the field), consistent with standard GraphQL client behaviour.
  • Validation inserted as step 5 in the now-sequential 1–6 step sequence of handleSubscribe(), after parse/operation-type validation and before subscription registration.

Code Audit Fixes

  • Move ctor/assign deleted: GraphQLWsHandler(&&) = default introduced a latent null-deref UB; replaced with = delete on both move constructor and move assignment operator.
  • Sequential step numbering: handleSubscribe() steps renumbered 1–6 in code order (was 1, 3, 4, 5, 2, 6).
  • Dead field removed: SubscriptionEntry::active was always set to true but never read; removed.
  • Float test coverage: Added 3 tests for the Float branch of validateVariables() — valid literal float, integer coercion (GraphQL spec allows Int→Float), and wrong type (string rejected).

Supporting Changes

  • 18 unit tests in tests/test_graphql_ws_handler.cpp covering: alive flag mechanism (3 direct tests via aliveForTesting()), reset safety, 10 variable validation cases (String/ID/Int/Boolean/null/list), 3 Float type cases, and variables: null accepted as empty.
  • CI workflow .github/workflows/graphql-ws-handler-cdc-callback-safety-ci.yml — corrected reusable workflow path to ./.github/workflows/01-core/ci-scope-classifier.yml; targets ubuntu-22.04 (gcc-12, clang-15) and ubuntu-24.04 (gcc-14).
  • src/api/FUTURE_ENHANCEMENTS.md — both implementation notes marked [x].

Type of Change

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Other:

Testing

  • Unit tests added/updated — 18 tests: alive flag mechanism (3 direct acquire/release ordering tests), reset safety, 10 variable validation cases (String/ID/Int/Boolean/null/list), 3 Float type cases, variables: null accepted as empty
  • Integration tests added/updated
  • Manual testing performed

📚 Research & Knowledge (wenn applicable)

  • Diese PR basiert auf wissenschaftlichen Paper(s) oder Best Practices?
    • Falls JA: Research-Dateien in /docs/research/ angelegt?
    • Falls JA: Im Modul-README unter "Wissenschaftliche Grundlagen" verlinkt?
    • Falls JA: In /docs/research/implementation_influence/ eingetragen?

Relevante Quellen:

  • Paper:
  • Best Practice:
  • Architecture Decision:

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed) — FUTURE_ENHANCEMENTS.md updated
  • No new warnings introduced
Original prompt

This section details on the original issue you should resolve

<issue_title>GraphQL WebSocket Handler — CDC Callback Lifetime Safety</issue_title>
<issue_description>### Context

This issue implements the roadmap item 'GraphQL WebSocket Handler — CDC Callback Lifetime Safety' for the api domain. It is sourced from the consolidated roadmap under 🟠 High Priority — Near-term (v1.5.0 – v1.8.0) and targets milestone v1.8.0.

Primary detail section: GraphQL WebSocket Handler — CDC Callback Lifetime Safety

Goal

Deliver the scoped changes for GraphQL WebSocket Handler — CDC Callback Lifetime Safety in src/api/ and complete the linked detail section in a release-ready state for v1.8.0.

Detailed Scope

GraphQL WebSocket Handler — CDC Callback Lifetime Safety

Priority: High
Target Version: v1.8.0

graphql_ws_handler.cpp::handleSubscribe() captures a raw GraphQLWsHandler* (self) pointer inside the CDC callback lambda that is passed to Changefeed::subscribe(). The SubscriptionHandle RAII type should cancel the subscription on destruction, but the safety of this interaction depends on CDC correctly serialising the callback teardown before the handle destructor returns.

Implementation Notes:

  • [ ] Raw self pointer captured in CDC callback (graphql_ws_handler.cpp::handleSubscribe()): the lambda [self, sub_id](const themis::Changefeed::ChangeEvent& ev) { ... std::lock_guard<std::mutex> lk(self->mutex_); self->pending_frames_.push_back(frame); } is invoked by the CDC system on its own thread. If the CDC implementation allows callbacks to fire after SubscriptionHandle destruction (even briefly), this is a use-after-free. Add a std::shared_ptr<std::atomic<bool>> "alive" flag shared between the handler and the lambda; the lambda checks it before dereferencing self, and the flag is set to false in GraphQLWsHandler::reset() before subscriptions are cleared.
  • [ ] Missing step-2 in handleSubscribe() comment sequence (graphql_ws_handler.cpp): the comment block labels "step 1" (reject duplicate IDs + enforce max_subscriptions) and "step 3" (parse payload), with no "step 2". This indicates a planned intermediate validation step (likely query variable type-checking against the schema) was omitted. Add schema-level argument type validation: verify that variables provided in the payload match the declared VariableDefinition types in the parsed operation before registering the subscription.

Performance Targets:

  • Zero use-after-free races under 10,000 concurrent subscription setup/teardown cycles.

Acceptance Criteria

  • Raw self pointer captured in CDC callback (graphql_ws_handler.cpp::handleSubscribe()): the lambda [self, sub_id](const themis::Changefeed::ChangeEvent& ev) { ... std::lock_guard<std::mutex> lk(self->mutex_); self->pending_frames_.push_back(frame); } is invoked by the CDC system on its own thread. If the CDC implementation allows callbacks to fire after SubscriptionHandle destruction (even briefly), this is a use-after-free. Add a std::shared_ptr<std::atomic<bool>> "alive" flag shared between the handler and the lambda; the lambda checks it before dereferencing self, and the flag is set to false in GraphQLWsHandler::reset() before subscriptions are cleared.
  • Missing step-2 in handleSubscribe() comment sequence (graphql_ws_handler.cpp): the comment block labels "step 1" (reject duplicate IDs + enforce max_subscriptions) and "step 3" (parse payload), with no "step 2". This indicates a planned intermediate validation step (likely query variable type-checking against the schema) was omitted. Add schema-level argument type validation: verify that variables provided in the payload match the declared VariableDefinition types in the parsed operation before registering the subscription.
  • Zero use-after-free races under 10,000 concurrent subscription setup/teardown cycles.

Relationships

References

  • src/ROADMAP.md
  • src/api/FUTURE_ENHANCEMENTS.md#graphql-websocket-handler--cdc-callback-lifetime-safety
  • Source key: roadmap:52:api:v1.8.0:graphql-websocket-handler-cdc-callback-lifetime-safety

Generated from the consolidated source roadmap. Keep the roadmap and issue in sync when scope changes.

</issue_description>

Comments on the Issue (you are @copilot in this section)


📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
@makr-code
Copy link
Copy Markdown
Owner

@copilot retry

@makr-code
Copy link
Copy Markdown
Owner

@makr-code retry

@makr-code
Copy link
Copy Markdown
Owner

@copilot retry

…or CDC callback safety

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot retry

Done. All implementation items are now complete (commit 5799ea1):

  • alive_ lifetime flagshared_ptr<atomic<bool>> added to header; constructor initialises to true; reset() stores false (release) before subscriptions_.clear(); CDC lambda captures alive by value and loads (acquire) before dereferencing self.
  • Step-2 variable validationvalidateVariables() static helper checks required/non-null presence, null legality, list shape, and built-in scalar type matching (String/ID/Int/Float/Boolean); called in handleSubscribe() after parse/operation-type check and before registration.
  • 11 new tests — alive flag reset safety + 10 variable validation cases.
  • CI workflow.github/workflows/graphql-ws-handler-cdc-callback-safety-ci.yml (ubuntu-22.04 gcc-12/clang-15, ubuntu-24.04 gcc-14).
  • FUTURE_ENHANCEMENTS.md — both implementation notes marked [x].

Copilot AI changed the title [WIP] [High Priority] Implement GraphQL WebSocket Handler CDC callback safety [High Priority] Implement GraphQL WebSocket Handler CDC callback safety Mar 17, 2026
Copilot AI requested a review from makr-code March 17, 2026 10:31
@makr-code
Copy link
Copy Markdown
Owner

@copilot Review und Code Audit

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the v1.8.0 roadmap item for GraphQL WebSocket Handler CDC callback lifetime safety, and adds a missing “step 2” validation pass for GraphQL subscription variables.

Changes:

  • Adds an alive_ lifetime flag to guard CDC callback lambdas against dereferencing a destroyed GraphQLWsHandler.
  • Adds validateVariables() and wires it into handleSubscribe() to enforce basic schema-level variable presence/null/scalar/list checks.
  • Extends unit tests, adds a focused CI workflow, and marks the corresponding FUTURE_ENHANCEMENTS items as completed.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/api/graphql_ws_handler.cpp Adds alive_ guarding + variable validation logic inside handleSubscribe().
include/api/graphql_ws_handler.h Declares alive_ and the new validateVariables() helper.
tests/test_graphql_ws_handler.cpp Adds tests intended to cover reset safety and variable validation branches.
src/api/FUTURE_ENHANCEMENTS.md Marks the roadmap implementation notes as completed.
.github/workflows/graphql-ws-handler-cdc-callback-safety-ci.yml Adds a dedicated CI workflow to build and run the focused test suite.
Comments suppressed due to low confidence (2)

src/api/graphql_ws_handler.cpp:68

  • reset() clears subscriptions and marks the connection disconnected, but it does not clear pending_frames_. If any CDC callback queued frames before/during reset, those stale frames can be flushed on a later handleFrame() call (including a new connection_init). Clearing pending_frames_ under the same mutex in reset() would ensure reset truly returns to an empty state.
    alive_->store(false, std::memory_order_release);
    std::lock_guard<std::mutex> lock(mutex_);
    subscriptions_.clear();
    connected_.store(false, std::memory_order_relaxed);
}

src/api/graphql_ws_handler.cpp:68

  • alive_ is set to false in reset() and never set back to true. Since handleFrame() still allows a new connection_init after reset(), any subsequent CDC subscriptions created on the same handler instance will capture an alive_ that is permanently false and will never deliver events. Either make the handler explicitly unusable after reset() (reject frames), or reinitialize the lifetime flag for a new session after teardown.
void GraphQLWsHandler::reset() {
    // Signal any in-flight CDC callbacks to stop before the subscription
    // handles (and their associated RAII teardown) are destroyed.  This
    // prevents a use-after-free should the CDC implementation fire the
    // callback concurrently with the SubscriptionHandle destructor.
    alive_->store(false, std::memory_order_release);
    std::lock_guard<std::mutex> lock(mutex_);
    subscriptions_.clear();
    connected_.store(false, std::memory_order_relaxed);
}

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +318 to +334
// reset() must not crash even when subscriptions are active (alive-flag path).
TEST_F(GraphQLWsHandlerTest, ResetWithActiveSubscriptionDoesNotCrash) {
doHandshake();
const std::string q =
"subscription { onChange(collection: \"orders\") { key } }";
send({
{"type", "subscribe"},
{"id", "sub1"},
{"payload", {{"query", q}}}
});
EXPECT_EQ(handler->activeSubscriptionCount(), 1u);
// reset() sets the alive flag to false before clearing subscriptions.
// No crash or assertion should occur.
EXPECT_NO_THROW(handler->reset());
EXPECT_EQ(handler->activeSubscriptionCount(), 0u);
EXPECT_FALSE(handler->isConnected());
}
ci-scope-classifier:
permissions:
contents: read
uses: ./.github/workflows/ci-scope-classifier.yml
Comment thread src/api/graphql_ws_handler.cpp Outdated
Comment on lines +222 to +227
// ── 2. Schema-level variable type validation ──────────────────────────────
// (Placed here because it requires the parsed operation.)
// Verify that every variable value supplied in the payload matches the
// declared VariableDefinition type. This is the step that was originally
// planned but omitted from the sequence, leaving the gap between step 1
// and the former step 3.
Comment thread src/api/graphql_ws_handler.cpp Outdated
// planned but omitted from the sequence, leaving the gap between step 1
// and the former step 3.
{
const json variables = payload.value("variables", json::object());
…oat tests

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot Review und Code Audit

Audit complete. 4 findings addressed in commit 6fe3cb2:

  1. BUG (null-deref UB)GraphQLWsHandler(&&) = default was generating a broken move: the source's alive_ is left null after the move, yet its destructor calls reset() which dereferences alive_ → null-pointer UB. Fixed by marking both move ctor and move assignment = delete. The handler is always managed via unique_ptr; moves are unnecessary.

  2. Step numberinghandleSubscribe() steps were ordered 1, 3, 4, 5, 2, 6 in the code (step 2 was appended after 3–5 because it needs the parsed operation). Renumbered 1–6 sequentially with a note that step 5 (variable validation) depends on the parsed operation from step 3.

  3. Dead fieldSubscriptionEntry::active was always set to true but never read. Removed.

  4. Test gapvalidateVariables() has a Float branch that had zero test coverage. Added 3 tests: FloatType_Valid (literal float), FloatType_IntCoercion (integer accepted per GraphQL coercion rules), and FloatType_WrongType (string rejected).

@makr-code
Copy link
Copy Markdown
Owner

@copilot fix it

…ll, naming cleanup

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot fix it

Fixed all 3 PR review findings in commit e442ddf:

  1. CI path./.github/workflows/ci-scope-classifier.yml./.github/workflows/01-core/ci-scope-classifier.yml (the file only exists at the 01-core/ subdirectory).

  2. Alive-flag tests — Added aliveForTesting() public accessor to GraphQLWsHandler and 3 direct mechanism tests that simulate what a CDC callback lambda does (capture a shared copy of alive_, check it with acquire ordering after reset()): AliveFlag_InitiallyTrue, AliveFlag_FalseAfterReset, AliveFlag_CallbackAfterResetObservesFalse (background thread observes false with correct acquire/release memory ordering).

  3. variables: null accepted — Explicit "variables": null is now treated as an empty object (identical to omitting the field), consistent with how most GraphQL clients behave. Added test VariableValidation_NullVariablesField_AcceptedAsEmpty.

@makr-code makr-code merged commit f5b5c1a into develop Mar 17, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GraphQL WebSocket Handler — CDC Callback Lifetime Safety

3 participants