Skip to content

feat: add mcp_gateway static broker filter#353

Open
nerdalert wants to merge 2 commits into
praxis-proxy:mainfrom
nerdalert:brent-mcp-gateway-pipeline
Open

feat: add mcp_gateway static broker filter#353
nerdalert wants to merge 2 commits into
praxis-proxy:mainfrom
nerdalert:brent-mcp-gateway-pipeline

Conversation

@nerdalert
Copy link
Copy Markdown
Member

@nerdalert nerdalert commented May 15, 2026

Summary

Adds the new mcp_gateway filter for MCP gateway mode. This PR implements the static broker/catalog phase and the supporting pipeline infrastructure needed for mcp_gateway -> load_balancer pipelines without requiring a router filter. Adding a static catalog makes sense for this first gateway PR because it lets us validate mcp_gateway behavior without adding backend discovery, caching, or session initialization yet.

⚠️ (yellow alert for attn 😛) Adding a specific callout for brevity. This PR uses a static MCP tool catalog by design: Praxis is not discovering backend tools yet, so configured tools must be normalized into valid MCP tools/list responses. Along with enabling smol PRs, a static catalog lets us validate the gateway contract without depending on live MCP servers (IT should do that as well when the epic is complete).

  • static: configured catalog only
  • dynamic: discover from backend tools/list
  • hybrid: discover backend tools, then apply config overrides/filtering/prefixes

MCP requires every tools/list tool to include inputSchema. Since this PR uses a static configured catalog, omitted inputSchema values are treated as no-argument tools and emitted as {"type":"object","additionalProperties":false}. That keeps simple dev/test configs concise while still returning spec-shaped MCP tool objects; we can tighten validation later if we decide production configs should require explicit schemas.

This PR adds:

  • mcp_gateway filter registration and config parsing
  • Static configured MCP tool catalog support
  • Short-circuit handling for initialize, tools/list, ping, notifications/initialized, and DELETE
  • Controlled JSON-RPC -32601 responses for unsupported MCP methods and tools/call until backend routing lands
  • Config validation for public/backend paths, duplicate exposed tool names, invalid schemas, and malformed tool definitions
  • Pipeline validation for mcp_gateway as a cluster-selecting filter
  • Conflict detection for mcp_gateway and router in the same pipeline
  • Cluster cross-reference validation from mcp_gateway.servers[].cluster to load_balancer.clusters[].name
  • StreamBuffer writeback for body-phase ctx.cluster, ctx.rewritten_path, and durable metadata
  • Mutated request body forwarding support, including upstream Content-Length repair
  • Retry safety for StreamBuffer bodies using the larger of original and mutated body sizes
  • Reusable integration-test body mutator coverage for ReadWrite + StreamBuffer body/path/cluster handoff
  • Example config for static MCP gateway broker/catalog mode

Scope

This implements static MCP gateway broker behavior only. It does not forward tools/call to backend MCP servers yet.

Follow-Up

A follow-up PR will complete end-to-end MCP gateway routing:

  • tools/call catalog lookup by prefixed tool name
  • Backend cluster selection from the selected tool
  • Prefix stripping from params.name before forwarding
  • Backend path rewrite to each configured server path
  • Multi-backend routing proof with two echo backends
  • End-to-end body mutation proof that the backend receives the stripped tool name and repaired Content-Length
  • Full MCP flow coverage: initialize -> initialized notification -> tools/list -> tools/call -> delete
  • Later in the PR chain for the epic Epic: AI Agentic Protocol Support #24 (comment) you will see a dynamic discovery, where the gateway calls backend tools/list at startup or on notifications/tools/list_changed, merges the results into the live catalog, and refreshes when backends notify

Test Plan

  • MCP gateway unit tests for config validation and broker responses
  • MCP gateway integration tests for initialize, tools/list, ping, notifications, DELETE, wrong-path 404, unsupported methods, and PR-limited tools/call
    behavior
  • StreamBuffer adapter integration tests for mutated body forwarding, path rewrite writeback, and Content-Length repair
  • Pipeline validation tests for mcp_gateway -> load_balancer, router conflict detection, and cluster cross-reference validation

Refs #25, #197, #196

@nerdalert nerdalert requested a review from a team May 15, 2026 16:36
@praxis-bot
Copy link
Copy Markdown
Collaborator

PR too large

This PR adds 2449 lines (limit: 500).

Large PRs are difficult to review and more likely to introduce subtle bugs. Please break this contribution into smaller, focused PRs that each address a single concern.

See our coding conventions for guidance.

If this PR legitimately requires a large change, a maintainer can add the conventions/bypass label to skip this check.

@praxis-bot praxis-bot closed this May 15, 2026
@nerdalert nerdalert reopened this May 15, 2026
@nerdalert nerdalert marked this pull request as draft May 15, 2026 16:37
@nerdalert nerdalert force-pushed the brent-mcp-gateway-pipeline branch 2 times, most recently from f9c1a54 to d3abf95 Compare May 16, 2026 04:23
@nerdalert nerdalert marked this pull request as ready for review May 16, 2026 04:28
@praxis-bot praxis-bot closed this May 16, 2026
@nerdalert
Copy link
Copy Markdown
Member Author

nerdalert commented May 16, 2026

@shaneutt Not sure how to get this one coherent and much smaller but not sure how to reopen anymore. Can chunk it post review maybe. The last one for this section (and subsequent agent tasks after this are more chunkable) Ty!

@shaneutt shaneutt added the skip/pr-hygiene PR size and description bypass label May 18, 2026
@shaneutt shaneutt reopened this May 18, 2026
@shaneutt shaneutt moved this to Review in AI Gateway May 18, 2026
@praxis-bot praxis-bot marked this pull request as draft May 18, 2026 15:51
@shaneutt shaneutt marked this pull request as ready for review May 18, 2026 16:00
@praxis-proxy praxis-proxy deleted a comment from praxis-bot May 18, 2026
@shaneutt shaneutt force-pushed the brent-mcp-gateway-pipeline branch from 7a9268f to 8cfcfff Compare May 19, 2026 11:51
@shaneutt shaneutt self-assigned this May 19, 2026
@shaneutt shaneutt added this to the v0.4.0 milestone May 19, 2026
@praxis-bot
Copy link
Copy Markdown
Collaborator

Unsigned commits detected

The following commits are not signed: 8cfcfff

Please sign your commits. For instructions on how to set up GPG/SSH signing, see GitHub Documentation.

@praxis-bot praxis-bot marked this pull request as draft May 19, 2026 12:38
@praxis-bot
Copy link
Copy Markdown
Collaborator

Converted to draft: required checks failing.

@nerdalert nerdalert force-pushed the brent-mcp-gateway-pipeline branch 2 times, most recently from 7ee1fc6 to ae470da Compare May 19, 2026 16:21
@nerdalert nerdalert marked this pull request as ready for review May 19, 2026 16:21
@shaneutt shaneutt requested a review from twghu May 19, 2026 18:11
- Adds the new `mcp_gateway` filter for MCP gateway mode.

- This PR implements the static broker/catalog phase and
  the supporting pipeline infrastructure needed
  for `mcp_gateway -> load_balancer` pipelines without
   requiring a `router` filter.

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert force-pushed the brent-mcp-gateway-pipeline branch from ae470da to ce61ce5 Compare May 21, 2026 03:08
@nerdalert nerdalert requested a review from a team May 21, 2026 03:08
Comment thread filter/src/builtins/http/ai/agentic/mcp/broker/mod.rs
Comment thread tests/integration/tests/suite/mcp_broker.rs
@twghu
Copy link
Copy Markdown

twghu commented May 21, 2026

mcp_gateway is placed at filter/src/agentic/ rather than under filter/src/builtins/http/ai/agentic/ where the existing mcp and json_rpc modules live ?

It seems that the full path qualification of x::y::z would be sufficient for identifying module coordinates, is the there value in naming change or at least more qualified comment.

From the tree:

filter/src/agentic/ ← crate root level
filter/src/builtins/http/ai/agentic/ ← inside builtins

In Rust module paths:

  • crate::agentic
  • crate::builtins::http::ai::agentic

Both have nearly identical doc comments ("Agentic protocol filters: …") and both contain MCP-related code. A contributor grepping for agentic or navigating the tree hits two results with no obvious signal for which one to look in or add to.

Comment thread examples/README.md
Comment thread filter/src/agentic/mcp_gateway/config.rs Outdated
Comment thread filter/src/builtins/http/ai/agentic/mcp/broker/mod.rs
Comment thread filter/src/builtins/http/ai/agentic/mcp/broker/mod.rs
Copy link
Copy Markdown

@twghu twghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds the first phase of an MCP gateway filter: a static tool catalog broker that short-circuits initialize, tools/list, ping, notifications/initialized, and DELETE. tools/call returns a controlled -32601 stub.

The supporting infrastructure includes StreamBuffer body-mutation writeback (rewritten_path, Content-Length repair), pipeline validation for mcp_gateway → load_balancer without a router, and cluster cross-reference validation extended to cover mcp_gateway.servers[].cluster.

The scope is clearly declared and the PR is internally consistent. Issues below are organized by severity.

Some nits, some functional questions.

@nerdalert nerdalert force-pushed the brent-mcp-gateway-pipeline branch from b3ef1dd to 370d14b Compare May 21, 2026 17:58
@nerdalert nerdalert requested a review from twghu May 21, 2026 18:09
Comment thread filter/src/agentic/mcp_gateway/config.rs Outdated
Comment thread filter/src/builtins/http/ai/agentic/mcp/broker/config.rs
Comment thread filter/src/agentic/mcp_gateway/mod.rs Outdated
Comment thread filter/src/agentic/mcp_gateway/mod.rs Outdated
Comment thread filter/src/agentic/mcp_gateway/mod.rs Outdated
Comment thread filter/src/pipeline/checks.rs Outdated
Comment on lines +114 to +115
/// `router` cannot coexist with body-phase cluster selectors because it
/// runs later and unconditionally overwrites `ctx.cluster`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should work quite like this.

It may be reasonable in the future to have router run prior to mcp routing, to reduce the selectable clusters or something like that, in addition to other variations.

Trying to deepen how we validate pipelines in this way may lead to trouble later, perhaps we should consider moving the decision point into the filters themselves, and give them a way to pass messages to modify their results? Something I'm going to think on. 🤔

Copy link
Copy Markdown

@twghu twghu May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the MCP-idiomatic answer is that it should never need to set ctx.cluster in the first place.

Should a pipeline have a way for a filter to say "set cluster to X only if I handled this request."

The check doesn't verify ordering and ordering implies a code path we don't need to process.

The current design coordinates inter-filter behavior through shared mutable context (ctx.cluster), and then appears to catch broken combinations statically. But static name-based checks can't capture intent — they can ban or log configurations.

In short, we probably don't want to act on routing info inter filter for it to be possibly actionable. So does route make sense in this content?

Unless I've misunderstood this, a filter shouldn't be in the routing business and probably actions on routing based on context could lead to unwelcome complications.

Then if a filter isn't in the routing business why add the check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks guys. Context on the thoughts. Did an e2e prototype where the MCP gateway was already forwarding tools/call to real backends. In that version, the MCP code picked the backend itself, then handed that choice to the load balancer. A router in the same pipeline could overwrite that backend.

This PR no longer does that forwarding. It only answers static catalog/broker methods locally and returns controlled unsupported for tools/call, so it does not currently choose a backend or set ctx.cluster.

Will take the feedback and rethink the e2e validation, for reference: #24 (comment)

Comment thread filter/src/pipeline/checks.rs Outdated

/// `router` cannot coexist with body-phase cluster selectors because it
/// runs later and unconditionally overwrites `ctx.cluster`.
pub(super) fn check_conflicting_cluster_selectors(names: &[&str], errors: &mut Vec<String>) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken in its current state, because it doesn't account for conditions which might actually result in no conflict.

Copy link
Copy Markdown

@twghu twghu May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backed out the change, it no longer does backend forwarding or sets ctx.cluster. Ty.

.execute_http_request_body(&mut filter_ctx, &mut body, end_of_stream)
.await;

ctx.request_body_bytes = filter_ctx.request_body_bytes;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes accumulation in pre-read, wont this break things?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty! Checked the Pingora body path as well. Pingora still has a fixed retry buffer, but it also has a specific path for bodies that were already consumed before forwarding: it gives request_body_filter() one terminal callback so the proxy implementation can emit its buffered body upstream.

In Praxis StreamBuffer mode, Praxis pre-read can run the body filters while reading chunks, then run them again at EOS with one full combined body. If we copy filter_ctx.request_body_bytes back directly, we can accidentally count those bytes twice.

Changed this so pre-read tracks the original bytes from the client separately as they arrive. At the end, ctx.request_body_bytes still reflects the real original body size, while ctx.mutated_request_body_len separately tracks the final forwarded body size if a body-writing filter changed the payload. That keeps retry/body-size logic correct without double-counting.

Changes:

  • Added an explicit original_body_bytes counter in stream_buffer.rs.
  • Count original downstream chunks before creating the synthetic full-body EOS pass.
  • Seed/write ctx.request_body_bytes from that original counter during pre-read.
  • Keep ctx.mutated_request_body_len for the final forwarded body length when body-writing filters mutate the payload.
  • Did not restore the old direct ctx.request_body_bytes = filter_ctx.request_body_bytes line because that can double-count in this pre-read path.

Example Flow:

  1. Client sends request body in chunks.
  2. Praxis StreamBuffer pre-reads those chunks before forwarding upstream.
  3. While reading, Praxis counts the real client bytes into original_body_bytes.
  4. At EOS, Praxis combines the chunks into one full body.
  5. Praxis runs body filters on that full body so they can inspect or rewrite it.
  6. Pingora later calls request_body_filter() once because the body was already consumed before forwarding.
  7. Praxis uses that callback to forward the buffered body from ctx.pre_read_body.
  8. ctx.request_body_bytes stays as the original client body size.
  9. If filters changed the body, ctx.mutated_request_body_len stores the final forwarded size.

@shaneutt shaneutt requested a review from usize May 21, 2026 20:43
The json_rpc module moved from payload_processing/ to
builtins/http/ai/agentic/ on upstream main. Update import
paths in mcp_gateway and widen ai module visibility to
pub(crate).

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert force-pushed the brent-mcp-gateway-pipeline branch from 370d14b to 2de2b03 Compare May 23, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip/pr-hygiene PR size and description bypass

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

4 participants