Skip to content

[RHIDP-12470] Add API endpoints for dynamically registering and unregistering MCP Servers#1330

Open
maysunfaisal wants to merge 2 commits intolightspeed-core:mainfrom
maysunfaisal:RHIDP-12470-1
Open

[RHIDP-12470] Add API endpoints for dynamically registering and unregistering MCP Servers#1330
maysunfaisal wants to merge 2 commits intolightspeed-core:mainfrom
maysunfaisal:RHIDP-12470-1

Conversation

@maysunfaisal
Copy link
Contributor

@maysunfaisal maysunfaisal commented Mar 16, 2026

Description

Currently if you want MCP Servers to be configured in Lightspeed Stack, you will have to do it via lightspeed-stack.yaml:

mcp_servers:
  - name: test-mcp-server
    provider_id: "model-context-protocol"
    url: "http://localhost:8888/mcp"
    authorization_headers:
      Authorization: "client"

However, there is a need for Lightspeed Stack Clients to dynamically register and unregister MCP Servers. For example, Developer Lightspeed has a UI (think of Cursor MCP Settings), where users can enter their MCP Server information. This lets users add and remove MCP Servers on the fly rather than it being configured statically via the lightspeed-config.yaml file (See Feature RHDHPLAN-45).

Three new APIs

  1. POST http://localhost:8080/v1/mcp-servers to register MCP Servers
  2. GET http://localhost:8080/v1/mcp-servers to get a list of registered MCP Servers
  3. DELETE http://localhost:8080/v1/mcp-servers/ to unregister MCP Server

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude Opus 4.6
  • Generated by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

POST http://localhost:8080/v1/mcp-servers

Body

{
    "name": "mcp-integration-tools",
    "url": "http://localhost:7008/api/mcp-actions/v1",
    "provider_id": "model-context-protocol",
    "authorization_headers": {
      "Authorization": "client"
    }
}

Response

{
    "name": "mcp-integration-tools",
    "url": "http://localhost:7008/api/mcp-actions/v1",
    "provider_id": "model-context-protocol",
    "message": "MCP server 'mcp-integration-tools' registered successfully"
}

LCS Logs

[03/16/26 17:35:54] WARNING  No-op authentication dependency is being used. The service is running in insecure mode intended solely for development purposes                                               noop.py:47
[17:35:54] INFO     HTTP Request: POST http://localhost:8321/v1/toolgroups "HTTP/1.1 200 OK"                                                                                                          _client.py:1740
[03/16/26 17:35:54] INFO     Dynamically registered MCP server: mcp-integration-tools at http://localhost:7008/api/mcp-actions/v1                                                                  mcp_servers.py:110
INFO:     127.0.0.1:53143 - "POST /v1/mcp-servers HTTP/1.1" 201 Created

Verifying via Llama Stack toolgroups API

$ curl -s http://127.0.0.1:8321/v1/toolgroups | python3 -m json.tool
{
    "data": [
        {
            "identifier": "builtin::rag",
            "provider_resource_id": "builtin::rag",
            "provider_id": "rag-runtime",
            "type": "tool_group",
            "mcp_endpoint": null,
            "args": null
        },
        {
            "identifier": "test-mcp-server",
            "provider_resource_id": "test-mcp-server",
            "provider_id": "model-context-protocol",
            "type": "tool_group",
            "mcp_endpoint": {
                "uri": "http://localhost:8888/mcp"
            },
            "args": null
        },
        {
            "identifier": "mcp-integration-tools",
            "provider_resource_id": "mcp-integration-tools",
            "provider_id": "model-context-protocol",
            "type": "tool_group",
            "mcp_endpoint": {
                "uri": "http://localhost:7008/api/mcp-actions/v1"
            },
            "args": null
        }
    ]
}

GET http://localhost:8080/v1/mcp-servers

Response

{
    "servers": [
        {
            "name": "test-mcp-server",
            "url": "http://localhost:8888/mcp",
            "provider_id": "model-context-protocol",
            "source": "config"
        },
        {
            "name": "mcp-integration-tools",
            "url": "http://localhost:7008/api/mcp-actions/v1",
            "provider_id": "model-context-protocol",
            "source": "api"
        }
    ]
}

LCS Logs

[03/16/26 17:37:03] WARNING  No-op authentication dependency is being used. The service is running in insecure mode intended solely for development purposes                                               noop.py:47
INFO:     127.0.0.1:53452 - "GET /v1/mcp-servers HTTP/1.1" 200 OK

DELETE http://localhost:8080/v1/mcp-servers/mcp-integration-tools-not-present

Response

{
    "detail": {
        "response": "Mcp Server not found",
        "cause": "Mcp Server with ID mcp-integration-tools-not-present does not exist"
    }
}

LCS Logs

[03/16/26 17:38:18] WARNING  No-op authentication dependency is being used. The service is running in insecure mode intended solely for development purposes                                               noop.py:47
INFO:     127.0.0.1:53805 - "DELETE /v1/mcp-servers/mcp-integration-tools-not-present HTTP/1.1" 404 Not Found

DELETE http://localhost:8080/v1/mcp-servers/mcp-integration-tools

Response

{
    "name": "mcp-integration-tools",
    "message": "MCP server 'mcp-integration-tools' unregistered successfully"
}

LCS Logs

[03/16/26 17:38:52] WARNING  No-op authentication dependency is being used. The service is running in insecure mode intended solely for development purposes                                               noop.py:47
[17:38:52] INFO     HTTP Request: DELETE http://localhost:8321/v1/toolgroups/mcp-integration-tools "HTTP/1.1 204 No Content"                                                                          _client.py:1740
[03/16/26 17:38:52] INFO     Dynamically unregistered MCP server: mcp-integration-tools                                                                                                            mcp_servers.py:242
INFO:     127.0.0.1:54008 - "DELETE /v1/mcp-servers/mcp-integration-tools HTTP/1.1" 200 OK

Verifying via Llama Stack toolgroups API

$ curl -s http://127.0.0.1:8321/v1/toolgroups | python3 -m json.tool
{
    "data": [
        {
            "identifier": "builtin::rag",
            "provider_resource_id": "builtin::rag",
            "provider_id": "rag-runtime",
            "type": "tool_group",
            "mcp_endpoint": null,
            "args": null
        },
        {
            "identifier": "test-mcp-server",
            "provider_resource_id": "test-mcp-server",
            "provider_id": "model-context-protocol",
            "type": "tool_group",
            "mcp_endpoint": {
                "uri": "http://localhost:8888/mcp"
            },
            "args": null
        }
    ]
}

DELETE http://localhost:8080/v1/mcp-servers/test-mcp-server

Response

{
    "detail": {
        "response": "Cannot delete statically configured MCP server",
        "cause": "MCP server 'test-mcp-server' was configured in lightspeed-stack.yaml and cannot be removed via the API."
    }
}

LCS Logs

[03/16/26 17:40:23] WARNING  No-op authentication dependency is being used. The service is running in insecure mode intended solely for development purposes                                               noop.py:47
INFO:     127.0.0.1:54393 - "DELETE /v1/mcp-servers/test-mcp-server HTTP/1.1" 403 Forbidden

Summary by CodeRabbit

Release Notes

  • New Features

    • Added dynamic MCP server management via API endpoints: register new servers at runtime, list all available servers (both configured and dynamically registered), and unregister dynamic servers.
    • Integrated role-based access control for MCP server operations.
    • Enhanced error handling with conflict detection for duplicate server registrations.
  • Tests

    • Added comprehensive test coverage for MCP server lifecycle operations, including success paths, error scenarios, and input validation.

Assisted-by: Claude Opus 4.6

Generated-by: Cursor
Signed-off-by: Maysun J Faisal <maysunaneek@gmail.com>
Assisted-by: Claude Opus 4.6

Generated-by: Cursor
Signed-off-by: Maysun J Faisal <maysunaneek@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Walkthrough

This change introduces dynamic MCP server management via three new FastAPI endpoints. It adds configuration support for tracking runtime-registered servers, new authorization actions, and comprehensive request/response models. The implementation integrates with existing authentication and Llama Stack client infrastructure.

Changes

Cohort / File(s) Summary
MCP Server Endpoints
src/app/endpoints/mcp_servers.py
New FastAPI router with POST, GET, and DELETE handlers for registering, listing, and unregistering dynamic MCP servers. Includes auth integration, validation, error handling with rollback on Llama Stack failures, and synchronization between runtime configuration and Llama Stack state.
Router Integration
src/app/routers.py
Imports and registers mcp_servers router with "/v1" prefix to expose new endpoints.
Configuration Management
src/configuration.py
Extends AppConfig to track dynamically registered servers via internal set, adds methods to register/unregister servers at runtime with duplicate prevention and dynamic/static server distinction.
Authorization & Actions
src/models/config.py
Adds three new Action enum members for MCP server operations: REGISTER_MCP_SERVER, LIST_MCP_SERVERS, DELETE_MCP_SERVER.
Request Models
src/models/requests.py
Introduces MCPServerRegistrationRequest with validation for URL scheme (http/https) and authorization header keywords (client, kubernetes, oauth). Includes note of duplicate class definition in file.
Response Models
src/models/responses.py
Adds MCPServerInfo, MCPServerRegistrationResponse, MCPServerListResponse, MCPServerDeleteResponse for operation responses. Introduces ConflictResponse for 409 conflicts and CONFLICT_DESCRIPTION constant. Expands NotFoundResponse examples.
Endpoint Tests
tests/unit/app/endpoints/test_mcp_servers.py
Comprehensive test module covering registration success/failure paths, rollback behavior, duplicate handling, listing with mixed static/dynamic servers, deletion restrictions, and input validation across ~437 lines.
Router Tests
tests/unit/app/test_routers.py, tests/unit/models/responses/test_error_responses.py
Updates router test assertions (router count 21→22, prefix checks) and error response examples to include new MCP server scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Endpoint as MCP Endpoint
    participant Config as AppConfig
    participant LlamaStack as Llama Stack Client
    
    Client->>Endpoint: POST /mcp-servers (registration request)
    Endpoint->>Endpoint: Validate auth & request
    Endpoint->>Config: add_mcp_server()
    activate Config
    Config->>Config: Check duplicate & load status
    Config->>Config: Update internal state
    deactivate Config
    Endpoint->>LlamaStack: Register toolgroup
    alt Success
        LlamaStack-->>Endpoint: OK
        Endpoint-->>Client: 200 MCPServerRegistrationResponse
    else Llama Stack Error
        LlamaStack-->>Endpoint: Error
        Endpoint->>Config: remove_mcp_server() (rollback)
        activate Config
        Config->>Config: Restore state
        deactivate Config
        Endpoint-->>Client: 503 Service Unavailable
    else Duplicate Name
        Endpoint-->>Client: 409 Conflict
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding API endpoints for dynamic MCP server management.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_mcp_servers.py (1)

326-350: Assert rollback invariants after delete failure.

This test currently checks only the 503. It should also verify the dynamic server remains registered locally when toolgroups.unregister fails.

✅ Suggested test hardening
 `@pytest.mark.asyncio`
 async def test_delete_mcp_server_llama_stack_failure(
@@
-    _make_app_config(mocker, mock_configuration)
+    app_config = _make_app_config(mocker, mock_configuration)
@@
     with pytest.raises(HTTPException) as exc_info:
         await mcp_servers.delete_mcp_server_handler(
             request=mocker.Mock(), name="to-delete-fail", auth=MOCK_AUTH
         )
     assert exc_info.value.status_code == 503
+    assert app_config.is_dynamic_mcp_server("to-delete-fail")
+    assert any(s.name == "to-delete-fail" for s in app_config.mcp_servers)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/app/endpoints/test_mcp_servers.py` around lines 326 - 350, Update
the test_delete_mcp_server_llama_stack_failure to assert rollback invariants
after toolgroups.unregister raises APIConnectionError: after calling
mcp_servers.register_mcp_server_handler and triggering the failing delete via
mcp_servers.delete_mcp_server_handler (where
client.toolgroups.unregister.side_effect is set), in addition to asserting the
HTTPException status_code == 503, verify that the dynamic server entry is still
present in the local registration state (the same registry/storage that
register_mcp_server_handler updates) so the server remains registered locally
when client.toolgroups.unregister fails; reference the test function name,
mcp_servers.register_mcp_server_handler, mcp_servers.delete_mcp_server_handler,
and client.toolgroups.unregister to locate the code to assert the local
registration still exists.
src/models/requests.py (1)

837-842: Use Optional[PositiveInt] for timeout to follow model typing conventions.

The timeout field currently uses Optional[int] with a gt=0 constraint, but the codebase guideline requires PositiveInt for positive integer fields in Pydantic models. Using Optional[PositiveInt] makes the constraint explicit at the type level, matching the pattern already established in src/models/config.py (e.g., line 553).

♻️ Proposed refactor
-from pydantic import BaseModel, Field, field_validator, model_validator
+from pydantic import BaseModel, Field, PositiveInt, field_validator, model_validator
...
-    timeout: Optional[int] = Field(
+    timeout: Optional[PositiveInt] = Field(
         default=None,
         description="Request timeout in seconds for the MCP server",
-        gt=0,
         examples=[30],
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 837 - 842, The timeout Field in the
Request model is typed as Optional[int] with gt=0; change its annotation to
Optional[PositiveInt] (from pydantic) so the positivity constraint is enforced
at the type level, remove the redundant gt=0 argument from the Field, and add
the PositiveInt import if missing; update the timeout declaration (symbol:
timeout: Optional[int] = Field(...)) to timeout: Optional[PositiveInt] =
Field(...) to match the pattern used in src/models/config.py.
src/app/endpoints/mcp_servers.py (1)

57-68: Align the handler docstrings with the repo's Google-style format.

These docstrings include Returns/Raises, but they still omit Parameters, which makes the request/body/auth contract harder to scan in an API module. As per coding guidelines, "Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with Parameters, Returns, Raises, and Attributes sections."

Also applies to: 136-146, 186-198

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/mcp_servers.py` around lines 57 - 68, The handler
docstrings (the "Register an MCP server dynamically at runtime" registration
handler and the other handlers at the ranges you flagged) must be converted to
Google-style docstrings: add a Parameters section describing the request
body/expected fields and auth context, change Returns to a "Returns:" section
that clearly states the return type (MCPServerRegistrationResponse) and content,
and keep a "Raises:" section listing HTTPException cases (duplicate name,
connection error, registration failure) with brief conditions; update the
docstrings for the MCP server registration handler and the other two handlers at
the referenced blocks (lines around 136-146 and 186-198) to follow this exact
structure so consumers can easily scan parameters, returns, and errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/mcp_servers.py`:
- Around line 99-107: The code currently includes raw upstream exception text
(str(e)) in API responses (see ServiceUnavailableResponse and
InternalServerErrorResponse in the except blocks around
configuration.remove_mcp_server and HTTPException raising); change these to log
the full exception via logger.exception or logger.error with exception info, but
replace the response cause with a fixed opaque string (e.g. "upstream_error" or
a generated error_id) or a generic message like "internal server error" before
raising HTTPException or returning the response; apply the same change to the
other occurrence referenced (lines ~223-225) so no raw exception text is
serialized to clients while full details remain in logs for debugging.
- Around line 227-240: The code currently swallows all exceptions from the Llama
Stack toolgroup unregister and proceeds to call
configuration.remove_mcp_server(name); change this so that only a verified
remote "not found" outcome allows local removal. Specifically, replace the broad
except Exception around the remote unregister with logic that (1) detects the
remote-not-found case and continues to call
configuration.remove_mcp_server(name), and (2) for any other exception from the
Llama Stack toolgroup unregister, log the error and raise an HTTPException (do
not call configuration.remove_mcp_server). Reference the Llama Stack toolgroup
unregister call and configuration.remove_mcp_server(name) when making the change
so the local config is preserved unless the remote toolgroup is confirmed
missing.
- Around line 74-81: The handler currently constructs a
ModelContextProtocolServer (mcp_server) with authorization_headers, headers, and
timeout but only passes mcp_endpoint={"uri": mcp_server.url} into
toolgroups.register, which can create an unusable toolgroup; update the endpoint
handler to either forward these connection fields to Llama Stack when supported
or explicitly reject requests that supply them—specifically, before calling
toolgroups.register check ModelContextProtocolServer.authorization_headers,
.headers, and .timeout against defaults and if any non-empty/non-default values
are present raise a clear HTTP error (or ValidationError) explaining that
authorization_headers, headers, and custom timeout are not supported by the
current llama-stack-client; reference ModelContextProtocolServer,
toolgroups.register, and mcp_endpoint in the change so the rejection/forwarding
logic is colocated with where mcp_server is created and toolgroups.register is
called.

In `@src/configuration.py`:
- Line 67: The dynamic MCP registry is currently stored in the in-memory set
_dynamic_mcp_server_names (and other related code around lines 169-228), which
will diverge across workers; change the implementation so the registry is kept
in a shared, persistent store (e.g., Redis, database table, or central config
service) instead of the process-local set. Replace direct reads/writes of
_dynamic_mcp_server_names with calls to new helper methods (e.g.,
get_dynamic_mcp_servers(), add_dynamic_mcp_server(name),
remove_dynamic_mcp_server(name)) that operate on the shared backend and provide
atomic operations; update any code paths that reference
_dynamic_mcp_server_names to use these helpers and ensure worker-safe
concurrency (locks or atomic commands) and consistent
serialization/deserialization. Ensure migration/initialization code populates
the shared store from any existing local state and add tests verifying
consistency across simulated multiple-worker access.
- Line 67: init_from_dict() currently clears cached runtime state but doesn't
reset the _dynamic_mcp_server_names set, allowing stale server names to persist
across reinitialization; update init_from_dict() to clear or reinitialize
self._dynamic_mcp_server_names (e.g. set it to an empty set) as part of the
re-initialization path so dynamic MCP registry state is reset when configuration
is reloaded.

---

Nitpick comments:
In `@src/app/endpoints/mcp_servers.py`:
- Around line 57-68: The handler docstrings (the "Register an MCP server
dynamically at runtime" registration handler and the other handlers at the
ranges you flagged) must be converted to Google-style docstrings: add a
Parameters section describing the request body/expected fields and auth context,
change Returns to a "Returns:" section that clearly states the return type
(MCPServerRegistrationResponse) and content, and keep a "Raises:" section
listing HTTPException cases (duplicate name, connection error, registration
failure) with brief conditions; update the docstrings for the MCP server
registration handler and the other two handlers at the referenced blocks (lines
around 136-146 and 186-198) to follow this exact structure so consumers can
easily scan parameters, returns, and errors.

In `@src/models/requests.py`:
- Around line 837-842: The timeout Field in the Request model is typed as
Optional[int] with gt=0; change its annotation to Optional[PositiveInt] (from
pydantic) so the positivity constraint is enforced at the type level, remove the
redundant gt=0 argument from the Field, and add the PositiveInt import if
missing; update the timeout declaration (symbol: timeout: Optional[int] =
Field(...)) to timeout: Optional[PositiveInt] = Field(...) to match the pattern
used in src/models/config.py.

In `@tests/unit/app/endpoints/test_mcp_servers.py`:
- Around line 326-350: Update the test_delete_mcp_server_llama_stack_failure to
assert rollback invariants after toolgroups.unregister raises
APIConnectionError: after calling mcp_servers.register_mcp_server_handler and
triggering the failing delete via mcp_servers.delete_mcp_server_handler (where
client.toolgroups.unregister.side_effect is set), in addition to asserting the
HTTPException status_code == 503, verify that the dynamic server entry is still
present in the local registration state (the same registry/storage that
register_mcp_server_handler updates) so the server remains registered locally
when client.toolgroups.unregister fails; reference the test function name,
mcp_servers.register_mcp_server_handler, mcp_servers.delete_mcp_server_handler,
and client.toolgroups.unregister to locate the code to assert the local
registration still exists.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ae38ff2-4149-475c-8e9d-ea70eb28845e

📥 Commits

Reviewing files that changed from the base of the PR and between b2f54cf and 8f2a6ec.

📒 Files selected for processing (9)
  • src/app/endpoints/mcp_servers.py
  • src/app/routers.py
  • src/configuration.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
  • tests/unit/app/endpoints/test_mcp_servers.py
  • tests/unit/app/test_routers.py
  • tests/unit/models/responses/test_error_responses.py

Comment on lines +74 to +81
mcp_server = ModelContextProtocolServer(
name=body.name,
url=body.url,
provider_id=body.provider_id,
authorization_headers=body.authorization_headers or {},
headers=body.headers or [],
timeout=body.timeout,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For llama-stack-client 0.5.2, what fields are supported inside AsyncLlamaStackClient.toolgroups.register(..., mcp_endpoint=...)for MCP endpoints? Doesmcp_endpoint support headers, authorization headers, or timeout values in addition to the URI?

💡 Result:

In llama-stack-client 0.5.2, AsyncLlamaStackClient.toolgroups.register(..., mcp_endpoint=...) uses the URL schema, and the only supported field inside mcp_endpoint is:

  • uri: str (required)

This comes from the RegisterToolGroupRequest schema (mcp_endpoint: URL | null) and the URL schema definition ({ "uri": string }). [1][2]

So, no—in 0.5.2, mcp_endpoint itself does not support additional structured fields like headers, authorization/Authorization headers, or timeouts beyond the URI. [1][2]

If you need headers/auth, note that headers and authorization are supported elsewhere in the API (e.g., the OpenAI “Responses” MCP tool config includes headers and authorization), but that is not part of toolgroups.register(..., mcp_endpoint=...). [3]

Sources
[1] Llama Stack OpenAPI (v0.5.2): RegisterToolGroupRequest schema (shows mcp_endpoint is URL)
[2] Llama Stack OpenAPI (v0.5.2): URL schema (uri only)
[3] Llama Stack OpenAPI (v0.5.2): OpenAIResponseInputToolMCP schema (has headers, authorization)


Forward the request's connection settings to Llama Stack or reject them explicitly.

The handler accepts and stores authorization_headers, headers, and timeout (lines 74-81), but the call to toolgroups.register(...) (lines 91-95) only forwards the URI via mcp_endpoint={"uri": mcp_server.url}. In llama-stack-client 0.5.2, the mcp_endpoint parameter supports only the uri field; additional connection metadata is not propagated to the remote toolgroup. This means the endpoint returns 201 while potentially creating a toolgroup that cannot authenticate to or properly connect with the MCP server if those settings are required. Either pass these fields through to Llama Stack if a future version supports them, or reject the request with a clear error if authorization_headers, headers, or a non-default timeout are provided.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/mcp_servers.py` around lines 74 - 81, The handler currently
constructs a ModelContextProtocolServer (mcp_server) with authorization_headers,
headers, and timeout but only passes mcp_endpoint={"uri": mcp_server.url} into
toolgroups.register, which can create an unusable toolgroup; update the endpoint
handler to either forward these connection fields to Llama Stack when supported
or explicitly reject requests that supply them—specifically, before calling
toolgroups.register check ModelContextProtocolServer.authorization_headers,
.headers, and .timeout against defaults and if any non-empty/non-default values
are present raise a clear HTTP error (or ValidationError) explaining that
authorization_headers, headers, and custom timeout are not supported by the
current llama-stack-client; reference ModelContextProtocolServer,
toolgroups.register, and mcp_endpoint in the change so the rejection/forwarding
logic is colocated with where mcp_server is created and toolgroups.register is
called.

Comment on lines +99 to +107
response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e))
raise HTTPException(**response.model_dump()) from e
except Exception as e: # pylint: disable=broad-exception-caught
configuration.remove_mcp_server(body.name)
logger.error("Failed to register MCP toolgroup: %s", e)
error_response = InternalServerErrorResponse(
response="Failed to register MCP server",
cause=str(e),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not return raw upstream exception text to API clients.

These branches serialize str(e) into the public error payload. That can leak internal Llama Stack details and makes this API contract depend on upstream exception wording. Keep the full exception in logs, but return a fixed cause string or opaque error id to callers.

Also applies to: 223-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/mcp_servers.py` around lines 99 - 107, The code currently
includes raw upstream exception text (str(e)) in API responses (see
ServiceUnavailableResponse and InternalServerErrorResponse in the except blocks
around configuration.remove_mcp_server and HTTPException raising); change these
to log the full exception via logger.exception or logger.error with exception
info, but replace the response cause with a fixed opaque string (e.g.
"upstream_error" or a generated error_id) or a generic message like "internal
server error" before raising HTTPException or returning the response; apply the
same change to the other occurrence referenced (lines ~223-225) so no raw
exception text is serialized to clients while full details remain in logs for
debugging.

Comment on lines +227 to +240
except Exception as e: # pylint: disable=broad-exception-caught
logger.warning(
"Llama Stack toolgroup unregister failed for '%s', "
"proceeding with local removal: %s",
name,
e,
)

try:
configuration.remove_mcp_server(name)
except ValueError as e:
logger.error("Failed to remove MCP server from configuration: %s", e)
response = NotFoundResponse(resource="MCP server", resource_id=name)
raise HTTPException(**response.model_dump()) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not report a successful delete after remote unregister fails.

The broad except Exception at Line 227 logs and continues, so the local config can be removed even though the Llama Stack toolgroup is still present. That leaves /v1/mcp-servers and the actual toolgroup state out of sync. Except for a verified remote "not found" case, this should fail the request and preserve the local registration so the delete can be retried safely.

💡 Example change
-    except Exception as e:  # pylint: disable=broad-exception-caught
-        logger.warning(
-            "Llama Stack toolgroup unregister failed for '%s', "
-            "proceeding with local removal: %s",
-            name,
-            e,
-        )
+    except Exception as e:  # pylint: disable=broad-exception-caught
+        logger.error("Failed to unregister MCP toolgroup from Llama Stack: %s", e)
+        error_response = InternalServerErrorResponse(
+            response="Failed to unregister MCP server",
+            cause="Toolgroup unregistration failed",
+        )
+        raise HTTPException(**error_response.model_dump()) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/mcp_servers.py` around lines 227 - 240, The code currently
swallows all exceptions from the Llama Stack toolgroup unregister and proceeds
to call configuration.remove_mcp_server(name); change this so that only a
verified remote "not found" outcome allows local removal. Specifically, replace
the broad except Exception around the remote unregister with logic that (1)
detects the remote-not-found case and continues to call
configuration.remove_mcp_server(name), and (2) for any other exception from the
Llama Stack toolgroup unregister, log the error and raise an HTTPException (do
not call configuration.remove_mcp_server). Reference the Llama Stack toolgroup
unregister call and configuration.remove_mcp_server(name) when making the change
so the local config is preserved unless the remote toolgroup is confirmed
missing.

self._conversation_cache: Optional[Cache] = None
self._quota_limiters: list[QuotaLimiter] = []
self._token_usage_history: Optional[TokenUsageHistory] = None
self._dynamic_mcp_server_names: set[str] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dynamic MCP registry is process-local and will diverge across workers.

Line 67 stores dynamic registration state only in local memory. In multi-worker deployments, requests can hit different workers and observe different MCP server sets (e.g., register succeeds but list/delete on another worker returns inconsistent results).

Also applies to: 169-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/configuration.py` at line 67, The dynamic MCP registry is currently
stored in the in-memory set _dynamic_mcp_server_names (and other related code
around lines 169-228), which will diverge across workers; change the
implementation so the registry is kept in a shared, persistent store (e.g.,
Redis, database table, or central config service) instead of the process-local
set. Replace direct reads/writes of _dynamic_mcp_server_names with calls to new
helper methods (e.g., get_dynamic_mcp_servers(), add_dynamic_mcp_server(name),
remove_dynamic_mcp_server(name)) that operate on the shared backend and provide
atomic operations; update any code paths that reference
_dynamic_mcp_server_names to use these helpers and ensure worker-safe
concurrency (locks or atomic commands) and consistent
serialization/deserialization. Ensure migration/initialization code populates
the shared store from any existing local state and add tests verifying
consistency across simulated multiple-worker access.

⚠️ Potential issue | 🟠 Major

Reset dynamic MCP registry on config re-initialization.

init_from_dict() clears other cached runtime state but not _dynamic_mcp_server_names. Stale names can survive reloads and incorrectly mark servers as API-managed.

🛠️ Proposed fix
     def init_from_dict(self, config_dict: dict[Any, Any]) -> None:
         """Initialize configuration from a dictionary.
@@
         # clear cached values when configuration changes
         self._conversation_cache = None
         self._quota_limiters = []
         self._token_usage_history = None
+        self._dynamic_mcp_server_names.clear()
         # now it is possible to re-read configuration
         self._configuration = Configuration(**config_dict)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/configuration.py` at line 67, init_from_dict() currently clears cached
runtime state but doesn't reset the _dynamic_mcp_server_names set, allowing
stale server names to persist across reinitialization; update init_from_dict()
to clear or reinitialize self._dynamic_mcp_server_names (e.g. set it to an empty
set) as part of the re-initialization path so dynamic MCP registry state is
reset when configuration is reloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant