Skip to content

Commit 5e8ee5c

Browse files
ochafikclaude
andcommitted
refactor(mcp): inline MCP bridge, use streaming HTTP proxy
Simplify MCP proxy architecture by removing custom transport layers and using the existing server_http_proxy for streaming responses. Changes: - Remove server-mcp-bridge.cpp/h (custom SSE+HTTP POST transport) - Remove server-ws.cpp/h (WebSocket transport, unused) - Remove mcp-transport-custom.ts (frontend custom transport) - Use server_http_proxy to stream responses (fixes SSE forwarding) - Fix endpoint mismatch: client now uses /mcp?server=... consistently - Fix JSON injection in error responses (use json{}.dump()) - Add port validation with proper error handling - Add CORS headers for browser MCP clients - Fix memory leaks in frontend MCP store disconnect - Simplify tests to match new HTTP proxy model - Update README and example config 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 2724d3f commit 5e8ee5c

22 files changed

Lines changed: 549 additions & 2054 deletions

PR.md

Lines changed: 0 additions & 93 deletions
This file was deleted.

tools/server/CMakeLists.txt

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,6 @@ set(TARGET_SRCS
4646
server-common.h
4747
server-context.cpp
4848
server-context.h
49-
server-ws.cpp
50-
server-ws.h
51-
server-mcp-bridge.cpp
52-
server-mcp-bridge.h
5349
)
5450
set(PUBLIC_ASSETS
5551
index.html.gz

tools/server/README.md

Lines changed: 47 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1681,26 +1681,25 @@ Apart from error types supported by OAI, we also have custom types that are spec
16811681

16821682
### MCP (Model Context Protocol) Support
16831683

1684-
The server supports [MCP](https://modelcontextprotocol.io/) for integrating external tools via WebSocket. MCP enables models to interact with external services like file systems, databases, APIs, and more.
1684+
The server supports [MCP](https://modelcontextprotocol.io/) for integrating external tools. MCP enables models to interact with external services like file systems, databases, APIs, and more.
1685+
1686+
The server acts as an HTTP proxy for remote MCP servers, handling CORS for browser-based clients.
16851687

16861688
#### MCP Configuration
16871689

1688-
Create an MCP configuration file (JSON format):
1690+
Create an MCP configuration file (JSON format) with remote MCP server URLs:
16891691

16901692
```json
16911693
{
16921694
"mcpServers": {
1693-
"filesystem": {
1694-
"command": "npx",
1695-
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"],
1696-
"env": {}
1697-
},
16981695
"brave-search": {
1699-
"command": "npx",
1700-
"args": ["-y", "@anthropic/mcp-server-brave-search"],
1701-
"env": {
1702-
"BRAVE_API_KEY": "your-api-key"
1696+
"url": "http://127.0.0.1:38180/mcp",
1697+
"headers": {
1698+
"Authorization": "Bearer your-api-key"
17031699
}
1700+
},
1701+
"filesystem": {
1702+
"url": "http://127.0.0.1:38181/mcp"
17041703
}
17051704
}
17061705
}
@@ -1717,38 +1716,56 @@ The server looks for MCP configuration in the following order:
17171716
#### MCP Usage
17181717

17191718
```bash
1720-
# Use default config location (~/.llama.cpp/mcp.json)
1721-
./llama-server -m model.gguf
1719+
# Enable MCP with --webui-mcp flag
1720+
./llama-server -m model.gguf --webui-mcp
17221721

1723-
# Or specify config path
1724-
./llama-server -m model.gguf --mcp-config /path/to/mcp.json
1722+
# Specify config path
1723+
./llama-server -m model.gguf --webui-mcp --mcp-config /path/to/mcp.json
17251724

17261725
# Or use environment variable
1727-
LLAMA_MCP_CONFIG=/path/to/mcp.json ./llama-server -m model.gguf
1726+
LLAMA_MCP_CONFIG=/path/to/mcp.json ./llama-server -m model.gguf --webui-mcp
17281727
```
17291728

1730-
#### MCP WebSocket Port
1731-
1732-
MCP uses WebSocket on HTTP port + 1 (default: 8081 when HTTP is on 8080).
1733-
17341729
#### MCP API Endpoints
17351730

1736-
| Endpoint | Description |
1737-
|----------|-------------|
1738-
| `GET /mcp/servers` | List available MCP servers from configuration |
1739-
| `WS /mcp?server=<name>` | WebSocket connection (on HTTP port + 1) |
1731+
| Endpoint | Method | Description |
1732+
|----------|--------|-------------|
1733+
| `/mcp/servers` | GET | List available MCP server names from config |
1734+
| `/mcp?server=<name>` | GET | Proxy GET requests to remote MCP server (SSE streams) |
1735+
| `/mcp?server=<name>` | POST | Proxy POST requests to remote MCP server (JSON-RPC) |
17401736

17411737
#### MCP Protocol
17421738

1743-
The MCP bridge implements JSON-RPC 2.0 over WebSocket. Key methods:
1744-
- `initialize` - Establish MCP session
1745-
- `tools/list` - List available tools
1746-
- `tools/call` - Execute a tool
1747-
- `resources/list` - List available resources
1748-
- `resources/read` - Read a resource
1739+
The server proxies requests to remote MCP servers using the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports). The web UI uses the official `@modelcontextprotocol/sdk` client.
17491740

17501741
For more information about MCP, see the [Model Context Protocol documentation](https://modelcontextprotocol.io/).
17511742

1743+
#### Example MCP Servers
1744+
1745+
Here's how to run some example MCP servers that work with the default config:
1746+
1747+
**Brave Search** (requires `BRAVE_API_KEY` environment variable - get one at https://brave.com/search/api/):
1748+
1749+
```bash
1750+
BRAVE_API_KEY=your-key-here npx -y @anthropic-ai/mcp-server-brave-search --transport http --port 38180
1751+
```
1752+
1753+
**Python interpreter** (with common data science packages):
1754+
1755+
```bash
1756+
uvx mcp-run-python --deps numpy,pandas,pydantic,requests,httpx,sympy,aiohttp streamable-http --port 38181
1757+
```
1758+
1759+
**Run both together** using `concurrently`:
1760+
1761+
```bash
1762+
BRAVE_API_KEY=your-key-here npx -y concurrently \
1763+
"npx -y @anthropic-ai/mcp-server-brave-search --transport http --port 38180" \
1764+
"uvx mcp-run-python --deps numpy,pandas,pydantic,requests,httpx,sympy,aiohttp streamable-http --port 38181"
1765+
```
1766+
1767+
Then update `mcp_config.example.json` with your settings and start llama-server with `--webui-mcp`.
1768+
17521769
### Legacy completion web UI
17531770

17541771
A new chat-based UI has replaced the old completion-based since [this PR](https://github.com/ggml-org/llama.cpp/pull/10175). If you want to use the old completion, start the server with `--path ./tools/server/public_legacy`

tools/server/mcp_config.example.json

Lines changed: 10 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,42 +4,19 @@
44
"_comment_macos": "On macOS/Linux, place this file in ~/.llama.cpp/mcp.json",
55
"_comment_env": "Or set the LLAMA_MCP_CONFIG environment variable to point to this file",
66
"mcpServers": {
7-
"filesystem": {
8-
"command": "npx",
9-
"args": [
10-
"-y",
11-
"@modelcontextprotocol/server-filesystem",
12-
"/allowed/path"
13-
]
14-
},
7+
"_comment": "Remote MCP servers (proxied via C++ backend with CORS support)",
158
"brave-search": {
16-
"command": "npx",
17-
"args": [
18-
"-y",
19-
"@modelcontextprotocol/server-brave-search"
20-
],
21-
"env": {
22-
"BRAVE_API_KEY": "your-api-key-here"
23-
}
9+
"_comment": "Run: BRAVE_API_KEY=... npx -y @anthropic-ai/mcp-server-brave-search --transport http --port 38180",
10+
"_comment_key": "Get your API key at https://brave.com/search/api/",
11+
"url": "http://127.0.0.1:38180/mcp"
2412
},
25-
"github": {
26-
"command": "npx",
27-
"args": [
28-
"-y",
29-
"@modelcontextprotocol/server-github"
30-
],
31-
"env": {
32-
"GITHUB_TOKEN": "your-github-token-here"
33-
}
13+
"python": {
14+
"_comment": "Run: uvx mcp-run-python --deps numpy,pandas,pydantic,requests,httpx,sympy,aiohttp streamable-http --port 38181",
15+
"url": "http://127.0.0.1:38181/mcp"
3416
},
35-
"_comment_cwd_example": "Example: Run a custom MCP server script from a specific directory",
36-
"my-script": {
37-
"command": "python",
38-
"args": ["server.py"],
39-
"cwd": "/path/to/working/directory",
40-
"env": {
41-
"PYTHONUNBUFFERED": "1"
42-
}
17+
"filesystem": {
18+
"_comment": "Run: npx -y @anthropic-ai/mcp-server-filesystem --transport http --port 38182 /path/to/allowed/dir",
19+
"url": "http://127.0.0.1:38182/mcp"
4320
}
4421
}
4522
}

tools/server/public/index.html.gz

7.92 KB
Binary file not shown.

tools/server/server-common.cpp

Lines changed: 0 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -57,64 +57,6 @@ json format_error_response(const std::string & message, const enum error_type ty
5757
};
5858
}
5959

60-
//
61-
// API key validation helpers
62-
//
63-
64-
std::string extract_api_key_from_auth_header(const std::string & auth_header) {
65-
std::string req_api_key = auth_header;
66-
67-
// Remove the "Bearer " prefix if needed
68-
std::string prefix = "Bearer ";
69-
if (req_api_key.length() >= prefix.length() && req_api_key.substr(0, prefix.length()) == prefix) {
70-
req_api_key = req_api_key.substr(prefix.length());
71-
}
72-
73-
// Trim leading whitespace
74-
while (!req_api_key.empty() && req_api_key[0] == ' ') {
75-
req_api_key.erase(0, 1);
76-
}
77-
78-
return req_api_key;
79-
}
80-
81-
// Constant-time string comparison to prevent timing attacks
82-
// Returns true if strings are equal, false otherwise
83-
static bool constant_time_compare(const std::string & a, const std::string & b) {
84-
if (a.size() != b.size()) {
85-
return false;
86-
}
87-
88-
// Use XOR to compare all bytes without early exit
89-
volatile unsigned char result = 0;
90-
for (size_t i = 0; i < a.size(); i++) {
91-
result |= (a[i] ^ b[i]);
92-
}
93-
94-
return result == 0;
95-
}
96-
97-
bool validate_auth_header(const std::string & auth_header, const std::vector<std::string> & api_keys) {
98-
// If API key is not set, skip validation
99-
if (api_keys.empty()) {
100-
return true;
101-
}
102-
103-
// Extract the API key from the Authorization header
104-
std::string req_api_key = extract_api_key_from_auth_header(auth_header);
105-
106-
// Validate the API key using constant-time comparison
107-
// This prevents timing attacks where an attacker could measure
108-
// response times to guess valid API key characters
109-
for (const auto & key : api_keys) {
110-
if (constant_time_compare(req_api_key, key)) {
111-
return true;
112-
}
113-
}
114-
115-
return false;
116-
}
117-
11860
//
11961
// random string / id
12062
//

tools/server/server-common.h

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -86,21 +86,6 @@ struct server_grammar_trigger {
8686

8787
json format_error_response(const std::string & message, const enum error_type type);
8888

89-
//
90-
// API key validation helpers
91-
//
92-
93-
// Validates an Authorization header value against a list of configured API keys.
94-
// Handles "Bearer " prefix and X-Api-Key header format.
95-
// Returns true if:
96-
// - api_keys is empty (no authentication configured)
97-
// - the provided auth_header matches one of the configured keys
98-
// Uses constant-time comparison to prevent timing attacks.
99-
bool validate_auth_header(const std::string & auth_header, const std::vector<std::string> & api_keys);
100-
101-
// Extracts the API key from an Authorization header value (removes "Bearer " prefix if present)
102-
std::string extract_api_key_from_auth_header(const std::string & auth_header);
103-
10489
//
10590
// random string / id
10691
//

0 commit comments

Comments
 (0)