Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2848,6 +2848,15 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
params.webui_mcp_proxy = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_WEBUI_MCP_PROXY"));
add_opt(common_arg(
{"--tools"}, "TOOL1,TOOL2,...",
"experimental: whether to enable built-in tools for AI agents - do not enable in untrusted environments (default: no tools)\n"
"specify \"all\" to enable all tools\n"
"available tools: read_file, file_glob_search, grep_search, exec_shell_command, write_file, edit_file, apply_diff",
[](common_params & params, const std::string & value) {
params.server_tools = parse_csv_row(value);
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_TOOLS"));
add_opt(common_arg(
{"--webui"},
{"--no-webui"},
Expand Down
3 changes: 3 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,9 @@ struct common_params {
bool endpoint_props = false; // only control POST requests, not GET
bool endpoint_metrics = false;

// enable built-in tools
std::vector<std::string> server_tools;

// router server configs
std::string models_dir = ""; // directory containing models for the router server
std::string models_preset = ""; // directory containing model presets for the router server
Expand Down
2 changes: 2 additions & 0 deletions tools/server/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ add_library(${TARGET} STATIC
server-common.h
server-context.cpp
server-context.h
server-tools.cpp
server-tools.h
)

if (BUILD_SHARED_LIBS)
Expand Down
55 changes: 55 additions & 0 deletions tools/server/README-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,61 @@ The framework automatically starts a `llama-server` instance, sends requests, an

For detailed instructions, see the [test documentation](./tests/README.md).

### API for tools

This endpoint is intended to be used internally by the Web UI and subject to change or to be removed in the future.

**GET /tools**

Get a list of tools, each tool has these fields:
- `tool` (string): the ID name of the tool, to be used in POST call. Example: `read_file`
- `display_name` (string): the name to be displayed on UI. Example: `Read file`
- `type` (string): always be `"builtin"` for now
- `permissions` (object): a mapping string --> boolean that indicates the permission required by this tool. This is useful for the UI to ask the user before calling the tool. For now, the only permission supported is `"write"`
- `definition` (object): the OAI-compat definition of this tool

**POST /tools**

Invoke a tool call, request body is a JSON object with:
- `tool` (string): the name of the tool
- `params` (object): a mapping from argument name (string) to argument value

Returns JSON object. There are two response formats:

Format 1: Plain text. The text will be placed into a field called `plain_text_response`, example:

```json
{
"plain_text_response": "this is a text response"
}
```

The client should extract this value and place it inside message content (note: content is no longer a JSON), example

```json
{
"role": "tool",
"content": "this is a text response"
}
```

Format 2: Normal JSON response, example:

```json
{
"error": "cannot open this file"
}
```

That requires `JSON.stringify` when formatted to message content:

```json
{
"role": "tool",
"content": "{\"error\":\"cannot open this file\"}"
}
```

### Notable Related PRs

- Initial server implementation: https://github.com/ggml-org/llama.cpp/pull/1443
Expand Down
15 changes: 15 additions & 0 deletions tools/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ For the full list of features, please refer to [server's changelog](https://gith
| `--webui-config JSON` | JSON that provides default WebUI settings (overrides WebUI defaults)<br/>(env: LLAMA_ARG_WEBUI_CONFIG) |
| `--webui-config-file PATH` | JSON file that provides default WebUI settings (overrides WebUI defaults)<br/>(env: LLAMA_ARG_WEBUI_CONFIG_FILE) |
| `--webui-mcp-proxy, --no-webui-mcp-proxy` | experimental: whether to enable MCP CORS proxy - do not enable in untrusted environments (default: disabled)<br/>(env: LLAMA_ARG_WEBUI_MCP_PROXY) |
| `--tools TOOL1,TOOL2,...` | experimental: whether to enable built-in tools for AI agents - do not enable in untrusted environments (default: no tools)<br/>specify "all" to enable all tools<br/>available tools: read_file, file_glob_search, grep_search, exec_shell_command, write_file, edit_file, apply_diff<br/>(env: LLAMA_ARG_TOOLS) |
| `--webui, --no-webui` | whether to enable the Web UI (default: enabled)<br/>(env: LLAMA_ARG_WEBUI) |
| `--embedding, --embeddings` | restrict to only support embedding use case; use only with dedicated embedding models (default: disabled)<br/>(env: LLAMA_ARG_EMBEDDINGS) |
| `--rerank, --reranking` | enable reranking endpoint on server (default: disabled)<br/>(env: LLAMA_ARG_RERANKING) |
Expand Down Expand Up @@ -293,6 +294,12 @@ It is currently available in the following endpoints:

For more details, please refer to [multimodal documentation](../../docs/multimodal.md)

### Built-in tools support

The server includes a set of built-in tools that enable the LLM to access the local file system directly from the Web UI.

To use this feature, start the server with `--tools all`. You can also enable only specific tools by passing a comma-separated list: `--tools name1,name2,...`. Run `--help` for the full list of available tool names.

## Build

`llama-server` is built alongside everything else from the root of the project
Expand Down Expand Up @@ -1438,6 +1445,14 @@ curl http://localhost:8080/v1/messages/count_tokens \
{"input_tokens": 10}
```

## Server built-in tools

The server exposes a REST API under `/tools` that allows the Web UI to call built-in tools. This endpoint is intended to be used internally by the Web UI and subject to change or to be removed in the future.

**Please do NOT use this endpoint in a downstream application**

For further documentation about this endpoint, please refer to [server internal documentation](./README-dev.md)

## Using multiple models

`llama-server` can be launched in a **router mode** that exposes an API for dynamically loading and unloading models. The main process (the "router") automatically forwards each request to the appropriate model instance.
Expand Down
Loading
Loading