- Make sure that nothing is listening on ports
8000and8080. Open 3 generously sized terminals on your screen. - Download a sensible model. Qwen 3.5 4B is sensible.
- Compile fresh
llama.cpp:git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp cmake -B build && cmake --build build --config Release -j 6
- Launch the llama in terminal #1:
./llama-server -m ~/Downloads/Qwen3.5-4B-Q8_0.gguf --ctx-size 4096 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --verbose --webui-mcp-proxy - Clone this repository:
https://github.com/behavioral-ds/mcp-example && cd mcp-example
- Install deps:
poetry install && poetry shell - Launch MCP in terminal #2:
python mcp_serve.py - Execute the Agentic Call™ in terminal #3:
python call.py - Observe the dance between
LLM <-> Inference engine <-> MCP <-> Client.
-
Open llama web UI at http://localhost:8080/, go to settings and add a new MCP server:

-
Then click "Use prompt" and rejoice:


