Low-latency IPC for persistent AI tool servers — LLM inference, TTS, STT, vector search, and more — all on one machine, no network stack required.
- Persistent servers — model weights and state stay loaded between calls; no per-request startup cost
- Kernel-speed IPC — named pipes route through kernel memory, not a network stack; lower latency than local HTTP
- Multi-client fanout — one server handles many concurrent clients; each gets its own downstream pipe
- Decorator API — register command handlers with a single
@ch.handler("CMD")line cpipeCLI — send ad-hoc commands to any running server from the terminal, likecurlfor pipes- Claude Code skill — an included skill teaches the assistant to discover and query live servers without leaving the session
- Ready-made servers — drop-in pipes for LLM chat, text-to-speech, and speech-to-text
This library uses named pipes as the transport layer for agentic tool servers — persistent background processes that expose capabilities such as LLM inference, text-to-speech, vector search, or browser automation to a Python orchestrator running on the same machine.
Because named pipes route data through kernel memory rather than a network stack, they offer lower latency than local HTTP and far less complexity than shared memory — a practical sweet spot for real-time applications like voice agents.
The same servers can be driven directly from Claude Code. An included agent skill teaches the assistant how to discover running pipe servers with cpipe --list, inspect their capabilities, and send commands.
For a deeper look at the design decisions and API reference, see DOCS.md.
# Core library only
pip install -e .
# With LLM inference support
pip install -e ".[llm]"
# With TTS support (macOS: mlx-audio + sounddevice)
pip install -e ".[tts]"
# With STT support (sounddevice; Voxtral weights vendored)
pip install -e ".[stt]"Requires Python 3.11+. See DOCS.md for platform-specific dependency details.
1. Start a server (Terminal 1):
conda activate named-pipes
cpipe --serve chat # LLM server on /tmp/tool-chat2. Query it from the CLI (Terminal 2):
cpipe /tmp/tool-chat chat --data '{"messages": [{"role":"user","content":"Hello!"}]}'3. Or write a client in Python:
from named_pipes.tool_client import ToolClient
import threading
class _ChatClient(ToolClient):
def on_message(self, msg):
if msg.get("done") is not True:
print(msg.get("result", ""), end="", flush=True)
done = threading.Event()
with _ChatClient("chat") as ch:
ch.send_command("chat", messages=[{"role": "user", "content": "Hello!"}])
done.wait(timeout=30)Start order matters — server first, then client (server creates the FIFOs).
# LLM chat
cpipe --serve chat # Terminal 1
python src/examples/chat_client.py # Terminal 2
# LLM → TTS pipeline (spoken output)
cpipe --serve chat # Terminal 1: LLM (/tmp/tool-chat)
cpipe --serve tts # Terminal 2: TTS (/tmp/tool-tts)
python src/examples/tts_client.py # Terminal 3: pipeline client
# Speech-to-text
cpipe --serve stt # Terminal 1: STT (/tmp/tool-stt)
python src/examples/stt_client.py # Terminal 2: subscriber
cpipe /tmp/tool-chat chat --data '{"messages": [{"role":"user","content":"Hello"}]}'
cpipe --version # show installed version
cpipe --list # discover running ToolServer instances (tool-* pipes)
cpipe --pid # same, plus PIDs that have each pipe open
cpipe --clear # delete orphaned tool pipesSee DOCS.md for all options and the full protocol reference.
An included skill at .claude/skills/cpipe/SKILL.md teaches Claude Code how to use cpipe to discover, inspect, and interact with live servers — so the LLM can query a local inference server or trigger TTS playback without leaving the coding session.
- DOCS.md — architecture, API reference, protocol spec, and design rationale
named-pipe-tools.md—ToolServerprotocol specificationsrc/examples/chat_client.py— LLM chat examplesrc/examples/tts_client.py— TTS examplesrc/examples/stt_client.py— STT example
