Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
-
Updated
Apr 29, 2026 - Go
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Lightweight proxy for LLM
Local LLM proxy, DevOps friendly
CLI wrapper for llama.cpp providing an ollama-like experience
A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml
FastAPI proxy that strips volatile fields from OpenClaw requests to dramatically improve llama-server KV cache hit rates (~22× faster prompt eval)
One place to store and manage all your recipe for Llama Server
LlamaOrch is simple Bash-based CLI Orchestrator for llama.cpp server.
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
Benchmark Gemma 4 E2B on Apple Silicon: MLX (mlx-lm) vs GGUF (llama-server), with TTFT, tokens/sec, and memory.
A production-grade Python SDK for llama-server that streamlines authentication, token rotation, observability, and PII masking—helping AI architects ship secure, traceable LLM systems with enterprise-ready guardrails.
Multi-instance llama.cpp orchestration — GPU pinning, heterogeneous pools, round-robin routing — one dashboard & API.
This is a Bash script to automatically launch llama-server, detects available .gguf models, and selects GPU layers based on your free VRAM.
Claude-Code-style CLI for your own local LLM fleet. Multi-agent delegation, MCP, hooks, autopilot, 100+ tools (file/shell/web/security/free APIs) — all running on your GPU, no cloud calls.
Provide tested tools and configs to run Qwen 3.5 GGUF models efficiently on a single 16GB NVIDIA GPU using llama.cpp locally.
Terminal UI for launching llama-server/llama.cpp with auto-discovered local GGUF models
Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.
To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."