llama-server

Star

Here are 29 public repositories matching this topic...

lordmathis / llamactl

Star

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

self-hosted mlx openai-api llm llamacpp llama-cpp vllm llm-inference localllm localllama llama-server llm-router mlx-lm

Updated Apr 29, 2026
Go

willbnu / Qwen-3.5-16G-Vram-Local

Star

Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU

Updated Apr 18, 2026
Python

hwpoison / llamacpp-terminal-chat

Star

A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.

chat roleplay llama teminal-application llamacpp mistral-7b llama-server

Updated Mar 31, 2026
C++

yatesdr / go-llm-proxy

Star

Lightweight proxy for LLM

golang self-hosted homelab codex openai-api llm openai-proxy llama-cpp vllm llm-proxy llama-server claude-code responses-api qwen-code claude-code-cli

Updated Apr 17, 2026
Go

lynxai-team / goinfer

Star

Local LLM proxy, DevOps friendly

inference inference-server inference-api openai-api llm openaiapi llamacpp llama-cpp local-llm localllm local-ai llm-proxy llama-api llama-server llm-router language-model-api local-lm local-llm-integration

Updated Apr 28, 2026
Go

thilomichael / llama-buddy

Star

CLI wrapper for llama.cpp providing an ollama-like experience

python cli huggingface llm llama-cpp local-llm gguf llama-server

Updated Apr 27, 2026
Python

pkeffect / llama-swap-sync

Star

A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml

python llama-cpp gguf llama-server llama-swap

Updated Nov 15, 2025
Python

mallard1983 / openclaw-kvcache-proxy

Star

FastAPI proxy that strips volatile fields from OpenClaw requests to dramatically improve llama-server KV cache hit rates (~22× faster prompt eval)

proxy fastapi kv-cache llm prompt-caching llama-cpp local-llm llama-server openclaw amd-vulkan

Updated Feb 23, 2026
Python

Llama-Recipe-Manager / llama-recipe-manager

Star

One place to store and manage all your recipe for Llama Server

desktop-app rust ai svelte tauri tauri-app llama-cpp local-llm svelte5 llama-server

Updated Apr 26, 2026
Svelte

nlkli / lachat

Star

minimal CLI client for llama-server

chat cli cli-app llama gpt llm chatgpt llamacpp llama-server

Updated Jan 24, 2026
Rust

alasgarovs / llamaorch

Star

LlamaOrch is simple Bash-based CLI Orchestrator for llama.cpp server.

orchestration llm llamacpp llm-tools local-ai llama-server llamacpp-server

Updated Apr 14, 2026
Shell

CasualEngineerZombie / smolvlm-realtime-face

Star

A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.

face-recognition webcam llm-inference llama-server smolvlm

Updated Jun 30, 2025
JavaScript

NoSkillGuy / gemma-on-mac-mlx-vs-llama.cpp

Star

Benchmark Gemma 4 E2B on Apple Silicon: MLX (mlx-lm) vs GGUF (llama-server), with TTFT, tokens/sec, and memory.

python macos benchmark machine-learning metal inference gemma mlx apple-silicon llama-cpp gguf llama-server mlx-lm gemma-4

Updated Apr 6, 2026
Python

byang37 / llama-runner

Star

A lightweight desktop GUI for llama-server — multi-model routing, per-model presets, live I/O recording. Built with Go. Support Windows · macOS · Linux

llm llama-cpp local-ai gguf llama-server llama-gui

Updated Mar 16, 2026
HTML

A production-grade Python SDK for llama-server that streamlines authentication, token rotation, observability, and PII masking—helping AI architects ship secure, traceable LLM systems with enterprise-ready guardrails.

sdk ai openai llama observability governance pii llm generative-ai langchain llama-cpp langfuse llama-server langgraph ai-architecture

Updated Feb 28, 2026
Python

boringresearchjames / llamafleet

Star

Multi-instance llama.cpp orchestration — GPU pinning, heterogeneous pools, round-robin routing — one dashboard & API.

Updated Apr 30, 2026
JavaScript

nemmusu / run-llama-server

Star

This is a Bash script to automatically launch llama-server, detects available .gguf models, and selects GPU layers based on your free VRAM.

bash cli utility ai launcher nvidia llama nvidia-smi nvidia-gpu llm llamacpp gguf llama-server gguf-models

Updated May 25, 2025
Shell

space-kitty-o / gemi

Star

Claude-Code-style CLI for your own local LLM fleet. Multi-agent delegation, MCP, hooks, autopilot, 100+ tools (file/shell/web/security/free APIs) — all running on your GPU, no cloud calls.

Updated Apr 30, 2026
Python

YashwanthMY15 / Qwen-3.5-16G-Vram-Local

Star

Provide tested tools and configs to run Qwen 3.5 GGUF models efficiently on a single 16GB NVIDIA GPU using llama.cpp locally.

benchmark nvidia vlm mixture-of-experts huggingface ai-inference llama-cpp local-llm local-ai qwen gguf llama-server rtx-5080 rtx-4080 vram-16gb

Updated May 1, 2026
Python

witong42 / llamapad

Star

Terminal UI for launching llama-server/llama.cpp with auto-discovered local GGUF models

go macos tui terminal-ui bubbletea llama-cpp gguf llama-server

Updated Apr 8, 2026
Go

Improve this page

Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-server

Here are 29 public repositories matching this topic...

lordmathis / llamactl

willbnu / Qwen-3.5-16G-Vram-Local

hwpoison / llamacpp-terminal-chat

yatesdr / go-llm-proxy

lynxai-team / goinfer

thilomichael / llama-buddy

pkeffect / llama-swap-sync

mallard1983 / openclaw-kvcache-proxy

Llama-Recipe-Manager / llama-recipe-manager

nlkli / lachat

alasgarovs / llamaorch

CasualEngineerZombie / smolvlm-realtime-face

NoSkillGuy / gemma-on-mac-mlx-vs-llama.cpp

byang37 / llama-runner

Root1V / axonium-sdk

boringresearchjames / llamafleet

nemmusu / run-llama-server

space-kitty-o / gemi

YashwanthMY15 / Qwen-3.5-16G-Vram-Local

witong42 / llamapad

Improve this page

Add this topic to your repo