Tabby with Vulkan Acceleration (Experimental & Highly Restricted)

🖥️ Environment

CPU: AMD R7 5800H(with AVX2 support)
GPU: AMD Radeon / Integrated Graphics (iGPU)
Memory: UMA Shared Memory (8GB allocated in BIOS)
Binary: tabby_x86_64-manylinux2014-vulkan (tabby version 0.10.0)
Model: Qwen2.5-Coder-7B-Instruct-q4_k_m.gguf
Host OS: Ubuntu 24.04 minimize

⚠️ CRITICAL WARNING: Read Before Implementation

The Vulkan-supported binary for Tabby (tabby_x86_64-manylinux2014-vulkan) is EXTREMELY LIMITED. This setup is only recommended if you absolutely require GPU acceleration on non-NVIDIA hardware and are willing to sacrifice almost all core features of Tabby.

🚫 Known Limitations in v0.10.0 (Vulkan Build)

No config.toml Support
- The binary does not scan or recognize external configuration files.
- All settings must be passed strictly through CLI arguments (e.g., --model, --chat-model, --device).
- If a feature isn't available as a CLI flag, it cannot be configured.
No Embedding/RAG Functionality
- The embedding engine and indexing logic (Tantivy/Semantic Search) are completely stripped from this binary.
- Even if you have embedding models in your storage, the binary will ignore them.
- You cannot use the "Repository" or "Answer with Context" features.
Inference-Only Architecture
- This build is a "bare-bones" runner designed solely for text generation (Chat/Completion).
- It lacks the internal routing to connect to external embedding servers or sidecar containers.
Monitoring Confusion (UMA/iGPU)
- When using AMD APUs with UMA (Unified Memory Architecture), container monitoring tools (like Dozzle) may only report ~300MB of RAM usage.
- Actual VRAM usage (e.g., 4GB+ for a 7B model) is handled directly by the Vulkan driver and is invisible to standard Docker stats. Use amdgpu_top or radeontop on the host to verify GPU load.

📊 Monitoring Behavior (AMD APU/UMA Specifics)

caution of using 'docker minitoring tool'

UMA Memory Allocation: In AMD APU setups, the model is loaded into the VRAM reserved in BIOS. This memory is managed by the Vulkan driver at the hardware level.
Dozzle / Docker Stats Limitation: These tools only track System RAM (RSS). Consequently, they may report only ~300MB of usage, failing to account for the GGUF model residing in the GPU's memory space.
Verification: This is not a bug. To see actual load, use host-level tools like amdgpu_top or radeontop.

🔍 What I Tried (The Journey of Hardship)

To save your time, here is what failed during the research:

Grep Analysis: Scanned the binary for embedding, config.toml, llama_cpp, and async_openai.
- Found async_openai..types..embedding, but it's only a remnant library without any functional entry point (CLI/Env).
- No strings related to config.toml or embedding_model flags were found in the Vulkan build.
Config Ingestion: Tried placing config.toml in various paths (/data, /root/.tabby, etc.), but the binary ignored them all.
Embedding Sidecar: Attempted to connect a separate embedding container, but this binary has no "client" logic to send requests to an external endpoint.
Custom Model directory: Llama inside tabby binary is try to read tabby/models/${MODEL_NAME}/tabby.json and tabby/models/${MODEL_NAME}/ggml/q8_0.v2.gguf forcely, not only tabby/models/${MODEL_NAME}/model.gguf, so i changed my model model.gguf to ggml/q8_0.v2.gguf (not recommended) and write tabby.json manually

💡 Recommended Alternative

If you require RAG (Embedding) or Config File support on AMD hardware:

Option A: Use the official CPU-only Docker image. It supports all features, and modern AMD CPUs (AVX2) provide acceptable performance for 7B models.
Option B: Migrate to vLLM (ROCm-based) or llama.cpp (server), which offer superior AMD support and full feature parity. or, easy way to setup, Aphrodite or Habor
Option C: Switch to claude (Paid)

"Only the strong (and the extremely patient) should venture beyond this point."
(강한 자, 그리고 굉장한 인내심을 가진 자만이 이 너머를 도전하십시오.)

PS: Thanks for gemini, but you suggested this engine to me. I'll keep your "helpful" suggestion in mind, 'tabby supports vulkan! and you can use RAG!'

Last Updated: 2026-04-28

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
docker-compose.yaml		docker-compose.yaml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabby with Vulkan Acceleration (Experimental & Highly Restricted)

🖥️ Environment

⚠️ CRITICAL WARNING: Read Before Implementation

🚫 Known Limitations in v0.10.0 (Vulkan Build)

📊 Monitoring Behavior (AMD APU/UMA Specifics)

caution of using 'docker minitoring tool'

🔍 What I Tried (The Journey of Hardship)

💡 Recommended Alternative

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Tabby with Vulkan Acceleration (Experimental & Highly Restricted)

🖥️ Environment

⚠️ CRITICAL WARNING: Read Before Implementation

🚫 Known Limitations in v0.10.0 (Vulkan Build)

📊 Monitoring Behavior (AMD APU/UMA Specifics)

caution of using 'docker minitoring tool'

🔍 What I Tried (The Journey of Hardship)

💡 Recommended Alternative

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages