Skip to content

modabada/tabby-vulkan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Tabby with Vulkan Acceleration (Experimental & Highly Restricted)

πŸ–₯️ Environment

  • CPU: AMD R7 5800H(with AVX2 support)
  • GPU: AMD Radeon / Integrated Graphics (iGPU)
  • Memory: UMA Shared Memory (8GB allocated in BIOS)
  • Binary: tabby_x86_64-manylinux2014-vulkan (tabby version 0.10.0)
  • Model: Qwen2.5-Coder-7B-Instruct-q4_k_m.gguf
  • Host OS: Ubuntu 24.04 minimize

⚠️ CRITICAL WARNING: Read Before Implementation

The Vulkan-supported binary for Tabby (tabby_x86_64-manylinux2014-vulkan) is EXTREMELY LIMITED. This setup is only recommended if you absolutely require GPU acceleration on non-NVIDIA hardware and are willing to sacrifice almost all core features of Tabby.

🚫 Known Limitations in v0.10.0 (Vulkan Build)

  1. No config.toml Support

    • The binary does not scan or recognize external configuration files.
    • All settings must be passed strictly through CLI arguments (e.g., --model, --chat-model, --device).
    • If a feature isn't available as a CLI flag, it cannot be configured.
  2. No Embedding/RAG Functionality

    • The embedding engine and indexing logic (Tantivy/Semantic Search) are completely stripped from this binary.
    • Even if you have embedding models in your storage, the binary will ignore them.
    • You cannot use the "Repository" or "Answer with Context" features.
  3. Inference-Only Architecture

    • This build is a "bare-bones" runner designed solely for text generation (Chat/Completion).
    • It lacks the internal routing to connect to external embedding servers or sidecar containers.
  4. Monitoring Confusion (UMA/iGPU)

    • When using AMD APUs with UMA (Unified Memory Architecture), container monitoring tools (like Dozzle) may only report ~300MB of RAM usage.
    • Actual VRAM usage (e.g., 4GB+ for a 7B model) is handled directly by the Vulkan driver and is invisible to standard Docker stats. Use amdgpu_top or radeontop on the host to verify GPU load.

πŸ“Š Monitoring Behavior (AMD APU/UMA Specifics)

caution of using 'docker minitoring tool'

  • UMA Memory Allocation: In AMD APU setups, the model is loaded into the VRAM reserved in BIOS. This memory is managed by the Vulkan driver at the hardware level.
  • Dozzle / Docker Stats Limitation: These tools only track System RAM (RSS). Consequently, they may report only ~300MB of usage, failing to account for the GGUF model residing in the GPU's memory space.
  • Verification: This is not a bug. To see actual load, use host-level tools like amdgpu_top or radeontop.

πŸ” What I Tried (The Journey of Hardship)

To save your time, here is what failed during the research:

  • Grep Analysis: Scanned the binary for embedding, config.toml, llama_cpp, and async_openai.
    • Found async_openai..types..embedding, but it's only a remnant library without any functional entry point (CLI/Env).
    • No strings related to config.toml or embedding_model flags were found in the Vulkan build.
  • Config Ingestion: Tried placing config.toml in various paths (/data, /root/.tabby, etc.), but the binary ignored them all.
  • Embedding Sidecar: Attempted to connect a separate embedding container, but this binary has no "client" logic to send requests to an external endpoint.
  • Custom Model directory: Llama inside tabby binary is try to read tabby/models/${MODEL_NAME}/tabby.json and tabby/models/${MODEL_NAME}/ggml/q8_0.v2.gguf forcely, not only tabby/models/${MODEL_NAME}/model.gguf, so i changed my model model.gguf to ggml/q8_0.v2.gguf (not recommended) and write tabby.json manually

πŸ’‘ Recommended Alternative

If you require RAG (Embedding) or Config File support on AMD hardware:

  • Option A: Use the official CPU-only Docker image. It supports all features, and modern AMD CPUs (AVX2) provide acceptable performance for 7B models.
  • Option B: Migrate to vLLM (ROCm-based) or llama.cpp (server), which offer superior AMD support and full feature parity. or, easy way to setup, Aphrodite or Habor
  • Option C: Switch to claude (Paid)

"Only the strong (and the extremely patient) should venture beyond this point."
(κ°•ν•œ 자, 그리고 ꡉμž₯ν•œ 인내심을 κ°€μ§„ 자만이 이 λ„ˆλ¨Έλ₯Ό λ„μ „ν•˜μ‹­μ‹œμ˜€.)

PS: Thanks for gemini, but you suggested this engine to me. I'll keep your "helpful" suggestion in mind, 'tabby supports vulkan! and you can use RAG!'

Last Updated: 2026-04-28

About

πŸš€ Tabby server with Vulkan/AMD iGPU acceleration on Docker. Optimized for AMD Ryzen APU (5800H/UMA) with automated model/binary setup. Fixed "Read-only file system" & ICD issues.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors