Build software better, together

hogeheer499-commits / strix-halo-guide

The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.

benchmark amd optimization vulkan inference rocm mini-pc unified-memory beelink llm llama-cpp local-llm ollama gguf rdna3 strix-halo gfx1151 dgx-spark ryzen-ai-max

Updated Apr 26, 2026
Shell

pccr10001 / comfyui-gfx1151-fa

Star

ComfyUI with Flash Attention for AI+ MAX 395 (gfx1151)

amd pytorch rocm comfyui flash-attention gfx1151

Updated Jan 12, 2026
Dockerfile

hec-ovi / vllm-awq4-qwen

Star

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated Apr 27, 2026
Python

hec-ovi / vllm-qwen

Star

vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.

docker amd self-hosted inference-server rocm llm-serving vllm local-llm qwen multimodal-llm ryzen-ai openai-compatible qwen3 strix-halo gfx1151

Updated Apr 26, 2026
Python

bong-water-water-bong / 1bit-systems

Star

Local, ternary-weight LLM inference on AMD Strix Halo. Rust above the kernels, HIP below, zero Python at runtime. https://discord.gg/EhQgmNePg

rust amd mcp hip rocm inference-engine 1-bit bitnet npu openai-api llm llm-inference local-ai strix-halo gfx1151 1-58-bit ternary-llm xdna2

Updated Apr 29, 2026
HTML

hec-ovi / llama-qwen

Star

llama.cpp + Qwen3.6-27B (Q8_0 GGUF) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). 256K context, ~7.5 t/s decode via TheRock ROCm Docker.

docker amd self-hosted inference-server rocm llm-serving llama-cpp local-llm qwen gguf ryzen-ai openai-compatible qwen3 strix-halo gfx1151

Updated Apr 26, 2026
Python

hec-ovi / comfyui-strix-docker

Star

ComfyUI on AMD Strix Halo (RDNA 3.5 / gfx1151) via Docker. Ubuntu Rolling + UV-managed Python 3.12 + ROCm preview wheels. Solves the silent CPU fallback Debian/Python 3.13 images hit on gfx1151.

docker flux amd self-hosted pytorch rocm stable-diffusion generative-ai ai-image-generation comfyui ryzen-ai strix-halo gfx1151

Updated Apr 26, 2026
Dockerfile

ianbarber / strix-halo-skills

Star

Claude Code skill for AMD Strix Halo (Ryzen AI MAX+ 395) ML setup. Handles PyTorch installation (official wheels don't work with gfx1151), GTT memory config, and environment setup. Enables 30B parameter models.

machine-learning amd pytorch rocm llm claude-code strix-halo gfx1151 ryzen-ai-max ml-setup

Updated Jan 23, 2026
Python

hec-ovi / vllm-gpt

Star

Production-oriented Docker Compose stack serving openai/gpt-oss-20b via vLLM on AMD Strix Halo (gfx1151, ROCm 7.2). OpenAI Responses API, host-mounted weights, hard-capped KV cache. Verified, no source build.

docker-compose amd self-hosted inference-server rocm openai-api llm-serving vllm local-llm openai-compatible responses-api strix-halo gpt-oss gfx1151

Updated Apr 26, 2026

bong-water-water-bong / rocm-cpp

Star

Native ROCm C++ kernels for Strix Halo (gfx1151): ternary BitNet GEMV, RMSNorm, RoPE, split-KV Flash-Decoding attention. Zero hipBLAS, zero Python.

amd hip gpu-computing rocm cpp20 bitnet llm-inference flash-decoding strix-halo gfx1151 1-58-bit ternary-llm

Updated Apr 29, 2026
C++

nerds-run / strix_halo_vllm

Star

Ansible collection for deploying vLLM on AMD Ryzen AI Max "Strix Halo" (gfx1151) APUs. Toolbox and systemd service modes, kernel tuning, model prefetching, and optional Open WebUI frontend.

ansible llama apu amd-gpu vllm qwen qwen2-5 qwen3 strix-halo gfx1151 openweb-ui

Updated Apr 16, 2026
Python

hec-ovi / rocm-strix-docker

Star

Docker infrastructure for AMD Strix Halo (RDNA 3.5 / gfx1151): PyTorch + ROCm base container and a separate Ollama LLM service. Two folders, two Compose files, one Strix Halo box.

docker ubuntu docker-compose amd gpu self-hosted pytorch rocm llm ollama rdna3 ryzen-ai strix-halo gfx1151

Updated Apr 26, 2026
Shell

nabe2030 / faster-whisper-rocm-strix-halo

Star

Drop-in recipe for running faster-whisper on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151) with Ubuntu 26.04 + ROCm 7.2.2 — no source build required

ubuntu speech-to-text transcription whisper asr rocm amd-gpu faster-whisper ctranslate2 strix-halo gfx1151 ryzen-ai-max

Updated Apr 29, 2026
Python

sammyjoyce / rocm-nightly-flake

Star

ROCm nightly monolithic tarball packaged as a Nix flake (gfx1151)

nix nixos amd gpu hip rocm nix-flake rdna3 strix-halo gfx1151

Updated Apr 26, 2026
Nix

MaxusAI / ryzen-ai-max-rocm-ollama-testbench

Star

Docker stack: Ollama v0.21.0 built from source against ROCm 7.2.2 with native gfx1151 (Strix Halo) — serves Gemma 4 up to 256K context on AMD Ryzen AI MAX+ 395 / Radeon 8060S. Includes a 9-layer make validate ladder for the host firmware, ROCm runtime, container, and long-context inference.

docker ubuntu amd validator hip gemma rocm llm long-context ollama rdna3 gemma3 strix-halo gfx1151 ryzen-ai-max radeon-8060s

Updated Apr 22, 2026
Shell

hec-ovi / strix-llm-api

Star

Experimental local LLM API for AMD Strix Halo (gfx1151) on ROCm 7.10 (TheRock). Two-service split: vLLM inference engine + FastAPI gateway with OpenAI protocol normalization, auth, management. Docker Compose.

docker-compose amd self-hosted inference-server rocm fastapi openai-api llm-serving vllm ryzen-ai openai-compatible strix-halo gpt-oss gfx1151

Updated Apr 26, 2026
Python

GetNyrex / strix-halo-guide

Star

Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87 t/s for large models without cloud or subscription costs

amd optimization inference rocm mini-pc asus-rog linux-gaming unified-memory beelink cachyos llm llama-cpp local-llm ollama gguf rdna3 strix-halo gfx1151

Updated Apr 30, 2026
Shell

hec-ovi / openclaw-strix-embed

Star

OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.

docker amd self-hosted embeddings rocm fastapi vector-search sentence-transformers openai-api embedding-model bge-m3 openai-compatible strix-halo gfx1151

Updated Apr 26, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gfx1151

Here are 18 public repositories matching this topic...

hogeheer499-commits / strix-halo-guide

pccr10001 / comfyui-gfx1151-fa

hec-ovi / vllm-awq4-qwen

hec-ovi / vllm-qwen

bong-water-water-bong / 1bit-systems

hec-ovi / llama-qwen

hec-ovi / comfyui-strix-docker

ianbarber / strix-halo-skills

hec-ovi / vllm-gpt

bong-water-water-bong / rocm-cpp

nerds-run / strix_halo_vllm

hec-ovi / rocm-strix-docker

nabe2030 / faster-whisper-rocm-strix-halo

sammyjoyce / rocm-nightly-flake

MaxusAI / ryzen-ai-max-rocm-ollama-testbench

hec-ovi / strix-llm-api

GetNyrex / strix-halo-guide

hec-ovi / openclaw-strix-embed

Improve this page

Add this topic to your repo