This project provides PowerShell scripts to automate the setup of the llama.cpp development environment on Windows. It installs the required prerequisites silently, selects an appropriate compute backend, and builds llama.cpp from source.
The repo is still Windows-only today. What changed is backend support: it is no longer implicitly NVIDIA-only.
autochoosescudaon NVIDIA GPUs,vulkanon AMD or Intel GPUs, andcpuif no supported GPU is detected.cudakeeps the existing NVIDIA-focused flow.vulkanis the recommended Windows path for AMD and Intel GPUs.cpubuilds a plain CPU-only binary.
Temporary CUDA policy for this repo: CUDA 13.2 is excluded due to corrupt llama.cpp builds. The scripts currently cap automatic selection at CUDA 13.1 unless you explicitly pin another compatible version.
- Windows 10/11 x64
- PowerShell 7
- Recent GPU driver for your selected backend
- ~20 GB free disk space
- App Installer / winget available (to install dependencies)
- Administrator rights (elevated PowerShell)
The CUDA path still uses
nvml.dllfrom the NVIDIA driver for SM auto-detect. If NVML isn’t available, the script falls back to a WMI-based heuristic and then toCMAKE_CUDA_ARCHITECTURES=native.
-
Admin check (must be elevated).
-
Installs prerequisites if missing:
- Git
- CMake
- Visual Studio 2022 Build Tools with the C++ toolchain and Windows SDK
- Ninja (and a portable fallback if needed)
-
Chooses a backend:
autopickscuda,vulkan, orcpufrom the detected adapter vendor.cudadetects your GPU’s SM via NVML and installs or selects a compatible CUDA toolkit.vulkaninstalls or uses the Vulkan SDK.cpuskips GPU SDK installation entirely.
-
Clones and builds
llama.cppundervendor\llama.cpp.
Run from an elevated PowerShell prompt:
# Allow script execution for this session
Set-ExecutionPolicy Bypass -Scope Process
# Run the installer (auto-selects backend from the detected GPU)
./install_llama_cpp.ps1Explicit backend controls:
# AMD / Intel on Windows
./install_llama_cpp.ps1 -Backend vulkan
# Force a CPU-only build
./install_llama_cpp.ps1 -Backend cpu
# Force the NVIDIA path
./install_llama_cpp.ps1 -Backend cudaExplicit CUDA version controls:
# Keep the repo on CUDA 13.1 for now
./install_llama_cpp.ps1 -PinnedCudaVersion 13.1
# Or force CUDA 13.0 specifically
./install_llama_cpp.ps1 -PinnedCudaVersion 13.0
# Or allow any compatible version up to 13.0
./install_llama_cpp.ps1 -MaxCudaVersion 13.0Optional: skip the build step (installs prerequisites + CUDA only):
./install_llama_cpp.ps1 -SkipBuildThe built binaries will be in:
vendor\llama.cpp\build\bin
To verify which runtime devices the built binary can see:
.\vendor\llama.cpp\build\bin\llama-server.exe --list-devicesOn an AMD or Intel Vulkan build, you should see a Vulkan device in that list. On a CPU-only build, use --device none at runtime to force CPU execution.
If you already have llama.cpp installed and just want to rebuild against a safe toolkit:
./rebuild_llama_cpp.ps1
# Rebuild for an AMD or Intel machine
./rebuild_llama_cpp.ps1 -Backend vulkan
# Or pin the rebuild to a specific installed toolkit
./rebuild_llama_cpp.ps1 -Backend cuda -PinnedCudaVersion 13.1
./rebuild_llama_cpp.ps1 -Backend cuda -PinnedCudaVersion 13.0Run from an elevated PowerShell prompt:
Set-ExecutionPolicy Bypass -Scope Process
./uninstall_llama_cpp.ps1This removes the winget-installed prerequisites and the vendor directory (and portable Ninja if it was created).
The run_llama_cpp_server.ps1 script starts llama-server.exe in router mode with a single default model configured in the script.
- Downloads the model: It automatically downloads the configured GGUF file into the
modelsdirectory if it is not already present. - Starts the server: It launches
llama-server.exein router mode and letsllama.cppauto-fit GPU layers and context. - Opens Web UI: After starting the server, it automatically opens
http://localhost:8080in your default web browser.
The current default model is:
gemma-4-e4b-it-Q4_K_M
To run the server, use the following command in PowerShell:
./run_llama_cpp_server.ps1- winget not found: Install “App Installer” from the Microsoft Store, then re-run.
- Pending reboot: Some installs require a reboot (Windows Update/VS Installer). Reboot and re-run.
- CUDA side-by-side: Multiple CUDA toolkits can co-exist. You do not need to uninstall CUDA 13.2 if CUDA 13.1 or 13.0 is also installed; the scripts will select the pinned/capped version.
- AMD / Intel on Windows: Prefer
-Backend vulkan. ROCm/HIP exists inllama.cpp, but it is not the primary path this repo automates on Windows. - Vulkan SDK missing: The script installs the Vulkan SDK when
-Backend vulkanis selected. If detection still fails, reinstall the LunarG Vulkan SDK and re-run. - NVML missing: The script falls back to a heuristic and then
CMAKE_CUDA_ARCHITECTURES=native. - Locked files: Stop
llama-server/llama-clibefore uninstalling.
This project is licensed under the MIT License. See the LICENSE file for details.
This project is a simplified version of the local-qwen3-coder-env repository, focusing solely on the installation of llama.cpp.