Prerequisites
Feature Description
The GEMMA4 architecture implementation in llama.cpp (specifically the C++ loader) does not yet map GGML_TYPE_NVFP4 weights to internal architectural slots. Even with a valid NVFP4 GGUF, the loader fails with a tensor count mismatch because it only recognizes the FP32/BF16 tensors.
Motivation
NVFP4 is the native 4-bit format for NVIDIA Blackwell GPUs. Supporting this mapping in the GEMMA4 architecture is essential for leveraging hardware acceleration on RTX 50-series and B200 hardware.
Possible Implementation
Update the architectural registry for GEMMA4 to accept GGML_TYPE_NVFP4 for weights and their associated scale tensors.
Prerequisites
Feature Description
The
GEMMA4architecture implementation inllama.cpp(specifically the C++ loader) does not yet mapGGML_TYPE_NVFP4weights to internal architectural slots. Even with a valid NVFP4 GGUF, the loader fails with a tensor count mismatch because it only recognizes the FP32/BF16 tensors.Motivation
NVFP4 is the native 4-bit format for NVIDIA Blackwell GPUs. Supporting this mapping in the
GEMMA4architecture is essential for leveraging hardware acceleration on RTX 50-series and B200 hardware.Possible Implementation
Update the architectural registry for
GEMMA4to acceptGGML_TYPE_NVFP4for weights and their associated scale tensors.