Skip to content

Feature Request: Add NVFP4 tensor mapping for GEMMA4 architecture #21777

@rnett

Description

@rnett

Prerequisites

  • I am running the latest code.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue.
  • I reviewed the Discussions.

Feature Description

The GEMMA4 architecture implementation in llama.cpp (specifically the C++ loader) does not yet map GGML_TYPE_NVFP4 weights to internal architectural slots. Even with a valid NVFP4 GGUF, the loader fails with a tensor count mismatch because it only recognizes the FP32/BF16 tensors.

Motivation

NVFP4 is the native 4-bit format for NVIDIA Blackwell GPUs. Supporting this mapping in the GEMMA4 architecture is essential for leveraging hardware acceleration on RTX 50-series and B200 hardware.

Possible Implementation

Update the architectural registry for GEMMA4 to accept GGML_TYPE_NVFP4 for weights and their associated scale tensors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions