Skip to content

Misc. bug: convert_hf_to_gguf.py produces tensor names exceeding 64-char limit for multimodal/NVFP4 models #21776

@rnett

Description

@rnett

Name and Version

version: 8757 (a29e4c0)
built with Clang 19.1.5 for Windows x86_64

Operating systems

Windows, Linux, Mac

Which llama.cpp modules do you know to be affected?

Python/Bash scripts, libllama (core library)

Command line

python convert_hf_to_gguf.py path/to/gemma-4-multimodal --outfile gemma4-unified.gguf --verbose

Problem description & steps to reproduce

When converting multimodal models with deep internal paths (like Gemma 4), the generated GGUF tensor names can exceed the 64-character limit (GGML_MAX_NAME). This is aggravated by NVFP4 quantization which appends .weight.scale or .weight.input_scale.

The convert_hf_to_gguf.py script preserves deep Hugging Face paths for multimodal towers instead of mapping them to concise names.

Example failing tensor name:
model.audio_tower.subsample_conv_projection.input_proj_linear.weight.scale (74 chars)

Relevant log output

Logs
gguf_init_from_file_ptr: tensor name 266 is too long: 68 >= 64
llama_model_load: error loading model: llama_model_loader: failed to load model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions