Name and Version
version: 8757 (a29e4c0)
built with Clang 19.1.5 for Windows x86_64
Operating systems
Windows, Linux, Mac
Which llama.cpp modules do you know to be affected?
Python/Bash scripts, libllama (core library)
Command line
python convert_hf_to_gguf.py path/to/gemma-4-multimodal --outfile gemma4-unified.gguf --verbose
Problem description & steps to reproduce
When converting multimodal models with deep internal paths (like Gemma 4), the generated GGUF tensor names can exceed the 64-character limit (GGML_MAX_NAME). This is aggravated by NVFP4 quantization which appends .weight.scale or .weight.input_scale.
The convert_hf_to_gguf.py script preserves deep Hugging Face paths for multimodal towers instead of mapping them to concise names.
Example failing tensor name:
model.audio_tower.subsample_conv_projection.input_proj_linear.weight.scale (74 chars)
Relevant log output
Logs
gguf_init_from_file_ptr: tensor name 266 is too long: 68 >= 64
llama_model_load: error loading model: llama_model_loader: failed to load model
Name and Version
version: 8757 (a29e4c0)
built with Clang 19.1.5 for Windows x86_64
Operating systems
Windows, Linux, Mac
Which llama.cpp modules do you know to be affected?
Python/Bash scripts, libllama (core library)
Command line
python convert_hf_to_gguf.py path/to/gemma-4-multimodal --outfile gemma4-unified.gguf --verbose
Problem description & steps to reproduce
When converting multimodal models with deep internal paths (like Gemma 4), the generated GGUF tensor names can exceed the 64-character limit (
GGML_MAX_NAME). This is aggravated by NVFP4 quantization which appends.weight.scaleor.weight.input_scale.The
convert_hf_to_gguf.pyscript preserves deep Hugging Face paths for multimodal towers instead of mapping them to concise names.Example failing tensor name:
model.audio_tower.subsample_conv_projection.input_proj_linear.weight.scale(74 chars)Relevant log output
Logs