[Mirror] New quantization type: Q3_HIFI by ngxson · Pull Request #65 · ngxson/llama.cpp

ngxson · 2025-12-22T23:38:16Z

Mirror from upstream PR: ggml-org#18246

Summary by CodeRabbit

New Features
- Adds a HIFI quantization family (Q3/Q5/Q6 HIFI variants) and new conversion/output option for HIFI formats.
Runtime / Backend
- Broad HIFI runtime support across CPU, CUDA, Metal, Vulkan, SYCL and GPU shader/kernels; file-format mappings updated.
Documentation
- Adds IMatrix guide, Q3_HIFI cross-model analysis, Q4_K_HIFI roadmap; removes CONTRIBUTING.md and AGENTS.md.
Testing & Tools
- New benchmarks, perplexity tests, dataset download/creation utilities, quantization and benchmarking scripts.
Chores
- Expanded .gitignore to exclude additional dataset and model artifacts.

All 3 metrics beat Q3_K_M

Q3_HIFI_A now the official version

Speed benchmark script added

Latest updates

Q3_HIFI model ready to go

Q2_K_HIFI updates

Q5_K_HIFI speed improvements

…2_K_HIFI case

…layout handling

…imization

… add whitespace for clarity in CPU quantization functions

LITE variant

…fy output weight matching and streamline default type returns for various quantization formats.

- Refactor `.editorconfig` to apply settings for public tools and remove deprecated entries. - Add `.gitattributes` to treat the WebUI build as binary for diff purposes. - Update `.gitignore` to exclude gzipped index.html files. - Introduce `AGENTS.md` and `CONTRIBUTING.md` to clarify AI usage policies and contribution guidelines. - Modify `CMakeLists.txt` to include an option for building the embedded Web UI. - Update `README.md` with recent API changes and hot topics. - Adjust Dockerfiles and other configuration files for compatibility and dependency updates. - Add new model support in `convert_hf_to_gguf_update.py` and related scripts. These changes enhance project maintainability and clarify contribution expectations.

- Introduce fallback defaults for layer normalization and rope frequency base if not specified in hparams. - Remove unnecessary addition of BOS token in Gemma4Model. - Ensure compatibility with updated model parameter requirements. These changes improve the robustness of model parameter handling and align with the latest specifications.

The previous refactoring removed explicit parameter writes from Gemma3Model.set_gguf_parameters() and Gemma4Model.set_vocab(), assuming TextModel would handle them. This broke Gemma4 inference because the GGUF file's key ordering matters: gguf_find_key() returns the *first* occurrence of a key, so TextModel writing head_count_kv as a scalar first caused all 60 layers to use n_head_kv=1 instead of the correct per-layer array (1 for SWA, 4 for global). This produced all-zero logits and PPL == n_vocab == 262144 for every imatrix chunk. Restore Gemma3Model to explicitly write all parameters (matching upstream exactly), and restore add_add_bos_token(True) in Gemma4Model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeoffApples and others added 30 commits December 13, 2025 21:26

All 3 metrics now exceed Q3_K_M

c5bf27f

Documentation updated

1cf26dc

Merge pull request #4 from GeoffApples/Q3_HIFI_1.7B_fast

9b58d82

All 3 metrics beat Q3_K_M

Q3_HIFI_A now the official version

0baa2c8

Merge pull request #5 from GeoffApples/Q3_HIFI_1.7B_fast

bc8ba8a

Q3_HIFI_A now the official version

Speed benchmark script added

2d4d0b3

Merge pull request #6 from GeoffApples/Q3_HIFI_1.7B_fast

a177f2c

Speed benchmark script added

Merge pull request #7 from ggml-org/master

bc3c5cf

Latest updates

Merge branch 'Q3_HIFI' into master

0e6f3aa

Merge pull request #8 from geoffmunn/master

9971857

Latest updates

Old files removed

42b6477

Cross-model documentation added

5792ab4

Validation errors fixed

8b72146

Whitespace fixed

daf0e20

Whitespace fixes

bf0d021

Whitespace fixes

f79424e

Whitespace fixes

abcb4cc

Whitespace changes

7724f7b

Whitespace fixes

a6bb077

Whitespace fixes

9bae334

Whitespace fixes

dce3e67

Whitespace fixes

3e3f931

Whitespace fixes

972d662

Whitespace fixes

20390e2

print statements changed to logging()

4851a00

Extra blank line removed

9be1c3d

Merge pull request #9 from geoffmunn/Q3_HIFI

c42d48f

Q3_HIFI model ready to go

Documentation moved

dbf9a9a

GGML_TYPE_Q3_HIFI now value 12

2c4049e

GGML_TYPE_Q3_HIFI moved to end, numbers re-ordered

e4fd98f

geoffmunn and others added 30 commits March 1, 2026 21:57

NaN errors fixed

3252ed2

Add support for Q5_K_HIFI_RES8 layout in mmq_get_q8_1_ds_layout function

c4861e9

Merge pull request #38 from geoffmunn/Q2_K_HIFI_v2

1a6936d

Q2_K_HIFI updates

Merge pull request #39 from geoffmunn/Q5_K_HIFI

d74f464

Q5_K_HIFI speed improvements

Whitespace fixed

48a3b7f

Whitespace fixed

a1a2687

Whitespace fixes for linter

74d62d2

Update mul_mat_vec_q_switch_type to include ids_stride parameter in Q…

184cacf

…2_K_HIFI case

Fix whitespace issues in HIFI_BUILD_GUIDE.md and quantize.cpp

1de093b

Phase 1 of the TURBO plan completed

1ac3434

Phase 2 and 5 complete

4133d36

Phase 4 implemented

6d66b07

Phase 5 final bits

f6d04a9

TURBO redesign

6526b9c

CUDA TURBO build errors fixed

a924783

Add support for additional TURBO quantization types in CUDA backend

d2a256e

Refine TURBO quantization type mappings in CUDA backend for improved …

356f7a8

…layout handling

CUDA performance improvements

57baca6

Update stride calculation in CUDA matrix multiplication for TURBO opt…

49b166d

…imization

Fix kernel mapping for Q3_K TURBO implementation in Metal backend and…

51561e4

… add whitespace for clarity in CPU quantization functions

TURBO renamed to LITE

e6656fb

Merge pull request #41 from geoffmunn/TURBO

5c24e4c

LITE variant

Merge remote-tracking branch 'upstream/master'

c1db190

Refactor tensor type handling in llama_tensor_get_type_impl to simpli…

bb131cf

…fy output weight matching and streamline default type returns for various quantization formats.

Merge branch 'ggml-org:master' into master

3e7b4cf

Replace 'bc' with 'awk' for arithmetic operations

70965fa

Files deleted

0deb61f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mirror] New quantization type: Q3_HIFI#65

[Mirror] New quantization type: Q3_HIFI#65
ngxson wants to merge 295 commits intongxson:masterfrom
geoffmunn:master

ngxson commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ngxson commented Dec 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Dec 22, 2025 •

edited by coderabbitai bot

Loading