Add EXAONE 4.5 implementations by nuxlear · Pull Request #21733 · ggml-org/llama.cpp

nuxlear · 2026-04-10T16:02:58Z

Overview

Add support for the EXAONE 4.5 architecture for the EXAONE 4.5 model released by LG AI Research.

Additional information

This PR adds the modeling code for EXAONE 4.5, which uses the same LLM architecture as EXAONE 4.
It also adds n_kv_heads to the CLIP model to make the ViT compatible with the GQA structure.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:
YES. The modeling code was implemented with the help of an AI assistant.

EXAONE 4.5 uses <vision> and </vision> for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style encode path (window attention pattern, optional mmproj input norm). Update exaone4_5 projector weights and convert_hf_to_gguf for mmproj export.

Align EXAONE4 tensor registration with EXAONE_MOE for NextN/MTP slots and avoid skip-flag propagation on duplicated rope_freqs so model loading succeeds for EXAONE 4.5 GGUF.

ngxson · 2026-04-10T16:06:03Z

please rebase before requesting a review

…llama-cpp/exaone4-5

ngxson · 2026-04-11T09:05:18Z

tools/mtmd/models/exaone4_5.cpp

+            Kcur = clip_repeat_kv_heads(ctx0, Kcur, d_head, n_kv_head, n_head, n_patches);
+            Vcur = clip_repeat_kv_heads(ctx0, Vcur, d_head, n_kv_head, n_head, n_patches);


this should be redundant, ggml support broadcasting automatically

ngxson · 2026-04-11T09:08:04Z

tools/mtmd/models/exaone4_5.cpp

+        window_idx = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_pos / 4);
+        ggml_set_name(window_idx, "window_idx");
+        ggml_set_input(window_idx);
+


move this to top. any inputs should be defined on top of cgraph

ngxson · 2026-04-11T09:09:54Z

tools/mtmd/clip.cpp

+                        hparams.n_merge = 2;
+                        get_u32(KEY_SPATIAL_MERGE_SIZE, hparams.n_merge, false);
+                        get_u32(KEY_WIN_ATTN_PATTERN, hparams.n_wa_pattern, false);
+                        hparams.set_limit_image_tokens(8, 4096);


set_limit_image_tokens is only used by old model. if you already know the min/max pixels supported, write it to GGUF metadata instead

ngxson · 2026-04-11T09:10:51Z

tools/mtmd/clip.cpp

+                        get_u32(KEY_N_HEAD_KV, hparams.n_kv_head, false);
+                        if (hparams.n_kv_head <= 0) {
+                            hparams.n_kv_head = 8;
+                        }


I see now reasons for this being an optional param. GGUF must have it, otherwise it's a faulty file

Suggested change

get_u32(KEY_N_HEAD_KV, hparams.n_kv_head, false);

if (hparams.n_kv_head <= 0) {

hparams.n_kv_head = 8;

}

get_u32(KEY_N_HEAD_KV, hparams.n_kv_head);

ngxson · 2026-04-11T09:11:40Z

tools/mtmd/models/exaone4_5.cpp

please specify exactly how this is different from qwen2, so that we can merge 2 models into one file in the future

ngxson · 2026-04-11T09:12:11Z

convert_hf_to_gguf.py

+                remapper = {
+                    "mtp.fc": "model.layers.{bid}.eh_proj",
+                    "mtp.pre_fc_norm_embedding": "model.layers.{bid}.enorm",
+                    "mtp.pre_fc_norm_hidden": "model.layers.{bid}.hnorm",
+                    "mtp.norm": "model.layers.{bid}.shared_head.norm",


please use proper tensor_mapping

ngxson · 2026-04-11T09:13:36Z

convert_hf_to_gguf.py

+        if name.startswith("visual.") and ".qkv." in name:
+            assert self.hparams_vision is not None
+            hv = self.hparams_vision
+            n_heads = hv["num_heads"]
+            n_kv = int(hv.get("num_key_value_heads", n_heads))
+            hidden = hv["hidden_size"]
+            head_dim = hidden // n_heads
+            q_dim = n_heads * head_dim
+            kv_dim = n_kv * head_dim
+            total_out = q_dim + 2 * kv_dim
+            out_dim = data_torch.shape[0]
+            if out_dim != total_out:
+                raise ValueError(f"EXAONE 4.5 vision qkv out dim mismatch: got {out_dim}, expected {total_out} ({name})")
+            wq = data_torch[:q_dim]
+            wk = data_torch[q_dim : q_dim + kv_dim]
+            wv = data_torch[q_dim + kv_dim :]
+            nq = name.replace("qkv", "q", 1)
+            nk = name.replace("qkv", "k", 1)
+            nv = name.replace("qkv", "v", 1)
+            yield from ModelBase.modify_tensors(self, wq, nq, bid)
+            yield from ModelBase.modify_tensors(self, wk, nk, bid)
+            yield from ModelBase.modify_tensors(self, wv, nv, bid)


we should not split qkv, instead, use ggml_view to split the result (not the weight)

see build_vit in clip.cpp for an example

lgai-exaone added 4 commits March 25, 2026 14:36

Add EXAONE 4.5 and Add GQA for MMproj

0b80839

mtmd: load EXAONE4 nextn tensors correctly

d011393

Align EXAONE4 tensor registration with EXAONE_MOE for NextN/MTP slots and avoid skip-flag propagation on duplicated rope_freqs so model loading succeeds for EXAONE 4.5 GGUF.

Minor fixes

f1e3ff2

nuxlear requested review from a team and CISC as code owners April 10, 2026 16:02

ngxson marked this pull request as draft April 10, 2026 16:06

github-actions bot added model Model specific examples python python script changes labels Apr 10, 2026

Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …

5ba14d5

…llama-cpp/exaone4-5

nuxlear marked this pull request as ready for review April 10, 2026 20:00

ngxson requested changes Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EXAONE 4.5 implementations#21733

Add EXAONE 4.5 implementations#21733
nuxlear wants to merge 5 commits intoggml-org:masterfrom
nuxlear:add-exaone4_5

nuxlear commented Apr 10, 2026

Uh oh!

ngxson commented Apr 10, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

ngxson Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		Kcur = clip_repeat_kv_heads(ctx0, Kcur, d_head, n_kv_head, n_head, n_patches);
		Vcur = clip_repeat_kv_heads(ctx0, Vcur, d_head, n_kv_head, n_head, n_patches);

Conversation

nuxlear commented Apr 10, 2026

Overview

Additional information

Requirements

Uh oh!

ngxson commented Apr 10, 2026

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants