Conversation
EXAONE 4.5 uses <vision> and </vision> for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style encode path (window attention pattern, optional mmproj input norm). Update exaone4_5 projector weights and convert_hf_to_gguf for mmproj export.
Align EXAONE4 tensor registration with EXAONE_MOE for NextN/MTP slots and avoid skip-flag propagation on duplicated rope_freqs so model loading succeeds for EXAONE 4.5 GGUF.
|
please rebase before requesting a review |
…llama-cpp/exaone4-5
| Kcur = clip_repeat_kv_heads(ctx0, Kcur, d_head, n_kv_head, n_head, n_patches); | ||
| Vcur = clip_repeat_kv_heads(ctx0, Vcur, d_head, n_kv_head, n_head, n_patches); |
There was a problem hiding this comment.
this should be redundant, ggml support broadcasting automatically
| window_idx = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_pos / 4); | ||
| ggml_set_name(window_idx, "window_idx"); | ||
| ggml_set_input(window_idx); | ||
|
|
There was a problem hiding this comment.
move this to top. any inputs should be defined on top of cgraph
| hparams.n_merge = 2; | ||
| get_u32(KEY_SPATIAL_MERGE_SIZE, hparams.n_merge, false); | ||
| get_u32(KEY_WIN_ATTN_PATTERN, hparams.n_wa_pattern, false); | ||
| hparams.set_limit_image_tokens(8, 4096); |
There was a problem hiding this comment.
set_limit_image_tokens is only used by old model. if you already know the min/max pixels supported, write it to GGUF metadata instead
| get_u32(KEY_N_HEAD_KV, hparams.n_kv_head, false); | ||
| if (hparams.n_kv_head <= 0) { | ||
| hparams.n_kv_head = 8; | ||
| } |
There was a problem hiding this comment.
I see now reasons for this being an optional param. GGUF must have it, otherwise it's a faulty file
| get_u32(KEY_N_HEAD_KV, hparams.n_kv_head, false); | |
| if (hparams.n_kv_head <= 0) { | |
| hparams.n_kv_head = 8; | |
| } | |
| get_u32(KEY_N_HEAD_KV, hparams.n_kv_head); |
There was a problem hiding this comment.
please specify exactly how this is different from qwen2, so that we can merge 2 models into one file in the future
| remapper = { | ||
| "mtp.fc": "model.layers.{bid}.eh_proj", | ||
| "mtp.pre_fc_norm_embedding": "model.layers.{bid}.enorm", | ||
| "mtp.pre_fc_norm_hidden": "model.layers.{bid}.hnorm", | ||
| "mtp.norm": "model.layers.{bid}.shared_head.norm", |
There was a problem hiding this comment.
please use proper tensor_mapping
| if name.startswith("visual.") and ".qkv." in name: | ||
| assert self.hparams_vision is not None | ||
| hv = self.hparams_vision | ||
| n_heads = hv["num_heads"] | ||
| n_kv = int(hv.get("num_key_value_heads", n_heads)) | ||
| hidden = hv["hidden_size"] | ||
| head_dim = hidden // n_heads | ||
| q_dim = n_heads * head_dim | ||
| kv_dim = n_kv * head_dim | ||
| total_out = q_dim + 2 * kv_dim | ||
| out_dim = data_torch.shape[0] | ||
| if out_dim != total_out: | ||
| raise ValueError(f"EXAONE 4.5 vision qkv out dim mismatch: got {out_dim}, expected {total_out} ({name})") | ||
| wq = data_torch[:q_dim] | ||
| wk = data_torch[q_dim : q_dim + kv_dim] | ||
| wv = data_torch[q_dim + kv_dim :] | ||
| nq = name.replace("qkv", "q", 1) | ||
| nk = name.replace("qkv", "k", 1) | ||
| nv = name.replace("qkv", "v", 1) | ||
| yield from ModelBase.modify_tensors(self, wq, nq, bid) | ||
| yield from ModelBase.modify_tensors(self, wk, nk, bid) | ||
| yield from ModelBase.modify_tensors(self, wv, nv, bid) |
There was a problem hiding this comment.
we should not split qkv, instead, use ggml_view to split the result (not the weight)
see build_vit in clip.cpp for an example
Overview
Add support for the EXAONE 4.5 architecture for the EXAONE 4.5 model released by LG AI Research.
Additional information
This PR adds the modeling code for EXAONE 4.5, which uses the same LLM architecture as EXAONE 4.
It also adds
n_kv_headsto the CLIP model to make the ViT compatible with the GQA structure.Requirements
YES. The modeling code was implemented with the help of an AI assistant.