Skip to content
This repository was archived by the owner on Oct 14, 2025. It is now read-only.

Makes Llama checkpoint convertion compatible with fused up/gate projection#26

Open
evellasques wants to merge 1 commit intoaws-neuron:mainfrom
evellasques:fix_llama_converters
Open

Makes Llama checkpoint convertion compatible with fused up/gate projection#26
evellasques wants to merge 1 commit intoaws-neuron:mainfrom
evellasques:fix_llama_converters

Conversation

@evellasques
Copy link
Copy Markdown
Contributor

Issue #, 24

Description of changes:

Recent merging of up/down projection in Llama requires the equivalent merging in the HF to NeMo conversion scripts (and subsequent splitting in the NeMo to HF script).

This change fixes that for the following converters:

  • convert_nemo_checkpoint_to_hf_llama.py
  • convert_hf_checkpoint_to_nemo_llama.py
  • convert_hf_checkpoint_to_nemo_llama_70b.py

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@amithrm
Copy link
Copy Markdown
Contributor

amithrm commented Apr 22, 2024

Thanks @evellasques for the PR. Going over it!

"self_attention.core_attention.rotary_emb.inv_freq": (0, "self_attn.rotary_emb.inv_freq", None, 0),
"mlp.dense_h_to_4h.weight": (1, "mlp.gate_proj.weight", 0, 0),
"mlp.dense_h_to_4h_2.weight": (1, "mlp.up_proj.weight", 0, 0),
"mlp.dense_h_to_4h.weight": (1, "mlp.gate_proj_up_proj.weight", 0, 0),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why considering "gate" and "up" proj to be fused for HF checkpoint? Shouldn't you split them from nemo checkpoint instead and then save as separate "gate" and "up" params for HF?

hf_model[hf_key_q], hf_model[hf_key_k], hf_model[hf_key_v] = torch.split(hf_model[hf_key], size_per_seg, dim=0)
hf_model.pop(hf_key)

if "dense_h_to_4h" in k:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not accurate. "gate" and "proj" fusion is per tp rank in the nemo checkpoint. So, you can't first concatenate all tps and then split to "gate" and "proj". Instead you should split them for each tp rank.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants