[WIP] Add HyperCLOVAX model by bigshanedogg · Pull Request #44956 · huggingface/transformers

bigshanedogg · 2026-03-23T19:34:30Z

Draft PR — waiting for issue approval. This PR is opened alongside the issue request.
It will be marked ready for review after a maintainer gives the go-ahead on the issue.

What does this PR do?

Adds native Transformers support for HyperCLOVA X SEED Think 14B,
a 14.74B-parameter Korean reasoning LLM developed by NAVER Cloud.

related issue: Add HyperCLOVA X SEED Think 14B #44957

Architecture

LLaMA-style decoder-only transformer with two modifications:

Peri-Layer Normalization (use_post_norm): an extra RMSNorm is applied after each
sub-layer output (both attention and MLP), in addition to the standard pre-norm.
Maximal Update Parametrization (μP): four per-config scaling factors replace fixed constants:
- attention_multiplier — replaces 1/sqrt(head_dim) in attention
- residual_multiplier — scales each sub-layer output before adding to the residual stream
- embedding_multiplier — scales the token embedding output
- logits_scaling — scales final logits before softmax / sampling

Implementation approach

Following the maintainer's guidance in #44957, this PR uses the modular system (modular_hyperclovax.py) to minimise LOC and make the diff easy to review-iterate. (Roughly 59% of lines are generated rather than manually maintained.)

The maintainer suggested inheriting the decoder layer with post-norms from GLM4. After evaluation, Granite was chosen as the decoder layer base instead, for the following reasons:

use_post_norm is optional (False by default). GLM4's decoder layer has post-norms always on — inheriting from it would require logic to conditionally disable post_self_attn_layernorm / post_mlp_layernorm, adding complexity rather than reducing it.
Granite's decoder layer already provides residual_multiplier (always-active MuP). When use_post_norm=False, HyperCLOVAXDecoderLayer is identical to GraniteDecoderLayer — zero extra code.
Using GLM4 would require adding both residual_multiplier and conditionally disabling its built-in norms — two changes in opposite directions for no net gain in code reuse.

All other modules (RMSNorm, MLP, Attention, etc.) are inherited from Granite unchanged. The modular file is a few hundred LOC as suggested.

(WIP) Benchmark validation

Tasks	Metric	vLLM	Huggingface (this PR) }
hellaswag (non-think)	acc_norm	0.6521	-
gsm8k (non-think)	flexible-extract	0.9151	-

External support

Huggingface hub: naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Technical report: arXiv 2506.22403
vLLM upstream: vllm-project/vllm#37107 (merged 2026-03-16)

Code Agent Policy

I confirm that this is not a pure code agent PR.

A code agent was used for mechanical tasks such as aligning docstrings and comments. The core implementation was written by the submitter directly, who has reviewed every changed line and personally run the tests including benchmark validation.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
- Add HyperCLOVA X SEED Think 14B #44957
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-29T17:19:53Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, hyperclovax

bigshanedogg mentioned this pull request Mar 23, 2026

Add HyperCLOVA X SEED Think 14B #44957

Open

2 tasks

DarkLight1337 mentioned this pull request Mar 28, 2026

[Transformers v5] HCXVisionForCausalLM vllm-project/vllm#38387

Open

This was referenced Mar 29, 2026

[Transformers v5] Vendor HCXVisionConfig for compatibility vllm-project/vllm#38447

Draft

AutoConfig.register() ignored when trust_remote_code=True and auto_map is present #45093

Open

feat: hyperclovax

ef1e73f

bigshanedogg force-pushed the feat/hyperclovax branch from b31ff44 to ef1e73f Compare March 29, 2026 13:51

docs: update model_doc and comment

6aa22bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add HyperCLOVAX model#44956

[WIP] Add HyperCLOVAX model#44956
bigshanedogg wants to merge 2 commits intohuggingface:mainfrom
bigshanedogg:feat/hyperclovax

bigshanedogg commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigshanedogg commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Architecture

Implementation approach

(WIP) Benchmark validation

External support

Code Agent Policy

Before submitting

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bigshanedogg commented Mar 23, 2026 •

edited

Loading