Skip to content

Add GLM-4-9B-0414 contrib model port#90

Open
dhwanw wants to merge 3 commits intomainfrom
contrib/GLM-4-9B-0414
Open

Add GLM-4-9B-0414 contrib model port#90
dhwanw wants to merge 3 commits intomainfrom
contrib/GLM-4-9B-0414

Conversation

@dhwanw
Copy link

@dhwanw dhwanw commented Mar 18, 2026

Description

Adds NeuronX port of GLM-4-9B-0414 (model_type=glm4) to the contrib models collection.

Model Information

Field Value
Model zai-org/GLM-4-9B-0414
Architecture Glm4ForCausalLM (decoder-only, 4 RMSNorm layers per block)
Parameters 9B
TP Degree 2
Precision BF16

Checklist

  • Model compiles successfully on Neuron
  • Token matching validated (69.38% greedy, 98.12% teacher-forced)
  • Performance profiled (12.3 tok/s)
  • README with architecture details, usage, validation results
  • Integration tests included

Folder Structure

contrib/models/GLM-4-9B-0414/
├── README.md
├── src/
│   ├── __init__.py
│   └── modeling_glm4.py
└── test/
    └── integration/
        └── test_model.py

Testing

  • Token Match (greedy): 69.38% (10 prompts, 32 tokens each)
  • Token Match (teacher-forced): 98.12%
  • Throughput: 12.3 tok/s (TP=2, BS=1, seq_len=128)

Note: Uses custom Glm4NeuronConfig (fused_qkv=True, attn_cls=NeuronGlm4Attention). This is a different model_type ("glm4") from glm-4-9b-chat-hf ("glm") with 4 RMSNorm layers per block instead of 2.

Compatibility

  • Neuron SDK: 2.22+
  • Instance: trn1.32xlarge

🤖 Generated with Claude Code

dhwanw and others added 3 commits March 18, 2026 23:18
GLM-4-9B-0414 is architecturally distinct from the existing glm-4-9b-chat-hf
(model_type=glm): it uses 4 RMSNorm layers per decoder block instead of 2,
with post_self_attn_layernorm and post_mlp_layernorm.

Validation results (tp=2, bs=1, seq_len=2048, bf16):
- Teacher-forced accuracy: 98.44%
- Greedy accuracy: 67.81%
- Throughput: 2.4 tokens/sec

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
98.12% teacher-forced, 69.38% greedy (10 prompts, 32 tokens each).
Document requirement to use Glm4NeuronConfig (fused_qkv=True) instead
of base NeuronConfig to avoid KeyError during weight sharding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhwanw dhwanw force-pushed the contrib/GLM-4-9B-0414 branch from 2ab9bdd to bdaa3b9 Compare March 18, 2026 23:19
@dhwanw dhwanw marked this pull request as ready for review March 19, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant