Add GLM-4-9B-0414 contrib model port by dhwanw · Pull Request #90 · aws-neuron/neuronx-distributed-inference

dhwanw · 2026-03-18T22:16:31Z

Description

Adds NeuronX port of GLM-4-9B-0414 (model_type=glm4) to the contrib models collection.

Model Information

Field	Value
Model	zai-org/GLM-4-9B-0414
Architecture	Glm4ForCausalLM (decoder-only, 4 RMSNorm layers per block)
Parameters	9B
TP Degree	2
Precision	BF16

Checklist

Model compiles successfully on Neuron
Token matching validated (69.38% greedy, 98.12% teacher-forced)
Performance profiled (12.3 tok/s)
README with architecture details, usage, validation results
Integration tests included

Folder Structure

contrib/models/GLM-4-9B-0414/
├── README.md
├── src/
│   ├── __init__.py
│   └── modeling_glm4.py
└── test/
    └── integration/
        └── test_model.py

Testing

Token Match (greedy): 69.38% (10 prompts, 32 tokens each)
Token Match (teacher-forced): 98.12%
Throughput: 12.3 tok/s (TP=2, BS=1, seq_len=128)

Note: Uses custom Glm4NeuronConfig (fused_qkv=True, attn_cls=NeuronGlm4Attention). This is a different model_type ("glm4") from glm-4-9b-chat-hf ("glm") with 4 RMSNorm layers per block instead of 2.

Compatibility

Neuron SDK: 2.22+
Instance: trn1.32xlarge

🤖 Generated with Claude Code

GLM-4-9B-0414 is architecturally distinct from the existing glm-4-9b-chat-hf (model_type=glm): it uses 4 RMSNorm layers per decoder block instead of 2, with post_self_attn_layernorm and post_mlp_layernorm. Validation results (tp=2, bs=1, seq_len=2048, bf16): - Teacher-forced accuracy: 98.44% - Greedy accuracy: 67.81% - Throughput: 2.4 tokens/sec Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

98.12% teacher-forced, 69.38% greedy (10 prompts, 32 tokens each). Document requirement to use Glm4NeuronConfig (fused_qkv=True) instead of base NeuronConfig to avoid KeyError during weight sharding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw and others added 3 commits March 18, 2026 23:18

Add performance profiling metrics to README

bdaa3b9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw force-pushed the contrib/GLM-4-9B-0414 branch from 2ab9bdd to bdaa3b9 Compare March 18, 2026 23:19

dhwanw marked this pull request as ready for review March 19, 2026 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM-4-9B-0414 contrib model port#90

Add GLM-4-9B-0414 contrib model port#90
dhwanw wants to merge 3 commits intomainfrom
contrib/GLM-4-9B-0414

dhwanw commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhwanw commented Mar 18, 2026

Description

Model Information

Checklist

Folder Structure

Testing

Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant