Add Aria text decoder (MoE) contrib model port by dhwanw · Pull Request #88 · aws-neuron/neuronx-distributed-inference

dhwanw · 2026-03-18T02:34:49Z

Description

Adds NeuronX port of the Aria text decoder (MoE) to the contrib models collection.

Model Information

Field	Value
Model	rhymes-ai/Aria
Architecture	MoE Decoder-only transformer (LLaMA-based)
Parameters	~3.9B total (text decoder), 64 routed experts, top-6 routing
TP Degree	8
Precision	BF16

Checklist

Model compiles successfully on Neuron
Token matching validated (72.81% greedy, 98.12% teacher-forced)
Performance profiled (4.2 tok/s)
README with architecture details, usage, validation results
Integration tests included

Folder Structure

contrib/models/Aria/
├── README.md
├── src/
│   ├── __init__.py
│   └── modeling_aria_text.py
└── test/
    └── integration/
        └── test_model.py

Testing

Token Match (greedy): 72.81% (10 prompts, 32 tokens each)
Token Match (teacher-forced): 98.12%
Throughput: 4.2 tok/s (TP=8, BS=1, seq_len=512)

Note: This port implements only the text decoder. Greedy divergence is expected for MoE models in bf16 due to cascading expert routing differences.

Compatibility

Neuron SDK: 2.22+
Instance: trn1.32xlarge

🤖 Generated with Claude Code

Aria is a LLaMA-based MoE model with 64 routed experts (top-6) and 2 shared experts. Uses NXDI's standard moe_v2 infrastructure with initialize_moe_module for both routed and shared experts. Validated on trn1.32xlarge: inference test passed (3/3 prompts coherent), 4.2 tokens/sec throughput at tp=8, batch=1, seq_len=512, bf16. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

HF golden uses full AriaForConditionalGeneration with text-only input. 10 prompts, 32 tokens each. Greedy 72.81%, teacher-forced 98.12%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw and others added 3 commits March 17, 2026 17:50

Update Aria README with token match results (98.12% TF)

90ff63e

HF golden uses full AriaForConditionalGeneration with text-only input. 10 prompts, 32 tokens each. Greedy 72.81%, teacher-forced 98.12%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add performance profiling metrics to README

16de31d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw marked this pull request as ready for review March 19, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Aria text decoder (MoE) contrib model port#88

Add Aria text decoder (MoE) contrib model port#88
dhwanw wants to merge 3 commits intomainfrom
contrib/Aria

dhwanw commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhwanw commented Mar 18, 2026

Description

Model Information

Checklist

Folder Structure

Testing

Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant