Skip to content

Add Aria text decoder (MoE) contrib model port#88

Open
dhwanw wants to merge 3 commits intomainfrom
contrib/Aria
Open

Add Aria text decoder (MoE) contrib model port#88
dhwanw wants to merge 3 commits intomainfrom
contrib/Aria

Conversation

@dhwanw
Copy link

@dhwanw dhwanw commented Mar 18, 2026

Description

Adds NeuronX port of the Aria text decoder (MoE) to the contrib models collection.

Model Information

Field Value
Model rhymes-ai/Aria
Architecture MoE Decoder-only transformer (LLaMA-based)
Parameters ~3.9B total (text decoder), 64 routed experts, top-6 routing
TP Degree 8
Precision BF16

Checklist

  • Model compiles successfully on Neuron
  • Token matching validated (72.81% greedy, 98.12% teacher-forced)
  • Performance profiled (4.2 tok/s)
  • README with architecture details, usage, validation results
  • Integration tests included

Folder Structure

contrib/models/Aria/
├── README.md
├── src/
│   ├── __init__.py
│   └── modeling_aria_text.py
└── test/
    └── integration/
        └── test_model.py

Testing

  • Token Match (greedy): 72.81% (10 prompts, 32 tokens each)
  • Token Match (teacher-forced): 98.12%
  • Throughput: 4.2 tok/s (TP=8, BS=1, seq_len=512)

Note: This port implements only the text decoder. Greedy divergence is expected for MoE models in bf16 due to cascading expert routing differences.

Compatibility

  • Neuron SDK: 2.22+
  • Instance: trn1.32xlarge

🤖 Generated with Claude Code

dhwanw and others added 3 commits March 17, 2026 17:50
Aria is a LLaMA-based MoE model with 64 routed experts (top-6) and
2 shared experts. Uses NXDI's standard moe_v2 infrastructure with
initialize_moe_module for both routed and shared experts.

Validated on trn1.32xlarge: inference test passed (3/3 prompts coherent),
4.2 tokens/sec throughput at tp=8, batch=1, seq_len=512, bf16.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
HF golden uses full AriaForConditionalGeneration with text-only input.
10 prompts, 32 tokens each. Greedy 72.81%, teacher-forced 98.12%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhwanw dhwanw marked this pull request as ready for review March 19, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant