[contrib] Add Bloom-1b7 NeuronX port by dhwanw · Pull Request #76 · aws-neuron/neuronx-distributed-inference

dhwanw · 2026-03-17T21:16:47Z

Description

NeuronX Distributed Inference port of bigscience/bloom-1b7, a 1.7B-parameter decoder-only transformer. Bloom uses ALiBi positional encoding (no position embeddings), fused interleaved QKV weights, embedding LayerNorm, and weight-tied lm_head. ALiBi bias is computed and added in custom perform_prefill and compute_for_token_gen overrides.

Model Information

Model Name: Bloom-1b7
Model Architecture: Decoder-only transformer with ALiBi attention, 16 MHA heads, 24 layers, LayerNorm (not RMSNorm), embedding LayerNorm, GELU (tanh approximation), tied embeddings
Purpose: Multilingual text generation

Checklist

Required Components

Accuracy Test (test/integration/test_model.py)
- Validates model generation and coherence
- Performance benchmarks (TTFT, throughput)
- Test can compile and run the model on Neuron
README.md with the following sections:
- Usage Example: Clear code example showing how to use the model
- Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
- Example Checkpoints: Links to compatible model checkpoints
- Testing Instructions: Command to run the test suite for the model
Source Code (src/)
- Modeling code following NxD Inference patterns

Optional Components

Unit Tests (CPU or Neuron-based)

Folder Structure

/contrib/models/bloom-1b7/
  README.md
  /src
    modeling_bloom.py
  /test
    /integration
      test_model.py

Testing

Model was compiled and tested with TP=1, batch_size=1, seq_len=128, bfloat16 on trn1.32xlarge.

Test Results:

Test	Status	Result
Smoke Test	✅ PASS	Model loads successfully
Greedy Token Matching	✅ PASS	76.9% average (vs HF bfloat16 reference)
Throughput	✅ PASS	63.3 tok/s

Compatibility

Tested with:

Neuron SDK Version(s): 2.22
Instance Type(s): trn1.32xlarge
PyTorch Version: 2.9
Python Version: 3.10
Configuration: TP=1, batch_size=1, seq_len=128, bfloat16

Additional Information

ALiBi attention: No position embeddings. Per-head slopes are used to compute position bias added to attention scores. Implemented via custom perform_prefill and compute_for_token_gen overrides.
Fused QKV: HF checkpoint stores interleaved [num_heads, 3, head_dim, hidden_size] QKV weights, split into separate Q, K, V during weight conversion.
Embedding LayerNorm: Applied after token embeddings via get_model_output override.
Weight tying: LM head tied to token embeddings.

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

NeuronX port of BigScience Bloom-1b7 (1.7B params) with ALiBi attention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use consistent CE/TG column table format across all contrib models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…andardize README - Add test/__init__.py, test/unit/__init__.py, test/integration/__init__.py - Rewrite README to match standard contrib format (Model Information, Architecture Details table, Validation Results table, Usage, Compatibility Matrix, Testing, Maintainer) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw and others added 4 commits March 6, 2026 17:29

Add Bloom-1b7 contrib model with ALiBi attention support

a2bcf39

NeuronX port of BigScience Bloom-1b7 (1.7B params) with ALiBi attention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add performance metrics to README

036249e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Standardize Performance section format in README

b06c52b

Use consistent CE/TG column table format across all contrib models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dhwanw marked this pull request as ready for review March 19, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[contrib] Add Bloom-1b7 NeuronX port#76

[contrib] Add Bloom-1b7 NeuronX port#76
dhwanw wants to merge 4 commits intomainfrom
contrib/bloom-1b7

dhwanw commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhwanw commented Mar 17, 2026

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant