Skip to content

[contrib] Add CodeGen-350M-mono NeuronX port#74

Open
dhwanw wants to merge 3 commits intomainfrom
contrib/codegen-350M-mono
Open

[contrib] Add CodeGen-350M-mono NeuronX port#74
dhwanw wants to merge 3 commits intomainfrom
contrib/codegen-350M-mono

Conversation

@dhwanw
Copy link

@dhwanw dhwanw commented Mar 17, 2026

Description

NeuronX Distributed Inference port of Salesforce/codegen-350M-mono, a 350M-parameter decoder-only transformer for code generation. CodeGen uses GPT-J-style architecture with partial RoPE (32/64 dims with interleaved rotation), parallel residual connections (attn + mlp + residual), and fused QKV with (Q, V, K) interleaved order requiring special weight decomposition.

Model Information

Model Name: CodeGen-350M-mono
Model Architecture: Decoder-only transformer (GPT-J variant) with partial RoPE, parallel residual connections, fused QKV, GELU-new activation, LayerNorm
Purpose: Code generation (monolingual Python)

Checklist

Required Components

  • Accuracy Test (test/integration/test_model.py)
    • Validates model generation and coherence
    • Performance benchmarks (TTFT, throughput)
    • Test can compile and run the model on Neuron
  • README.md with the following sections:
    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
    • Example Checkpoints: Links to compatible model checkpoints
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)
    • Modeling code following NxD Inference patterns

Optional Components

  • Unit Tests (CPU or Neuron-based)

Folder Structure

/contrib/models/codegen-350M-mono/
  README.md
  /src
    modeling_codegen.py
  /test
    /integration
      test_model.py

Testing

Model was compiled and tested with TP=1, batch_size=1, seq_len=128, bfloat16 on trn1.32xlarge.

Test Results:

Test Status Result
Smoke Test ✅ PASS Model loads successfully
Greedy Token Matching ✅ PASS 100% match on 14/30 prompts (54.48% avg across all prompts)
Teacher-Forced Match ✅ PASS 97.03% average
Throughput ✅ PASS 187.4 tok/s

Compatibility

Tested with:

  • Neuron SDK Version(s): 2.22
  • Instance Type(s): trn1.32xlarge
  • PyTorch Version: 2.9
  • Python Version: 3.10
  • Configuration: TP=1, batch_size=1, seq_len=128, bfloat16

Additional Information

  • Partial RoPE: Only 32 of 64 head dimensions use rotary embeddings, following GPT-J's interleaved rotation convention (not the standard LLaMA half-split).
  • Fused QKV: HuggingFace checkpoint uses fused qkv_proj with mp_num=4 and (Q, V, K) interleaved order -- the weight converter handles this decomposition.
  • Parallel residual: Attention and MLP both operate on the layer-normed input, and their outputs are summed with the residual.

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

dhwanw and others added 2 commits March 18, 2026 23:13
…upport

Salesforce CodeGen-350M-mono NeuronX port with GPT-J style partial rotary
embeddings (32/64 dims) and fused QKV weight decomposition (mp_num=4,
Q/V/K interleaved order). Validated at 100% greedy token match on 14/30
prompts (64 tokens each) and 97% teacher-forced match average.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhwanw dhwanw force-pushed the contrib/codegen-350M-mono branch from e930983 to 4f13617 Compare March 18, 2026 23:19
Use consistent CE/TG column table format across all contrib models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhwanw dhwanw marked this pull request as ready for review March 19, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant