Skip to content

[contrib] Add Qwen2-Audio-7B NeuronX port#99

Draft
lutfanm-aws wants to merge 1 commit intomainfrom
contrib/Qwen2-Audio-7B
Draft

[contrib] Add Qwen2-Audio-7B NeuronX port#99
lutfanm-aws wants to merge 1 commit intomainfrom
contrib/Qwen2-Audio-7B

Conversation

@lutfanm-aws
Copy link

Summary

  • Adds NeuronX Distributed Inference implementation of Qwen/Qwen2-Audio-7B
  • Multimodal audio-to-text model (~8.2B params): audio encoder + Qwen2 7B language model
  • Both audio encoder and language model run entirely on Neuron hardware
  • Validated with speech transcription, audio captioning, and text-only generation

Model Details

  • Architecture: Multimodal encoder-decoder (Whisper-style audio encoder + Qwen2 decoder)
  • Parameters: ~8.2B (audio encoder ~600M + LM ~7.6B)
  • TP Degree: 2
  • Precision: BF16

Validation

  • Speech transcription: exact match ✅
  • Audio captioning: correct caption ✅
  • Text-only generation: correct response ✅
  • Configuration: TP=2, batch_size=1, seq_len=1024

Performance

  • Token generation: 15-16 tok/s on trn1.32xlarge (TP=2)
  • Audio encoding: ~60ms for 3-4s audio

Files

  • contrib/models/Qwen2-Audio-7B/src/modeling_qwen2_audio.py — Model implementation
  • contrib/models/Qwen2-Audio-7B/src/configuration_qwen2_audio.py — Config classes
  • contrib/models/Qwen2-Audio-7B/test/ — Integration tests
  • contrib/models/Qwen2-Audio-7B/README.md — Documentation

Multimodal audio-to-text model (~8.2B params) with audio encoder and Qwen2 7B
language model running entirely on Neuron hardware. Supports audio captioning,
speech transcription, and text-only generation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant