This project implements a Transformer model architecture from scratch, directly onto an FPGA.
Each essential block of the Transformer is modularized in Verilog, focusing on matrix processing, parallelism, and hardware efficiency.
| Module Name | Description |
|---|---|
PositionalEncoding.v |
Adds positional information to input embeddings. |
encoder_layer.v |
Complete encoder layer: attention + feedforward. |
feedforward.v |
Feedforward neural network layer. |
layer_normalisation.v |
Normalizes activations for stable training. |
masked_multi_attention_head.v |
Implements masked multi-head self-attention. |
multi-attention_layer.v |
Implements standard multi-head self-attention. |
-
Matrix Processing Acceleration
All major computations (dot products, matrix multiplications, additions) are optimized and parallelized for FPGA execution. -
Modular Layer Design
Each Transformer component (attention, feedforward, normalization, positional encoding) is separately implemented for flexibility and testing. -
Hardware Efficiency
Design focuses on pipelining, parallelism, and resource optimization for real-time inference applications. -
Scalability
The design allows easy scaling of the number of attention heads, model dimension size, and layer stacking.
- Input Data is passed with positional encoding.
- Multi-Head Attention computes context vectors.
- Layer Normalization ensures stability and faster convergence.
- Feedforward Networks add non-linearity and learning capacity.
- Stacked Encoder Layers enable deeper feature extraction.
- Matrix Computations are heavily optimized using FPGA-specific design patterns (e.g., parallel MAC units, dataflow pipelining).
- Real-Time NLP Acceleration on Edge Devices
- Low-Latency Inference for Vision Transformers (ViTs)
- Energy-Efficient Deep Learning Processing
- Autonomous Systems, Medical Imaging, and Surveillance
- Extend to full Transformer encoder-decoder design.
- Implement dynamic quantization for further resource optimization.
- Integrate AXI interfaces for easy SoC integration.
This project is licensed under the MIT License.
Feel free to reach out for collaborations or just a friendly hello!