English → French neural machine translation built from scratch in CuPy.
- Encoder: 2-layer bidirectional GRU
- Decoder: 2-layer GRU with Bahdanau attention
- Tokenization: BPE (1000 merges)
- Optimizer: Adam with gradient clipping
- Implementation: Pure CuPy (no PyTorch/TensorFlow)
All layers, backpropagation, and attention mechanisms implemented manually.
- Implementing GRU cells and recurrent backpropagation through time
- Building attention mechanisms from scratch (projection, alignment scoring, context vectors)
- Manual gradient computation for complex architectures
- BPE tokenization and optimization (reduced O(n²) complexity with caching)
- Training dynamics: gradient clipping, learning rate scheduling, checkpoint management
- GPU programming with CuPy for deep learning primitives
python -m src.dataDownloads fra-eng corpus, trains BPE, builds vocabularies, and preprocesses ~163k sentence pairs.
python -m src.trainTrains for 5000 iterations (~20-30 min on GPU). Saves checkpoints to models/seq2seq.pkl.
python -m src.evalTranslates test phrases using the trained model.
pip install numpy cupy d2l torchRequires CUDA-compatible GPU.