Seq2Seq Neural Machine Translation

This repository contains a Sequence-to-Sequence (Seq2Seq) neural machine translation model for translating from French to English, along with inference utilities.

Files

seq2seq.ipynb - Main training notebook for the Seq2Seq model with attention mechanism
inference.ipynb - Inference notebook for generating translations using trained models
fra.csv - Test Data for inference
tatoeba/eng-fra.txt - Train Data from Tatoeba with english and french sentences

Project Overview

This project implements a neural machine translation system using:

Encoder-Decoder Architecture with attention mechanism
Bidirectional GRU encoder
GRU-based Decoder with attention

The model is trained on French-English parallel text data and supports both training and inference workflows.

Requirements

Python 3.7+
PyTorch
pandas
tqdm

Install dependencies:

pip install torch pandas tqdm

Usage

Training (seq2seq.ipynb)

Prepare training data in TSV format (source \t target)
Configure hyperparameters:
- Embedding dimension
- Hidden dimension
- Number of layers
- Dropout rate
- Learning rate
- Batch size
Run the notebook cells to:
- Load and preprocess data
- Build vocabularies
- Initialize and train the model
- Save trained weights

Inference (inference.ipynb)

Load pre-trained model weights and vocabularies
Provide input sentences for translation
Generate translations using beam search or greedy decoding

Example:

input_sentence = "Comment allez-vous?"
translation = translate(input_sentence)
print(translation)  # "How are you?"

Data Format

Training data should be in TSV format:

French sentence	English sentence
Bonjour	Hello
Comment allez-vous?	How are you?

Model Architecture

Encoder

Embedding layer
Bidirectional GRU (2 layers)
Linear projection layer

Decoder

Embedding layer
GRU with attention
Output projection layer

Attention Mechanism

Luong-style attention
Scores computed over encoder outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seq2Seq Neural Machine Translation

Files

Project Overview

Requirements

Usage

Training (seq2seq.ipynb)

Inference (inference.ipynb)

Data Format

Model Architecture

Encoder

Decoder

Attention Mechanism

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tatoeba		tatoeba
README.md		README.md
fra.csv		fra.csv
inference.ipynb		inference.ipynb
seq2seq.ipynb		seq2seq.ipynb

Folders and files

Latest commit

History

Repository files navigation

Seq2Seq Neural Machine Translation

Files

Project Overview

Requirements

Usage

Training (seq2seq.ipynb)

Inference (inference.ipynb)

Data Format

Model Architecture

Encoder

Decoder

Attention Mechanism

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages