A deep learning model for predicting peptide-MHC class I binding using a two-tower CNN architecture.
This project implements a neural network model to predict the binding affinity between peptides and Major Histocompatibility Complex (MHC) class I molecules. The model uses a two-tower architecture that separately processes peptide sequences and MHC pseudo-sequences before combining them for prediction.
- Two-tower CNN architecture for peptide and MHC sequence processing
- Baseline MLP model for comparison
- 5-fold cross-validation support
- Comprehensive evaluation metrics
- One-hot encoding for peptide sequences
- Embedding layers for MHC pseudo-sequences
.
├── two_tower_cnn.py # Main model implementation
├── allele_lookup_fixer.py # Utility for processing MHC allele data
├── mhci_eda_analysis.ipynb # Exploratory data analysis notebook
├── requirements.txt # Python dependencies
├── mhc_lookup.json # MHC allele to pseudo-sequence mapping
├── MHC_pseudo.dat # MHC pseudo-sequence data
├── netmhcpan41_data/ # Training and test data
│ ├── fold_0.csv
│ ├── fold_1.csv
│ ├── fold_2.csv
│ ├── fold_3.csv
│ ├── fold_4.csv
│ └── test.csv
└── pr_curves/ # Precision-recall curve visualizations
- Clone the repository:
git clone https://github.com/rubentium/Peptide-MHC_Binding_Classifier.git
cd Peptide-MHC_Binding_Classifier- Install dependencies:
pip install -r requirements.txtpython two_tower_cnn.py --model two_tower_cnnpython two_tower_cnn.py --model baseline--model: Choose betweentwo_tower_cnn(default) orbaseline- Additional configuration can be modified within the script
- Peptide tower: 1D convolutional layers processing one-hot encoded peptide sequences
- MHC tower: Processes the 34-residue MHC pseudo-sequences through an embedding layer and an MLP
- Combined features passed through fully connected layers for binary classification
- Simple multi-layer perceptron for comparison
- Concatenates peptide and MHC features
- Fully connected linear layers with dropout
The model expects CSV files with the following columns:
peptide: Amino acid sequence of the peptideallele: MHC allele identifierlabel: Binary label (0 or 1) indicating binding
Model performance is evaluated using:
- Precision-Recall Area Under Curve (PR-AUC)
- Precision-Recall curves saved to
pr_curves/directory
This project is available for academic and research purposes.
Based on the NetMHCpan 4.1 dataset for MHC-peptide binding prediction.