Skip to content

rubentium/Peptide-MHC_Binding_Classifier

Repository files navigation

Peptide-MHC Binding Classifier

A deep learning model for predicting peptide-MHC class I binding using a two-tower CNN architecture.

Overview

This project implements a neural network model to predict the binding affinity between peptides and Major Histocompatibility Complex (MHC) class I molecules. The model uses a two-tower architecture that separately processes peptide sequences and MHC pseudo-sequences before combining them for prediction.

Features

  • Two-tower CNN architecture for peptide and MHC sequence processing
  • Baseline MLP model for comparison
  • 5-fold cross-validation support
  • Comprehensive evaluation metrics
  • One-hot encoding for peptide sequences
  • Embedding layers for MHC pseudo-sequences

Project Structure

.
├── two_tower_cnn.py           # Main model implementation
├── allele_lookup_fixer.py     # Utility for processing MHC allele data
├── mhci_eda_analysis.ipynb    # Exploratory data analysis notebook
├── requirements.txt           # Python dependencies
├── mhc_lookup.json           # MHC allele to pseudo-sequence mapping
├── MHC_pseudo.dat            # MHC pseudo-sequence data
├── netmhcpan41_data/         # Training and test data
│   ├── fold_0.csv
│   ├── fold_1.csv
│   ├── fold_2.csv
│   ├── fold_3.csv
│   ├── fold_4.csv
│   └── test.csv
└── pr_curves/                # Precision-recall curve visualizations

Installation

  1. Clone the repository:
git clone https://github.com/rubentium/Peptide-MHC_Binding_Classifier.git
cd Peptide-MHC_Binding_Classifier
  1. Install dependencies:
pip install -r requirements.txt

Usage

Training the Two-Tower CNN Model

python two_tower_cnn.py --model two_tower_cnn

Training the Baseline MLP Model

python two_tower_cnn.py --model baseline

Command Line Arguments

  • --model: Choose between two_tower_cnn (default) or baseline
  • Additional configuration can be modified within the script

Model Architecture

Two-Tower CNN

  • Peptide tower: 1D convolutional layers processing one-hot encoded peptide sequences
  • MHC tower: Processes the 34-residue MHC pseudo-sequences through an embedding layer and an MLP
  • Combined features passed through fully connected layers for binary classification

Baseline MLP

  • Simple multi-layer perceptron for comparison
  • Concatenates peptide and MHC features
  • Fully connected linear layers with dropout

Data Format

The model expects CSV files with the following columns:

  • peptide: Amino acid sequence of the peptide
  • allele: MHC allele identifier
  • label: Binary label (0 or 1) indicating binding

Results

Model performance is evaluated using:

  • Precision-Recall Area Under Curve (PR-AUC)
  • Precision-Recall curves saved to pr_curves/ directory

License

This project is available for academic and research purposes.

Acknowledgments

Based on the NetMHCpan 4.1 dataset for MHC-peptide binding prediction.

About

Peptide-MHC Binding Classifier is a deep learning model that predicts whether a peptide will be presented by MHC Class I proteins. It uses a two-tower CNN architecture to generalize across alleles, enabling accurate, zero-shot immune recognition for immunotherapy design.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors