IPIAbDev is a highly flexible AI-ML Framework, designed for high-throughput prediction of antibody developability and biophysical properties, with a primary focus on polyreactivity (PSR) and SEC developability, while being extensible to SPR binder, HIC, etc. It integrates multiple antibody-specific protein language models (AbLang2, AntiBERTy, AntiBERTa2, AntiBERTa2-CSSP) for embedding generation and supports a diverse set of classifiers, including XGBoost, Random Forest, 1D-CNN with residual blocks, and Transformer architectures. Key features include automated embedding generation, HCDR3-cluster-stratified k-fold cross-validation to prevent data leakage, model training and prediction for binary classification tasks, and built-in interpretability via Integrated Gradients for residue-level attribution. The package also provides publication-ready visualization of ROC curves, performance metrics (AUC, accuracy, F1, precision, recall), and attribution heatmaps
This open source software was developed at The Antibody Platform, Institut for Protein Innovation,Boston, USA
Machine Learning Architect and Designer: Hoan Nguyen, PhD
Authors and Contact: {Hoan.Nguyen, Andre.Teixeira }@proteininnovation.org
Full heavy and light chain sequences, Heavy chain sequences, CDR3 sequences
Protein Lanuage model embedding, one hot encoding, sequence biophysical properties (molecular weight, charge, Isoelectric Point,..)
The ML data in figure (A,B) below was generated by IPIAbDev with Transformer-VH+VL+HCDR3 one hot encoding and public dataset #1 from (HT Chen, 2024)
git clone https://github.com/proteininnovation/IPIAbdev.git
#Create a new environment with Python 3.11 or 3.12 conda create -n ml python=3.11 -y conda activate ml
#python package install conda install -c bioconda anarci pip install -r requirements.txt
filename: your_trainset_name.xlsx required columns: BARCODE,CDR3,HSEQ,LSEQ, any_biophysical_properties_column as sec_filter,psr_filter ,spr_filter any_biophysical_properties_column : this label should be annotated as 1 (pass/positive) or 0 (fail or negative)
python predict_developability.py --build-embedding data/test.xlsx --lm all
python predict_developability.py --build-embedding data/test.xlsx --lm ablang
python predict_developability.py --kfold 10 --target sec_filter --lm antiberta2 --model xgboost --db data/ipi_antibody.xlsx
python predict_developability.py --kfold 10 --target sec_filter --lm antiberty --model rf --db data/ipi_antibody.xlsx
python predict_developability.py --kfold 10 --target sec_filter --lm antiberta2 --model cnn --db data/ipi_antibody.xlsx
python predict_developability.py --kfold 10 --target sec_filter --lm antiberty --model transformer_lm --db data/ipi_antibody.xlsx
python predict_developability.py --kfold 10 --target sec_filter --lm onehot --model transformer_onehot --db data/ipi_antibody.xlsx
#SEC trainning
python predict_developability.py --train --target sec_filter --lm antiberta2 --model xgboost --db data/ipi_antibody.xlsx
#PSR trainning
python predict_developability.py --train --target sec_filter --lm antiberta2 --model xgboost --db data/ipi_antibody.xlsx
python predict_developability.py --predict data/new_lib.xlsx --target sec_filter --lm antiberta2
python predict_developability.py --predict data/test.xlsx --target sec_filter --lm ablang --model cnn





