Skip to content

Ngoni-M/TabPFN_project

Repository files navigation

TB Classification with TabPFN and Traditional ML Models

Project Overview

Title: Comparing traditional machine learning algorithms with a transformer-based model (TabPFN) for the prediction of health outcomes

This project compares TabPFN, a pre-trained transformer for tabular data, to traditional machine learning models (LightGBM, glmnet) in predicting tuberculosis (TB) status from host analytes.

We evaluate performance using:

  • ROC curves and AUC
  • Accuracy, Sensitivity, Specificity, Balanced Accuracy

The comparison is performed for:

  • All 22 analytes
  • Top 3 analytes selected using information gain

Dataset details:

  • Concentrations of TB biomarkers measured using the Luminex assay
  • Clinical dataset (patient-level data)
  • Binary classification: patient is TB positive or TB negative
  • ML algorithms: Elastic Net Logistic Regression (glmnet), LightGBM, TabPFN (transformer-based)
  • Purpose: Compare performance of traditional ML vs transformer-based models
  • Notes: Clinical dataset not shared publicly; code can run on synthetic or similar datasets

AIM

Compare TabPFN to traditional ML models in predicting TB status from host analytes, using ROC, AUC, and balanced accuracy as performance metrics.

Workflow

  1. Load TB dataset and analyte information
  2. Split data into training and test sets
  3. Preprocess data:
    • Scale features
    • Apply SMOTE to balance classes
  4. Select features for analysis:
    • All 22 analytes
    • Top 3 analytes (based on information gain)
  5. Train traditional ML models on training data:
    • LightGBM
    • glmnet
    • Rpart (optional)
  6. Tune hyperparameters using nested cross-validation
  7. Prepare TabPFN inputs using training/test sets
  8. Train TabPFN classifier on same training data
  9. Generate predictions and probabilities for all models
  10. Evaluate performance:
    • ROC curves and AUC
    • Accuracy, Sensitivity, Specificity, Balanced Accuracy
  11. Compare TabPFN to traditional ML models:
    • Plot ROC curves together
    • Summarize performance metrics
  12. Save processed datasets, model objects, and prediction outputs
image image

About

Reproducible pipelines for TB classification using TabPFN and traditional machine learning models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors