TB Classification with TabPFN and Traditional ML Models

Project Overview

Title: Comparing traditional machine learning algorithms with a transformer-based model (TabPFN) for the prediction of health outcomes

This project compares TabPFN, a pre-trained transformer for tabular data, to traditional machine learning models (LightGBM, glmnet) in predicting tuberculosis (TB) status from host analytes.

We evaluate performance using:

ROC curves and AUC
Accuracy, Sensitivity, Specificity, Balanced Accuracy

The comparison is performed for:

All 22 analytes
Top 3 analytes selected using information gain

Dataset details:

Concentrations of TB biomarkers measured using the Luminex assay
Clinical dataset (patient-level data)
Binary classification: patient is TB positive or TB negative
ML algorithms: Elastic Net Logistic Regression (glmnet), LightGBM, TabPFN (transformer-based)
Purpose: Compare performance of traditional ML vs transformer-based models
Notes: Clinical dataset not shared publicly; code can run on synthetic or similar datasets

AIM

Compare TabPFN to traditional ML models in predicting TB status from host analytes, using ROC, AUC, and balanced accuracy as performance metrics.

Workflow

Load TB dataset and analyte information
Split data into training and test sets
Preprocess data:
- Scale features
- Apply SMOTE to balance classes
Select features for analysis:
- All 22 analytes
- Top 3 analytes (based on information gain)
Train traditional ML models on training data:
- LightGBM
- glmnet
- Rpart (optional)
Tune hyperparameters using nested cross-validation
Prepare TabPFN inputs using training/test sets
Train TabPFN classifier on same training data
Generate predictions and probabilities for all models
Evaluate performance:
- ROC curves and AUC
- Accuracy, Sensitivity, Specificity, Balanced Accuracy
Compare TabPFN to traditional ML models:
- Plot ROC curves together
- Summarize performance metrics
Save processed datasets, model objects, and prediction outputs

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
05 Pipeline_construction_Full_analyte_panel.qmd		05 Pipeline_construction_Full_analyte_panel.qmd
05.1 Limited_predictors.qmd		05.1 Limited_predictors.qmd
06 Pipeline_results.qmd		06 Pipeline_results.qmd
07 Task_TabPFN.R		07 Task_TabPFN.R
07.1 TabPFN_code.py		07.1 TabPFN_code.py
07.2 TabPFN in R using Reticulate.qmd		07.2 TabPFN in R using Reticulate.qmd
08 Plots for Comparisons.qmd		08 Plots for Comparisons.qmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TB Classification with TabPFN and Traditional ML Models

Project Overview

AIM

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TB Classification with TabPFN and Traditional ML Models

Project Overview

AIM

Workflow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages