TrialPredictor is a machine learning system that estimates the probability of clinical trial success before a trial begins — combining compound properties, trial design choices, indication-specific history, and sponsor track records into a unified probabilistic model.
The system addresses a core challenge in pharmaceutical R&D: 90% of drug candidates that enter clinical trials fail, yet most portfolio decisions are still made with limited data and high subjectivity. By grounding go/no-go decisions in quantitative predictions, TrialPredictor enables:
- Portfolio prioritization: Allocate R&D capital toward trials with highest predicted probability of success
- Trial design optimization: Identify design parameters (enrollment size, endpoint type, duration) most predictive of success
- Risk stratification: Quantify uncertainty to differentiate high-confidence from speculative programs
- eNPV modeling: Integrate predictions into expected net present value calculations for asset valuation
The project covers the full pipeline: data collection from public sources (ClinicalTrials.gov, DrugBank, PubChem), feature engineering, model training, calibrated probability output, survival analysis for trial timelines, and a portfolio simulation engine that translates ML predictions into R&D value.
Drug development is one of the most capital-intensive processes in any industry:
| Stage | Average Cost | Duration | Success Rate |
|---|---|---|---|
| Preclinical | $1–5M | 3–6 years | — |
| Phase I | $10–30M | 1–2 years | ~60% → Phase II |
| Phase II | $30–100M | 2–3 years | ~35% → Phase III |
| Phase III | $100–500M | 3–5 years | ~55% → NDA/BLA |
| FDA Review | ~$10M | 1–2 years | ~85% approval |
Overall Phase I → Approval: ~12–14%
The Tufts Center for the Study of Drug Development estimates the fully-loaded cost of bringing a new drug to market at $2.6 billion, largely driven by the cost of failure. A predictive model that improves Phase II → Phase III transition rates by even 5 percentage points can generate hundreds of millions in preserved capital annually for a large pharma company.
Failure modes are not random. They cluster around:
- Safety signals not predicted by preclinical data (~30% of failures)
- Insufficient efficacy in broader patient populations (~55% of failures)
- Trial design flaws (underpowering, wrong endpoint, enrollment failure) (~15%)
- Commercial/strategic withdrawal (competitive landscape, changing priority)
TrialPredictor explicitly models each failure mode and generates interpretable features that clinical teams can act on.
clinicaltrials.gov ─┐
DrugBank ──────────┤──▶ Feature Builder ──▶ Gradient Boosting ─┐
PubChem ───────────┤ ──▶ Neural (TabNet) ──┤──▶ Calibrated P(success)
FDA Drug Labels ───┘ ──▶ Survival (CoxPH) ──┘
│
Portfolio Simulator ──▶ eNPV / Decision Analysis
Evaluated on held-out trials from 2018–2023 (train on 2000–2017), with temporal validation to prevent look-ahead leakage:
| Model | AUROC | AUPRC | Brier Score | Calibration ECE |
|---|---|---|---|---|
| XGBoost (tuned) | 0.791 | 0.682 | 0.178 | 0.041 |
| LightGBM | 0.784 | 0.671 | 0.183 | 0.048 |
| CatBoost | 0.779 | 0.665 | 0.186 | 0.053 |
| TabNet | 0.763 | 0.651 | 0.195 | 0.062 |
| Logistic Regression | 0.711 | 0.598 | 0.214 | 0.087 |
| Phase Transition | AUROC | N (test) |
|---|---|---|
| Phase I → II | 0.734 | 1,842 |
| Phase II → III | 0.812 | 2,105 |
| Phase III → Approval | 0.778 | 892 |
| Model | C-Index | Mean Abs. Error (months) |
|---|---|---|
| DeepSurv | 0.714 | 8.3 |
| Cox PH | 0.698 | 9.7 |
| Kaplan-Meier (baseline) | 0.500 | 14.2 |
Using model predictions to guide a simulated 20-asset portfolio vs. random selection over 1000 bootstrap runs:
| Strategy | Mean eNPV ($M) | 95% CI | Improvement vs. Random |
|---|---|---|---|
| Model-guided (top quartile) | $847M | [$612M, $1,091M] | +38% |
| Random selection | $614M | [$401M, $826M] | baseline |
| Industry benchmark (historical) | $721M | [$533M, $912M] | +17% |
| Rank | Feature | SHAP Importance | Direction |
|---|---|---|---|
| 1 | Sponsor historical success rate (indication) | 0.142 | Positive |
| 2 | Lipinski violations | 0.118 | Negative |
| 3 | Phase II prior data available | 0.097 | Positive |
| 4 | Orphan drug designation | 0.089 | Positive |
| 5 | Enrollment size (log) | 0.076 | Positive (up to ~500) |
| 6 | Number of primary endpoints | 0.071 | Negative (>2 hurts) |
| 7 | Mechanism-of-action validation score | 0.068 | Positive |
| 8 | Indication competitive density | 0.064 | Negative |
| 9 | Trial duration (months) | 0.058 | Inverted-U |
| 10 | Molecular weight | 0.052 | Negative (>600 Da) |
| Source | Access | Content | Update Frequency |
|---|---|---|---|
| ClinicalTrials.gov | Free, public API | Trial metadata, results, interventions | Daily |
| AACT Database | Free registration | Full relational DB of ClinicalTrials.gov | Monthly snapshots |
| DrugBank | Academic license | Drug properties, targets, mechanisms | Quarterly |
| PubChem | Free, public API | Molecular structures, physicochemical properties | Real-time |
| FDA Drug Approvals | Free, public | Approved drugs, indication, PDUFA dates | Continuous |
See docs/DATA_SOURCES.md for schema details and data quality notes.
# Clone repository
git clone https://github.com/yourusername/trial-predictor.git
cd trial-predictor
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install package in development mode
pip install -e .For full-scale training, download the AACT database:
# Download latest AACT snapshot (requires free registration)
# https://aact.ctti-clinicaltrials.org/snapshots
psql -d aact -f /path/to/aact_snapshot.dmp# 1. Fetch trial data (uses ClinicalTrials.gov API by default)
python scripts/fetch_data.py --source api --phases 2 3 --output data/raw/
# 2. Build features
python scripts/fetch_data.py --build-features --input data/raw/ --output data/processed/
# 3. Train models
python scripts/train.py --config configs/trial_config.yaml --model xgboost
# 4. Evaluate
python scripts/evaluate.py --model-path models/xgboost_best.pkl --test-data data/processed/test.parquet
# 5. Run portfolio simulation
python scripts/analyze.py --mode portfolio --model-path models/xgboost_best.pkltrial-predictor/
├── src/
│ ├── data/
│ │ ├── clinicaltrials_fetcher.py # ClinicalTrials.gov API + AACT
│ │ ├── drugbank_loader.py # DrugBank drug property extraction
│ │ └── feature_builder.py # Feature engineering pipeline
│ ├── models/
│ │ ├── gradient_boosting.py # XGBoost / LightGBM / CatBoost
│ │ ├── neural_trial.py # TabNet with entity embeddings
│ │ └── survival_model.py # DeepSurv / Cox PH
│ ├── evaluation/
│ │ ├── trial_metrics.py # AUROC, calibration, decision analysis
│ │ └── portfolio_simulator.py # eNPV / portfolio optimization
│ └── analysis/
│ ├── failure_analyzer.py # Failure mode clustering
│ └── indication_profiler.py # Therapeutic area analysis
├── configs/
│ └── trial_config.yaml # Experiment configuration
├── scripts/
│ ├── fetch_data.py # Data collection entry point
│ ├── train.py # Model training entry point
│ ├── evaluate.py # Evaluation entry point
│ └── analyze.py # Analysis entry point
├── docs/
│ ├── DATA_SOURCES.md # Data source documentation
│ └── PHARMA_CONTEXT.md # Drug development pipeline context
├── tests/ # Unit and integration tests
├── notebooks/ # Exploratory analysis
├── requirements.txt
├── setup.py
└── README.md
This project is built with a pharmaceutical R&D mindset, not just an ML mindset. Key design decisions:
Temporal validation: All models are validated on future trials to prevent look-ahead leakage — mimicking real deployment where you predict trials before they complete.
Calibration priority: In portfolio decisions, calibrated probabilities matter more than raw discrimination. A model that says "70% success" should be right 70% of the time. We enforce calibration via isotonic regression and Platt scaling.
Phase-specific models: Phase II and Phase III have fundamentally different failure modes. We train separate models per phase transition rather than forcing one model to generalize across phases.
Interpretability: SHAP values are computed for every prediction. Clinical decision-makers need to understand why a trial is predicted to succeed or fail — not just the score.
Regulatory awareness: The system is explicitly framed as a decision-support tool, not a clinical decision tool. All documentation reflects FDA guidance on the appropriate use of AI/ML in drug development.
See docs/PHARMA_CONTEXT.md for a full drug development pipeline overview.
- Label noise: "Completed" trials may still fail to achieve approval; outcome labels are proxy measures
- Publication bias: Successful trials are more likely to publish results, biasing the training signal
- Indication shifts: Novel indications (e.g., first-in-class mechanisms) have limited historical comparators
- External validity: Models trained on publicly registered trials may not generalize to internal proprietary trials with different documentation standards
- Regulatory changes: FDA guidance evolves; models may need retraining after major policy shifts
@software{trialpredictor2024,
title = {TrialPredictor: ML-Driven Clinical Trial Outcome Prediction},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/trial-predictor}
}MIT License — see LICENSE for details.
This project uses publicly available data from ClinicalTrials.gov (public domain) and DrugBank (academic license required for commercial use). DrugBank data must not be redistributed without a license from Wishart Lab.