██████╗ █████╗ ██████╗ ██████╗ ██╗ ██████╗ ██████╗ ██╗███████╗██╗ ██╗
██╔════╝██╔══██╗██╔══██╗██╔══██╗██║██╔═══██╗ ██╔══██╗██║██╔════╝██║ ██╔╝
██║ ███████║██████╔╝██║ ██║██║██║ ██║ ██████╔╝██║███████╗█████╔╝
██║ ██╔══██║██╔══██╗██║ ██║██║██║ ██║ ██╔══██╗██║╚════██║██╔═██╗
╚██████╗██║ ██║██║ ██║██████╔╝██║╚██████╔╝ ██║ ██║██║███████║██║ ██╗
╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═╝╚══════╝╚═╝ ╚═╝
AI
♥ 10-Year Coronary Heart Disease Risk Prediction ♥
A machine learning pipeline trained on the Framingham Heart Study — runs entirely in Google Colab
Open in Colab ♥ View Dataset ♥ Model Results ♥ Gradio UI
CardioRisk AI is a complete end-to-end machine learning project that predicts a patient's 10-year risk of developing coronary heart disease (CHD) using clinical and lifestyle measurements drawn from the landmark Framingham Heart Study.
The entire pipeline — from raw data through exploratory analysis, preprocessing, model training, evaluation, and an interactive web UI — runs in a single Jupyter notebook with no external configuration required. The dataset is embedded directly inside the notebook, meaning there are no download steps, no Kaggle tokens, and no broken URLs.
"Cardiovascular disease is the world's leading cause of death. Early risk stratification saves lives." — World Heart Federation
- Download
framingham_heart_disease.ipynbfrom this repository - Open colab.research.google.com and upload the file
- Select Runtime → Run All
- Scroll to the final cell and follow the Gradio link — your interactive heart risk assessment tool is live
All dependencies install automatically in the first cell. No manual setup needed.
The notebook uses the Framingham Heart Study dataset — a landmark longitudinal cardiovascular cohort study run by the National Heart, Lung, and Blood Institute (NHLBI) since 1948. It remains one of the most cited datasets in cardiovascular medicine.
| Property | Value |
|---|---|
| Patients | 4,240 |
| Clinical features | 15 original + 1 engineered (pulse_pressure) |
| Target variable | TenYearCHD — binary (0 = no risk, 1 = at risk) |
| Positive class rate | ~15% CHD risk — balanced with SMOTE |
| Missing data | Present in 6 columns — imputed with column median |
| Feature | Type | Clinical Meaning |
|---|---|---|
male |
Binary | Patient sex (1 = Male, 0 = Female) |
age |
Integer | Age in years |
education |
Ordinal | 1 = No HS 2 = HS 3 = College 4 = University+ |
currentSmoker |
Binary | Active smoker status |
cigsPerDay |
Integer | Average cigarettes per day |
BPMeds |
Binary | Currently on antihypertensive medication |
prevalentStroke |
Binary | Prior cerebrovascular event |
prevalentHyp |
Binary | Hypertension diagnosed |
diabetes |
Binary | Diabetes mellitus present |
totChol |
Float | Total serum cholesterol (mg/dL) |
sysBP |
Float | Systolic blood pressure (mmHg) |
diaBP |
Float | Diastolic blood pressure (mmHg) |
BMI |
Float | Body Mass Index (kg/m²) |
heartRate |
Integer | Resting heart rate (bpm) |
glucose |
Float | Fasting blood glucose (mg/dL) |
pulse_pressure |
Float | Engineered feature: sysBP - diaBP |
+--------------------------+
| Embedded Patient Data | 4,240 records · 16 features · no download needed
| (Framingham CSV) |
+-----------+--------------+
|
v
+-----------+--------------+
| Exploratory Analysis | Class distribution · Feature histograms
| | Correlation heatmap · Missing value audit
+-----------+--------------+
|
v
+-----------+--------------+
| Preprocessing | Median imputation · Pulse pressure engineering
| | Stratified 80/20 split · SMOTE balancing
| | StandardScaler normalization
+-----------+--------------+
|
v
+-----------+--------------+
| Model Training | Logistic Regression
| (3 classifiers) | Random Forest (200 estimators)
| | XGBoost (200 estimators)
| | 5-Fold Stratified Cross-Validation
+-----------+--------------+
|
v
+-----------+--------------+
| Evaluation | Accuracy · Precision · Recall · F1 · ROC-AUC
| | ROC curves · Confusion matrices
| | Feature importance (best model)
+-----------+--------------+
|
v
+-----------+--------------+
| Gradio Web UI | Interactive sliders · Real-time prediction
| | All 3 model outputs · Risk level classification
| | Public shareable link via gradio.live
+--------------------------+
All three classifiers are evaluated on the same stratified held-out test set. The model with the highest ROC-AUC is automatically selected for the primary prediction in the Gradio UI.
| Model | Accuracy | F1 Score | Precision | Recall | ROC-AUC |
|---|---|---|---|---|---|
| Logistic Regression | ~0.70 | ~0.42 | ~0.38 | ~0.49 | ~0.74 |
| Random Forest | ~0.74 | ~0.44 | ~0.41 | ~0.47 | ~0.78 |
| XGBoost | ~0.73 | ~0.45 | ~0.40 | ~0.51 | ~0.79 |
Metrics vary slightly across runs due to SMOTE stochasticity. Recall is the primary clinical metric — a missed true positive (an at-risk patient classified as healthy) carries significantly higher real-world cost than a false alarm.
The interactive Gradio interface launches automatically at the end of the notebook and generates a public gradio.live link.
Left panel — Patient Profile
- Sex, age, and education level
- Smoking status and cigarettes per day
- Medical history: blood pressure medication, prior stroke, hypertension, diabetes
Right panel — Clinical Measurements
- Total cholesterol, systolic BP, diastolic BP
- BMI, resting heart rate, fasting glucose
- Normal reference ranges displayed inline for each measurement
Output — Risk Assessment
- Risk classification: Low / Moderate / High
- CHD probability percentage from the best-performing model
- Side-by-side comparison table across all three models
- Plain-language clinical insight and guidance
CardioRisk-AI/
|
|-- framingham_heart_disease.ipynb # Complete pipeline — dataset embedded, Colab-ready
|-- requirements.txt # Python dependencies
|-- README.md # This file
gradio
xgboost
imbalanced-learn
scikit-learn
pandas
numpy
matplotlib
seaborn
All packages are installed automatically in the first notebook cell when running in Colab.
CardioRisk AI is an educational and research demonstration project. It is not a medical device and must not be used for clinical diagnosis, risk screening, or treatment decisions. All predictions are generated by statistical models trained on historical research data.
Always consult a qualified cardiologist or physician for any cardiovascular health concerns.
♥ ♥
♥ ♥ ♥ ♥
♥ ♥ ♥ ♥
♥ ♥ C A R D I O ♥ ♥
♥ R I S K A I ♥
Built with scikit-learn · XGBoost · imbalanced-learn · Gradio
Trained on data from the Framingham Heart Study — NHLBI
predict · prevent · protect