GitHub - nimshafernando/CardioRisk-AI: Machine learning web app for 10-year coronary heart disease risk prediction. Built with Python, scikit-learn, XGBoost, and imbalanced-learn (SMOTE), with an interactive Gradio UI. Trained on the Framingham Heart Study dataset.

  ██████╗ █████╗ ██████╗ ██████╗ ██╗ ██████╗     ██████╗ ██╗███████╗██╗  ██╗
 ██╔════╝██╔══██╗██╔══██╗██╔══██╗██║██╔═══██╗    ██╔══██╗██║██╔════╝██║ ██╔╝
 ██║     ███████║██████╔╝██║  ██║██║██║   ██║    ██████╔╝██║███████╗█████╔╝ 
 ██║     ██╔══██║██╔══██╗██║  ██║██║██║   ██║    ██╔══██╗██║╚════██║██╔═██╗ 
 ╚██████╗██║  ██║██║  ██║██████╔╝██║╚██████╔╝    ██║  ██║██║███████║██║  ██╗
  ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═════╝ ╚═╝ ╚═════╝    ╚═╝  ╚═╝╚═╝╚══════╝╚═╝  ╚═╝
                                                                             AI

♥ 10-Year Coronary Heart Disease Risk Prediction ♥

A machine learning pipeline trained on the Framingham Heart Study — runs entirely in Google Colab

Open in Colab ♥ View Dataset ♥ Model Results ♥ Gradio UI

Overview

CardioRisk AI is a complete end-to-end machine learning project that predicts a patient's 10-year risk of developing coronary heart disease (CHD) using clinical and lifestyle measurements drawn from the landmark Framingham Heart Study.

The entire pipeline — from raw data through exploratory analysis, preprocessing, model training, evaluation, and an interactive web UI — runs in a single Jupyter notebook with no external configuration required. The dataset is embedded directly inside the notebook, meaning there are no download steps, no Kaggle tokens, and no broken URLs.

"Cardiovascular disease is the world's leading cause of death. Early risk stratification saves lives." — World Heart Federation

Quick Start

Download framingham_heart_disease.ipynb from this repository
Open colab.research.google.com and upload the file
Select Runtime → Run All
Scroll to the final cell and follow the Gradio link — your interactive heart risk assessment tool is live

All dependencies install automatically in the first cell. No manual setup needed.

Dataset

The notebook uses the Framingham Heart Study dataset — a landmark longitudinal cardiovascular cohort study run by the National Heart, Lung, and Blood Institute (NHLBI) since 1948. It remains one of the most cited datasets in cardiovascular medicine.

Property	Value
Patients	4,240
Clinical features	15 original + 1 engineered (`pulse_pressure`)
Target variable	`TenYearCHD` — binary (0 = no risk, 1 = at risk)
Positive class rate	~15% CHD risk — balanced with SMOTE
Missing data	Present in 6 columns — imputed with column median

Clinical Feature Reference

Feature	Type	Clinical Meaning
`male`	Binary	Patient sex (1 = Male, 0 = Female)
`age`	Integer	Age in years
`education`	Ordinal	1 = No HS 2 = HS 3 = College 4 = University+
`currentSmoker`	Binary	Active smoker status
`cigsPerDay`	Integer	Average cigarettes per day
`BPMeds`	Binary	Currently on antihypertensive medication
`prevalentStroke`	Binary	Prior cerebrovascular event
`prevalentHyp`	Binary	Hypertension diagnosed
`diabetes`	Binary	Diabetes mellitus present
`totChol`	Float	Total serum cholesterol (mg/dL)
`sysBP`	Float	Systolic blood pressure (mmHg)
`diaBP`	Float	Diastolic blood pressure (mmHg)
`BMI`	Float	Body Mass Index (kg/m²)
`heartRate`	Integer	Resting heart rate (bpm)
`glucose`	Float	Fasting blood glucose (mg/dL)
`pulse_pressure`	Float	Engineered feature: `sysBP - diaBP`

Pipeline

  +--------------------------+
  |   Embedded Patient Data  |   4,240 records · 16 features · no download needed
  |      (Framingham CSV)    |
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |  Exploratory Analysis    |   Class distribution · Feature histograms
  |                          |   Correlation heatmap · Missing value audit
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |     Preprocessing        |   Median imputation · Pulse pressure engineering
  |                          |   Stratified 80/20 split · SMOTE balancing
  |                          |   StandardScaler normalization
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |    Model Training        |   Logistic Regression
  |    (3 classifiers)       |   Random Forest  (200 estimators)
  |                          |   XGBoost        (200 estimators)
  |                          |   5-Fold Stratified Cross-Validation
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |      Evaluation          |   Accuracy · Precision · Recall · F1 · ROC-AUC
  |                          |   ROC curves · Confusion matrices
  |                          |   Feature importance (best model)
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |      Gradio Web UI       |   Interactive sliders · Real-time prediction
  |                          |   All 3 model outputs · Risk level classification
  |                          |   Public shareable link via gradio.live
  +--------------------------+

Results

All three classifiers are evaluated on the same stratified held-out test set. The model with the highest ROC-AUC is automatically selected for the primary prediction in the Gradio UI.

Model	Accuracy	F1 Score	Precision	Recall	ROC-AUC
Logistic Regression	~0.70	~0.42	~0.38	~0.49	~0.74
Random Forest	~0.74	~0.44	~0.41	~0.47	~0.78
XGBoost	~0.73	~0.45	~0.40	~0.51	~0.79

Metrics vary slightly across runs due to SMOTE stochasticity. Recall is the primary clinical metric — a missed true positive (an at-risk patient classified as healthy) carries significantly higher real-world cost than a false alarm.

Gradio UI

The interactive Gradio interface launches automatically at the end of the notebook and generates a public gradio.live link.

Left panel — Patient Profile

Sex, age, and education level
Smoking status and cigarettes per day
Medical history: blood pressure medication, prior stroke, hypertension, diabetes

Right panel — Clinical Measurements

Total cholesterol, systolic BP, diastolic BP
BMI, resting heart rate, fasting glucose
Normal reference ranges displayed inline for each measurement

Output — Risk Assessment

Risk classification: Low / Moderate / High
CHD probability percentage from the best-performing model
Side-by-side comparison table across all three models
Plain-language clinical insight and guidance

Project Structure

CardioRisk-AI/
|
|-- framingham_heart_disease.ipynb   # Complete pipeline — dataset embedded, Colab-ready
|-- requirements.txt                 # Python dependencies
|-- README.md                        # This file

Requirements

gradio
xgboost
imbalanced-learn
scikit-learn
pandas
numpy
matplotlib
seaborn

All packages are installed automatically in the first notebook cell when running in Colab.

Medical Disclaimer

CardioRisk AI is an educational and research demonstration project. It is not a medical device and must not be used for clinical diagnosis, risk screening, or treatment decisions. All predictions are generated by statistical models trained on historical research data.

Always consult a qualified cardiologist or physician for any cardiovascular health concerns.

        ♥                       ♥
      ♥   ♥                   ♥   ♥
    ♥       ♥               ♥       ♥
      ♥   ♥    C A R D I O    ♥   ♥
        ♥      R I S K  A I      ♥

Built with scikit-learn · XGBoost · imbalanced-learn · Gradio

Trained on data from the Framingham Heart Study — NHLBI

predict · prevent · protect

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CardioRisk-AI.ipynb		CardioRisk-AI.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Quick Start

Dataset

Clinical Feature Reference

Pipeline

Results

Gradio UI

Project Structure

Requirements

Medical Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Quick Start

Dataset

Clinical Feature Reference

Pipeline

Results

Gradio UI

Project Structure

Requirements

Medical Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages