Skip to content

nimshafernando/CardioRisk-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation


  ██████╗ █████╗ ██████╗ ██████╗ ██╗ ██████╗     ██████╗ ██╗███████╗██╗  ██╗
 ██╔════╝██╔══██╗██╔══██╗██╔══██╗██║██╔═══██╗    ██╔══██╗██║██╔════╝██║ ██╔╝
 ██║     ███████║██████╔╝██║  ██║██║██║   ██║    ██████╔╝██║███████╗█████╔╝ 
 ██║     ██╔══██║██╔══██╗██║  ██║██║██║   ██║    ██╔══██╗██║╚════██║██╔═██╗ 
 ╚██████╗██║  ██║██║  ██║██████╔╝██║╚██████╔╝    ██║  ██║██║███████║██║  ██╗
  ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═════╝ ╚═╝ ╚═════╝    ╚═╝  ╚═╝╚═╝╚══════╝╚═╝  ╚═╝
                                                                             AI

♥   10-Year Coronary Heart Disease Risk Prediction   ♥

A machine learning pipeline trained on the Framingham Heart Study — runs entirely in Google Colab


Python Jupyter Gradio scikit-learn XGBoost License


Open in Colab   ♥   View Dataset   ♥   Model Results   ♥   Gradio UI


Overview

CardioRisk AI is a complete end-to-end machine learning project that predicts a patient's 10-year risk of developing coronary heart disease (CHD) using clinical and lifestyle measurements drawn from the landmark Framingham Heart Study.

The entire pipeline — from raw data through exploratory analysis, preprocessing, model training, evaluation, and an interactive web UI — runs in a single Jupyter notebook with no external configuration required. The dataset is embedded directly inside the notebook, meaning there are no download steps, no Kaggle tokens, and no broken URLs.

"Cardiovascular disease is the world's leading cause of death. Early risk stratification saves lives." — World Heart Federation



Quick Start

  1. Download framingham_heart_disease.ipynb from this repository
  2. Open colab.research.google.com and upload the file
  3. Select Runtime → Run All
  4. Scroll to the final cell and follow the Gradio link — your interactive heart risk assessment tool is live

All dependencies install automatically in the first cell. No manual setup needed.



Dataset

The notebook uses the Framingham Heart Study dataset — a landmark longitudinal cardiovascular cohort study run by the National Heart, Lung, and Blood Institute (NHLBI) since 1948. It remains one of the most cited datasets in cardiovascular medicine.

Property Value
Patients 4,240
Clinical features 15 original + 1 engineered (pulse_pressure)
Target variable TenYearCHD — binary (0 = no risk, 1 = at risk)
Positive class rate ~15% CHD risk — balanced with SMOTE
Missing data Present in 6 columns — imputed with column median

Clinical Feature Reference

Feature Type Clinical Meaning
male Binary Patient sex (1 = Male, 0 = Female)
age Integer Age in years
education Ordinal 1 = No HS   2 = HS   3 = College   4 = University+
currentSmoker Binary Active smoker status
cigsPerDay Integer Average cigarettes per day
BPMeds Binary Currently on antihypertensive medication
prevalentStroke Binary Prior cerebrovascular event
prevalentHyp Binary Hypertension diagnosed
diabetes Binary Diabetes mellitus present
totChol Float Total serum cholesterol (mg/dL)
sysBP Float Systolic blood pressure (mmHg)
diaBP Float Diastolic blood pressure (mmHg)
BMI Float Body Mass Index (kg/m²)
heartRate Integer Resting heart rate (bpm)
glucose Float Fasting blood glucose (mg/dL)
pulse_pressure Float Engineered feature: sysBP - diaBP


Pipeline

  +--------------------------+
  |   Embedded Patient Data  |   4,240 records · 16 features · no download needed
  |      (Framingham CSV)    |
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |  Exploratory Analysis    |   Class distribution · Feature histograms
  |                          |   Correlation heatmap · Missing value audit
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |     Preprocessing        |   Median imputation · Pulse pressure engineering
  |                          |   Stratified 80/20 split · SMOTE balancing
  |                          |   StandardScaler normalization
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |    Model Training        |   Logistic Regression
  |    (3 classifiers)       |   Random Forest  (200 estimators)
  |                          |   XGBoost        (200 estimators)
  |                          |   5-Fold Stratified Cross-Validation
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |      Evaluation          |   Accuracy · Precision · Recall · F1 · ROC-AUC
  |                          |   ROC curves · Confusion matrices
  |                          |   Feature importance (best model)
  +-----------+--------------+
              |
              v
  +-----------+--------------+
  |      Gradio Web UI       |   Interactive sliders · Real-time prediction
  |                          |   All 3 model outputs · Risk level classification
  |                          |   Public shareable link via gradio.live
  +--------------------------+


Results

All three classifiers are evaluated on the same stratified held-out test set. The model with the highest ROC-AUC is automatically selected for the primary prediction in the Gradio UI.

Model Accuracy F1 Score Precision Recall ROC-AUC
Logistic Regression ~0.70 ~0.42 ~0.38 ~0.49 ~0.74
Random Forest ~0.74 ~0.44 ~0.41 ~0.47 ~0.78
XGBoost ~0.73 ~0.45 ~0.40 ~0.51 ~0.79

Metrics vary slightly across runs due to SMOTE stochasticity. Recall is the primary clinical metric — a missed true positive (an at-risk patient classified as healthy) carries significantly higher real-world cost than a false alarm.



Gradio UI

The interactive Gradio interface launches automatically at the end of the notebook and generates a public gradio.live link.

Left panel — Patient Profile

  • Sex, age, and education level
  • Smoking status and cigarettes per day
  • Medical history: blood pressure medication, prior stroke, hypertension, diabetes

Right panel — Clinical Measurements

  • Total cholesterol, systolic BP, diastolic BP
  • BMI, resting heart rate, fasting glucose
  • Normal reference ranges displayed inline for each measurement

Output — Risk Assessment

  • Risk classification: Low / Moderate / High
  • CHD probability percentage from the best-performing model
  • Side-by-side comparison table across all three models
  • Plain-language clinical insight and guidance


Project Structure

CardioRisk-AI/
|
|-- framingham_heart_disease.ipynb   # Complete pipeline — dataset embedded, Colab-ready
|-- requirements.txt                 # Python dependencies
|-- README.md                        # This file


Requirements

gradio
xgboost
imbalanced-learn
scikit-learn
pandas
numpy
matplotlib
seaborn

All packages are installed automatically in the first notebook cell when running in Colab.



Medical Disclaimer

CardioRisk AI is an educational and research demonstration project. It is not a medical device and must not be used for clinical diagnosis, risk screening, or treatment decisions. All predictions are generated by statistical models trained on historical research data.

Always consult a qualified cardiologist or physician for any cardiovascular health concerns.




        ♥                       ♥
      ♥   ♥                   ♥   ♥
    ♥       ♥               ♥       ♥
      ♥   ♥    C A R D I O    ♥   ♥
        ♥      R I S K  A I      ♥

Built with scikit-learn · XGBoost · imbalanced-learn · Gradio

Trained on data from the Framingham Heart Study — NHLBI

predict · prevent · protect


About

Machine learning web app for 10-year coronary heart disease risk prediction. Built with Python, scikit-learn, XGBoost, and imbalanced-learn (SMOTE), with an interactive Gradio UI. Trained on the Framingham Heart Study dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors