Skip to content

ashfaq3112/Ai_HealthCare_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺 AI-Powered Healthcare System

An end-to-end ML/AI project that predicts stroke risk, clusters patients into meaningful groups, and generates treatment recommendations using association rule mining. It also includes a Streamlit app for interactive predictions.


πŸ“Œ Features

  • Data Preprocessing

    • Cleans and encodes the stroke dataset
    • Handles missing values and categorical encoding
    • Saves processed data
  • **Supervised Learning **

    • Logistic Regression, Random Forest, XGBoost baselines
    • 5-fold Cross Validation
    • Evaluation with ROC AUC, Precision, Recall, F1
    • SHAP-based feature importance
  • **Unsupervised Learning **

    • KMeans & DBSCAN clustering
    • Cluster profiles with mean feature summaries
    • Risk-based cluster naming (High/Moderate/Low Risk groups)
  • Association Rules

    • Simulated patient symptoms β†’ treatments transactions
    • Apriori + FP-Growth mining
    • Top-10 rules exported for recommendations
  • **Streamlit App **

    • Single patient risk prediction
    • Cluster assignment with profile interpretation
    • Recommended treatments from association rules

πŸ—οΈ Project Structure

ai-healthcare-system/
β”‚
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Original dataset
β”‚ β”œβ”€β”€ processed/ # Cleaned & preprocessed data
β”‚ β”‚ └── stroke_data_processed.csv
β”‚
β”œβ”€β”€ models/ # Trained & saved models
β”‚ β”œβ”€β”€ model.pkl # Best supervised model (LogReg / RF / XGB)
β”‚ β”œβ”€β”€ kmeans.pkl # Saved KMeans clustering model
β”‚ β”œβ”€β”€ scaler.pkl # Scaler used for clustering
β”‚
β”œβ”€β”€ notebooks/ # Jupyter notebooks (experiments & reports)
β”‚ β”œβ”€β”€ 01-eda.ipynb # Exploratory Data Analysis
β”‚ β”œβ”€β”€ 02-supervised-baseline.ipynb# Baseline supervised models
β”‚ β”œβ”€β”€ 03-clustering.ipynb # Clustering experiments
β”‚ β”œβ”€β”€ 04-association.ipynb # Association rule mining
β”‚
β”œβ”€β”€ src/ # Source code
β”‚ β”œβ”€β”€ data/
β”‚ β”‚ β”œβ”€β”€ load.py # Load raw/processed data
β”‚ β”‚ └── preprocess.py # Data cleaning & feature engineering
β”‚ β”‚
β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”œβ”€β”€ baseline.py # Pipelines for baseline models
β”‚ β”‚ β”œβ”€β”€ trainer.py # Training & cross-validation
β”‚ β”‚ └── evaluate.py # Model evaluation & metrics
β”‚ β”‚
β”‚ β”œβ”€β”€ unsupervised/
β”‚ β”‚ └── clustering.py # KMeans & DBSCAN + cluster profiling
β”‚ β”‚
β”‚ β”œβ”€β”€ association/
β”‚ β”‚ └── apriori_rules.py # Association rules (Apriori/FP-Growth)
β”‚ β”‚
β”‚ └── app/
β”‚ └── streamlit_app.py # Streamlit web app integration
β”‚
β”œβ”€β”€ cluster_profiles.md # Cluster summaries (generated in Milestone 3)
β”œβ”€β”€ association_rules.csv # Top-10 rules (generated in Milestone 4)
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project documentation
└── .gitignore # Files ignored by Git

βš™οΈ Installation & Setup

  1. Clone the repo:
    git clone https://github.com/yourusername/ai-healthcare-system.git
    cd ai-healthcare-system

2.Create a virtual environment:

conda create -n ai-healthcare python=3.10 -y
conda activate ai-healthcare

3.Install dependencies:

  pip install -r requirements.txt

⚑ Execution Workflow

Follow this order to execute the project end-to-end:


# Run data loading
python src/data/load.py

# Run preprocessing
python src/data/preprocess.py

# Train baseline models (LogReg, RF, XGBoost)
python src/models/trainer.py

# Evaluate saved model
python src/models/evaluate.py

# Run clustering (KMeans + DBSCAN)
python src/unsupervised/clustering.py

# Mine Apriori + FP-Growth rules
python src/association/apriori_rules.py

# Launch the interactive app
streamlit run src/app/streamlit_app.py

πŸ“ˆ Example Outputs 🧠 Model Performance

Logistic Regression (best CV ROC AUC β‰ˆ 0.84)

XGBoost: tunable for higher recall/precision

πŸŒ€ Clustering

KMeans Silhouette Score β‰ˆ 0.15

Clusters:

Cluster 0 β†’ High-Risk Group

Cluster 1 β†’ Moderate-Risk Group

Cluster 2 β†’ Low-Risk Younger Group

πŸ“‹ Example Rule

symptom:hypertension, symptom:obese β†’ treatment:antihypertensive, treatment:lifestyle_change
(Lift: 18.51, Confidence: 1.00)

πŸ“Έ Screenshots

🏠 Home Page

Home Page

πŸ§‘β€βš•οΈ Single Patient Prediction

Single Patient Prediction

πŸ› οΈ Tech Stack

Programming Language

  • Python 3.10+

Libraries & Frameworks

  • Data Handling: pandas, numpy

  • Visualization: matplotlib, seaborn

  • Machine Learning: scikit-learn, xgboost, imblearn

  • Clustering: scikit-learn (KMeans, DBSCAN)

  • Association Rules: mlxtend (Apriori, FP-Growth)

  • Explainability: shap

  • App Framework: streamlit

  • Serialization: joblib

    πŸš€ Future Improvements

  • Deploy via Docker or Cloud: Package the app using Docker or deploy on platforms like Heroku, AWS, or GCP for wider accessibility.

  • Integrate Real Clinical Datasets: Incorporate real-world patient datasets with treatment + outcome mappings to improve the reliability of predictions.

  • Temporal Association Rules: Enhance the association rule mining by including temporal patient history (sequence of symptoms β†’ treatments β†’ outcomes).

  • Improved Interpretability: Add interactive LIME/SHAP dashboards within the app for doctors and researchers to better understand model decisions.

About

🩺 AI-Powered Healthcare System for Stroke Risk Prediction and Patient Profiling. Combines supervised ML (classification), unsupervised ML (clustering), and association rule mining to deliver patient risk prediction, cluster-based interpretation, and treatment recommendations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors