Skip to content

EudaLabs/HealthWithSevgi

Repository files navigation

FastAPI React scikit-learn TypeScript Docker Python SHAP License

HealthWithSevgi

An interactive, browser-based machine learning education tool for healthcare professionals.

SENG 430 - Software Quality Assurance Cankaya University - Spring 2025-2026 Instructor: Dr. Sevgi Koyuncu Tunç

HealthWithSevgi guides clinicians through a complete ML pipeline in 7 steps — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with zero coding required.

Live Demo  |  Jira Board  |  Figma Designs  |  Setup Guide


Table of Contents


Overview

Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.

Key capabilities:

  • 20 medical specialties with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
  • 8 ML classifiers with interactive hyperparameter tuning via sliders
  • SHAP-based explainability — global feature importance and single-patient waterfall explanations
  • Fairness auditing — subgroup performance analysis across demographics with bias detection
  • EU AI Act compliance checklist with downloadable PDF certificate
  • No server-side data storage — all session data is held in-memory and evicted automatically

The 7-Step Pipeline

Step Name What Happens
1 Clinical Context Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap.
2 Data Exploration Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable.
3 Data Preparation Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping).
4 Model & Parameters Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest).
5 Results & Evaluation View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison.
6 Explainability Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., "High glucose increases diabetes risk by 0.23").
7 Ethics & Bias Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate.

Supported Specialties

# Specialty Prediction Task Dataset Samples
1 Cardiology 30-day heart failure mortality Heart Failure Clinical Records ~300
2 Radiology Pneumonia detection (chest X-ray metadata) NIH Chest X-ray 100K+
3 Nephrology Chronic kidney disease detection UCI CKD 400
4 Oncology - Breast Malignant vs. benign biopsy Wisconsin Breast Cancer 569
5 Neurology - Parkinson's Parkinson's from voice biomarkers UCI Parkinson's 195
6 Endocrinology - Diabetes Diabetes onset within 5 years Pima Indians 768
7 Hepatology - Liver Liver disease detection Indian Liver Patient 583
8 Cardiology - Stroke Stroke risk prediction Kaggle Stroke Prediction 5,110
9 Mental Health Depression severity (PHQ-9) Kaggle Depression ~1,000
10 Pulmonology - COPD COPD exacerbation risk PhysioNet + Kaggle ~1,000
11 Haematology - Anaemia Anaemia type classification Kaggle Anaemia ~400
12 Dermatology Benign vs. malignant skin lesion HAM10000 metadata ~10K
13 Ophthalmology Diabetic retinopathy detection UCI Diabetic Retinopathy 1,151
14 Orthopaedics - Spine Disc herniation / spondylolisthesis UCI Vertebral Column 310
15 ICU / Sepsis Sepsis onset within 6 hours PhysioNet Sepsis ~40K
16 Obstetrics - Fetal Health Fetal health classification (CTG) UCI Fetal Health 2,126
17 Cardiology - Arrhythmia Arrhythmia detection (ECG) UCI Arrhythmia 452
18 Oncology - Cervical Cervical cancer risk UCI Cervical Cancer 858
19 Thyroid / Endocrinology Thyroid function classification UCI Thyroid 9,172
20 Pharmacy - Readmission Hospital readmission risk UCI Diabetes 130-US 101,766

ML Models

Model Category Key Hyperparameters
K-Nearest Neighbors Instance-based k (1-25), distance metric
Support Vector Machine Boundary-based C (0.01-100), kernel (linear/rbf/poly)
Decision Tree Tree-based max_depth (1-20), criterion (gini/entropy)
Random Forest Ensemble n_estimators (10-500), max_depth
Logistic Regression Linear C (0.001-100), solver (lbfgs/saga)
Naive Bayes Probabilistic var_smoothing (1e-12 to 1e-3)
XGBoost Gradient Boosting n_estimators, max_depth, learning_rate
LightGBM Gradient Boosting n_estimators, max_depth, learning_rate

All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).


Tech Stack

Layer Technology Purpose
Frontend React 18, TypeScript, Vite Single-page wizard application
UI Components Recharts, Lucide Icons, react-dropzone Charts, icons, file uploads
State Management TanStack React Query Server state caching and synchronization
Backend FastAPI, Python 3.12 REST API with auto-generated OpenAPI docs
ML Engine scikit-learn, XGBoost, LightGBM Model training, evaluation, cross-validation
Explainability SHAP TreeExplainer (tree models), KernelExplainer (linear), permutation importance
Data Processing pandas, numpy, imbalanced-learn Data cleaning, normalization, SMOTE
PDF Generation ReportLab Compliance certificate export
Containerization Docker (multi-stage) Production deployment
Hosting HuggingFace Spaces Live demo environment
Package Manager pnpm (frontend), pip (backend) Dependency management

Architecture

📐 Full Architecture Diagrams (Google Drive) — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.

                          +---------------------+
                          |   Browser (React)   |
                          |   Wizard UI (SPA)   |
                          +----------+----------+
                                     |
                            HTTP/REST (JSON)
                                     |
                          +----------v----------+
                          |   FastAPI Backend    |
                          +----------+----------+
                                     |
              +----------------------+----------------------+
              |              |              |                |
     +--------v---+  +------v-----+  +-----v------+  +-----v--------+
     | DataService|  | MLService  |  |ExplainSvc  |  | EthicsService|
     |            |  |            |  |            |  |              |
     | - Explore  |  | - Train    |  | - SHAP     |  | - Subgroup   |
     | - Prepare  |  | - Evaluate |  | - Waterfall|  | - Bias detect|
     | - SMOTE    |  | - Compare  |  | - Clinical |  | - EU AI Act  |
     +-----+------+  +------+-----+  +------+-----+  +------+-------+
           |                |                |                |
           v                v                v                v
     +-----------+   +------------+   +------------+   +-----------+
     | In-Memory |   | In-Memory  |   |   SHAP     |   | ReportLab |
     | Sessions  |   | Models     |   |  Library   |   |  PDF Gen  |
     | (LRU 50)  |   | (LRU 100+)|   |            |   |           |
     +-----------+   +------------+   +------------+   +-----------+

Data flow: Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate


Project Structure

HealthWithSevgi/
|
+-- frontend/                         # React 18 + Vite + TypeScript
|   +-- src/
|   |   +-- pages/                    # Step 1-7 wizard pages
|   |   |   +-- Step1ClinicalContext.tsx
|   |   |   +-- Step2DataExploration.tsx
|   |   |   +-- Step3DataPreparation.tsx
|   |   |   +-- Step4ModelParameters.tsx
|   |   |   +-- Step5Results.tsx
|   |   |   +-- Step6Explainability.tsx
|   |   |   +-- Step7Ethics.tsx
|   |   +-- components/               # Reusable UI components
|   |   |   +-- NavBar.tsx            # Specialty switcher, glossary
|   |   |   +-- WizardProgress.tsx    # Step progress tracker
|   |   |   +-- SpecialtySelector.tsx # 20-specialty grid
|   |   |   +-- ColumnMapperModal.tsx # Target column confirmation
|   |   |   +-- ErrorModal.tsx       # Error display modal
|   |   |   +-- charts/              # Visualization components
|   |   |       +-- ConfusionMatrixChart.tsx  # 2x2 confusion matrix
|   |   |       +-- KNNScatterCanvas.tsx     # KNN decision boundary
|   |   |       +-- PRCurveChart.tsx         # Precision-Recall curve
|   |   |       +-- ROCCurveChart.tsx        # ROC curve with AUC badge
|   |   +-- api/                      # API client layer
|   |   |   +-- client.ts            # Axios instance + interceptors
|   |   |   +-- specialties.ts       # Specialty endpoints
|   |   |   +-- data.ts              # Explore + Prepare endpoints
|   |   |   +-- ml.ts                # Train + Compare endpoints
|   |   |   +-- explain.ts           # Explainability + Ethics + Certificate
|   |   +-- types/index.ts           # Shared TypeScript interfaces
|   |   +-- styles/globals.css        # Global CSS + theme variables
|   |   +-- App.tsx                   # Main wizard state manager
|   |   +-- main.tsx                  # Application entry point
|   +-- package.json
|   +-- vite.config.ts
|
+-- backend/                          # FastAPI REST API + ML engine
|   +-- app/
|   |   +-- main.py                   # FastAPI setup, CORS, routers
|   |   +-- routers/
|   |   |   +-- data_router.py        # /specialties, /explore, /prepare
|   |   |   +-- ml_router.py          # /train, /compare, /models
|   |   |   +-- explain_router.py     # /explain/*, /ethics, /certificate
|   |   +-- services/
|   |   |   +-- data_service.py       # Dataset loading, exploration, preprocessing
|   |   |   +-- ml_service.py         # Model building, training, evaluation
|   |   |   +-- explain_service.py    # SHAP explanations, clinical mapping
|   |   |   +-- ethics_service.py     # Fairness audit, bias detection
|   |   |   +-- certificate_service.py # PDF certificate generation
|   |   |   +-- specialty_registry.py # 20 specialty definitions + datasets
|   |   +-- models/
|   |   |   +-- schemas.py            # Data exploration/preparation DTOs
|   |   |   +-- ml_schemas.py         # Training/evaluation DTOs
|   |   |   +-- explain_schemas.py    # Explainability/ethics DTOs
|   |   +-- utils/                    # Utility modules
|   +-- data_cache/                   # Cached clinical CSV datasets
|   +-- datasets/                     # Additional dataset storage
|   +-- tests/                        # pytest test suite (178 tests)
|   |   +-- conftest.py              # Shared fixtures
|   |   +-- test_step1_clinical_context.py
|   |   +-- test_step2_data_exploration.py
|   |   +-- test_step3_data_preparation.py
|   |   +-- test_step6_explainability.py
|   |   +-- test_step7_ethics.py
|   |   +-- test_certificate.py
|   +-- pytest.ini
|   +-- requirements.txt
|
+-- hf-space/                         # HuggingFace Spaces deployment
|   +-- main_hf.py                    # Combined API + SPA entrypoint
|   +-- Dockerfile                    # HF-specific Docker build
|   +-- README.md                     # HF Space metadata
|
+-- docs/                             # Documentation & design specs
|   +-- ML_Tool_User_Guide.md         # Course user manual
|   +-- Sprint_1_Assignment.md        # Sprint 1 requirements
|   +-- Clinical_Specialties_Dataset_Collection.pdf
|   +-- diagrams/                     # C4 architecture + toolchain PDFs
|   +-- drawio/                       # Editable draw.io source files
|   +-- mermaid/                      # C4 architecture (Mermaid source)
|   +-- iso42001/                     # ISO 42001 AI governance report
|   +-- seng430-sprints/              # Sprint requirements from instructor
|   +-- qa/                           # QA test reports (PDF)
|   +-- reports/                      # Progress reports + screenshots
|
+-- jira/                             # Jira backlog documentation
|   +-- JIRA.md                       # Product backlog report
|   +-- SPRINT_1_TASK_BOARD.md        # Sprint 1 task breakdown
|
+-- local/                            # Local-only extensions
|   +-- model-arena/                  # Model Arena comparison feature
|       +-- arena/                    # Backend (router, service, schemas)
|       +-- frontend/                 # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
|   +-- pull_request_template.md      # PR template linked to Jira
|   +-- workflows/deploy-hf.yml      # Auto-deploy to HuggingFace on release
|
+-- Dockerfile                        # Multi-stage build (Node + Python)
+-- docker-compose.yml                # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md                         # AI coding assistant context
+-- SETUP.md                          # Local development setup guide
+-- README.md

Live Demo & Docker

🌐 Live Demo

The application is deployed on HuggingFace Spaces — no installation required:

➡️ huggingface.co/spaces/0xBatuhan4/HealthWithSevgi

🐳 Docker (single command)

Pull and run the pre-built container image from GitHub Container Registry:

docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latest

Open http://localhost:7860 — that's it.

Alternatively, build from source:

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgi

Docker Compose (local development)

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker-compose up --build

This starts both the backend API and frontend dev server with hot-reload.


Quick Start

Prerequisites (for local development)

Tool Version Required For
Python >= 3.10 Backend
Node.js >= 18 Frontend
Git latest Version control

Local Development

Backend:

cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Start the API server
uvicorn app.main:app --reload --port 8001

API docs available at: http://localhost:8001/docs (Swagger UI)

Frontend (in a separate terminal):

cd frontend

# Install dependencies
pnpm install

# Start the dev server
pnpm dev

App available at: http://localhost:5173 (proxies /api requests to port 8001)

Environment Variables

Create a .env file in the project root:

# Backend
BACKEND_PORT=8001
DEBUG=true

# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001

API Reference

All endpoints are prefixed with /api. Full interactive documentation is available at /docs when the backend is running.

Specialties

Method Endpoint Description
GET /api/specialties List all 20 specialties
GET /api/specialties/{id} Get specialty details (description, features, clinical context)

Data

Method Endpoint Description
POST /api/explore Upload CSV or load built-in dataset; returns column stats + class distribution
POST /api/prepare Preprocess data (split, normalize, SMOTE); returns session_id

ML Training

Method Endpoint Description
POST /api/train Train a model; returns model_id + evaluation metrics
POST /api/compare/{model_id} Add model to comparison table
GET /api/compare/{session_id} Get all compared models for a session
DELETE /api/compare/{session_id} Clear comparison table
GET /api/models/{model_id} Get model metadata

Explainability

Method Endpoint Description
GET /api/explain/global/{model_id} Global feature importance (top 10 features + clinical names)
GET /api/explain/patient/{model_id}/{index} Single-patient SHAP waterfall explanation

Ethics & Certificate

Method Endpoint Description
GET /api/ethics/{model_id} Subgroup fairness audit + bias warnings + checklist
POST /api/ethics/checklist Update EU AI Act checklist item
POST /api/certificate Generate and download PDF compliance certificate

Health

Method Endpoint Description
GET / Status check ({status: "ok"})
GET /health Health probe ({status: "healthy"})

Testing

The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — 178 tests across 6 test files.

cd backend

# Run all tests
pytest -v

# Run a specific test file
pytest -v tests/test_step1_clinical_context.py

# Run only slow tests (domain context validation)
pytest -v -m slow

Test coverage:

Test File Covers Key Assertions
test_step1_clinical_context.py Specialty registry All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling
test_step2_data_exploration.py Data exploration CSV upload validation, missing value detection, class distribution, imbalance warnings
test_step3_data_preparation.py Preprocessing Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention
test_step6_explainability.py SHAP explanations Global importance, patient explanation, What-If analysis, sample patient selection
test_step7_ethics.py Fairness audit Ethics endpoint, case study severity, checklist toggle, bias detection thresholds
test_certificate.py PDF generation Certificate content type, PDF magic bytes, checklist state persistence

Total: 178 tests — all passing.


Deployment

HuggingFace Spaces

The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:

  1. Stage 1 — Builds the React frontend with pnpm
  2. Stage 2 — Installs Python dependencies
  3. Stage 3 — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860

hf-space/main_hf.py serves both the FastAPI backend and the static React build from a single process.

Live demo: huggingface.co/spaces/0xBatuhan4/HealthWithSevgi


Branch Strategy

Branch Purpose
main Production-ready, protected
develop Integration branch for sprint work
feature/US-XXX One branch per user story

Rules:

  • All changes go through Pull Requests (use the PR template)
  • PRs require at least 1 approval
  • main and develop are protected — no direct pushes
  • PR titles follow: feat/fix/docs(US-XXX): description

Team

Role Name Student ID
Product Owner + Developer Efe Çelik 202128016
UX Designer Burak Aydoğmuş 202128028
Lead Developer + Scrum Master Batuhan Bayazıt 202228008
Developer Berat Mert Gökkaya 202228019
QA / Documentation Lead Berfin Duru Alkan 202228005

Links

  • Jira Board: Jira
  • Figma Designs: Figma
  • GitHub Wiki: Wiki
  • API Docs: http://localhost:8001/docs (when running locally)

License

This project is developed as part of the SENG 430 Software Quality Assurance course at Cankaya University. All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors