An interactive, browser-based machine learning education tool for healthcare professionals.
SENG 430 - Software Quality Assurance Cankaya University - Spring 2025-2026 Instructor: Dr. Sevgi Koyuncu Tunç
HealthWithSevgi guides clinicians through a complete ML pipeline in 7 steps — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with zero coding required.
Live Demo | Jira Board | Figma Designs | Setup Guide
- Overview
- The 7-Step Pipeline
- Supported Specialties
- ML Models
- Tech Stack
- Architecture
- Project Structure
- Getting Started
- API Reference
- Testing
- Deployment
- Branch Strategy
- Team
- License
Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.
Key capabilities:
- 20 medical specialties with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
- 8 ML classifiers with interactive hyperparameter tuning via sliders
- SHAP-based explainability — global feature importance and single-patient waterfall explanations
- Fairness auditing — subgroup performance analysis across demographics with bias detection
- EU AI Act compliance checklist with downloadable PDF certificate
- No server-side data storage — all session data is held in-memory and evicted automatically
| Step | Name | What Happens |
|---|---|---|
| 1 | Clinical Context | Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap. |
| 2 | Data Exploration | Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable. |
| 3 | Data Preparation | Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping). |
| 4 | Model & Parameters | Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest). |
| 5 | Results & Evaluation | View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison. |
| 6 | Explainability | Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., "High glucose increases diabetes risk by 0.23"). |
| 7 | Ethics & Bias | Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate. |
| # | Specialty | Prediction Task | Dataset | Samples |
|---|---|---|---|---|
| 1 | Cardiology | 30-day heart failure mortality | Heart Failure Clinical Records | ~300 |
| 2 | Radiology | Pneumonia detection (chest X-ray metadata) | NIH Chest X-ray | 100K+ |
| 3 | Nephrology | Chronic kidney disease detection | UCI CKD | 400 |
| 4 | Oncology - Breast | Malignant vs. benign biopsy | Wisconsin Breast Cancer | 569 |
| 5 | Neurology - Parkinson's | Parkinson's from voice biomarkers | UCI Parkinson's | 195 |
| 6 | Endocrinology - Diabetes | Diabetes onset within 5 years | Pima Indians | 768 |
| 7 | Hepatology - Liver | Liver disease detection | Indian Liver Patient | 583 |
| 8 | Cardiology - Stroke | Stroke risk prediction | Kaggle Stroke Prediction | 5,110 |
| 9 | Mental Health | Depression severity (PHQ-9) | Kaggle Depression | ~1,000 |
| 10 | Pulmonology - COPD | COPD exacerbation risk | PhysioNet + Kaggle | ~1,000 |
| 11 | Haematology - Anaemia | Anaemia type classification | Kaggle Anaemia | ~400 |
| 12 | Dermatology | Benign vs. malignant skin lesion | HAM10000 metadata | ~10K |
| 13 | Ophthalmology | Diabetic retinopathy detection | UCI Diabetic Retinopathy | 1,151 |
| 14 | Orthopaedics - Spine | Disc herniation / spondylolisthesis | UCI Vertebral Column | 310 |
| 15 | ICU / Sepsis | Sepsis onset within 6 hours | PhysioNet Sepsis | ~40K |
| 16 | Obstetrics - Fetal Health | Fetal health classification (CTG) | UCI Fetal Health | 2,126 |
| 17 | Cardiology - Arrhythmia | Arrhythmia detection (ECG) | UCI Arrhythmia | 452 |
| 18 | Oncology - Cervical | Cervical cancer risk | UCI Cervical Cancer | 858 |
| 19 | Thyroid / Endocrinology | Thyroid function classification | UCI Thyroid | 9,172 |
| 20 | Pharmacy - Readmission | Hospital readmission risk | UCI Diabetes 130-US | 101,766 |
| Model | Category | Key Hyperparameters |
|---|---|---|
| K-Nearest Neighbors | Instance-based | k (1-25), distance metric |
| Support Vector Machine | Boundary-based | C (0.01-100), kernel (linear/rbf/poly) |
| Decision Tree | Tree-based | max_depth (1-20), criterion (gini/entropy) |
| Random Forest | Ensemble | n_estimators (10-500), max_depth |
| Logistic Regression | Linear | C (0.001-100), solver (lbfgs/saga) |
| Naive Bayes | Probabilistic | var_smoothing (1e-12 to 1e-3) |
| XGBoost | Gradient Boosting | n_estimators, max_depth, learning_rate |
| LightGBM | Gradient Boosting | n_estimators, max_depth, learning_rate |
All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18, TypeScript, Vite | Single-page wizard application |
| UI Components | Recharts, Lucide Icons, react-dropzone | Charts, icons, file uploads |
| State Management | TanStack React Query | Server state caching and synchronization |
| Backend | FastAPI, Python 3.12 | REST API with auto-generated OpenAPI docs |
| ML Engine | scikit-learn, XGBoost, LightGBM | Model training, evaluation, cross-validation |
| Explainability | SHAP | TreeExplainer (tree models), KernelExplainer (linear), permutation importance |
| Data Processing | pandas, numpy, imbalanced-learn | Data cleaning, normalization, SMOTE |
| PDF Generation | ReportLab | Compliance certificate export |
| Containerization | Docker (multi-stage) | Production deployment |
| Hosting | HuggingFace Spaces | Live demo environment |
| Package Manager | pnpm (frontend), pip (backend) | Dependency management |
📐 Full Architecture Diagrams (Google Drive) — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.
+---------------------+
| Browser (React) |
| Wizard UI (SPA) |
+----------+----------+
|
HTTP/REST (JSON)
|
+----------v----------+
| FastAPI Backend |
+----------+----------+
|
+----------------------+----------------------+
| | | |
+--------v---+ +------v-----+ +-----v------+ +-----v--------+
| DataService| | MLService | |ExplainSvc | | EthicsService|
| | | | | | | |
| - Explore | | - Train | | - SHAP | | - Subgroup |
| - Prepare | | - Evaluate | | - Waterfall| | - Bias detect|
| - SMOTE | | - Compare | | - Clinical | | - EU AI Act |
+-----+------+ +------+-----+ +------+-----+ +------+-------+
| | | |
v v v v
+-----------+ +------------+ +------------+ +-----------+
| In-Memory | | In-Memory | | SHAP | | ReportLab |
| Sessions | | Models | | Library | | PDF Gen |
| (LRU 50) | | (LRU 100+)| | | | |
+-----------+ +------------+ +------------+ +-----------+
Data flow: Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate
HealthWithSevgi/
|
+-- frontend/ # React 18 + Vite + TypeScript
| +-- src/
| | +-- pages/ # Step 1-7 wizard pages
| | | +-- Step1ClinicalContext.tsx
| | | +-- Step2DataExploration.tsx
| | | +-- Step3DataPreparation.tsx
| | | +-- Step4ModelParameters.tsx
| | | +-- Step5Results.tsx
| | | +-- Step6Explainability.tsx
| | | +-- Step7Ethics.tsx
| | +-- components/ # Reusable UI components
| | | +-- NavBar.tsx # Specialty switcher, glossary
| | | +-- WizardProgress.tsx # Step progress tracker
| | | +-- SpecialtySelector.tsx # 20-specialty grid
| | | +-- ColumnMapperModal.tsx # Target column confirmation
| | | +-- ErrorModal.tsx # Error display modal
| | | +-- charts/ # Visualization components
| | | +-- ConfusionMatrixChart.tsx # 2x2 confusion matrix
| | | +-- KNNScatterCanvas.tsx # KNN decision boundary
| | | +-- PRCurveChart.tsx # Precision-Recall curve
| | | +-- ROCCurveChart.tsx # ROC curve with AUC badge
| | +-- api/ # API client layer
| | | +-- client.ts # Axios instance + interceptors
| | | +-- specialties.ts # Specialty endpoints
| | | +-- data.ts # Explore + Prepare endpoints
| | | +-- ml.ts # Train + Compare endpoints
| | | +-- explain.ts # Explainability + Ethics + Certificate
| | +-- types/index.ts # Shared TypeScript interfaces
| | +-- styles/globals.css # Global CSS + theme variables
| | +-- App.tsx # Main wizard state manager
| | +-- main.tsx # Application entry point
| +-- package.json
| +-- vite.config.ts
|
+-- backend/ # FastAPI REST API + ML engine
| +-- app/
| | +-- main.py # FastAPI setup, CORS, routers
| | +-- routers/
| | | +-- data_router.py # /specialties, /explore, /prepare
| | | +-- ml_router.py # /train, /compare, /models
| | | +-- explain_router.py # /explain/*, /ethics, /certificate
| | +-- services/
| | | +-- data_service.py # Dataset loading, exploration, preprocessing
| | | +-- ml_service.py # Model building, training, evaluation
| | | +-- explain_service.py # SHAP explanations, clinical mapping
| | | +-- ethics_service.py # Fairness audit, bias detection
| | | +-- certificate_service.py # PDF certificate generation
| | | +-- specialty_registry.py # 20 specialty definitions + datasets
| | +-- models/
| | | +-- schemas.py # Data exploration/preparation DTOs
| | | +-- ml_schemas.py # Training/evaluation DTOs
| | | +-- explain_schemas.py # Explainability/ethics DTOs
| | +-- utils/ # Utility modules
| +-- data_cache/ # Cached clinical CSV datasets
| +-- datasets/ # Additional dataset storage
| +-- tests/ # pytest test suite (178 tests)
| | +-- conftest.py # Shared fixtures
| | +-- test_step1_clinical_context.py
| | +-- test_step2_data_exploration.py
| | +-- test_step3_data_preparation.py
| | +-- test_step6_explainability.py
| | +-- test_step7_ethics.py
| | +-- test_certificate.py
| +-- pytest.ini
| +-- requirements.txt
|
+-- hf-space/ # HuggingFace Spaces deployment
| +-- main_hf.py # Combined API + SPA entrypoint
| +-- Dockerfile # HF-specific Docker build
| +-- README.md # HF Space metadata
|
+-- docs/ # Documentation & design specs
| +-- ML_Tool_User_Guide.md # Course user manual
| +-- Sprint_1_Assignment.md # Sprint 1 requirements
| +-- Clinical_Specialties_Dataset_Collection.pdf
| +-- diagrams/ # C4 architecture + toolchain PDFs
| +-- drawio/ # Editable draw.io source files
| +-- mermaid/ # C4 architecture (Mermaid source)
| +-- iso42001/ # ISO 42001 AI governance report
| +-- seng430-sprints/ # Sprint requirements from instructor
| +-- qa/ # QA test reports (PDF)
| +-- reports/ # Progress reports + screenshots
|
+-- jira/ # Jira backlog documentation
| +-- JIRA.md # Product backlog report
| +-- SPRINT_1_TASK_BOARD.md # Sprint 1 task breakdown
|
+-- local/ # Local-only extensions
| +-- model-arena/ # Model Arena comparison feature
| +-- arena/ # Backend (router, service, schemas)
| +-- frontend/ # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
| +-- pull_request_template.md # PR template linked to Jira
| +-- workflows/deploy-hf.yml # Auto-deploy to HuggingFace on release
|
+-- Dockerfile # Multi-stage build (Node + Python)
+-- docker-compose.yml # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md # AI coding assistant context
+-- SETUP.md # Local development setup guide
+-- README.md
The application is deployed on HuggingFace Spaces — no installation required:
➡️ huggingface.co/spaces/0xBatuhan4/HealthWithSevgi
Pull and run the pre-built container image from GitHub Container Registry:
docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latestOpen http://localhost:7860 — that's it.
Alternatively, build from source:
git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgigit clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker-compose up --buildThis starts both the backend API and frontend dev server with hot-reload.
| Tool | Version | Required For |
|---|---|---|
| Python | >= 3.10 | Backend |
| Node.js | >= 18 | Frontend |
| Git | latest | Version control |
Backend:
cd backend
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Start the API server
uvicorn app.main:app --reload --port 8001API docs available at: http://localhost:8001/docs (Swagger UI)
Frontend (in a separate terminal):
cd frontend
# Install dependencies
pnpm install
# Start the dev server
pnpm devApp available at: http://localhost:5173 (proxies /api requests to port 8001)
Create a .env file in the project root:
# Backend
BACKEND_PORT=8001
DEBUG=true
# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001All endpoints are prefixed with /api. Full interactive documentation is available at /docs when the backend is running.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/specialties |
List all 20 specialties |
GET |
/api/specialties/{id} |
Get specialty details (description, features, clinical context) |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/explore |
Upload CSV or load built-in dataset; returns column stats + class distribution |
POST |
/api/prepare |
Preprocess data (split, normalize, SMOTE); returns session_id |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/train |
Train a model; returns model_id + evaluation metrics |
POST |
/api/compare/{model_id} |
Add model to comparison table |
GET |
/api/compare/{session_id} |
Get all compared models for a session |
DELETE |
/api/compare/{session_id} |
Clear comparison table |
GET |
/api/models/{model_id} |
Get model metadata |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/explain/global/{model_id} |
Global feature importance (top 10 features + clinical names) |
GET |
/api/explain/patient/{model_id}/{index} |
Single-patient SHAP waterfall explanation |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/ethics/{model_id} |
Subgroup fairness audit + bias warnings + checklist |
POST |
/api/ethics/checklist |
Update EU AI Act checklist item |
POST |
/api/certificate |
Generate and download PDF compliance certificate |
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Status check ({status: "ok"}) |
GET |
/health |
Health probe ({status: "healthy"}) |
The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — 178 tests across 6 test files.
cd backend
# Run all tests
pytest -v
# Run a specific test file
pytest -v tests/test_step1_clinical_context.py
# Run only slow tests (domain context validation)
pytest -v -m slowTest coverage:
| Test File | Covers | Key Assertions |
|---|---|---|
test_step1_clinical_context.py |
Specialty registry | All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling |
test_step2_data_exploration.py |
Data exploration | CSV upload validation, missing value detection, class distribution, imbalance warnings |
test_step3_data_preparation.py |
Preprocessing | Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention |
test_step6_explainability.py |
SHAP explanations | Global importance, patient explanation, What-If analysis, sample patient selection |
test_step7_ethics.py |
Fairness audit | Ethics endpoint, case study severity, checklist toggle, bias detection thresholds |
test_certificate.py |
PDF generation | Certificate content type, PDF magic bytes, checklist state persistence |
Total: 178 tests — all passing.
The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:
- Stage 1 — Builds the React frontend with pnpm
- Stage 2 — Installs Python dependencies
- Stage 3 — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860
hf-space/main_hf.py serves both the FastAPI backend and the static React build from a single process.
Live demo: huggingface.co/spaces/0xBatuhan4/HealthWithSevgi
| Branch | Purpose |
|---|---|
main |
Production-ready, protected |
develop |
Integration branch for sprint work |
feature/US-XXX |
One branch per user story |
Rules:
- All changes go through Pull Requests (use the PR template)
- PRs require at least 1 approval
mainanddevelopare protected — no direct pushes- PR titles follow:
feat/fix/docs(US-XXX): description
| Role | Name | Student ID |
|---|---|---|
| Product Owner + Developer | Efe Çelik | 202128016 |
| UX Designer | Burak Aydoğmuş | 202128028 |
| Lead Developer + Scrum Master | Batuhan Bayazıt | 202228008 |
| Developer | Berat Mert Gökkaya | 202228019 |
| QA / Documentation Lead | Berfin Duru Alkan | 202228005 |
- Jira Board: Jira
- Figma Designs: Figma
- GitHub Wiki: Wiki
- API Docs:
http://localhost:8001/docs(when running locally)
This project is developed as part of the SENG 430 Software Quality Assurance course at Cankaya University. All rights reserved.