AI-powered risk assessment system for California healthcare facilities
CareEnforced AI is a dual-track machine learning platform that predicts regulatory enforcement risk for California healthcare facilities. It analyzes financial, operational, and staffing data to help facility administrators proactively identify compliance issues before penalties occur.
Click on the thumbnail below to play the full live demo recording of CareEnforced AI.
-
Long-Term Care Facilities - Nursing homes, skilled nursing facilities
- Best Model: Enhanced XGBoost (ROC-AUC: 0.8035)
- Features: 248 operational, financial, and staffing metrics
- Performance: 66.27% precision, 54.11% recall
-
Hospitals - Acute care hospitals, medical centers
- Best Model: LightGBM (ROC-AUC: 0.8021)
- Features: 383 revenue, patient day, and utilization metrics
- Performance: 52.94% precision, 30.00% recall
- β‘ Real-time Risk Assessment - Instant predictions via FastAPI backend
- π Interactive Web Interface - React/Vite frontend with dynamic forms
- π― Actionable Recommendations - AI-generated compliance improvement suggestions
- π Feature Attribution - Understand which factors drive risk scores
- π₯ Dual Facility Support - Separate models optimized for hospitals vs. long-term care
| Model | ROC-AUC | F1-Score | Precision | Recall | Status |
|---|---|---|---|---|---|
| Enhanced XGBoost β | 0.8035 | 0.5957 | 0.6627 | 0.5411 | Recommended |
| Ensemble (3-model) | 0.8054 | 0.6108 | 0.6231 | 0.5990 | Available |
| Enhanced Logistic Regression | 0.7661 | 0.6034 | 0.5447 | 0.6763 | High Recall |
| Enhanced Random Forest | 0.7812 | 0.5514 | 0.6258 | 0.4928 | Available |
| Baseline Random Forest | 0.7861 | 0.5376 | 0.6691 | 0.4493 | Legacy |
Recommendation: Use Enhanced XGBoost for production (best balance of precision/recall, no geographic bias)
| Model | Accuracy | ROC-AUC | Precision | Recall |
|---|---|---|---|---|
| LightGBM β | 0.8585 | 0.8021 | 0.5294 | 0.3000 |
| Random Forest | 0.8439 | 0.7950 | 0.4000 | 0.1333 |
| XGBoost | 0.8537 | 0.7644 | 0.5000 | 0.3000 |
| Logistic Regression | 0.8195 | 0.7036 | 0.3871 | 0.4000 |
- Geographic Bias Removed: All enhanced models exclude Los Angeles County features for fairness
- Recall Trade-offs: Enhanced models prioritized catching violations (recall) over minimizing false alarms (precision)
- Ensemble Benefits: 3-model ensemble offers highest overall performance but requires 3x storage/inference time
- Feature Engineering Impact: Derived staffing ratios and financial margins significantly improved predictions
- Class Imbalance Handling: Class weights and hyperparameter tuning improved minority class detection (+48% recall)
- Proactive Monitoring β Enhanced XGBoost (balanced precision/recall)
- Regulatory Enforcement β Enhanced Logistic Regression (high recall 67.63%)
- Mission-Critical Decisions β Ensemble (highest accuracy, confidence scoring)
- Research/Analysis β Enhanced Random Forest (interpretable feature importances)
π Detailed Analysis: See docs/model_comparison.md and docs/random_forest_model_details.md
Frontend
- React 18 with Vite
- React Router for navigation
- Recharts for visualizations
- TailwindCSS for styling
Backend
- FastAPI (Python 3.9+)
- scikit-learn, XGBoost, LightGBM
- Pandas, NumPy for data processing
- Uvicorn ASGI server
Machine Learning
- Random Forest, XGBoost, LightGBM, Logistic Regression
- Ensemble voting classifier
- SHAP for model interpretability
- RandomizedSearchCV for hyperparameter tuning
User Input (Web Form)
β
Frontend (React) β API Request
β
Backend (FastAPI) β /predict endpoint
β
Model Pipeline:
1. Data Validation
2. Feature Engineering
3. Preprocessing (Imputation, Scaling, Encoding)
4. Model Prediction (XGBoost/LightGBM)
5. Feature Attribution
6. Recommendation Generation
β
JSON Response β Frontend
β
Results Display (Risk Score, Drivers, Recommendations)
FinalProject/
β
βββ backend/ # FastAPI backend server
β βββ main.py # API endpoints (/predict, /predict-hospital)
β βββ model.py # Long-term care model wrapper
β βββ ensemble_model.py # Ensemble prediction logic
β βββ requirements.txt # Python dependencies
β βββ top_features.json # Feature importance metadata
β βββ venv/ # Python virtual environment
β
βββ frontend/ # React/Vite frontend
β βββ src/
β β βββ pages/
β β β βββ HospitalInput.jsx # Hospital prediction form
β β β βββ LongTermInput.jsx # Long-term care form
β β βββ components/
β β β βββ ResultsPage.jsx # Prediction results display
β β β βββ ...
β β βββ App.jsx # Main application router
β βββ package.json # Node.js dependencies
β βββ vite.config.js # Vite build configuration
β
βββ models/ # Trained ML models (PKL files)
β βββ xgboost_model.pkl # β Long-term care (recommended)
β βββ risk_model_enhanced.pkl # Symlink to XGBoost
β βββ random_forest_model.pkl # Long-term care RF
β βββ logistic_regression_model.pkl # Long-term care LR
β βββ ensemble_metadata.json # Ensemble weights
β βββ hospital_lightgbm_model.pkl # β Hospital model
β βββ hospital_model_metadata.json # Hospital feature mapping
β
βββ scripts/ # Training & analysis scripts
β βββ train_model_enhanced.py # Main training (long-term care)
β βββ train_hospital_model_for_app.py # Hospital model training
β βββ train_hospital_models.py # Hospital model comparison
β βββ compare_models.py # Model performance comparison
β βββ visualize_model_performance.py # Performance visualizations
β βββ visualize_shap.py # SHAP interpretability
β βββ clean_data.py # Data preprocessing
β βββ merge_data.py # Merge 2022 + 2024 datasets
β βββ ... # Additional utilities
β
βββ data/ # Datasets
β βββ raw/ # Original Excel files
β β βββ longterm_care_v2_2022.xlsx
β β βββ longterm_care_v2_2024.xlsx
β β βββ hospital_v2_2022.xlsx
β β βββ hospital_v2_2024.xlsx
β βββ processed/ # Cleaned CSV files
β βββ longterm_care_cleaned.csv # 248 features, 4,831 facilities
β βββ hospital_cleaned.csv # 383 features, 1,025 hospitals
β βββ longterm_care_22_24.csv # Merged raw data
β βββ hospital_22_24.csv # Merged raw hospital data
β
βββ docs/ # Long-term care documentation
β βββ model_comparison.md # Comprehensive model guide
β βββ random_forest_model_details.md # RF architecture & performance
β βββ cleaned_data_dictionary.md # Feature definitions
β βββ geographic_feature_removal_summary.md # Bias mitigation
β βββ workflow_explanation.md # Development timeline
β βββ ...
β
βββ docs-hospitals/ # Hospital documentation
β βββ hospital_model_comparison.md # Hospital model performance
β βββ hospital_data_dictionary.md # Hospital feature definitions
β
βββ config/ # Configuration files
β βββ feature_mapping.json # Feature name translations
β
βββ output/ # Analysis outputs
β βββ missing_data_columns.md # Data quality report
β
βββ start_app.sh # π One-command app launcher
βββ README.md # This file
βββ DEPLOYMENT.md # Deployment instructions (Render, Vercel)
- Python 3.9+
- Node.js 18+
- Git (to clone the repository)
-
Clone the repository:
git clone https://github.com/christopheroueis/IAI-final-CAHealthEnforcement.git cd IAI-final-CAHealthEnforcement -
Make the start script executable (first time only):
chmod +x start_app.sh
-
Run the app:
./start_app.sh
-
Open your browser to http://localhost:5173
Backend:
cd backend
../.venv/bin/python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000Frontend (in a new terminal):
cd frontend
npm install # First time only
npm run dev -- --hostNote: The backend uses a shared virtual environment at .venv/ in the project root.
| Script | Purpose | Output |
|---|---|---|
scripts/train_model_enhanced.py |
Main training script for long-term care models | xgboost_model.pkl, random_forest_model.pkl, logistic_regression_model.pkl |
scripts/train_hospital_model_for_app.py |
Hospital model training | hospital_lightgbm_model.pkl |
scripts/compare_models.py |
Benchmarks all models | Performance comparison reports |
scripts/visualize_model_performance.py |
Generates performance charts | ROC curves, confusion matrices |
scripts/extract_top_features.py |
Extracts feature importances | top_features.json |
To retrain the long-term care models:
cd scripts
../.venv/bin/python train_model_enhanced.pyTo retrain the hospital model:
cd scripts
../.venv/bin/python train_hospital_model_for_app.pyNote: Training requires the cleaned datasets in data/processed/. Models are saved to models/ directory.
- Data Loading β
data/processed/longterm_care_cleaned.csvorhospital_cleaned.csv - Feature Engineering β Staffing ratios, financial margins, derived metrics
- Preprocessing β Imputation (mean/mode), StandardScaler, OneHotEncoder
- Hyperparameter Tuning β RandomizedSearchCV with 3-fold cross-validation
- Model Training β Fit on 80% train set with class weights
- Evaluation β Test on 20% holdout set
- Serialization β Save trained pipeline as
.pklfile
π Detailed Training Documentation: See docs/workflow_explanation.md
Source: California Department of Health Care Access and Information (HCAI)
Files: longterm_care_v2_2022.xlsx, longterm_care_v2_2024.xlsx
Records: 4,831 facilities (2022 + 2024 combined)
Features: 248 after cleaning (606 original, 358 dropped due to >50% missing data)
Key Feature Categories:
- Financial: Revenue, expenses, net income, margins
- Operational: Licensed beds, patient days, discharge patterns
- Staffing: RN/LVN/CNA hours per patient day
- Geographic: County, Health Service Area (HSA)
Source: California HCAI Hospital Annual Financial Data
Files: hospital_v2_2022.xlsx, hospital_v2_2024.xlsx
Records: 1,025 hospitals
Features: 383 after cleaning (600 original, 217 dropped)
Key Feature Categories:
- Revenue: Gross patient revenue, net revenue by payer
- Utilization: Patient days, discharges, visits, length of stay
- Expenses: Daily, ancillary, administrative
- Staffing: Full-time equivalents (FTE)
π Data Dictionaries:
docs/cleaned_data_dictionary.md(Long-term care)docs-hospitals/hospital_data_dictionary.md(Hospitals)
The application can be deployed to cloud platforms like Render (recommended), Vercel, or Railway.
π Full deployment instructions: See DEPLOYMENT.md
- Sign up at render.com
- Connect your GitHub repository
- Deploy Backend as a Web Service:
- Root Directory:
backend - Build Command:
pip install -r requirements.txt - Start Command:
uvicorn main:app --host 0.0.0.0 --port $PORT
- Root Directory:
- Deploy Frontend as a Static Site:
- Root Directory:
frontend - Build Command:
npm install && npm run build - Publish Directory:
dist - Environment Variable:
VITE_API_URL=<your-backend-url>
- Root Directory:
- Model Comparison Guide - Comprehensive comparison of all models with use case recommendations
- Random Forest Model Details - Architecture, training, and performance
- Hospital Model Comparison - Hospital-specific model benchmarks
- Geographic Feature Removal - Bias mitigation approach
- Long-Term Care Data Dictionary - Feature definitions and meanings
- Hospital Data Dictionary - Hospital feature reference
- Column Dictionary - Abbreviated column name mappings
- Workflow Explanation - Development timeline and process
- Enhanced Model Performance - Results from hyperparameter tuning
- EDA Report - Exploratory data analysis findings
- Missing Data Analysis - Data quality assessment
Need the best overall accuracy?
β Use Enhanced XGBoost (Long-term care) or LightGBM (Hospitals)
Need to catch all violations (high recall)?
β Use Enhanced Logistic Regression (67.63% recall)
Need explainable predictions for regulators?
β Use Logistic Regression (coefficient-based explanations)
Need maximum confidence in predictions?
β Use 3-Model Ensemble (model agreement scoring)
Need fast inference and small model size?
β Use Logistic Regression (~2MB, <10ms)
π Detailed decision guide: docs/model_comparison.md
Base URL: http://localhost:8000
Endpoints:
POST /predict- Long-term care facility risk predictionPOST /predict-hospital- Hospital risk predictionGET /health- Health check endpoint
Example Request (/predict):
{
"TOT_LIC_BEDS": 120,
"PRDHR_RN_Per_Day": 0.85,
"PRDHR_NA_Per_Day": 2.5,
"Net_Income_Margin": -0.03,
"COUNTY_x_Los Angeles": 0
}Example Response:
{
"risk_score": 0.68,
"risk_level": "Medium",
"top_risk_drivers": [
{
"feature": "RN Hours per Patient Day",
"contribution": 14.2,
"importance": 0.0132
}
],
"recommendations": [
{
"title": "Increase RN Staffing",
"description": "Current RN hours are below recommended...",
"impact": "high"
}
]
}To demo on your phone:
- Ensure computer and phone are on the same Wi-Fi network
- Run
./start_app.sh - Note the network URL displayed (e.g.,
http://192.168.1.x:5173) - Open that URL on your phone's browser
Data Source: California Department of Health Care Access and Information (HCAI)
Course: 95891 - Introduction to Artificial Intelligence, Fall 2025
Institution: [Your Institution Name]
This project is for educational purposes as part of a university course assignment.
Last Updated: December 2025
Repository: https://github.com/christopheroueis/IAI-final-CAHealthEnforcement
Contact: Christopher Oueis