Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 1.04 KB

File metadata and controls

23 lines (17 loc) · 1.04 KB

Healthcare Fraud Detection – CMS, Kaggle & Synthea Datasets

This project analyzes healthcare fraud patterns using three large-scale datasets:

  1. CMS Medicare Data – Public provider billing records with cost and service metrics.
  2. Kaggle Healthcare Fraud Dataset – Real-world data labeled with fraudulent claims.
  3. Synthea Synthetic Data – Comprehensive synthetic EHR data including patients, conditions, and claims.

📊 Project Objectives:

  • Explore service volume and financial metrics across providers and states.
  • Detect patterns of excessive billing and service anomalies.
  • Prepare datasets for machine learning models focused on fraud detection.

🔧 Tools & Techniques:

  • Python, Pandas, Seaborn, Scikit-learn, Matplotlib
  • Data preprocessing, outlier handling, feature creation
  • Visualization and statistical summary
  • Prepared for modeling with textbook methods from An Introduction to Statistical Learning

Course Project – Statistical Learning (Spring 2025)
Team Members: Nhan, Tan, Andre