Skip to content

gmbeddard/em255-intro_data_science-finalproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

EM212 Intro to Data Science Final Project: An Analysis on Endometriosis in South Asian Women

Project Overview

As a part of our work for our women's health-focused startup, Saha, we developed a model investigating the factors influencing endometriosis in South Asian women. We focused on symptoms such as infertility, menstrual irregularities, blood pressure, and depression. The study also examines socioeconomic impacts on healthcare access in the U.S., particularly regarding the diagnosis and treatment of endometriosis. Using the National Health and Nutrition Examination Survey (NHANES) dataset (2005-2006 cycle), the project employs data science and machine learning techniques to explore patterns, build predictive models, and derive actionable insights.

Features

  • Data Cleaning and Preparation: Integrated data from reproductive health, demographics, depression, and blood pressure datasets, focusing on critical variables such as diagnosis, age, and socioeconomic indicators.
  • Data Visualization: Created visualizations like box plots, pie charts, heatmaps, and point plots to analyze trends across variables such as race, income, and menstrual irregularity.
  • Predictive Modeling:
    • Logistic Regression, Decision Tree Classification, and Naive Bayes models were implemented.
    • Naive Bayes achieved the best performance, with 80% accuracy and better balance between true positives and negatives.
  • Ethical Considerations: Addressed issues of algorithmic bias, data security, and potential misuse of findings, emphasizing inclusivity and fairness in healthcare applications.

Results and Insights

  • Key Factors: Features such as age, days since the last period, and socioeconomic status significantly impact diagnosis likelihood.
  • Findings:
    • Higher income levels correlate with earlier and more consistent diagnoses.
    • Non-Hispanic white women are diagnosed more frequently, likely reflecting healthcare access disparities.
    • Irregular menstrual cycles are strongly associated with endometriosis.

Future Directions

  • Integrate updated NHANES data and broaden demographic representation to include underserved groups.
  • Enhance predictive models by refining features and testing advanced algorithms, since our initial models were not great.
  • Expand focus to global populations, particularly South Asian women, for a more inclusive study.

Repository Content

Authors

Gabby Beddard, Camille Gimilaro

Thanks for reading :D

About

A data science project analyzing symptoms, socioeconomic impacts, and diagnosis trends of endometriosis using NHANES datasets. Features machine learning models and visualizations to enhance healthcare insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors