📍 Richmond, Kentucky, USA | 🇺🇸 U.S. Permanent Resident — No Sponsorship Required
I design and ship production data science systems — ML pipelines, causal inference engines, AI platforms, and real-time analytics infrastructure — applied to healthcare and population health problems at scale.
My background combines hands-on engineering with deep quantitative methodology: I build the models and I understand the math behind them.
Core areas:
- 🤖 AI / LLM Systems — RAG pipelines, multi-agent architectures, healthcare Q&A platforms
- 🧠 Machine Learning & MLOps — end-to-end pipelines, model validation, SHAP explainability, CI/CD
- 📊 Healthcare Analytics — risk stratification, population health modeling, clinical decision intelligence
- 🔬 Causal Inference & RWE — PSM, DiD, ITS, TMLE, SuperLearner — production-grade, not just academic
Philosophy: Models that don't deploy don't matter. Data science should produce systems, not papers.
| What I Built | Result |
|---|---|
| ML predictive models for health outcomes | ~30% improvement in prediction accuracy |
| Automated data pipelines (ClickHouse + Python + dbt) | Reporting latency: 10–14 days → real-time |
| Causal inference & RWE studies | 25+ production studies informing program decisions |
| Medicare risk adjustment pipeline (U.S.) | Validated ATT of −$391/member, p<0.0001 |
| Healthcare analytics platforms | Scale: 8.5M+ individuals across multiple health systems |
| Peer-reviewed publications | 30+ articles incl. The Lancet Global Health |
| Project | Description | Stack |
|---|---|---|
| AI-Powered Research Assistant | Production RAG platform for scientific paper intelligence with modular LangGraph workflows | Python · LangGraph · LangChain · ChromaDB · FastAPI |
| Automated Research & Report Generation | Multi-agent AI system for research retrieval, synthesis and structured reporting | Python · LangGraph · FastAPI |
| MultiAgent Research Graph | AI knowledge graph generator from natural language queries | Python · LangGraph · LLMs |
| Healthcare Q&A RAG Platform | Enterprise healthcare knowledge retrieval with vector search and RBAC | Python · FastAPI · ChromaDB |
| Project | Description | Stack |
|---|---|---|
| Medicare Risk Adjustment Pipeline | Validated U.S. Medicare RAF pipeline — ATT −$391/member, p<0.0001 | Python · R · SQL |
| Insurance Premium Prediction | End-to-end ML pipeline with CI/CD, MLflow tracking and SHAP explainability | Python · XGBoost · MLflow · SHAP |
| DHS RAG System | Semantic intelligence system for Demographic & Health Survey datasets | Python · RAG · Vector Search |
| Multimodal PDF RAG System | Document intelligence platform with OCR, table extraction and semantic search | Python · FastAPI · React |
| Project | Description | Stack |
|---|---|---|
| Medical Diagnosis AI | ML prototype for clinical diagnostic support | Python · scikit-learn |
| KDHS Memory Bot | Multimodal RAG chatbot for large public health survey datasets | Python · OCR · Vector DB |
| Kenya Community Health AI | AI analytics platform integrating national digital health systems | Python · Multi-Agent AI |
Languages Python · R · SQL
Machine Learning scikit-learn · XGBoost · PyTorch · TensorFlow · MLflow · SHAP · Survival models
AI / LLM LangChain · LangGraph · RAG · Vector Databases (ChromaDB, Pinecone) · Multi-Agent Systems · Prompt Engineering
Data Infrastructure AWS (Redshift · Glue · SageMaker · S3) · ClickHouse · PostgreSQL · dbt · FastAPI · Docker · Airflow
Visualization & BI Power BI · Tableau · Plotly · ggplot2
Causal & Statistical Methods PSM · Difference-in-Differences · Interrupted Time Series · TMLE · SuperLearner · Bayesian modeling · Mixed-effects models · Pharmacoepidemiology
PhD — Epidemiology (Quantitative Methods, Causal Inference & Health Data Science) Advanced training in study design, statistical theory, and evidence generation — applied directly to ML model validation, experiment design, and real-world evidence production.
MSc — Health Systems Management BSc — Statistics
Certifications:
- Stanford University — Machine Learning in Medicine
- AWS Certified Data Science & Analytics
- Google Data Analytics Professional Certificate
- DataCamp Machine Learning Scientist Track
- Generative AI (multiple platforms)
A common assumption: PhD = academic researcher = not hands-on.
That's not my profile.
My PhD is in quantitative epidemiology — which means advanced statistics, causal modeling, experimental design, and evidence validation. These are the same foundations that make a data scientist rigorous: knowing why a model works, not just that it works.
In practice, I:
- Build and ship ML pipelines, not just analyze data
- Design causal inference studies that hold up to scrutiny
- Write production Python and SQL, not just R markdown
- Lead analytics engineering alongside research
The PhD makes the data science better. It doesn't replace it.
Hands-on and leadership roles across data science, healthcare analytics, and AI:
- Senior / Principal Data Scientist
- Healthcare Data Scientist
- Clinical Data Scientist
- Population Health Analyst / Analytics Lead
- Real-World Evidence Scientist / Analyst
- HEOR Data Scientist
- Decision Science / Advanced Analytics
- Director / Associate Director, Data Science or Epidemiology
Target sectors: Pharma · Biotech · CRO · Health tech · Payers & Insurers · Clinical AI · Population health
Data Science · Healthcare Analytics · AI Systems · Causal Inference · Real-World Evidence
📩 keyegon@gmail.com | 🔗 LinkedIn | 🌐 Portfolio

