M.S. Data Science @ Johns Hopkins University (incoming) · B.S. Mathematics, UC Irvine
📧 Samir2000VIP@gmail.com · 💼 LinkedIn
4+ years building production data science in healthcare — causal inference, predictive modeling, and NLP across 60,000+ patients and 20+ facilities. Built computer vision models at UCI, then moved into healthcare outcomes research — COPD cost-effectiveness, readmission modeling, chronic disease analytics, and four ASHP conference presentations. Heart failure outcomes manuscript under peer review (n=3,024, 11 primary care clinics).
GenomicsGPT — ML + LLM pipeline for clinical variant interpretation. XGBoost/LightGBM ensemble on 1.69M ClinVar variants (AUC = 0.9949, leakage-corrected 0.985) with SHAP explainability and Llama 3 / Claude clinical report generation. Manuscript targeting Bioinformatics Advances.
ClinicalRAG — RAG system for clinical question answering over 220 discharge summaries with hallucination guardrails, citation tracking, and systematic chunking evaluation. 97.6% condition recall, 95.2% abstention accuracy.
CausalCare — Causal inference analysis of ICU beta-blocker treatment effects using propensity matching, IPW, doubly robust estimation, Double ML, and Causal Forest on eICU data via EconML/DoWhy.
REIGN NBA Analytics — Cross-era player impact metric with 4 era-specific regression models, playoff opponent adjustments, and interactive visualizations. 29,969 player-seasons across 80 years. Research Paper
Diabetic Retinopathy Classification — CNN-based 5-class severity grading from retinal fundus images (F1 = 0.94) with GradCAM interpretability. Research Paper
Gene Expression Cancer Prediction — ML classification of AML vs. ALL leukemia subtypes from 7,000+ gene expression features (F1 = 0.95).
- 0.9949 AUC on 1.69M genetic variants (GenomicsGPT)
- $83.50 PMPM cost reduction in COPD intervention analysis (p = 0.0027, n = 997)
- 97.6% condition recall on clinical RAG system
- 0.94 F1 on 5-class diabetic retinopathy classification
- 0.95 F1 on gene expression cancer subtype prediction
ML/AI: Python · scikit-learn · XGBoost · LightGBM · TensorFlow/Keras · SHAP · LangChain · EconML/DoWhy · pandas · NumPy · R
LLM/NLP: Llama 3 (Ollama) · Claude API · ChromaDB · RAG pipelines · prompt engineering
Engineering: React · TypeScript · Vite · Flask · FastAPI · PostgreSQL · SQL · Git
Domain: EHR/clinical data · genomics · causal inference · healthcare analytics · sports analytics
In my free time! — chess (2500+ rated), basketball, piano, and gaming.