This project delivers an end-to-end data pipeline and interactive analytics dashboard for analyzing Canadian residency programs.
It transforms semi-structured CaRMS program descriptions into structured PostgreSQL tables and generates strategic insights through a cloud-deployed Dash application.
The project demonstrates applied data engineering, statistical analysis, and business intelligence in a production-ready environment.
Live App:
https://carms-dashboard.onrender.com/
Raw CSV
→ ETL Parsing Layer (Local)
→ PostgreSQL (Staging — Render Hosted)
→ Cleaning & Feature Engineering (Local)
→ Analytics Tables (PostgreSQL — Render Hosted)
→ Dash Dashboard (Render Deployment)
pip install -r requirements.txt
Windows
set DATABASE_URL=postgresql://user:password@host:port/database
Mac / Linux
export DATABASE_URL=postgresql://user:password@host:port/database
python pipeline/etl_pipeline.py
python pipeline/cleaning_layer.py
python app.py
Access locally at:
- PostgreSQL hosted on Render
- Dash application deployed as a Web Service
- Gunicorn used as production WSGI server
- SSL-enforced database connection
gunicorn app:app
DATABASE_URL
- Semi-structured text parsing into structured fields
- Schema normalization and type enforcement
- Feature engineering (quota per residency, time buckets, flags)
- Column-level data quality metrics table
- PostgreSQL cloud integration
- Kruskal–Wallis test to compare quota-per-residency distributions across provinces
- Dunn post-hoc test for pairwise provincial comparisons
- Boxplot visualization of statistical findings
- National KPI summary
- Residency and quota distribution by province and city
- Specialty portfolio analysis
- Program duration funnel
- Accreditation trend over time
- Data quality transparency section
- Python (Pandas, NumPy)
- PostgreSQL
- SQLAlchemy
- Dash & Plotly
- SciPy / scikit-posthocs
- Gunicorn
- Render (Cloud Deployment)
This project demonstrates:
- Structured data engineering workflow
- Cloud database deployment
- Statistical reasoning beyond descriptive analytics
- End-to-end pipeline ownership
- Production deployment of an interactive analytics application
It was developed as part of a Junior Data Scientist application requirement and serves as a portfolio-ready example for:
- Data Scientist
- Data Analyst
- BI Developer
- Data Engineer