This project demonstrates the design and implementation of a modern Data Warehouse using the Medallion Architecture (Bronze, Silver, and Gold layers). The primary objective was to transform siloed raw CRM and ERP data into a high-performance, validated Star Schema optimized for advanced Business Intelligence (BI) reporting.
- End-to-End Data Lineage: Successfully maintained 100% record reconciliation across all architecture layers.
- Advanced Data Cleansing: Leveraged Window Functions (
ROW_NUMBER) and CTEs to resolve a ~7% data inflation caused by many-to-many join conditions. - Cross-System Integration: Engineered a Tiered Join Strategy to bridge a ~5% gap resulting from mismatched Product Key formats between CRM and ERP source systems.
- Bronze Layer (Landing): Ingested raw CSV data into a landing zone using SQL scripts to preserve original data structures.
- Silver Layer (Cleansing): Performed data standardization, handled
NULLvalues viaCOALESCE, and executed deduplication logic. - Gold Layer (Curated): Developed a dimensional model (Star Schema) utilizing Surrogate Keys and complex join logic to support business reporting.
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
- Data Modeling: Developing fact and dimension tables optimized for analytical queries.
- Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights.
The data warehouse enables analysis of:
- Customer purchase patterns
- Repeat customers and revenue contribution
- Best and worst performing products
- Category-level sales analysis
- Sales over time
- Order volume and revenue trends
- Database: MySQL 8.0.
- SQL Concepts: Window Functions (
ROW_NUMBER), CTEs, Medallion Architecture, Star Schema Modeling, Surrogate Keys, Data Reconciliation.
Viraj Satpute
Data Analyst | Data Engineer