This project was a major part of a course I took - "Big Data Engineering" at TAU’s Faculty of Engineering, during the 3rd year of my studies (2025).
The project has been the 1st course assignment. Along with my 2 partners, we focused on building a complete ETL pipeline for the Olist online orders dataset (Brazil). The task included:
-
Performing a detailed exploratory data analysis (EDA) procedure, specifically handling null values, duplicates, and logical inconsistencies.
-
Designing and constructing a data warehouse from scratch using a star schema, applying denormalization and design pattern principles.
The full details can be found in the notebook file (IPYNB) and in the Final Report (PDF), in the repo, written in Hebrew.
Link to ALL the CSV files used in the project
The project's final grade - 93.