Data Science Foundations — End-to-End Data Lifecycle Project

This repository presents a structured Data Science Foundations project, covering the complete data lifecycle:

From database initialization and SQL data modeling
to exploratory data analysis, feature engineering, and classical machine learning.

The project is intentionally organized by real-world data roles, reflecting how modern data systems are built in professional environments.

🎯 Project Philosophy

Rather than approaching exercises as isolated tasks, this repository models the natural progression of a data product:

Data infrastructure
Data warehousing
Analytical exploration
Feature preparation
Model training & evaluation

Each stage builds on the previous one.

📂 Project Structure

.
├── 00_data_engineer
├── 01_data_warehouse
├── 02_data_analyst
├── 03_data_scientist_01
├── 04_data_scientist_02
├── db_init
├── src
├── scripts
├── etl
├── Dockerfile
├── docker-compose.yml
├── Makefile
└── requirements.txt

Each directory represents a stage in the professional data workflow.

📊 Visual Insights

Purchase Price Distribution (ex02)

This boxplot highlights the distribution of purchase prices, revealing:

A strong right-skewed distribution
Significant high-value outliers
Concentration of values within a lower price range

User Behavior Analysis (ex03)

These histograms focus on users whose total spending is below 225 A$, allowing clearer observation of general purchasing behavior.

Left: purchase frequency per user
Right: total spending per user

Key observations:

Most users make few purchases
Spending is concentrated in lower ranges
Clear long-tail behavior typical of transactional systems

🧠 Technical Design Decisions

Clear separation of concerns
Reproducibility over static artifacts
Role-based modular structure

🐍 Environment Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

🐳 Docker (Optional)

docker compose up --build

⚠️ Disclaimer

This repository reflects a learning and consolidation process.

While functional and structured for clarity, minor edge cases may still exist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Foundations — End-to-End Data Lifecycle Project

🎯 Project Philosophy

📂 Project Structure

📊 Visual Insights

Purchase Price Distribution (ex02)

User Behavior Analysis (ex03)

🧠 Technical Design Decisions

🐍 Environment Setup

🐳 Docker (Optional)

⚠️ Disclaimer

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
00_data_engineer		00_data_engineer
01_data_warehouse		01_data_warehouse
02_data_analyst		02_data_analyst
03_data_scientist_01		03_data_scientist_01
04_data_scientist_02		04_data_scientist_02
db_init		db_init
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
building_histograms.html		building_histograms.html
building_histograms.png		building_histograms.png
docker-compose.yml		docker-compose.yml
mustache_overall.html		mustache_overall.html
mustache_overall.png		mustache_overall.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Data Science Foundations — End-to-End Data Lifecycle Project

🎯 Project Philosophy

📂 Project Structure

📊 Visual Insights

Purchase Price Distribution (ex02)

User Behavior Analysis (ex03)

🧠 Technical Design Decisions

🐍 Environment Setup

🐳 Docker (Optional)

⚠️ Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages