Code for 3 Data Storage Techniques Every Data Engineer Should Know post.
Prerequisites:
- docker & docker compose
- Atleast 4GB (preferably 8GB or more) memory
Clone and start the containers using the command below on your terminal.
Windows users: please setup WSL and a local Ubuntu Virtual machine following the instructions here until you get an ubuntu prompt.
git clone https://github.com/josephmachado/data_storage_pattern.git
cd data_storage_pattern
docker compose up --build -d
sleep 30- Open data storage pattern code notebook at storage_patterns.ipynb
- Open Spark History Server at http://localhost:18080/
- Open Spark UI at http://localhost:4040/, and upto 4049 port (one SparkSessions UI per port from 4040 to 4049)
- Open Minio at http://localhost:9001 with
adminandpasswordas username and password.
Once done, stop containers with
docker compose down -v
sudo rm -rf ./minio_data/*
sudo rm -rf ./notebooks/data/*This course uses MinIO for object storage demonstrations. MinIO is open source software licensed under GNU AGPL v3.