-
Notifications
You must be signed in to change notification settings - Fork 0
Getting Started
This guide will walk you through setting up your environment and completing your first coding lab.
Before you begin, make sure you have:
- Docker or Podman installed
- 16GB RAM minimum (for full environment)
- 40GB disk space available
- Basic knowledge of SQL and data concepts
- Familiarity with command line interface
Choose one of these methods:
git clone https://github.com/nellaivijay/iceberg-code-practice.git
cd iceberg-code-practice- Go to https://github.com/nellaivijay/iceberg-code-practice
- Click the green "Code" button
- Select "Download ZIP"
- Extract the files to your computer
Pros:
- Production-like environment
- Better resource isolation
- Full feature set
- Suitable for advanced labs
Cons:
- More complex setup
- Higher resource requirements
- Requires k3s installation
curl -sfL https://get.k3s.io | sh -
# Verify installation
kubectl version --client# Run setup script
./scripts/setup.sh
# Apply Kubernetes manifests
kubectl apply -f k8s/
# Wait for pods to be ready
kubectl get pods -w# Check all services are running
kubectl get pods
# Access Spark UI
kubectl port-forward svc/spark-master 8080:8080
# Access Trino UI
kubectl port-forward svc/trino 8081:8080Pros:
- Quick to set up
- Lower resource requirements
- Easier to troubleshoot
- Good for initial learning
Cons:
- Limited resource isolation
- Some advanced features may not work
- Less production-like
# Start all services
docker-compose up -d
# Check services are running
docker-compose ps
# View logs
docker-compose logs -f# Access Spark UI
open http://localhost:8080
# Access Trino UI
open http://localhost:8081
# Check Iceberg catalog
docker-compose exec spark bashThe environment includes a comprehensive sample database for hands-on learning.
python3 scripts/generate_sample_data.pyThis creates:
- 1,000 customer records
- 200 product records
- 5,000 order records
- 10,000 transaction records
- 20,000 web event records
./scripts/load_sample_data.shThis loads the generated data into Iceberg tables using Spark.
# Access Spark shell
docker-compose exec spark bash
spark-sql
# In Spark SQL
SHOW DATABASES;
USE sample_db;
SHOW TABLES;
SELECT COUNT(*) FROM sample_customers;Let's start with Lab 0: Sample Database Setup
Objective: Explore the sample database and practice basic queries
Prerequisites: Environment setup complete, sample data loaded
Estimated Time: 30-45 minutes
# Open the lab markdown file
cat labs/lab-00-sample-database.md
# Or open in your preferred editorThe lab will guide you through:
- Understanding the sample database schema
- Exploring table relationships
- Writing queries to answer business questions
- Understanding data distribution and patterns
# Start Jupyter (if using Docker Compose)
docker-compose exec spark jupyter notebook
# Navigate to notebooks/lab-00-sample-database.ipynbCompare your results with the solution:
# View solution notebook
cat solutions/lab-00-sample-database-solution.ipynbAfter completing Lab 0, proceed to Lab 1: Environment Setup
Objective: Verify all components and perform your first Iceberg operation
Estimated Time: 30-45 minutes
This lab will:
- Verify catalog connectivity
- Test storage access
- Create your first Iceberg table
- Perform basic read/write operations
Solution:
# Check Docker is running
docker ps
# Check available disk space
df -h
# Check available memory
free -h
# Restart Docker
sudo systemctl restart dockerSolution:
# Check pod status
kubectl describe pod <pod-name>
# Check node resources
kubectl top nodes
# Check for resource limits
kubectl get nodes -o yaml | grep -A 5 resourcesSolution:
- Reduce the number of running services
- Increase system RAM or swap space
- Use Docker Compose instead of k3s for lighter footprint
- Adjust memory limits in docker-compose.yaml
Solution:
# Check storage service is running
docker-compose ps objectscale
# Check storage logs
docker-compose logs objectscale
# Verify environment variables
env | grep STORAGEIf you're new to Docker or have limited resources, start with Docker Compose. It's easier to troubleshoot and requires fewer resources.
- Minimum 16GB RAM for full environment
- 8GB RAM may work with reduced services
- At least 40GB disk space for data and logs
If you get stuck, check the solution notebooks in the solutions/ folder. They provide complete working examples.
Document your setup steps and any issues you encounter. This will help you troubleshoot later and contribute improvements.
- Open GitHub Issues for problems
- Share your solutions and insights
- Contribute improvements to the labs
After completing Lab 0 and Lab 1:
- Lab 2: Basic Iceberg Operations - Learn core table operations
- Lab 3: Advanced Features - Partitioning, time travel, schema evolution
- Lab 4: Spark Optimizations - Performance tuning
- Follow the Learning Path: See Learning Path for recommended order
Once setup is complete, you can access:
- Spark UI: http://localhost:8080
- Spark History Server: http://localhost:18080
- Trino UI: http://localhost:8081
- Grafana (if configured): http://localhost:3000
- MinIO Console (if using MinIO): http://localhost:9000
docker-compose down
# Or to remove volumes
docker-compose down -vkubectl delete -f k8s/
# Or delete specific resources
kubectl delete deployment spark-masterTo completely remove the environment:
# Docker Compose
docker-compose down -v
docker system prune -a
# Kubernetes
kubectl delete -f k8s/
k3s-uninstall.sh- Check the Troubleshooting page
- Review Best Practices
- Open an issue on GitHub
- Start a discussion in GitHub Discussions
Ready to start learning? Begin with Lab 0 🚀