Skip to content
nellaivijay edited this page Apr 19, 2026 · 2 revisions

Apache Iceberg Code Practice Wiki

Welcome to the Apache Iceberg Code Practice Wiki! This educational resource provides comprehensive guides and additional learning materials to supplement the hands-on coding labs in the main repository.

πŸ“š Wiki Contents

🎯 Educational Philosophy

This repository follows a hands-on, lab-based approach to learning Apache Iceberg:

  • Progressive complexity: Labs range from beginner to advanced
  • Real-world scenarios: Practical exercises based on actual use cases
  • Multi-engine learning: Experience with different query engines
  • Vendor independence: Learn concepts that apply across platforms
  • Production patterns: Best practices you can apply in production

πŸš€ Quick Start

  1. Set up your environment
  2. Start with Lab 0
  3. Follow the learning path
  4. Explore advanced topics

πŸ“Š Progress Tracking

Track your progress by:

  • Marking completed labs in your notebook
  • Keeping a checklist of finished exercises
  • Timing yourself to measure improvement
  • Revisiting labs after learning new concepts

🀝 Community

Join our community of learners:

  • Share your solutions and insights
  • Ask questions in GitHub Issues
  • Contribute new labs and exercises
  • Help improve existing content

πŸ”— Resources

πŸŽ“ Lab Overview

Beginner Labs (0-2)

  • Lab 0: Sample Database Setup
  • Lab 1: Environment Setup
  • Lab 2: Basic Iceberg Operations

Intermediate Labs (3-5)

  • Lab 3: Advanced Features
  • Lab 4: Spark Optimizations
  • Lab 5: Real-World Patterns

Advanced Labs (6-11)

  • Lab 6: Performance & UI
  • Lab 7: Table Maintenance
  • Lab 8: Kafka Integration
  • Lab 9: CDC with Debezium
  • Lab 10: Spring Boot with Iceberg
  • Lab 11: Multi-Engine Lakehouse

πŸ’‘ Tips for Success

Start with Fundamentals

  • Complete Labs 0-2 before moving to advanced topics
  • Understand Iceberg's core concepts (metadata, snapshots, manifests)
  • Practice basic table operations thoroughly

Use Multiple Engines

  • Try the same operations in Spark, Trino, and DuckDB
  • Understand engine-specific optimizations
  • Learn which engine is best for which use case

Practice Regularly

  • Consistency beats intensity - 30 minutes daily is better than 3 hours weekly
  • Revisit labs after breaks to reinforce learning
  • Try to complete labs without looking at solutions

Learn from Mistakes

  • Read error messages carefully
  • Understand why your solution didn't work
  • Try alternative approaches
  • Check the solution notebooks for patterns

πŸ”§ Environment Options

Kubernetes with k3s (Recommended)

  • Full-featured environment
  • Production-like setup
  • Better resource isolation
  • Suitable for long-term learning

Docker Compose (Lightweight)

  • Quick to set up
  • Lower resource requirements
  • Good for initial learning
  • Easier to troubleshoot

πŸ“ˆ Skill Development

By completing all labs, you will develop skills in:

  • Apache Iceberg table operations and management
  • Data lakehouse architecture and design
  • Multi-engine query optimization
  • Streaming data pipelines with Kafka
  • Change data capture with Debezium
  • Performance tuning and monitoring
  • Production-ready data engineering patterns

πŸ†˜ Need Help?


Happy learning! πŸŽ“πŸ”οΈ