-
Notifications
You must be signed in to change notification settings - Fork 0
Home
nellaivijay edited this page Apr 19, 2026
·
2 revisions
Welcome to the Apache Iceberg Code Practice Wiki! This educational resource provides comprehensive guides and additional learning materials to supplement the hands-on coding labs in the main repository.
- Getting Started - Complete setup guide and first steps
- Iceberg Fundamentals - Deep dive into Iceberg concepts and architecture
- Lab Guides - Detailed walkthroughs for each lab
- Best Practices - Production-ready patterns and tips
- Troubleshooting - Common issues and solutions
- Learning Path - Recommended order for completing labs
- Multi-Engine Guide - Working with Spark, Trino, and DuckDB
- Streaming & CDC - Real-time data pipelines with Kafka and Debezium
This repository follows a hands-on, lab-based approach to learning Apache Iceberg:
- Progressive complexity: Labs range from beginner to advanced
- Real-world scenarios: Practical exercises based on actual use cases
- Multi-engine learning: Experience with different query engines
- Vendor independence: Learn concepts that apply across platforms
- Production patterns: Best practices you can apply in production
Track your progress by:
- Marking completed labs in your notebook
- Keeping a checklist of finished exercises
- Timing yourself to measure improvement
- Revisiting labs after learning new concepts
Join our community of learners:
- Share your solutions and insights
- Ask questions in GitHub Issues
- Contribute new labs and exercises
- Help improve existing content
- Main Repository
- Apache Iceberg Documentation
- Apache Spark Documentation
- Trino Documentation
- DuckDB Documentation
- Lab 0: Sample Database Setup
- Lab 1: Environment Setup
- Lab 2: Basic Iceberg Operations
- Lab 3: Advanced Features
- Lab 4: Spark Optimizations
- Lab 5: Real-World Patterns
- Lab 6: Performance & UI
- Lab 7: Table Maintenance
- Lab 8: Kafka Integration
- Lab 9: CDC with Debezium
- Lab 10: Spring Boot with Iceberg
- Lab 11: Multi-Engine Lakehouse
- Complete Labs 0-2 before moving to advanced topics
- Understand Iceberg's core concepts (metadata, snapshots, manifests)
- Practice basic table operations thoroughly
- Try the same operations in Spark, Trino, and DuckDB
- Understand engine-specific optimizations
- Learn which engine is best for which use case
- Consistency beats intensity - 30 minutes daily is better than 3 hours weekly
- Revisit labs after breaks to reinforce learning
- Try to complete labs without looking at solutions
- Read error messages carefully
- Understand why your solution didn't work
- Try alternative approaches
- Check the solution notebooks for patterns
- Full-featured environment
- Production-like setup
- Better resource isolation
- Suitable for long-term learning
- Quick to set up
- Lower resource requirements
- Good for initial learning
- Easier to troubleshoot
By completing all labs, you will develop skills in:
- Apache Iceberg table operations and management
- Data lakehouse architecture and design
- Multi-engine query optimization
- Streaming data pipelines with Kafka
- Change data capture with Debezium
- Performance tuning and monitoring
- Production-ready data engineering patterns
- Check the Troubleshooting page
- Review Best Practices
- Open an issue on GitHub
- Start a discussion in GitHub Discussions
- Check the Learning Path for guidance
Happy learning! πποΈ