Seven day sprint for data engineering interviews. For candidates with an onsite next week, not next quarter.
The plan · Weekend only · Companion repos
This is the speed run. The full Data Engineering Interview Handbook is 200+ pages. If you have an onsite in seven days, you do not have time for that. You have time for this.
| Day | Focus | Time | Deliverable |
|---|---|---|---|
| Mon | SQL fundamentals refresher | 2h | Solve 5 problems on joins, aggregating, dates |
| Tue | Window functions | 2h | Write LAG, LEAD, ROW_NUMBER, SUM OVER, AVG OVER from memory |
| Wed | Python data wrangling | 2h | Solve 4 problems on chunking, dedup, interval merging |
| Thu | Schema design | 3h | Sketch 3 schemas on paper before reading the solution |
| Fri | Pipeline architecture | 3h | Design 3 pipelines using the eight beat framework |
| Sat | Mock interview | 2h | 60 min loop with a partner, or recorded solo |
| Sun | Behavioral story bank | 2h | Six STAR stories, three minutes each, rehearsed out loud |
Total: 16 hours over a week.
Lessons: joins intermediate, aggregating intermediate, filtering advanced.
Drill these 5:
- 10 Lowest Uptime Services. TOP N with ties. Trap:
LIMIT 10drops tied rows. - 2FA Confirmation Rate. Conditional aggregation. Trap: divide by zero.
- 30 Day Page View Counts. Date filtering. Trap: timezone boundaries.
- 2nd Most Common Content Type. Tie breaking. Trap:
LIMIT 1 OFFSET 1ignores tied first place. - Active Users by Month. Cohort logic. Trap: double counting users active in multiple months.
Window functions show up in most senior DE SQL screens. Watch all three lessons: beginner, intermediate, advanced.
Drill: 7 Check Rolling Average, 7 Day Onboarding Conversion, then run the window functions drill timed.
Lessons: foundations intermediate, collections advanced.
Drill these 4:
- Batch Records. Chunking iterables.
- Activity Time Ledger. Interval merging.
- Batch Partitioner. Hash bucketing.
- Batch With Metadata. Stateful iteration.
Read first: keys, normalization, dimensional modeling, SCD.
Then sketch these 3 on paper for 20 minutes each before reading the solution:
Read first: data engineering system design.
Memorize the eight beat framework: clarify, estimate, freshness, batch vs stream, storage, topology, failure modes, cost.
Sketch these 3 end to end for 30 minutes each before reading the solution:
- Card Transaction Streaming Pipeline
- Cellular Connectivity and App Log Data Warehouse
- Database Replication and Schema Normalization Pipeline
60 minute loop:
- 15 min SQL, one medium problem
- 20 min Python, one medium problem
- 25 min schema or pipeline design
No partner? Record yourself solving each one out loud, then watch the recording. The talking is the training signal.
Six STAR stories, three minutes each. Themes:
- Owned an ambiguous problem
- Disagreed with a stakeholder
- Broke production and recovered
- Mentored someone
- Killed a project
- Shipped fast then cleaned up
Rehearse each one out loud twice. The first attempt will be terrible.
50 common DE behavioral questions: datadriven.io/behavioral-interview-questions.
If you have only two days, do days 2, 4, 5, and 7: window functions, schema design, pipeline architecture, behavioral. Those four cover the rounds where most senior candidates lose points.
If you know your target, spend 30 minutes on its guide:
- Netflix: companies/netflix/interview
- Uber: companies/uber/interview
- Amazon: companies/amazon/interview
- Google: companies/google/interview
- Meta: companies/meta/interview
- data-engineering-interview-handbook. The full handbook with chapter by chapter coverage and 4, 8, and 12 week plans.
- data-engineering-interview-questions. 1418 tagged practice problems.
- system-design-for-data-engineers. 120 long form pipeline case studies.
- data-engineering-cheatsheet. Single page recall reference.
- data-engineer-interview-prep. 8 week structured practice track.
- awesome-data-engineering-interview. Curated resource list.
CC BY-SA 4.0. Linked sandboxes and lessons hosted at datadriven.io.