Skip to content

datadriven-io/data-engineer-interview-handbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Data Engineer Interview Handbook

Seven day sprint for data engineering interviews. For candidates with an onsite next week, not next quarter.

Stars License PRs welcome Sandbox

The plan · Weekend only · Companion repos


This is the speed run. The full Data Engineering Interview Handbook is 200+ pages. If you have an onsite in seven days, you do not have time for that. You have time for this.

The 7 day plan

Day Focus Time Deliverable
Mon SQL fundamentals refresher 2h Solve 5 problems on joins, aggregating, dates
Tue Window functions 2h Write LAG, LEAD, ROW_NUMBER, SUM OVER, AVG OVER from memory
Wed Python data wrangling 2h Solve 4 problems on chunking, dedup, interval merging
Thu Schema design 3h Sketch 3 schemas on paper before reading the solution
Fri Pipeline architecture 3h Design 3 pipelines using the eight beat framework
Sat Mock interview 2h 60 min loop with a partner, or recorded solo
Sun Behavioral story bank 2h Six STAR stories, three minutes each, rehearsed out loud

Total: 16 hours over a week.

Day 1: SQL fundamentals

Lessons: joins intermediate, aggregating intermediate, filtering advanced.

Drill these 5:

  1. 10 Lowest Uptime Services. TOP N with ties. Trap: LIMIT 10 drops tied rows.
  2. 2FA Confirmation Rate. Conditional aggregation. Trap: divide by zero.
  3. 30 Day Page View Counts. Date filtering. Trap: timezone boundaries.
  4. 2nd Most Common Content Type. Tie breaking. Trap: LIMIT 1 OFFSET 1 ignores tied first place.
  5. Active Users by Month. Cohort logic. Trap: double counting users active in multiple months.

Day 2: Window functions

Window functions show up in most senior DE SQL screens. Watch all three lessons: beginner, intermediate, advanced.

Drill: 7 Check Rolling Average, 7 Day Onboarding Conversion, then run the window functions drill timed.

Day 3: Python wrangling

Lessons: foundations intermediate, collections advanced.

Drill these 4:

  1. Batch Records. Chunking iterables.
  2. Activity Time Ledger. Interval merging.
  3. Batch Partitioner. Hash bucketing.
  4. Batch With Metadata. Stateful iteration.

Day 4: Schema design

Read first: keys, normalization, dimensional modeling, SCD.

Then sketch these 3 on paper for 20 minutes each before reading the solution:

  1. A/B Experiment Assignment Schema
  2. Customer Address History
  3. Insurance Claims Lifecycle

Day 5: Pipeline architecture

Read first: data engineering system design.

Memorize the eight beat framework: clarify, estimate, freshness, batch vs stream, storage, topology, failure modes, cost.

Sketch these 3 end to end for 30 minutes each before reading the solution:

  1. Card Transaction Streaming Pipeline
  2. Cellular Connectivity and App Log Data Warehouse
  3. Database Replication and Schema Normalization Pipeline

Day 6: Mock interview

60 minute loop:

  • 15 min SQL, one medium problem
  • 20 min Python, one medium problem
  • 25 min schema or pipeline design

No partner? Record yourself solving each one out loud, then watch the recording. The talking is the training signal.

Day 7: Behavioral

Six STAR stories, three minutes each. Themes:

  1. Owned an ambiguous problem
  2. Disagreed with a stakeholder
  3. Broke production and recovered
  4. Mentored someone
  5. Killed a project
  6. Shipped fast then cleaned up

Rehearse each one out loud twice. The first attempt will be terrible.

50 common DE behavioral questions: datadriven.io/behavioral-interview-questions.

Weekend only version

If you have only two days, do days 2, 4, 5, and 7: window functions, schema design, pipeline architecture, behavioral. Those four cover the rounds where most senior candidates lose points.

Company specific prep (30 min)

If you know your target, spend 30 minutes on its guide:

Companion repos

License

CC BY-SA 4.0. Linked sandboxes and lessons hosted at datadriven.io.

About

The 7 day data engineer interview prep sprint. SQL, Python, schema design, and pipeline architecture in one week. For candidates with an onsite next week, not next quarter.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors