Time: 60 minutes (40 minutes coding + 20 min questions) Stack: dbt (Cloud) against BigQuery
You join a small e-commerce team. An upstream system exports append-only snapshots of order headers and line items. The exports are not idempotent: the same logical order appears on many rows, and fields can be partially null on newer rows. Finance needs a single current row per order and a reliable line-total to reconcile against the header.
Your job is to implement this in dbt with optimisation and quality controls.
| Column | Description |
|---|---|
order_id |
Business key of the order |
customer_id |
Customer identifier |
extracted_at |
When this snapshot row was produced (ISO-like timestamp string) |
order_status |
e.g. placed, paid, shipped, cancelled, refunded |
order_total_amount |
Money as decimal; may be null on some later rows even when still the same order |
| Column | Description |
|---|---|
order_id |
Foreign key to order |
sku |
Product sku |
qty |
Integer quantity |
unit_price |
Unit price (decimal) |
line_no |
Integer line sequence from source (not globally unique) |
-
From seeds
dbt seedhad already been run and data is available to use
-
(Optional) The intermetiates
- Implement one row per
order_idreflecting:- the latest status *.
- amount = rule above (latest non-null amount carried backward per order).
- Implement collapsed line items at grain
order_id,skuwithqtysummed.
- Implement one row per
-
Marts
- Model
fct_ordersat grain one row perorder_idwith:order_id,customer_id,order_statusheader_amount(from your deduped / filled header logic)lines_subtotal= sum over lines ofqty * unit_pricefor that order- total order value should match total order line value
- Model
- Orchestration, Airflow, infra.
- Real-time APIs; work from seeds only.
Good luck.