microsoft · mwc360 · Feb 25, 2026 · Feb 24, 2026 · Feb 24, 2026 · Feb 24, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,180 @@
+# LakeBench Codebase Reference
+
+> Quick-reference for Copilot and contributors. Keep this in sync when adding major features.
+
+---
+
+## What is LakeBench?
+
+LakeBench is a **Python-native, multi-modal benchmarking framework** for evaluating performance across multiple lakehouse compute engines and ELT scenarios. It supports industry-standard benchmarks (TPC-DS, TPC-H, ClickBench) and a novel ELT-focused benchmark (ELTBench), all installable via `pip`.
+
+---
+
+## Project Layout
+
+```
+src/lakebench/
+├── __init__.py
+│
+├── benchmarks/
+│   ├── base.py                  # BaseBenchmark ABC — result schema, timing, post_results()
+│   ├── elt_bench/               # ELTBench: load, transform, merge, maintain, query
+│   ├── tpcds/                   # TPC-DS: 99 queries, 24 tables
+│   ├── tpch/                    # TPC-H: 22 queries, 8 tables
+│   └── clickbench/              # ClickBench: 43 queries on clickstream data
+│
+├── datagen/
+│   ├── tpch.py                  # TPCHDataGenerator (uses tpchgen-rs, ~10x faster than alternatives)
+│   ├── tpcds.py                 # TPCDSDataGenerator (wraps DuckDB TPC-DS extension)
+│   └── clickbench.py            # Downloads dataset from ClickHouse host
+│
+├── engines/
+│   ├── base.py                  # BaseEngine ABC — fsspec, runtime detection, result writing
+│   ├── spark.py                 # Generic Spark engine
+│   ├── fabric_spark.py          # Microsoft Fabric Spark (auto-authenticates via notebookutils)
+│   ├── synapse_spark.py         # Azure Synapse Spark
+│   ├── hdi_spark.py             # HDInsight Spark
+│   ├── duckdb.py                # DuckDB
+│   ├── polars.py                # Polars
+│   ├── daft.py                  # Daft
+│   ├── sail.py                  # Sail (PySpark-compatible engine)
+│   └── delta_rs.py              # Shared DeltaRs write helper (used by non-Spark engines)
+│
+└── utils/
+    ├── query_utils.py           # transpile_and_qualify_query(), get_table_name_from_ddl()
+    ├── path_utils.py            # abfss_to_https(), to_unix_path()
+    └── timer.py                 # Context-manager timer; stores results for post_results()
+```
+
+---
+
+## Core Abstractions
+
+### `BaseEngine` (`engines/base.py`)
+Abstract base for all compute engines.
+
+| Attribute | Description |
+|---|---|
+| `SQLGLOT_DIALECT` | SQLGlot dialect string for auto-transpilation (e.g. `"duckdb"`) |
+| `SUPPORTS_SCHEMA_PREP` | Whether the engine can create an empty schema-defined table before data load |
+| `SUPPORTS_MOUNT_PATH` | Whether the engine can use mount-style URIs (`/mnt/...`) |
+| `TABLE_FORMAT` | Always `'delta'` |
+| `schema_or_working_directory_uri` | Base path where Delta tables are stored |
+| `storage_options` | Dict passed through to DeltaRs / fsspec for cloud auth |
+| `extended_engine_metadata` | Dict of key/value pairs appended to benchmark results |
+
+Key methods: `get_total_cores()`, `get_compute_size()`, `get_job_cost(duration_ms)`, `create_schema_if_not_exists()`, `_append_results_to_delta()`.
+
+Runtime is auto-detected at init via `_detect_runtime()` — returns `"fabric"`, `"synapse"`, `"databricks"`, `"colab"`, or `"local_unknown"`.
+
+### `BaseBenchmark` (`benchmarks/base.py`)
+Abstract base for all benchmarks.
+
+| Attribute | Description |
+|---|---|
+| `BENCHMARK_IMPL_REGISTRY` | `Dict[EngineClass → ImplClass]` — maps engines to optional engine-specific implementations |
+| `RESULT_SCHEMA` | Canonical 21-column result schema (see below) |
+| `VERSION` | Benchmark version string |
+
+The result schema includes: `run_id`, `run_datetime`, `lakebench_version`, `engine`, `engine_version`, `benchmark`, `benchmark_version`, `mode`, `scale_factor`, `scenario`, `total_cores`, `compute_size`, `phase`, `test_item`, `start_datetime`, `duration_ms`, `estimated_retail_job_cost`, `iteration`, `success`, `error_message`, `engine_properties` (MAP), `execution_telemetry` (MAP).
+
+`post_results()` collects timer results → builds result rows → optionally appends to a Delta table via `engine._append_results_to_delta()`.
+
+---
+
+## Engine & Benchmark Registration
+
+Benchmarks declare engine support via `BENCHMARK_IMPL_REGISTRY`. If an engine uses only shared `BaseEngine` methods, the value is `None`; otherwise it maps to a specialized implementation class.
+
+```python
+# Register a custom engine with an existing benchmark
+from lakebench.benchmarks import TPCDS
+TPCDS.register_engine(MyNewEngine, None)           # use shared methods
+TPCDS.register_engine(MyNewEngine, MyTPCDSImpl)   # use custom impl class
+```
+
+To add a new engine, subclass an existing one:
+```python
+from lakebench.engines import BaseEngine
+
+class MyEngine(BaseEngine):
+    SQLGLOT_DIALECT = "duckdb"  # or whichever dialect applies
+    ...
+
+from lakebench.benchmarks.elt_bench import ELTBench
+ELTBench.register_engine(MyEngine, None)
+benchmark = ELTBench(engine=MyEngine(...), ...)
+benchmark.run()
+```
+
+---
+
+## Query Resolution Strategy (3-Tier Fallback)
+
+For each query, LakeBench resolves in this order:
+
+1. **Engine-specific override** — `resources/queries/<engine_name>/q14.sql` (rare; e.g. Daft decimal casting)
+2. **Parent engine class override** — `resources/queries/<parent_class>/q14.sql` (rare; e.g. Spark family)
+3. **Canonical + auto-transpilation** — `resources/queries/canonical/q14.sql` transpiled via SQLGlot using the engine's `SQLGLOT_DIALECT`
+
+Tables are automatically qualified with catalog and schema when applicable. To inspect the resolved query:
+
+```python
+benchmark = TPCH(engine=MyEngine(...))
+print(benchmark._return_query_definition('q14'))
+```
+
+---
+
+## Optional Dependency Groups
+
+Install only what you need:
+
+| Extra | Installs |
+|---|---|
+| `duckdb` | `duckdb`, `deltalake`, `pyarrow` |
+| `polars` | `polars`, `deltalake`, `pyarrow` |
+| `daft` | `daft`, `deltalake`, `pyarrow` |
+| `tpcds_datagen` | `duckdb`, `pyarrow` |
+| `tpch_datagen` | `tpchgen-cli` |
+| `sparkmeasure` | `sparkmeasure` |
+| `sail` | `pysail`, `pyspark[connect]`, `deltalake`, `pyarrow` |
+
+```bash
+pip install lakebench[duckdb,polars,tpch_datagen]
+```
+
+---
+
+## Supported Runtimes & Storage
+
+**Runtimes**: Local (Windows), Microsoft Fabric, Azure Synapse, HDInsight, Google Colab (experimental)
+
+**Storage**: Local filesystem, OneLake, ADLS Gen2 (Fabric/Synapse/HDInsight), S3 (experimental), GCS (experimental)
+
+**Table format**: Delta Lake only (via `delta-rs` for non-Spark engines)
+
+---
+
+## Timer (`utils/timer.py`)
+
+`timer` is a context-manager function with a `.results` list attached. Use it inside benchmark `run()` implementations to time each phase/test item:
+
+```python
+with self.timer(phase="load", test_item="q1", engine=self.engine) as t:
+    t.execution_telemetry = {"rows": 1000}   # optional metadata
+    do_work()
+
+self.post_results()   # flush timer.results → self.results → optionally Delta
+```
+
+---
+
+## Key Conventions
+
+- **All Delta writes for non-Spark engines** go through `engines/delta_rs.py` (`DeltaRs().write_deltalake(...)`).
+- **SQLGlot transpilation** is the default path; engine-specific SQL files are exceptions, not the rule.
+- **`storage_options`** on `BaseEngine` is the single place for cloud auth credentials (bearer token, SAS, etc.).
+- **`extended_engine_metadata`** on `BaseEngine` is the right place to attach runtime-specific metadata that ends up in the `engine_properties` MAP column of results.
+- **TPC-DS / TPC-H spec compliance**: LakeBench intentionally diverges from `spark-sql-perf` to follow the official specs (see `customer.c_last_review_date_sk` and `store.s_tax_percentage` fixes in README).
+- **New benchmarks** should subclass `BaseBenchmark`, define `RESULT_SCHEMA`, `BENCHMARK_IMPL_REGISTRY`, `VERSION`, and implement `run()`.
diff --git a/.github/workflows/publish_to_pypi.yml b/.github/workflows/publish_to_pypi.yml
@@ -8,15 +8,11 @@ jobs:
   build-and-publish:
     runs-on: ubuntu-latest
     steps:
-    - uses: actions/checkout@v2
-    - name: Set up Python
-      uses: actions/setup-python@v2
-      with:
-        python-version: '3.x'
-    - name: Install build dependencies
-      run: python -m pip install build twine
+    - uses: actions/checkout@v4
+    - name: Install uv
+      uses: astral-sh/setup-uv@v5
     - name: Build package
-      run: python -m build
+      run: uv build
     - name: Publish package to PyPI
       uses: pypa/gh-action-pypi-publish@v1.4.2
       with:

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,74 @@
+name: Tests
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  unit-tests:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: uv sync --group dev
+
+      - name: Run unit tests
+        run: uv run pytest tests/ --ignore=tests/integration -v --tb=short
+
+  integration-tests:
+    name: integration (${{ matrix.engine }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - engine: duckdb
+            extras_flags: "--extra duckdb --extra tpcds_datagen --extra tpch_datagen"
+            test_file: "tests/integration/test_duckdb.py"
+          - engine: daft
+            extras_flags: "--extra daft --extra tpcds_datagen --extra tpch_datagen"
+            test_file: "tests/integration/test_daft.py"
+          - engine: polars
+            extras_flags: "--extra polars --extra tpcds_datagen --extra tpch_datagen"
+            test_file: "tests/integration/test_polars.py"
+          - engine: spark
+            extras_flags: "--extra spark --extra tpcds_datagen --extra tpch_datagen"
+            test_file: "tests/integration/test_spark.py"
+            java: "17"
+          - engine: sail
+            extras_flags: "--extra sail --extra tpcds_datagen --extra tpch_datagen"
+            test_file: "tests/integration/test_sail.py"
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Java ${{ matrix.java }}
+        if: matrix.java != ''
+        uses: actions/setup-java@v4
+        with:
+          distribution: temurin
+          java-version: ${{ matrix.java }}
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install dependencies (${{ matrix.engine }})
+        run: uv sync --group dev ${{ matrix.extras_flags }}
+
+      - name: Run integration tests (${{ matrix.engine }})
+        run: uv run pytest ${{ matrix.test_file }} -v -s --tb=short -W always
diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,9 @@ __pycache__/
 *.pyd
 *.so
 
+# Development artifacts
+dev/
+
 # Virtual environment
 .venv/
 env/
@@ -34,6 +37,10 @@ build/
 .DS_Store
 Thumbs.db
 
+# Spark metastore (Derby embedded DB)
+metastore_db/
+derby.log
+
 # Logs
 *.log
 

diff --git a/.python-version b/.python-version
@@ -0,0 +1 @@
+3.11
diff --git a/README.md b/README.md
@@ -70,15 +70,30 @@ LakeBench supports multiple lakehouse compute engines. Each benchmark scenario d
 | Synapse Spark   |    ✅    |   ✅   |   ✅  |    ✅    |
 | HDInsight Spark |    ✅    |   ✅   |   ✅  |    ✅    |
 | DuckDB          |    ✅    |   ✅   |   ✅  |    ✅    |
-| Polars          |    ✅    |   ⚠️   |   ⚠️  |    🔜    |
-| Daft            |    ✅    |   ⚠️   |   ⚠️  |    🔜    |
+| Polars          |    ✅    |   ⚠️   |   ⚠️  |    ⚠️    |
+| Daft            |    ✅    |   ⚠️   |   ⚠️  |    ⚠️    |
 | Sail            |    ✅    |   ✅   |   ✅  |    ✅    |
 
 > **Legend:**  
 > ✅ = Supported  
 > ⚠️ = Some queries fail due to syntax issues (i.e. Polars doesn't support SQL non-equi joins, Daft is missing a lot of standard SQL contructs, i.e. DATE_ADD, CROSS JOIN, Subqueries, non-equi joins, CASE with operand, etc.).
 > 🔜 = Coming Soon  
-> (Blank) = Not currently supported 
+> (Blank) = Not currently supported
+
+For detailed pass rates and per-query failure analysis, see the [coverage reports](reports/coverage/).
+
+## 📊 Engine Coverage Reports
+
+Per-engine coverage reports are auto-generated by the integration test suite and show pass rates with individual query failure details.  
+To refresh: run the integration tests for your engine of choice (see [`tests/integration/README.md`](tests/integration/README.md)).
+
+| Engine | Report |
+|--------|--------|
+| DuckDB | [reports/coverage/duckdb.md](reports/coverage/duckdb.md) |
+| Polars | [reports/coverage/polars.md](reports/coverage/polars.md) |
+| Daft   | [reports/coverage/daft.md](reports/coverage/daft.md) |
+| Spark  | [reports/coverage/spark.md](reports/coverage/spark.md) |
+| Sail   | [reports/coverage/sail.md](reports/coverage/sail.md) |
 
 ## Where Can I Run LakeBench?
 Multiple modalities doesn't end at just benchmarks and engines, LakeBench also supports different runtimes and storage backends: