feat(aggregate): cost-aware partial-aggregation skip (opt-in) by zhuqi-lucas · Pull Request #22518 · apache/datafusion

zhuqi-lucas · 2026-05-26T05:59:44Z

Which issue does this PR close?

Phase 1 prototype for #22405.

Rationale for this change

The current skip_partial_aggregation_probe_ratio_threshold (default 0.8) is a single fixed knob: when measured num_groups / input_rows ≥ 0.8, partial aggregation skips. This catches the no-reduction case but misses the medium-ratio band (~0.5–0.7) where partial aggregation is still net-negative because per-row cost is high — heavy variable-length keys, complex aggregates, etc.

ClickBench Q18 is the motivating example. Measured ratio is 0.565, well below 0.8, so partial aggregation keeps running and burns ~17 s of compute across 12 partitions for only ~40 % reduction. Lowering the global threshold to 0.6 fixes Q18 but is likely to regress lower-cost queries that benefit from partial agg at that ratio.

This PR replaces the static threshold guess with measured A/B sampling: probe the per-row cost of both partial agg and passthrough, then pick the cheaper path via a closed-form cost comparison. No magic constants.

How it works

After the existing partial probe window (probe_rows_threshold, default 100k rows) closes:

Partial probe — measure partial_ns_per_row from the partial-agg path so far, and compute ratio = num_groups / input_rows.
Rule 1 short-circuit — ratio ≥ probe_ratio_threshold (0.8) skips immediately (existing behaviour, preserved for compatibility and to save the A/B window when the answer is obvious).
A/B sampling — when Rule 1 doesn't fire, route the next ab_sampling_rows (default 10k) through the passthrough (transform_to_states) path. The hash table is preserved; the passthrough output is sent downstream and merges naturally in Final agg.
Cost decision — at the end of the A/B window, measure passthrough_ns_per_row and apply:
```
skip ⇔ ratio > passthrough_ns_per_row / partial_ns_per_row
```
Derived from cost_keep_partial = partial × N + final × N × ratio vs cost_skip = passthrough × N + final × N, assuming final ≈ partial (same hash-table mechanics).
If the decision is skip, emit the partial hash table and continue via SkippingAggregation. If keep, return to ReadingInput and the hash table continues accumulating.

The crossover is set entirely by the two measured numbers — no magic threshold, automatically adapts to hardware and query shape.

Benchmark (ClickBench partitioned, ARM Neoverse-V2 12 vCPU)

Metric	HEAD	This PR	Δ
Total	20027 ms	19726 ms	−1.5%
Q18	1291 ms	1156 ms	+1.12×
Q19	39 ms	27 ms	+1.43×
Q39	153 ms	118 ms	+1.30×
Q29	51 ms	41 ms	+1.23×
Q36	73 ms	63 ms	+1.16×
Q35	323 ms	297 ms	+1.09×
...
Queries faster	—	10
Queries slower	—	1 (Q42, ~15 ms, noise)

What changes are included in this PR?

SkipAggregationProbe extended with a phased state machine:

Partial → AbSampling → Active { should_skip }

(ExecutionState::AbSampling mirrors SkippingAggregation — input goes through transform_to_states — but keeps the partial hash table.)

New datafusion.execution.* config:

skip_partial_aggregation_use_cost_model (bool, default true) — turns the A/B path on. Set false to fall back to the bare ratio check.
skip_partial_aggregation_ab_sampling_rows (usize, default 10000) — size of the passthrough sample window.

New EXPLAIN ANALYZE diagnostic gauges so users (and follow-up tuning work) can see what the probe is doing per-partition:

partial_agg_probe_partial_ns_per_row
partial_agg_probe_passthrough_ns_per_row
partial_agg_probe_ratio_per_mille (ratio × 1000, integer storage)
partial_agg_probe_cost_decision_skip (1 = cost said skip, 0 = cost said keep)

Are these changes tested?

Seven SkipAggregationProbe unit tests:

skip_probe_cost_model_off_matches_legacy_ratio_check — bare ratio check unchanged when cost model is off.
skip_probe_cost_model_short_circuits_on_high_ratio — Rule 1 still wins over A/B.
skip_probe_enters_ab_sampling_when_partial_window_closes — A/B transition.
skip_probe_cost_decision_chooses_skip_when_partial_is_expensive — cost crossover (skip).
skip_probe_cost_decision_chooses_keep_when_passthrough_not_much_cheaper — cost crossover (keep).
skip_probe_ab_window_accumulates_across_batches — sampling spans multiple input batches.
skip_probe_records_diagnostic_gauges — diagnostic metrics fire as expected.

Existing 100 aggregate tests + 10 aggregate SLT files still pass; cargo clippy -p datafusion-physical-plan --all-targets -- -D warnings clean.

Are there any user-facing changes?

Two additive datafusion.execution.* config options. Default behaviour for the cost-aware path is on based on the benchmark above; can be opted out via SET datafusion.execution.skip_partial_aggregation_use_cost_model = false.

Followups

Segment-level re-probing (was attempted in this PR but reverted — see commits 44f815a87, c506a81fb). The current implementation makes one A/B decision per partition. Re-probing every N rows would let a single partition switch direction as the data distribution shifts. Implementation hit a pre-existing GroupValues issue: emit(EmitTo::All) clears the per-column arrays but the hash→index map appears to retain stale entries, panicking on subsequent partial-agg inserts at multi_group_by/primitive.rs:156. Should be tackled as a follow-up after that reset semantic is sorted out.
The simplifying assumption final_ns ≈ partial_ns in the cost formula is reasonable but not exact. A more refined model could track Final-agg per-row cost separately. Possible follow-up if measured data (via the diagnostic gauges above) shows the assumption costs us in some workload.

The fixed `skip_partial_aggregation_probe_ratio_threshold` (default 0.8) catches "the partial agg barely reduces anything" cases, but it misses the band where the ratio is moderate (say 0.5-0.6) and partial aggregation is *still* net-negative because per-row cost is high — heavy variable- length keys, complex aggregates, etc. ClickBench Q18 is the motivating example (issue apache#22405): ratio 0.565, but partial agg burns 17s of compute across 12 partitions while reducing input only ~40%; turning the threshold down enough to catch it would regress lower-cost queries. Add a second, opt-in skip rule that augments the fixed-ratio check with the measured per-row wall time of the operator. Disabled by default, so existing behaviour is preserved. New config (all under `datafusion.execution`): - `skip_partial_aggregation_use_cost_model` (bool, default false) — turns the cost-aware rule on. - `skip_partial_aggregation_cost_ns_per_row` (u64, default 1000) — the per-row wall-time floor above which the cost-aware rule fires. - `skip_partial_aggregation_cost_min_ratio` (f64, default 0.3) — below this ratio partial agg is kept regardless of per-row cost (it's reducing too much to be worth skipping). How it works: `SkipAggregationProbe` already runs at probe-window boundaries and already has `baseline_metrics.elapsed_compute` ticking through every timed block. The probe now snapshots that counter at construction; once `probe_rows_threshold` is reached, it computes `ns_per_row = (elapsed_compute - snapshot) / input_rows` and, if both the per-row cost is above the floor and the ratio sits in the medium band, switches to skip mode. The existing high-ratio rule still fires first, so this is purely additive. Five unit tests on `SkipAggregationProbe` cover the new branches — cost-model-off matches the legacy ratio check, medium-ratio + high cost skips, below-min-ratio doesn't, cheap-per-row doesn't, and the high-ratio rule is honoured even with the cost model on. Refs: apache#22405

zhuqi-lucas · 2026-05-26T06:04:26Z

run benchmarks

adriangbot · 2026-05-26T06:07:38Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-318-wlrsf 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:07:39Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-319-hmhf7 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:07:41Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-320-ks6ks 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:24:05Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_adaptive-partial-agg-cost ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.56 / 39.95 ±1.11 / 41.85 ms │ 38.27 / 38.98 ±1.04 / 41.03 ms │ no change │
│ QQuery 2  │ 19.49 / 19.68 ±0.26 / 20.17 ms │ 19.33 / 19.99 ±0.54 / 20.88 ms │ no change │
│ QQuery 3  │ 32.22 / 34.40 ±1.72 / 37.20 ms │ 32.79 / 33.61 ±0.81 / 34.77 ms │ no change │
│ QQuery 4  │ 17.09 / 17.43 ±0.23 / 17.70 ms │ 17.16 / 17.27 ±0.09 / 17.38 ms │ no change │
│ QQuery 5  │ 40.94 / 41.27 ±0.22 / 41.56 ms │ 38.76 / 40.92 ±1.11 / 41.79 ms │ no change │
│ QQuery 6  │ 15.90 / 16.11 ±0.14 / 16.33 ms │ 15.88 / 16.40 ±0.41 / 17.11 ms │ no change │
│ QQuery 7  │ 46.25 / 47.86 ±0.97 / 48.88 ms │ 45.78 / 49.22 ±1.88 / 50.71 ms │ no change │
│ QQuery 8  │ 43.99 / 44.20 ±0.13 / 44.34 ms │ 44.00 / 44.18 ±0.11 / 44.31 ms │ no change │
│ QQuery 9  │ 48.74 / 49.77 ±1.14 / 51.56 ms │ 49.23 / 49.77 ±0.52 / 50.59 ms │ no change │
│ QQuery 10 │ 62.97 / 63.23 ±0.18 / 63.44 ms │ 62.66 / 63.10 ±0.53 / 64.13 ms │ no change │
│ QQuery 11 │ 13.15 / 13.26 ±0.11 / 13.47 ms │ 12.95 / 13.19 ±0.13 / 13.32 ms │ no change │
│ QQuery 12 │ 24.10 / 25.00 ±1.35 / 27.66 ms │ 24.05 / 25.27 ±1.95 / 29.15 ms │ no change │
│ QQuery 13 │ 33.60 / 35.12 ±2.21 / 39.40 ms │ 33.73 / 35.01 ±1.19 / 37.26 ms │ no change │
│ QQuery 14 │ 25.34 / 25.44 ±0.12 / 25.67 ms │ 25.21 / 25.43 ±0.11 / 25.49 ms │ no change │
│ QQuery 15 │ 31.01 / 31.28 ±0.20 / 31.63 ms │ 31.00 / 31.57 ±0.97 / 33.52 ms │ no change │
│ QQuery 16 │ 14.47 / 14.63 ±0.17 / 14.95 ms │ 14.72 / 14.86 ±0.09 / 14.96 ms │ no change │
│ QQuery 17 │ 74.23 / 75.60 ±1.15 / 77.69 ms │ 73.92 / 77.81 ±3.74 / 84.66 ms │ no change │
│ QQuery 18 │ 61.36 / 64.94 ±4.20 / 72.99 ms │ 60.99 / 62.63 ±1.40 / 64.68 ms │ no change │
│ QQuery 19 │ 33.59 / 34.80 ±1.47 / 37.66 ms │ 33.16 / 34.06 ±1.20 / 36.43 ms │ no change │
│ QQuery 20 │ 36.96 / 37.27 ±0.17 / 37.44 ms │ 37.04 / 37.20 ±0.14 / 37.36 ms │ no change │
│ QQuery 21 │ 56.48 / 57.19 ±0.38 / 57.57 ms │ 54.49 / 55.94 ±0.94 / 57.15 ms │ no change │
│ QQuery 22 │ 23.22 / 24.79 ±2.43 / 29.60 ms │ 23.26 / 24.71 ±2.01 / 28.63 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                             ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 813.22ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 811.13ms │
│ Average Time (HEAD)                           │  36.96ms │
│ Average Time (feat_adaptive-partial-agg-cost) │  36.87ms │
│ Queries Faster                                │        0 │
│ Queries Slower                                │        0 │
│ Queries with No Change                        │       22 │
│ Queries with Failure                          │        0 │
└───────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.6 GiB
Avg memory	5.1 GiB
CPU user	29.3s
CPU sys	2.3s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.6 GiB
Avg memory	5.1 GiB
CPU user	29.4s
CPU sys	2.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:24:58Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.69 / 6.21 ±0.85 / 7.90 ms │           5.57 / 6.05 ±0.88 / 7.81 ms │     no change │
│ QQuery 2  │        80.83 / 81.20 ±0.26 / 81.63 ms │        79.78 / 80.00 ±0.15 / 80.20 ms │     no change │
│ QQuery 3  │        28.69 / 29.07 ±0.28 / 29.45 ms │        28.49 / 28.68 ±0.27 / 29.22 ms │     no change │
│ QQuery 4  │    500.59 / 516.71 ±11.53 / 531.32 ms │     494.30 / 497.16 ±2.51 / 500.82 ms │     no change │
│ QQuery 5  │        50.79 / 51.39 ±0.44 / 52.03 ms │        50.15 / 50.75 ±0.38 / 51.33 ms │     no change │
│ QQuery 6  │        35.04 / 36.05 ±0.75 / 37.02 ms │        34.55 / 34.98 ±0.29 / 35.38 ms │     no change │
│ QQuery 7  │     107.85 / 109.47 ±2.22 / 113.82 ms │     106.61 / 108.60 ±2.14 / 112.72 ms │     no change │
│ QQuery 8  │        36.27 / 36.79 ±0.30 / 37.05 ms │        35.69 / 36.49 ±0.66 / 37.31 ms │     no change │
│ QQuery 9  │        53.42 / 54.41 ±1.19 / 56.62 ms │        51.69 / 53.99 ±2.77 / 59.20 ms │     no change │
│ QQuery 10 │        80.36 / 82.58 ±3.00 / 88.53 ms │        80.57 / 81.02 ±0.44 / 81.81 ms │     no change │
│ QQuery 11 │     310.13 / 314.92 ±4.68 / 322.90 ms │     307.98 / 312.56 ±4.72 / 320.99 ms │     no change │
│ QQuery 12 │        28.12 / 29.33 ±0.92 / 30.86 ms │        28.20 / 28.54 ±0.25 / 28.83 ms │     no change │
│ QQuery 13 │     125.22 / 126.30 ±1.45 / 129.07 ms │     124.74 / 128.12 ±3.97 / 135.91 ms │     no change │
│ QQuery 14 │     501.82 / 503.02 ±1.25 / 505.41 ms │     501.24 / 505.68 ±5.26 / 515.52 ms │     no change │
│ QQuery 15 │        60.44 / 62.26 ±2.12 / 66.09 ms │        59.84 / 61.25 ±1.43 / 63.59 ms │     no change │
│ QQuery 16 │           6.57 / 6.73 ±0.19 / 7.09 ms │           6.63 / 6.80 ±0.18 / 7.14 ms │     no change │
│ QQuery 17 │        80.24 / 81.18 ±0.64 / 82.06 ms │        80.14 / 81.14 ±0.94 / 82.91 ms │     no change │
│ QQuery 18 │     151.25 / 151.82 ±0.33 / 152.13 ms │     151.20 / 152.60 ±1.52 / 155.19 ms │     no change │
│ QQuery 19 │        40.88 / 41.12 ±0.34 / 41.78 ms │        40.45 / 40.80 ±0.29 / 41.29 ms │     no change │
│ QQuery 20 │        34.84 / 35.44 ±0.35 / 35.94 ms │        35.19 / 35.54 ±0.30 / 36.05 ms │     no change │
│ QQuery 21 │        16.66 / 16.87 ±0.20 / 17.19 ms │        16.65 / 16.83 ±0.14 / 17.08 ms │     no change │
│ QQuery 22 │        61.83 / 62.45 ±0.48 / 62.99 ms │        61.87 / 62.21 ±0.30 / 62.67 ms │     no change │
│ QQuery 23 │     476.85 / 483.02 ±5.46 / 492.18 ms │    468.45 / 483.79 ±11.59 / 499.59 ms │     no change │
│ QQuery 24 │     233.29 / 239.28 ±6.10 / 250.85 ms │     230.38 / 237.13 ±8.71 / 254.14 ms │     no change │
│ QQuery 25 │     113.86 / 117.88 ±2.65 / 121.07 ms │     112.53 / 115.81 ±3.53 / 121.57 ms │     no change │
│ QQuery 26 │        69.37 / 70.21 ±0.47 / 70.67 ms │        69.84 / 70.71 ±0.68 / 71.84 ms │     no change │
│ QQuery 27 │           6.43 / 6.53 ±0.11 / 6.72 ms │           6.55 / 6.92 ±0.51 / 7.92 ms │  1.06x slower │
│ QQuery 28 │        59.87 / 62.29 ±2.36 / 66.83 ms │        60.45 / 61.09 ±0.66 / 62.10 ms │     no change │
│ QQuery 29 │      99.04 / 101.42 ±1.84 / 103.60 ms │      98.79 / 102.37 ±5.30 / 112.86 ms │     no change │
│ QQuery 30 │        29.97 / 30.43 ±0.41 / 31.10 ms │        30.42 / 30.66 ±0.17 / 30.95 ms │     no change │
│ QQuery 31 │     110.20 / 114.78 ±4.57 / 122.92 ms │     110.46 / 113.36 ±4.43 / 122.07 ms │     no change │
│ QQuery 32 │        19.78 / 20.33 ±0.33 / 20.79 ms │        19.90 / 20.30 ±0.40 / 20.98 ms │     no change │
│ QQuery 33 │        37.80 / 38.19 ±0.26 / 38.52 ms │        38.04 / 38.63 ±0.44 / 39.13 ms │     no change │
│ QQuery 34 │           9.33 / 9.57 ±0.19 / 9.76 ms │          9.48 / 9.91 ±0.27 / 10.31 ms │     no change │
│ QQuery 35 │        79.90 / 80.37 ±0.38 / 80.89 ms │        80.43 / 81.43 ±0.64 / 82.16 ms │     no change │
│ QQuery 36 │          5.87 / 6.98 ±2.07 / 11.12 ms │           5.89 / 5.96 ±0.10 / 6.15 ms │ +1.17x faster │
│ QQuery 37 │           6.73 / 6.94 ±0.16 / 7.12 ms │           6.65 / 6.78 ±0.10 / 6.92 ms │     no change │
│ QQuery 38 │        68.52 / 70.33 ±1.89 / 73.35 ms │        68.20 / 69.72 ±1.75 / 73.14 ms │     no change │
│ QQuery 39 │        97.38 / 98.07 ±0.76 / 99.48 ms │        97.17 / 97.63 ±0.44 / 98.40 ms │     no change │
│ QQuery 40 │        22.46 / 23.73 ±2.12 / 27.96 ms │        22.20 / 22.60 ±0.30 / 22.89 ms │     no change │
│ QQuery 41 │        11.07 / 11.38 ±0.36 / 12.09 ms │        10.99 / 11.17 ±0.25 / 11.67 ms │     no change │
│ QQuery 42 │        23.99 / 24.26 ±0.20 / 24.50 ms │        23.72 / 24.51 ±0.86 / 26.16 ms │     no change │
│ QQuery 43 │           4.76 / 4.84 ±0.10 / 5.03 ms │           4.77 / 4.85 ±0.07 / 4.98 ms │     no change │
│ QQuery 44 │        10.43 / 10.56 ±0.12 / 10.72 ms │        10.43 / 10.71 ±0.25 / 11.13 ms │     no change │
│ QQuery 45 │        39.94 / 40.36 ±0.49 / 41.17 ms │        39.61 / 39.94 ±0.33 / 40.51 ms │     no change │
│ QQuery 46 │        12.77 / 13.15 ±0.21 / 13.34 ms │        12.52 / 13.36 ±0.42 / 13.61 ms │     no change │
│ QQuery 47 │     226.74 / 233.13 ±8.89 / 250.49 ms │     227.31 / 230.95 ±3.46 / 236.55 ms │     no change │
│ QQuery 48 │     102.05 / 106.28 ±4.17 / 111.49 ms │     101.92 / 103.36 ±2.02 / 107.35 ms │     no change │
│ QQuery 49 │        78.64 / 80.89 ±1.87 / 83.37 ms │        78.37 / 80.00 ±1.38 / 82.44 ms │     no change │
│ QQuery 50 │        58.89 / 59.76 ±0.56 / 60.58 ms │        59.32 / 59.72 ±0.32 / 60.17 ms │     no change │
│ QQuery 51 │        92.04 / 95.48 ±2.48 / 99.06 ms │       91.91 / 96.36 ±5.65 / 107.11 ms │     no change │
│ QQuery 52 │        23.81 / 24.20 ±0.33 / 24.60 ms │        23.61 / 24.09 ±0.39 / 24.61 ms │     no change │
│ QQuery 53 │        29.13 / 29.31 ±0.15 / 29.51 ms │        29.01 / 29.51 ±0.29 / 29.89 ms │     no change │
│ QQuery 54 │        53.40 / 53.77 ±0.21 / 54.05 ms │        53.66 / 53.94 ±0.17 / 54.13 ms │     no change │
│ QQuery 55 │        23.30 / 25.62 ±4.08 / 33.77 ms │        23.22 / 23.39 ±0.19 / 23.75 ms │ +1.10x faster │
│ QQuery 56 │        38.49 / 39.01 ±0.32 / 39.37 ms │        38.07 / 40.09 ±3.32 / 46.69 ms │     no change │
│ QQuery 57 │     174.13 / 177.35 ±3.45 / 184.01 ms │     175.44 / 177.54 ±1.52 / 179.63 ms │     no change │
│ QQuery 58 │     115.57 / 118.43 ±2.56 / 123.24 ms │     116.05 / 117.99 ±1.88 / 121.55 ms │     no change │
│ QQuery 59 │     117.34 / 119.86 ±2.19 / 123.46 ms │     116.99 / 118.34 ±1.58 / 120.71 ms │     no change │
│ QQuery 60 │        38.82 / 39.93 ±0.91 / 41.09 ms │        38.69 / 39.77 ±0.90 / 41.30 ms │     no change │
│ QQuery 61 │        12.47 / 12.73 ±0.24 / 13.18 ms │        12.52 / 12.67 ±0.23 / 13.12 ms │     no change │
│ QQuery 62 │        45.45 / 45.82 ±0.38 / 46.36 ms │        45.77 / 46.04 ±0.31 / 46.62 ms │     no change │
│ QQuery 63 │        29.03 / 29.19 ±0.14 / 29.38 ms │        30.00 / 30.57 ±0.64 / 31.75 ms │     no change │
│ QQuery 64 │     461.26 / 468.96 ±7.18 / 480.17 ms │     460.87 / 469.03 ±8.22 / 481.33 ms │     no change │
│ QQuery 65 │     148.14 / 151.92 ±2.80 / 156.85 ms │     146.66 / 149.70 ±2.17 / 153.07 ms │     no change │
│ QQuery 66 │        78.72 / 81.80 ±3.93 / 89.56 ms │        78.22 / 81.63 ±4.04 / 88.21 ms │     no change │
│ QQuery 67 │     244.59 / 250.31 ±4.30 / 255.38 ms │     242.26 / 250.24 ±4.60 / 256.67 ms │     no change │
│ QQuery 68 │        12.86 / 13.06 ±0.28 / 13.62 ms │        12.93 / 13.08 ±0.16 / 13.39 ms │     no change │
│ QQuery 69 │        76.48 / 78.95 ±4.31 / 87.55 ms │        76.63 / 76.93 ±0.26 / 77.31 ms │     no change │
│ QQuery 70 │     107.69 / 112.59 ±7.99 / 128.43 ms │     103.84 / 111.14 ±9.06 / 128.66 ms │     no change │
│ QQuery 71 │        35.18 / 35.49 ±0.27 / 35.96 ms │        35.60 / 35.89 ±0.30 / 36.31 ms │     no change │
│ QQuery 72 │ 2084.93 / 2122.22 ±31.46 / 2166.29 ms │ 2099.10 / 2158.79 ±62.51 / 2267.70 ms │     no change │
│ QQuery 73 │           9.06 / 9.31 ±0.27 / 9.73 ms │           8.86 / 9.20 ±0.31 / 9.71 ms │     no change │
│ QQuery 74 │     176.37 / 178.86 ±3.62 / 185.74 ms │     177.09 / 183.19 ±5.07 / 191.91 ms │     no change │
│ QQuery 75 │     145.59 / 150.84 ±8.40 / 167.59 ms │     145.68 / 147.81 ±1.53 / 150.16 ms │     no change │
│ QQuery 76 │        34.94 / 35.84 ±0.73 / 36.89 ms │        35.60 / 36.31 ±0.57 / 37.35 ms │     no change │
│ QQuery 77 │        59.94 / 60.85 ±0.71 / 61.88 ms │        59.97 / 60.32 ±0.30 / 60.68 ms │     no change │
│ QQuery 78 │     188.52 / 192.03 ±2.07 / 194.08 ms │     189.65 / 192.23 ±2.24 / 194.87 ms │     no change │
│ QQuery 79 │        66.57 / 66.98 ±0.38 / 67.62 ms │        66.70 / 67.16 ±0.27 / 67.47 ms │     no change │
│ QQuery 80 │     100.15 / 105.26 ±4.47 / 112.23 ms │      99.03 / 102.37 ±3.96 / 110.06 ms │     no change │
│ QQuery 81 │        23.85 / 24.21 ±0.44 / 25.04 ms │        23.98 / 24.26 ±0.17 / 24.51 ms │     no change │
│ QQuery 82 │        16.18 / 16.65 ±0.59 / 17.68 ms │        16.00 / 16.17 ±0.15 / 16.41 ms │     no change │
│ QQuery 83 │        35.63 / 36.22 ±0.31 / 36.49 ms │        36.26 / 36.52 ±0.17 / 36.70 ms │     no change │
│ QQuery 84 │        42.65 / 44.71 ±3.11 / 50.86 ms │        42.69 / 44.74 ±3.38 / 51.48 ms │     no change │
│ QQuery 85 │     135.95 / 138.40 ±3.25 / 144.82 ms │     135.61 / 138.68 ±4.15 / 146.73 ms │     no change │
│ QQuery 86 │        24.42 / 24.79 ±0.40 / 25.39 ms │        24.74 / 25.06 ±0.19 / 25.30 ms │     no change │
│ QQuery 87 │        69.63 / 71.96 ±2.23 / 74.70 ms │        69.15 / 70.57 ±1.36 / 73.02 ms │     no change │
│ QQuery 88 │        60.78 / 61.37 ±0.42 / 62.02 ms │        60.70 / 61.54 ±0.64 / 62.63 ms │     no change │
│ QQuery 89 │        35.53 / 35.70 ±0.19 / 35.95 ms │        35.20 / 35.66 ±0.36 / 36.22 ms │     no change │
│ QQuery 90 │        16.66 / 16.91 ±0.24 / 17.24 ms │        16.77 / 16.90 ±0.13 / 17.14 ms │     no change │
│ QQuery 91 │        51.28 / 52.54 ±2.18 / 56.89 ms │        51.03 / 52.40 ±1.74 / 55.83 ms │     no change │
│ QQuery 92 │        28.90 / 31.42 ±2.28 / 35.53 ms │        29.22 / 31.20 ±1.95 / 34.94 ms │     no change │
│ QQuery 93 │        49.98 / 51.85 ±1.00 / 53.00 ms │        49.70 / 51.34 ±1.77 / 53.84 ms │     no change │
│ QQuery 94 │        37.04 / 37.56 ±0.51 / 38.42 ms │        37.76 / 38.59 ±0.84 / 40.04 ms │     no change │
│ QQuery 95 │        85.56 / 86.58 ±1.40 / 89.29 ms │        83.75 / 86.84 ±3.71 / 94.11 ms │     no change │
│ QQuery 96 │        23.69 / 24.66 ±1.36 / 27.31 ms │        23.95 / 24.31 ±0.38 / 25.01 ms │     no change │
│ QQuery 97 │        46.00 / 46.87 ±0.63 / 47.74 ms │        46.16 / 46.63 ±0.50 / 47.56 ms │     no change │
│ QQuery 98 │        42.14 / 42.77 ±0.39 / 43.13 ms │        42.27 / 43.00 ±0.60 / 44.06 ms │     no change │
│ QQuery 99 │        69.28 / 70.49 ±1.76 / 73.96 ms │        69.51 / 71.14 ±1.98 / 75.02 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 10441.92ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 10434.02ms │
│ Average Time (HEAD)                           │   105.47ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   105.39ms │
│ Queries Faster                                │          2 │
│ Queries Slower                                │          1 │
│ Queries with No Change                        │         96 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	7.0 GiB
Avg memory	6.2 GiB
CPU user	233.0s
CPU sys	6.7s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	7.0 GiB
Avg memory	6.2 GiB
CPU user	231.2s
CPU sys	6.7s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:28:41Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.80 ±7.08 / 18.95 ms │          1.16 / 4.80 ±7.15 / 19.10 ms │     no change │
│ QQuery 1  │        12.56 / 13.03 ±0.25 / 13.34 ms │        13.00 / 13.18 ±0.18 / 13.45 ms │     no change │
│ QQuery 2  │        35.73 / 36.00 ±0.26 / 36.31 ms │        35.67 / 35.99 ±0.34 / 36.66 ms │     no change │
│ QQuery 3  │        31.29 / 31.88 ±0.58 / 32.63 ms │        30.53 / 31.05 ±0.61 / 32.17 ms │     no change │
│ QQuery 4  │     230.72 / 233.14 ±3.95 / 240.96 ms │     227.56 / 229.84 ±2.71 / 234.91 ms │     no change │
│ QQuery 5  │     274.49 / 277.53 ±3.17 / 283.12 ms │     270.34 / 275.39 ±4.50 / 281.58 ms │     no change │
│ QQuery 6  │           1.19 / 1.34 ±0.21 / 1.75 ms │           1.20 / 1.35 ±0.23 / 1.80 ms │     no change │
│ QQuery 7  │        13.87 / 13.99 ±0.09 / 14.13 ms │        14.20 / 14.35 ±0.08 / 14.43 ms │     no change │
│ QQuery 8  │     320.92 / 332.10 ±7.63 / 343.31 ms │     327.08 / 331.10 ±2.65 / 334.46 ms │     no change │
│ QQuery 9  │     463.36 / 471.01 ±5.64 / 478.48 ms │     458.43 / 469.85 ±9.48 / 481.75 ms │     no change │
│ QQuery 10 │        70.52 / 71.91 ±0.96 / 73.39 ms │        70.25 / 70.60 ±0.39 / 71.30 ms │     no change │
│ QQuery 11 │        82.74 / 84.22 ±1.05 / 85.93 ms │       81.89 / 87.09 ±8.18 / 103.36 ms │     no change │
│ QQuery 12 │     271.63 / 280.45 ±7.02 / 290.99 ms │     275.32 / 281.85 ±8.95 / 298.86 ms │     no change │
│ QQuery 13 │     368.93 / 373.66 ±4.15 / 380.24 ms │    365.86 / 377.16 ±12.70 / 400.92 ms │     no change │
│ QQuery 14 │     285.55 / 289.29 ±3.88 / 294.91 ms │     286.46 / 295.28 ±9.43 / 309.74 ms │     no change │
│ QQuery 15 │     270.25 / 279.15 ±8.64 / 293.85 ms │     274.36 / 281.75 ±5.05 / 289.34 ms │     no change │
│ QQuery 16 │    624.97 / 635.44 ±10.36 / 654.00 ms │    621.12 / 636.06 ±12.79 / 654.49 ms │     no change │
│ QQuery 17 │     623.22 / 633.20 ±8.74 / 648.05 ms │     630.76 / 640.15 ±6.09 / 649.66 ms │     no change │
│ QQuery 18 │ 1279.79 / 1295.53 ±13.61 / 1319.20 ms │ 1276.73 / 1304.89 ±23.58 / 1344.53 ms │     no change │
│ QQuery 19 │        27.84 / 28.12 ±0.17 / 28.35 ms │        27.75 / 29.55 ±2.93 / 35.40 ms │  1.05x slower │
│ QQuery 20 │     519.78 / 530.47 ±8.35 / 540.19 ms │    521.10 / 539.50 ±22.27 / 582.40 ms │     no change │
│ QQuery 21 │     597.22 / 606.87 ±8.82 / 623.45 ms │     591.32 / 596.23 ±3.36 / 600.06 ms │     no change │
│ QQuery 22 │ 1069.17 / 1085.88 ±13.18 / 1107.23 ms │  1057.23 / 1070.97 ±8.94 / 1081.82 ms │     no change │
│ QQuery 23 │ 3208.18 / 3242.68 ±34.91 / 3306.50 ms │ 3223.54 / 3255.70 ±32.83 / 3308.97 ms │     no change │
│ QQuery 24 │        41.54 / 44.74 ±3.94 / 52.37 ms │        42.27 / 42.83 ±0.58 / 43.60 ms │     no change │
│ QQuery 25 │     111.91 / 114.57 ±3.43 / 121.07 ms │     112.35 / 120.26 ±9.19 / 136.45 ms │     no change │
│ QQuery 26 │        41.78 / 44.03 ±2.47 / 47.12 ms │        42.18 / 43.54 ±1.87 / 47.13 ms │     no change │
│ QQuery 27 │     665.69 / 677.49 ±6.71 / 685.92 ms │     676.64 / 680.14 ±1.79 / 681.51 ms │     no change │
│ QQuery 28 │ 3043.86 / 3055.40 ±12.53 / 3075.89 ms │ 3037.03 / 3097.03 ±31.69 / 3123.80 ms │     no change │
│ QQuery 29 │        40.18 / 43.57 ±6.42 / 56.40 ms │        40.62 / 51.11 ±7.24 / 61.69 ms │  1.17x slower │
│ QQuery 30 │     305.42 / 313.51 ±8.10 / 328.13 ms │    304.81 / 314.78 ±10.29 / 333.04 ms │     no change │
│ QQuery 31 │     283.03 / 296.49 ±8.33 / 303.47 ms │     295.36 / 303.01 ±9.26 / 320.79 ms │     no change │
│ QQuery 32 │    947.47 / 976.58 ±17.85 / 994.29 ms │  993.88 / 1012.02 ±20.18 / 1050.88 ms │     no change │
│ QQuery 33 │ 1444.17 / 1489.81 ±25.95 / 1516.39 ms │ 1476.61 / 1527.79 ±73.16 / 1672.16 ms │     no change │
│ QQuery 34 │ 1465.36 / 1524.20 ±37.69 / 1576.52 ms │ 1472.71 / 1515.88 ±37.73 / 1577.76 ms │     no change │
│ QQuery 35 │    286.76 / 326.19 ±50.34 / 411.58 ms │   275.63 / 342.34 ±119.71 / 581.36 ms │     no change │
│ QQuery 36 │        66.67 / 67.93 ±0.74 / 68.98 ms │      66.66 / 78.10 ±17.09 / 112.06 ms │  1.15x slower │
│ QQuery 37 │        35.86 / 38.56 ±3.06 / 43.92 ms │        35.41 / 38.65 ±5.04 / 48.68 ms │     no change │
│ QQuery 38 │        43.76 / 49.33 ±5.17 / 55.49 ms │        40.76 / 44.04 ±3.23 / 50.07 ms │ +1.12x faster │
│ QQuery 39 │     143.61 / 151.16 ±6.41 / 161.30 ms │    131.50 / 151.92 ±17.75 / 184.89 ms │     no change │
│ QQuery 40 │        13.89 / 16.16 ±3.80 / 23.69 ms │        13.94 / 16.60 ±3.45 / 23.31 ms │     no change │
│ QQuery 41 │        13.84 / 17.03 ±3.84 / 23.57 ms │        13.48 / 13.58 ±0.10 / 13.78 ms │ +1.25x faster │
│ QQuery 42 │        12.92 / 13.36 ±0.25 / 13.69 ms │        13.13 / 14.39 ±2.31 / 19.01 ms │  1.08x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20111.80ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 20281.69ms │
│ Average Time (HEAD)                           │   467.72ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   471.67ms │
│ Queries Faster                                │          2 │
│ Queries Slower                                │          4 │
│ Queries with No Change                        │         37 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	30.3 GiB
Avg memory	23.0 GiB
CPU user	1034.6s
CPU sys	75.1s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	105.0s
Peak memory	29.6 GiB
Avg memory	22.8 GiB
CPU user	1035.2s
CPU sys	77.3s
Peak spill	0 B

File an issue against this benchmark runner

Temporarily set `skip_partial_aggregation_use_cost_model` default = true so the benchmark bot actually exercises the new code path. **Revert this commit before merge** — final default should remain false (opt-in) until ClickBench-wide validation tunes the constants. Regenerated: - docs/source/user-guide/configs.md - datafusion/sqllogictest/test_files/information_schema.slt (SHOW ALL added the new 3 config rows; CI was failing on the stale expectation).

zhuqi-lucas · 2026-05-26T06:42:52Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T06:45:29Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541260123-323-xxlv8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (9940d8a) to a87bdc9 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T07:05:52Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.66 ±6.87 / 18.39 ms │          1.17 / 4.71 ±6.98 / 18.67 ms │     no change │
│ QQuery 1  │        12.11 / 12.66 ±0.40 / 13.34 ms │        12.37 / 12.59 ±0.27 / 13.10 ms │     no change │
│ QQuery 2  │        35.62 / 36.24 ±0.65 / 37.45 ms │        35.49 / 35.78 ±0.16 / 35.98 ms │     no change │
│ QQuery 3  │        30.64 / 31.62 ±1.14 / 33.84 ms │        30.32 / 30.77 ±0.29 / 31.21 ms │     no change │
│ QQuery 4  │     227.20 / 230.38 ±2.41 / 234.37 ms │     222.94 / 226.83 ±3.57 / 233.43 ms │     no change │
│ QQuery 5  │     271.83 / 275.47 ±2.94 / 280.03 ms │     269.37 / 273.46 ±2.57 / 276.56 ms │     no change │
│ QQuery 6  │           1.19 / 1.34 ±0.22 / 1.77 ms │           1.19 / 1.32 ±0.22 / 1.76 ms │     no change │
│ QQuery 7  │        13.71 / 13.88 ±0.18 / 14.13 ms │        13.30 / 13.41 ±0.08 / 13.51 ms │     no change │
│ QQuery 8  │     329.76 / 331.68 ±2.28 / 335.63 ms │     320.58 / 323.23 ±1.84 / 325.82 ms │     no change │
│ QQuery 9  │     457.29 / 467.02 ±8.02 / 476.93 ms │     459.86 / 462.41 ±4.36 / 471.10 ms │     no change │
│ QQuery 10 │        68.41 / 69.25 ±0.75 / 70.20 ms │        68.88 / 73.17 ±7.10 / 87.32 ms │  1.06x slower │
│ QQuery 11 │        80.82 / 84.18 ±4.08 / 92.00 ms │        80.40 / 82.50 ±1.40 / 83.92 ms │     no change │
│ QQuery 12 │     263.92 / 269.27 ±3.83 / 274.31 ms │     266.76 / 271.92 ±5.82 / 280.78 ms │     no change │
│ QQuery 13 │    365.37 / 382.02 ±15.90 / 408.59 ms │     365.56 / 379.48 ±8.66 / 390.42 ms │     no change │
│ QQuery 14 │     283.30 / 286.17 ±2.28 / 288.71 ms │     279.47 / 285.81 ±7.17 / 299.60 ms │     no change │
│ QQuery 15 │     262.42 / 271.00 ±5.10 / 277.78 ms │     268.65 / 279.08 ±7.92 / 290.13 ms │     no change │
│ QQuery 16 │     615.17 / 625.07 ±9.48 / 641.80 ms │    611.76 / 631.22 ±16.98 / 661.78 ms │     no change │
│ QQuery 17 │     619.71 / 625.56 ±8.74 / 642.97 ms │     625.45 / 631.47 ±4.29 / 636.68 ms │     no change │
│ QQuery 18 │ 1254.71 / 1271.82 ±14.17 / 1289.43 ms │ 1260.04 / 1278.14 ±13.83 / 1299.57 ms │     no change │
│ QQuery 19 │       27.17 / 34.12 ±13.44 / 61.00 ms │       27.18 / 34.39 ±11.31 / 56.68 ms │     no change │
│ QQuery 20 │     515.73 / 523.76 ±7.69 / 537.00 ms │    517.16 / 534.85 ±20.78 / 573.46 ms │     no change │
│ QQuery 21 │     593.22 / 597.98 ±6.11 / 609.80 ms │     590.21 / 599.83 ±8.18 / 612.11 ms │     no change │
│ QQuery 22 │ 1049.47 / 1074.97 ±18.58 / 1105.08 ms │ 1060.70 / 1073.81 ±10.96 / 1093.47 ms │     no change │
│ QQuery 23 │ 3185.14 / 3212.31 ±29.43 / 3266.66 ms │ 3159.45 / 3220.88 ±34.99 / 3259.77 ms │     no change │
│ QQuery 24 │        41.50 / 47.78 ±7.57 / 58.18 ms │        41.66 / 44.28 ±3.70 / 51.55 ms │ +1.08x faster │
│ QQuery 25 │     111.04 / 116.84 ±5.41 / 126.36 ms │     111.56 / 113.12 ±2.04 / 117.15 ms │     no change │
│ QQuery 26 │        41.22 / 42.34 ±0.93 / 43.91 ms │        42.26 / 42.71 ±0.35 / 43.28 ms │     no change │
│ QQuery 27 │     667.50 / 675.99 ±7.09 / 685.02 ms │     662.14 / 669.14 ±6.00 / 677.25 ms │     no change │
│ QQuery 28 │ 3019.99 / 3044.77 ±20.05 / 3070.30 ms │ 2997.70 / 3043.17 ±32.76 / 3081.45 ms │     no change │
│ QQuery 29 │        40.18 / 48.94 ±7.40 / 59.79 ms │        39.80 / 41.62 ±2.80 / 47.11 ms │ +1.18x faster │
│ QQuery 30 │     295.45 / 299.74 ±3.70 / 305.38 ms │    295.92 / 310.48 ±17.60 / 345.04 ms │     no change │
│ QQuery 31 │     284.34 / 288.35 ±2.13 / 290.15 ms │     284.38 / 294.03 ±6.71 / 305.24 ms │     no change │
│ QQuery 32 │    926.32 / 944.81 ±19.90 / 983.52 ms │    943.96 / 964.63 ±15.25 / 983.83 ms │     no change │
│ QQuery 33 │  1464.77 / 1473.13 ±8.13 / 1484.35 ms │ 1411.87 / 1492.71 ±65.30 / 1604.09 ms │     no change │
│ QQuery 34 │ 1464.68 / 1502.56 ±29.24 / 1550.43 ms │ 1465.93 / 1504.05 ±25.18 / 1531.27 ms │     no change │
│ QQuery 35 │    278.82 / 298.27 ±22.23 / 331.14 ms │    277.81 / 290.38 ±14.92 / 314.50 ms │     no change │
│ QQuery 36 │        65.58 / 69.21 ±2.74 / 73.80 ms │        64.14 / 67.96 ±2.90 / 71.62 ms │     no change │
│ QQuery 37 │        35.42 / 40.44 ±5.75 / 49.47 ms │        35.35 / 35.88 ±0.50 / 36.82 ms │ +1.13x faster │
│ QQuery 38 │        43.94 / 45.82 ±2.49 / 50.37 ms │        40.51 / 47.10 ±4.69 / 52.57 ms │     no change │
│ QQuery 39 │    132.11 / 149.66 ±13.98 / 172.02 ms │    135.15 / 146.62 ±11.14 / 166.19 ms │     no change │
│ QQuery 40 │        13.47 / 15.21 ±2.28 / 19.63 ms │        13.44 / 13.91 ±0.31 / 14.41 ms │ +1.09x faster │
│ QQuery 41 │        13.16 / 14.63 ±2.46 / 19.50 ms │        13.18 / 14.60 ±2.54 / 19.67 ms │     no change │
│ QQuery 42 │        12.54 / 12.86 ±0.47 / 13.79 ms │        12.62 / 13.90 ±2.37 / 18.64 ms │  1.08x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 19863.78ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19911.35ms │
│ Average Time (HEAD)                           │   461.95ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   463.05ms │
│ Queries Faster                                │          4 │
│ Queries Slower                                │          2 │
│ Queries with No Change                        │         37 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	30.5 GiB
Avg memory	23.2 GiB
CPU user	1022.1s
CPU sys	74.0s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	29.4 GiB
Avg memory	22.9 GiB
CPU user	1021.1s
CPU sys	75.3s
Peak spill	0 B

File an issue against this benchmark runner

The 1000 ns/row threshold was wishful thinking. Re-derived per-row cost on the benchmark hardware (Neoverse-V2 ARM): for ClickBench Q18, partial agg costs ~100-200 ns/row, well below 1000. M-series MBP from the issue report is similar (~170 ns/row reading back from the 17 s / 100 M figure). 100 ns/row is roughly the floor of a hash-table probe + insert on modern CPUs, so anything above that is in the "meaningful per-row overhead" band where partial agg can plausibly be net-negative. The 0.3 cost_min_ratio guard keeps low-cardinality / high-reduction queries (like ClickBench Q35) safe — they sit below 0.3 and never enter this branch regardless of per-row cost.

zhuqi-lucas · 2026-05-26T07:17:26Z

run benchmark clickbench

zhuqi-lucas · 2026-05-26T07:17:36Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T07:19:53Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541539131-325-p7crj 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (5c10375) to bdf8a6d (merge-base) diff
BENCH_NAME=clickbench
BENCH_COMMAND=cargo bench --features=parquet --bench clickbench
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T07:19:58Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    unions_to_filter
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

adriangbot · 2026-05-26T07:20:04Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541540268-326-4gjvk 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (5c10375) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T07:41:16Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.17 / 4.64 ±6.87 / 18.37 ms │          1.14 / 4.69 ±7.01 / 18.72 ms │     no change │
│ QQuery 1  │        12.57 / 12.85 ±0.16 / 13.00 ms │        12.09 / 12.54 ±0.26 / 12.78 ms │     no change │
│ QQuery 2  │        37.03 / 37.20 ±0.18 / 37.53 ms │        35.75 / 36.03 ±0.23 / 36.41 ms │     no change │
│ QQuery 3  │        30.85 / 31.60 ±0.93 / 33.35 ms │        30.74 / 31.05 ±0.30 / 31.60 ms │     no change │
│ QQuery 4  │     227.65 / 230.17 ±2.08 / 233.95 ms │     225.91 / 227.86 ±2.17 / 231.70 ms │     no change │
│ QQuery 5  │     273.28 / 279.27 ±4.45 / 285.20 ms │     269.83 / 273.50 ±3.72 / 280.59 ms │     no change │
│ QQuery 6  │           1.22 / 1.47 ±0.26 / 1.80 ms │           1.22 / 1.37 ±0.23 / 1.83 ms │ +1.07x faster │
│ QQuery 7  │        13.93 / 14.08 ±0.17 / 14.41 ms │        13.70 / 13.89 ±0.14 / 14.05 ms │     no change │
│ QQuery 8  │     323.04 / 325.35 ±2.45 / 330.10 ms │     320.40 / 325.93 ±4.77 / 334.71 ms │     no change │
│ QQuery 9  │    450.31 / 466.46 ±10.47 / 478.59 ms │    450.46 / 461.66 ±17.07 / 495.63 ms │     no change │
│ QQuery 10 │        69.27 / 72.20 ±4.02 / 80.10 ms │        70.69 / 71.70 ±1.03 / 73.57 ms │     no change │
│ QQuery 11 │        80.56 / 81.57 ±0.67 / 82.46 ms │        80.07 / 84.42 ±5.43 / 94.91 ms │     no change │
│ QQuery 12 │     267.57 / 270.51 ±2.38 / 274.48 ms │     264.54 / 268.83 ±4.54 / 276.93 ms │     no change │
│ QQuery 13 │     371.41 / 380.29 ±9.89 / 397.58 ms │    370.16 / 398.00 ±18.35 / 426.17 ms │     no change │
│ QQuery 14 │     282.14 / 287.37 ±3.30 / 291.93 ms │    288.25 / 310.29 ±14.60 / 327.27 ms │  1.08x slower │
│ QQuery 15 │     270.70 / 278.36 ±6.38 / 288.45 ms │    266.58 / 281.82 ±12.26 / 299.07 ms │     no change │
│ QQuery 16 │     613.02 / 623.15 ±5.36 / 629.05 ms │     615.78 / 630.35 ±7.83 / 638.39 ms │     no change │
│ QQuery 17 │     646.19 / 661.53 ±9.41 / 675.22 ms │    620.25 / 631.96 ±13.46 / 657.52 ms │     no change │
│ QQuery 18 │ 1310.08 / 1349.11 ±24.28 / 1384.90 ms │ 1271.26 / 1293.91 ±17.43 / 1322.39 ms │     no change │
│ QQuery 19 │        28.52 / 32.05 ±6.24 / 44.50 ms │        27.39 / 27.82 ±0.43 / 28.60 ms │ +1.15x faster │
│ QQuery 20 │    530.38 / 540.05 ±11.43 / 554.22 ms │     519.27 / 530.21 ±8.23 / 541.81 ms │     no change │
│ QQuery 21 │     604.04 / 609.78 ±5.84 / 620.41 ms │     597.20 / 599.85 ±2.71 / 605.06 ms │     no change │
│ QQuery 22 │ 1103.33 / 1116.63 ±11.59 / 1137.61 ms │ 1091.96 / 1108.66 ±13.79 / 1127.50 ms │     no change │
│ QQuery 23 │ 3200.29 / 3271.15 ±71.17 / 3407.07 ms │ 3310.05 / 3367.10 ±36.94 / 3412.51 ms │     no change │
│ QQuery 24 │        41.05 / 41.55 ±0.63 / 42.76 ms │       41.18 / 51.62 ±14.40 / 78.70 ms │  1.24x slower │
│ QQuery 25 │     111.61 / 116.72 ±9.73 / 136.18 ms │     110.50 / 113.22 ±2.96 / 118.92 ms │     no change │
│ QQuery 26 │        41.53 / 42.56 ±0.72 / 43.54 ms │        41.65 / 41.95 ±0.46 / 42.86 ms │     no change │
│ QQuery 27 │    672.74 / 697.87 ±14.09 / 715.08 ms │    679.94 / 697.75 ±10.25 / 706.93 ms │     no change │
│ QQuery 28 │ 3023.10 / 3038.62 ±16.21 / 3062.56 ms │ 3023.12 / 3071.95 ±35.89 / 3127.00 ms │     no change │
│ QQuery 29 │        40.30 / 43.55 ±4.63 / 52.76 ms │       40.11 / 48.64 ±11.33 / 69.02 ms │  1.12x slower │
│ QQuery 30 │    304.68 / 312.92 ±10.30 / 330.27 ms │    297.11 / 324.34 ±35.45 / 393.61 ms │     no change │
│ QQuery 31 │     286.68 / 293.91 ±5.04 / 302.16 ms │     280.26 / 288.72 ±5.95 / 294.24 ms │     no change │
│ QQuery 32 │  983.32 / 1024.54 ±22.57 / 1048.63 ms │   937.33 / 969.48 ±24.40 / 1010.79 ms │ +1.06x faster │
│ QQuery 33 │ 1565.29 / 1596.64 ±27.90 / 1639.47 ms │ 1431.27 / 1476.56 ±25.80 / 1504.78 ms │ +1.08x faster │
│ QQuery 34 │ 1525.52 / 1630.70 ±62.17 / 1692.56 ms │ 1544.37 / 1603.14 ±45.23 / 1664.96 ms │     no change │
│ QQuery 35 │    278.85 / 322.55 ±31.11 / 368.30 ms │    308.06 / 339.70 ±40.72 / 417.13 ms │  1.05x slower │
│ QQuery 36 │        65.33 / 73.74 ±9.53 / 91.89 ms │      76.87 / 87.31 ±14.52 / 115.14 ms │  1.18x slower │
│ QQuery 37 │        35.73 / 39.83 ±3.83 / 46.46 ms │        36.80 / 42.37 ±4.10 / 47.24 ms │  1.06x slower │
│ QQuery 38 │        39.54 / 44.81 ±5.26 / 54.67 ms │        41.68 / 43.22 ±1.33 / 44.89 ms │     no change │
│ QQuery 39 │     154.67 / 164.56 ±7.95 / 173.78 ms │     126.41 / 129.86 ±4.88 / 139.52 ms │ +1.27x faster │
│ QQuery 40 │        15.05 / 18.34 ±4.65 / 27.47 ms │        14.72 / 17.67 ±3.21 / 22.81 ms │     no change │
│ QQuery 41 │        14.85 / 14.96 ±0.13 / 15.20 ms │        13.90 / 14.54 ±0.49 / 15.28 ms │     no change │
│ QQuery 42 │        13.99 / 14.15 ±0.14 / 14.40 ms │        13.58 / 13.77 ±0.13 / 13.98 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20509.34ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 20369.26ms │
│ Average Time (HEAD)                           │   476.96ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   473.70ms │
│ Queries Faster                                │          5 │
│ Queries Slower                                │          6 │
│ Queries with No Change                        │         32 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	30.8 GiB
Avg memory	23.0 GiB
CPU user	1049.2s
CPU sys	81.0s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	105.0s
Peak memory	30.0 GiB
Avg memory	23.1 GiB
CPU user	1040.6s
CPU sys	81.4s
Peak spill	0 B

File an issue against this benchmark runner

Round 1 (cost_ns_per_row > 1000) didn't fire on Q18 because partial agg per-row cost at probe close (first 100k rows, hash table still small) is ~100-200 ns on the ARM bot, not 1000+. Lowering to 100 didn't help either — the measured cost at probe time underestimates the eventual asymptotic cost, so a single-shot probe-time threshold is fundamentally fragile. Pivot: - Drop `skip_partial_aggregation_cost_ns_per_row` entirely from the decision. Rule 2 is now a pure ratio check: skip when `ratio >= cost_min_ratio` (default 0.5). - This matches the empirical finding in the issue body: ratio_threshold = 0.6 makes Q18 1.73× faster on M-series. 0.5 is conservative around that — the 0.3 cost_min_ratio guard from before is gone. - Add two diagnostic gauges (always recorded, regardless of which rule fires): * `partial_agg_probe_ns_per_row` — measured per-row wall time * `partial_agg_probe_ratio_per_mille` — ratio × 1000 EXPLAIN ANALYZE shows these so we can revisit a real cost-aware rule later with actual numbers instead of guessing thresholds. Why keep `use_cost_model` as the flag name even though it isn't cost-aware anymore: the gauges (the basis for a future cost-aware rule) ride alongside, and we want a single opt-in surface that graduates from "lower ratio threshold" to "cost-aware" without churning configs. Unit tests rewritten to match: 5 tests covering off/on, fires/doesn't fire, fixed-rule precedence, and gauge recording.

zhuqi-lucas · 2026-05-26T09:36:34Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T09:39:50Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4542914270-327-27lk2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (e6a98fe) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T10:01:01Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.16 / 4.57 ±6.78 / 18.14 ms │          1.18 / 4.83 ±7.14 / 19.10 ms │  1.06x slower │
│ QQuery 1  │        12.21 / 12.49 ±0.16 / 12.66 ms │        12.75 / 13.20 ±0.47 / 14.09 ms │  1.06x slower │
│ QQuery 2  │        35.25 / 35.59 ±0.27 / 35.97 ms │        35.83 / 36.22 ±0.36 / 36.65 ms │     no change │
│ QQuery 3  │        30.31 / 30.87 ±0.59 / 32.01 ms │        31.42 / 31.95 ±0.53 / 32.95 ms │     no change │
│ QQuery 4  │     219.39 / 225.19 ±3.24 / 228.79 ms │     240.95 / 243.12 ±1.69 / 245.59 ms │  1.08x slower │
│ QQuery 5  │     268.06 / 272.04 ±2.41 / 275.41 ms │     279.27 / 283.91 ±2.34 / 285.52 ms │     no change │
│ QQuery 6  │           1.20 / 1.35 ±0.23 / 1.80 ms │           1.24 / 1.40 ±0.23 / 1.84 ms │     no change │
│ QQuery 7  │        13.54 / 13.95 ±0.28 / 14.36 ms │        13.78 / 15.16 ±2.09 / 19.32 ms │  1.09x slower │
│ QQuery 8  │     320.27 / 323.82 ±2.18 / 327.01 ms │     327.00 / 333.26 ±7.23 / 346.93 ms │     no change │
│ QQuery 9  │    455.85 / 490.87 ±28.34 / 528.55 ms │    448.66 / 505.40 ±35.71 / 557.69 ms │     no change │
│ QQuery 10 │        74.19 / 75.97 ±1.30 / 77.27 ms │      72.78 / 80.30 ±12.72 / 105.64 ms │  1.06x slower │
│ QQuery 11 │        87.51 / 87.84 ±0.29 / 88.30 ms │       85.61 / 89.89 ±6.64 / 103.04 ms │     no change │
│ QQuery 12 │    266.72 / 276.26 ±11.91 / 299.36 ms │    238.28 / 255.37 ±14.18 / 274.95 ms │ +1.08x faster │
│ QQuery 13 │    365.79 / 378.86 ±11.86 / 394.87 ms │     357.01 / 367.08 ±9.25 / 384.62 ms │     no change │
│ QQuery 14 │     285.98 / 289.99 ±3.39 / 294.76 ms │     251.36 / 258.40 ±5.34 / 265.23 ms │ +1.12x faster │
│ QQuery 15 │     272.40 / 277.74 ±4.46 / 283.63 ms │     267.56 / 268.95 ±1.36 / 271.36 ms │     no change │
│ QQuery 16 │     609.47 / 625.38 ±9.58 / 639.53 ms │    617.01 / 635.93 ±14.36 / 659.85 ms │     no change │
│ QQuery 17 │     614.82 / 627.93 ±8.23 / 640.38 ms │     628.89 / 641.52 ±7.56 / 652.57 ms │     no change │
│ QQuery 18 │ 1230.87 / 1274.09 ±30.24 / 1307.89 ms │  977.51 / 1007.37 ±22.96 / 1034.98 ms │ +1.26x faster │
│ QQuery 19 │       27.46 / 41.97 ±17.09 / 63.90 ms │       27.24 / 36.81 ±17.95 / 72.70 ms │ +1.14x faster │
│ QQuery 20 │     521.25 / 529.50 ±7.85 / 542.79 ms │     515.49 / 520.95 ±4.52 / 529.07 ms │     no change │
│ QQuery 21 │     604.44 / 610.33 ±3.75 / 615.01 ms │     595.09 / 599.02 ±3.18 / 604.29 ms │     no change │
│ QQuery 22 │  1088.08 / 1094.56 ±8.79 / 1111.69 ms │ 1054.27 / 1081.70 ±18.59 / 1108.66 ms │     no change │
│ QQuery 23 │ 3232.06 / 3315.37 ±70.31 / 3424.33 ms │ 3214.09 / 3306.99 ±65.15 / 3393.77 ms │     no change │
│ QQuery 24 │       42.01 / 49.96 ±14.57 / 79.06 ms │        42.35 / 46.30 ±5.99 / 58.17 ms │ +1.08x faster │
│ QQuery 25 │     111.44 / 113.72 ±2.92 / 119.38 ms │     114.20 / 116.05 ±1.48 / 118.40 ms │     no change │
│ QQuery 26 │        41.17 / 41.69 ±0.43 / 42.20 ms │        42.29 / 44.31 ±2.54 / 49.20 ms │  1.06x slower │
│ QQuery 27 │     671.11 / 677.84 ±6.58 / 687.67 ms │    664.65 / 683.54 ±10.80 / 697.73 ms │     no change │
│ QQuery 28 │ 3020.49 / 3054.32 ±38.05 / 3124.71 ms │ 3040.59 / 3090.00 ±30.44 / 3131.64 ms │     no change │
│ QQuery 29 │       39.92 / 51.22 ±10.16 / 66.62 ms │        39.69 / 40.92 ±0.72 / 41.82 ms │ +1.25x faster │
│ QQuery 30 │     295.09 / 303.53 ±9.72 / 322.49 ms │     265.14 / 276.42 ±8.52 / 290.79 ms │ +1.10x faster │
│ QQuery 31 │     281.68 / 289.25 ±5.94 / 297.88 ms │    275.66 / 292.52 ±10.91 / 308.29 ms │     no change │
│ QQuery 32 │    907.69 / 923.56 ±17.36 / 956.28 ms │    911.69 / 959.23 ±31.78 / 999.13 ms │     no change │
│ QQuery 33 │ 1437.38 / 1479.83 ±37.75 / 1546.04 ms │ 1475.27 / 1525.30 ±52.51 / 1610.48 ms │     no change │
│ QQuery 34 │ 1541.17 / 1574.38 ±20.14 / 1596.35 ms │ 1436.55 / 1482.51 ±36.79 / 1526.15 ms │ +1.06x faster │
│ QQuery 35 │    310.04 / 329.26 ±21.84 / 357.70 ms │    270.82 / 313.47 ±71.76 / 456.80 ms │     no change │
│ QQuery 36 │        67.63 / 73.90 ±5.47 / 83.88 ms │        66.38 / 67.07 ±0.89 / 68.74 ms │ +1.10x faster │
│ QQuery 37 │        36.17 / 43.30 ±5.63 / 51.12 ms │        34.48 / 36.11 ±2.10 / 40.19 ms │ +1.20x faster │
│ QQuery 38 │        42.04 / 46.10 ±2.47 / 48.63 ms │        46.46 / 50.11 ±3.81 / 57.49 ms │  1.09x slower │
│ QQuery 39 │     143.84 / 158.70 ±7.92 / 167.55 ms │     103.96 / 108.60 ±3.70 / 112.37 ms │ +1.46x faster │
│ QQuery 40 │        13.61 / 13.99 ±0.29 / 14.46 ms │        13.14 / 13.59 ±0.25 / 13.83 ms │     no change │
│ QQuery 41 │        13.66 / 15.21 ±2.31 / 19.68 ms │        12.98 / 13.18 ±0.16 / 13.38 ms │ +1.15x faster │
│ QQuery 42 │        13.09 / 14.84 ±3.12 / 21.05 ms │        12.49 / 12.83 ±0.22 / 13.19 ms │ +1.16x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20171.13ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19790.79ms │
│ Average Time (HEAD)                           │   469.10ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   460.25ms │
│ Queries Faster                                │         13 │
│ Queries Slower                                │          7 │
│ Queries with No Change                        │         23 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	29.5 GiB
Avg memory	22.5 GiB
CPU user	1037.9s
CPU sys	75.3s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	31.9 GiB
Avg memory	23.2 GiB
CPU user	1014.6s
CPU sys	73.0s
Peak spill	0 B

File an issue against this benchmark runner

The default-true was a temporary flip so the benchmarking bot would exercise the new code path. Revert before merge — opt-in stays the contract until the cost-aware variant lands.

Replace the fixed lower-ratio rule with measurement-driven A/B sampling. After the initial partial-probe window closes (100k rows by default), the operator routes the next 10k rows through the passthrough path to measure `passthrough_ns/row`, then compares it against the previously measured `partial_ns/row` via a closed-form cost crossover: cost_keep_partial = partial_ns × N + final_ns × N × ratio cost_skip = passthrough_ns × N + final_ns × N assuming final_ns ≈ partial_ns (similar hash-table mechanics): skip wins ⇔ ratio > passthrough_ns / partial_ns The crossover is set entirely by the two measured numbers — no magic constant, no hardcoded ratio. Rule 1 (ratio >= 0.8) still short-circuits before A/B, preserving the legacy cheap path. State machine extensions: - New `ExecutionState::AbSampling` mirrors `SkippingAggregation` (input → `transform_to_states` → output) but *keeps the partial hash table* — if A/B decides to keep partial, the stream reverts to `ReadingInput` and the hash table continues accumulating. - `ProbePhase` enum (Partial / AbSampling / Locked) inside `SkipAggregationProbe` drives the transitions. Diagnostic gauges exposed via EXPLAIN ANALYZE: - `partial_agg_probe_partial_ns_per_row` — measured at probe close - `partial_agg_probe_passthrough_ns_per_row` — measured at A/B close - `partial_agg_probe_ratio_per_mille` — ratio × 1000 - `partial_agg_probe_cost_decision_skip` — 1 if cost said skip, 0 if keep Config: - `skip_partial_aggregation_use_cost_model` (bool, default false) — opt-in switch. With it off, behaviour is exactly the legacy bare ratio check. - `skip_partial_aggregation_ab_sampling_rows` (usize, default 10_000) — size of the A/B sampling window. - Drops `skip_partial_aggregation_cost_min_ratio` and `skip_partial_aggregation_cost_ns_per_row` from the previous iteration of this PR — they were magic-constant gates that the cost-aware formula obsoletes. 7 `SkipAggregationProbe` unit tests cover: - cost-model-off matches legacy ratio check - cost-model-on short-circuits on Rule 1 (no A/B needed) - A/B sampling entry transition - cost decision chooses skip when partial expensive - cost decision chooses keep when passthrough not much cheaper - A/B window accumulates across multiple batches - diagnostic gauges record at every transition Existing 100 aggregate tests + 10 aggregate SLT files still pass.

Same temporary flip as before — benchmarking bot uses default config, so cost-aware A/B sampling would never run otherwise. Revert this commit before merge; the contract stays opt-in until we have data on which to base a default change.

zhuqi-lucas · 2026-05-26T12:41:04Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T12:43:49Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4544098758-328-k67ld 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (df6e264) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T13:04:37Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.15 / 4.68 ±6.91 / 18.49 ms │          1.15 / 4.69 ±6.97 / 18.62 ms │     no change │
│ QQuery 1  │        12.48 / 12.89 ±0.25 / 13.17 ms │        12.61 / 13.06 ±0.27 / 13.46 ms │     no change │
│ QQuery 2  │        35.58 / 35.90 ±0.28 / 36.38 ms │        35.60 / 35.91 ±0.24 / 36.33 ms │     no change │
│ QQuery 3  │        30.58 / 31.24 ±0.64 / 32.28 ms │        30.71 / 31.17 ±0.26 / 31.50 ms │     no change │
│ QQuery 4  │     224.83 / 229.33 ±4.11 / 234.47 ms │     229.57 / 236.04 ±4.36 / 241.88 ms │     no change │
│ QQuery 5  │     272.66 / 276.42 ±3.40 / 282.80 ms │     273.84 / 276.31 ±1.73 / 278.65 ms │     no change │
│ QQuery 6  │           1.16 / 1.31 ±0.23 / 1.76 ms │           1.18 / 1.32 ±0.23 / 1.77 ms │     no change │
│ QQuery 7  │        13.81 / 13.92 ±0.06 / 13.98 ms │        14.31 / 14.39 ±0.09 / 14.55 ms │     no change │
│ QQuery 8  │     323.26 / 326.81 ±2.75 / 329.99 ms │     333.76 / 339.68 ±4.91 / 346.89 ms │     no change │
│ QQuery 9  │     461.80 / 465.11 ±3.95 / 472.77 ms │     454.00 / 466.95 ±8.85 / 476.86 ms │     no change │
│ QQuery 10 │        69.42 / 72.07 ±2.96 / 77.61 ms │        70.21 / 73.16 ±4.87 / 82.82 ms │     no change │
│ QQuery 11 │        81.62 / 82.62 ±0.82 / 83.78 ms │        79.43 / 82.46 ±2.30 / 86.38 ms │     no change │
│ QQuery 12 │     274.50 / 276.58 ±1.72 / 278.73 ms │     253.77 / 257.56 ±2.48 / 260.96 ms │ +1.07x faster │
│ QQuery 13 │    376.44 / 388.93 ±13.21 / 406.23 ms │    370.52 / 393.82 ±11.91 / 403.20 ms │     no change │
│ QQuery 14 │     286.35 / 293.18 ±7.91 / 307.87 ms │     268.76 / 275.91 ±7.77 / 290.09 ms │ +1.06x faster │
│ QQuery 15 │     275.06 / 284.05 ±6.42 / 293.92 ms │     278.11 / 287.50 ±6.32 / 294.09 ms │     no change │
│ QQuery 16 │    621.48 / 637.72 ±14.24 / 656.31 ms │    630.27 / 641.57 ±11.64 / 663.51 ms │     no change │
│ QQuery 17 │     626.48 / 635.40 ±7.46 / 645.95 ms │     637.12 / 641.28 ±5.74 / 652.25 ms │     no change │
│ QQuery 18 │ 1278.88 / 1302.21 ±17.06 / 1325.58 ms │ 1139.46 / 1162.80 ±18.97 / 1193.86 ms │ +1.12x faster │
│ QQuery 19 │        27.71 / 27.81 ±0.07 / 27.91 ms │        27.48 / 29.22 ±3.03 / 35.26 ms │  1.05x slower │
│ QQuery 20 │    525.71 / 549.99 ±36.32 / 621.87 ms │     523.20 / 529.02 ±6.30 / 540.91 ms │     no change │
│ QQuery 21 │     595.21 / 599.63 ±5.39 / 609.97 ms │     593.94 / 601.36 ±5.56 / 608.33 ms │     no change │
│ QQuery 22 │  1064.22 / 1073.93 ±8.85 / 1084.80 ms │  1059.57 / 1066.73 ±5.09 / 1072.83 ms │     no change │
│ QQuery 23 │ 3219.08 / 3242.42 ±13.61 / 3261.41 ms │ 3188.16 / 3227.53 ±24.74 / 3262.72 ms │     no change │
│ QQuery 24 │        41.40 / 44.49 ±4.50 / 53.34 ms │        41.32 / 42.99 ±2.11 / 47.06 ms │     no change │
│ QQuery 25 │     111.07 / 116.95 ±5.52 / 125.83 ms │     112.84 / 114.53 ±1.55 / 117.20 ms │     no change │
│ QQuery 26 │        42.29 / 42.63 ±0.24 / 42.98 ms │        41.56 / 45.03 ±5.60 / 56.13 ms │  1.06x slower │
│ QQuery 27 │     670.74 / 680.11 ±9.89 / 695.73 ms │     665.23 / 673.68 ±4.72 / 678.90 ms │     no change │
│ QQuery 28 │ 3040.39 / 3066.12 ±23.61 / 3100.57 ms │ 3054.40 / 3088.22 ±21.67 / 3119.29 ms │     no change │
│ QQuery 29 │        40.36 / 47.86 ±8.59 / 60.60 ms │        40.11 / 44.48 ±5.31 / 54.30 ms │ +1.08x faster │
│ QQuery 30 │     304.80 / 311.17 ±6.52 / 323.11 ms │     274.25 / 287.75 ±7.29 / 294.55 ms │ +1.08x faster │
│ QQuery 31 │     292.41 / 300.35 ±7.20 / 313.18 ms │     286.06 / 296.85 ±7.24 / 306.23 ms │     no change │
│ QQuery 32 │    953.14 / 972.74 ±16.98 / 999.09 ms │    957.92 / 969.77 ±13.33 / 995.46 ms │     no change │
│ QQuery 33 │ 1491.21 / 1504.26 ±19.19 / 1542.42 ms │ 1472.27 / 1508.58 ±26.68 / 1543.56 ms │     no change │
│ QQuery 34 │ 1488.48 / 1517.55 ±25.60 / 1559.77 ms │  1511.60 / 1519.80 ±4.71 / 1524.50 ms │     no change │
│ QQuery 35 │    287.43 / 303.36 ±13.82 / 322.31 ms │    300.26 / 328.80 ±52.33 / 433.31 ms │  1.08x slower │
│ QQuery 36 │        66.45 / 72.71 ±6.12 / 82.66 ms │        56.05 / 68.95 ±6.85 / 76.62 ms │ +1.05x faster │
│ QQuery 37 │        37.57 / 44.07 ±5.05 / 52.59 ms │       36.44 / 42.73 ±11.58 / 65.84 ms │     no change │
│ QQuery 38 │        41.70 / 43.39 ±1.73 / 45.65 ms │        41.78 / 45.08 ±2.87 / 48.98 ms │     no change │
│ QQuery 39 │     136.21 / 148.50 ±7.02 / 157.49 ms │     112.58 / 116.45 ±3.08 / 120.82 ms │ +1.28x faster │
│ QQuery 40 │        13.97 / 19.01 ±5.97 / 27.58 ms │        14.34 / 19.91 ±5.86 / 29.62 ms │     no change │
│ QQuery 41 │        13.66 / 13.85 ±0.11 / 14.00 ms │        13.90 / 14.17 ±0.33 / 14.80 ms │     no change │
│ QQuery 42 │        13.31 / 13.52 ±0.22 / 13.85 ms │        13.11 / 13.80 ±0.72 / 15.09 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20126.77ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19930.96ms │
│ Average Time (HEAD)                           │   468.06ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   463.51ms │
│ Queries Faster                                │          7 │
│ Queries Slower                                │          3 │
│ Queries with No Change                        │         33 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	30.0 GiB
Avg memory	23.0 GiB
CPU user	1034.3s
CPU sys	75.1s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	30.4 GiB
Avg memory	23.1 GiB
CPU user	1023.1s
CPU sys	74.6s
Peak spill	0 B

File an issue against this benchmark runner

Phase 2 of cost-aware partial-agg skip. Instead of one final decision per partition, the probe rewinds back to the partial-probe phase after `re_probe_interval_rows` rows (default 1M) and re-runs the partial+A/B sampling cycle on the next segment. Lets a single partition oscillate between partial and skip as the data distribution shifts (dense burst of repeated keys followed by high-cardinality stretch, etc.). State machine: `ProbePhase::Locked { should_skip }` becomes `ProbePhase::Active { should_skip, rows_since_decision }`. Per-batch: - In keep-partial: `observe_partial_batch` increments `rows_since_decision`. At the threshold, `start_reprobe` resets the probe (phase = Partial, counters cleared, `elapsed_compute_at_probe_start` re-snapshotted, `is_locked` = false). - In skip: `tick_skip_batch` does the same from the `SkippingAggregation` exec-state arm. When re-probe fires, the main loop transitions back to `ReadingInput` so the partial-agg path runs on the next batch (fresh hash table, since the previous one was emitted on entry to skip). Final-agg correctness is unaffected: each segment's output (be it emitted partial state or per-row passthrough state) is associative- commutative and merges naturally downstream. New config: - `skip_partial_aggregation_re_probe_interval_rows` (usize, default 1_000_000). Set to 0 to disable re-probing entirely (one-shot decision, the Phase 1 behaviour). New diagnostic counter: - `partial_agg_probe_segment_count` — number of completed segments in the current partition. 0 means the probe ran once and never re-probed; a large value on a fast query suggests the interval is too small. Three new `SkipAggregationProbe` unit tests cover: - re-probe after a committed skip decision rewinds to `Partial` - re-probe after a committed keep decision rewinds to `Partial` - `re_probe_interval_rows = 0` disables re-probing Existing test `test_skip_aggregation_probe_not_locked_until_skip` explicitly disables the cost-aware path (it exercises a legacy Rule 1 corner case the cost model would intercept differently).

zhuqi-lucas · 2026-05-26T13:37:31Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T13:39:41Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4544553176-329-zpdd7 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (c716f33) to 2453bec (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T14:00:37Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.18 / 4.72 ±6.98 / 18.68 ms │          1.21 / 4.81 ±7.08 / 18.96 ms │     no change │
│ QQuery 1  │        12.45 / 12.89 ±0.29 / 13.28 ms │        12.52 / 12.87 ±0.24 / 13.24 ms │     no change │
│ QQuery 2  │        35.55 / 35.86 ±0.29 / 36.24 ms │        36.32 / 36.47 ±0.22 / 36.89 ms │     no change │
│ QQuery 3  │        30.41 / 30.90 ±0.68 / 32.23 ms │        31.15 / 31.31 ±0.16 / 31.61 ms │     no change │
│ QQuery 4  │     221.87 / 224.54 ±2.56 / 229.42 ms │     204.41 / 207.97 ±3.29 / 212.95 ms │ +1.08x faster │
│ QQuery 5  │     269.97 / 274.75 ±3.86 / 279.90 ms │     293.59 / 300.68 ±6.25 / 308.65 ms │  1.09x slower │
│ QQuery 6  │           1.19 / 1.35 ±0.23 / 1.81 ms │           1.21 / 1.36 ±0.22 / 1.78 ms │     no change │
│ QQuery 7  │        13.73 / 13.99 ±0.14 / 14.13 ms │        13.75 / 13.87 ±0.09 / 14.00 ms │     no change │
│ QQuery 8  │     320.78 / 328.84 ±4.63 / 334.55 ms │                                  FAIL │  incomparable │
│ QQuery 9  │     454.63 / 463.19 ±4.56 / 467.49 ms │    460.89 / 472.99 ±12.76 / 497.41 ms │     no change │
│ QQuery 10 │        69.29 / 70.27 ±1.11 / 72.42 ms │        71.10 / 73.32 ±4.22 / 81.76 ms │     no change │
│ QQuery 11 │        79.76 / 81.65 ±1.75 / 84.95 ms │        80.64 / 83.74 ±3.38 / 89.89 ms │     no change │
│ QQuery 12 │     266.03 / 271.93 ±6.02 / 281.43 ms │     261.01 / 266.00 ±3.05 / 269.85 ms │     no change │
│ QQuery 13 │    370.50 / 383.80 ±15.84 / 411.68 ms │     382.98 / 396.03 ±8.97 / 407.49 ms │     no change │
│ QQuery 14 │     281.09 / 286.83 ±5.89 / 296.46 ms │                                  FAIL │  incomparable │
│ QQuery 15 │     269.00 / 276.19 ±5.54 / 282.94 ms │    253.65 / 274.17 ±18.94 / 303.25 ms │     no change │
│ QQuery 16 │     622.91 / 627.43 ±4.90 / 635.98 ms │                                  FAIL │  incomparable │
│ QQuery 17 │     630.86 / 647.04 ±8.29 / 653.39 ms │                                  FAIL │  incomparable │
│ QQuery 18 │ 1276.97 / 1304.93 ±18.66 / 1331.74 ms │                                  FAIL │  incomparable │
│ QQuery 19 │        27.48 / 28.70 ±2.16 / 33.01 ms │        27.49 / 32.13 ±6.65 / 45.21 ms │  1.12x slower │
│ QQuery 20 │    516.47 / 528.35 ±11.87 / 548.41 ms │    519.56 / 526.99 ±10.22 / 546.94 ms │     no change │
│ QQuery 21 │     590.44 / 596.24 ±5.21 / 604.91 ms │     595.52 / 597.31 ±1.51 / 599.44 ms │     no change │
│ QQuery 22 │  1063.07 / 1071.01 ±5.75 / 1078.89 ms │ 1068.62 / 1076.76 ±10.29 / 1096.25 ms │     no change │
│ QQuery 23 │ 3187.30 / 3266.59 ±74.85 / 3405.01 ms │ 3186.55 / 3216.99 ±17.77 / 3241.70 ms │     no change │
│ QQuery 24 │       42.95 / 51.86 ±12.32 / 75.38 ms │       41.38 / 49.33 ±15.06 / 79.44 ms │     no change │
│ QQuery 25 │     112.17 / 117.99 ±6.60 / 129.27 ms │    112.30 / 121.06 ±15.56 / 152.12 ms │     no change │
│ QQuery 26 │        42.00 / 42.54 ±0.46 / 43.33 ms │        42.17 / 47.03 ±8.04 / 63.03 ms │  1.11x slower │
│ QQuery 27 │    677.12 / 687.54 ±14.79 / 716.90 ms │    669.83 / 683.37 ±12.27 / 705.25 ms │     no change │
│ QQuery 28 │ 3052.39 / 3099.10 ±30.01 / 3141.78 ms │ 3124.12 / 3178.31 ±58.85 / 3288.20 ms │     no change │
│ QQuery 29 │        40.11 / 43.05 ±4.27 / 51.46 ms │       40.11 / 47.93 ±15.29 / 78.51 ms │  1.11x slower │
│ QQuery 30 │     302.53 / 312.23 ±7.03 / 323.81 ms │                                  FAIL │  incomparable │
│ QQuery 31 │     280.81 / 291.58 ±7.32 / 303.49 ms │    290.32 / 306.04 ±13.04 / 329.66 ms │     no change │
│ QQuery 32 │  968.09 / 1019.84 ±31.94 / 1053.04 ms │    938.93 / 964.53 ±15.80 / 988.36 ms │ +1.06x faster │
│ QQuery 33 │ 1481.45 / 1511.26 ±21.39 / 1540.16 ms │ 1330.36 / 1357.00 ±17.56 / 1378.17 ms │ +1.11x faster │
│ QQuery 34 │ 1527.47 / 1538.89 ±12.25 / 1557.28 ms │ 1375.21 / 1395.77 ±13.17 / 1407.51 ms │ +1.10x faster │
│ QQuery 35 │    283.13 / 300.78 ±17.46 / 324.34 ms │    253.59 / 279.95 ±26.70 / 314.98 ms │ +1.07x faster │
│ QQuery 36 │        65.79 / 68.94 ±2.38 / 73.16 ms │        55.65 / 67.21 ±8.33 / 81.36 ms │     no change │
│ QQuery 37 │        34.78 / 37.17 ±2.16 / 40.24 ms │        36.24 / 38.78 ±3.75 / 46.19 ms │     no change │
│ QQuery 38 │        40.48 / 44.40 ±3.74 / 49.30 ms │        40.07 / 42.92 ±2.42 / 46.90 ms │     no change │
│ QQuery 39 │    135.73 / 157.71 ±13.57 / 177.88 ms │     113.52 / 118.43 ±3.14 / 122.91 ms │ +1.33x faster │
│ QQuery 40 │        14.15 / 14.68 ±0.50 / 15.59 ms │        14.09 / 17.02 ±4.95 / 26.90 ms │  1.16x slower │
│ QQuery 41 │        13.77 / 16.04 ±2.64 / 20.66 ms │        13.51 / 15.44 ±3.16 / 21.72 ms │     no change │
│ QQuery 42 │        12.98 / 13.12 ±0.11 / 13.28 ms │        13.06 / 13.33 ±0.19 / 13.64 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 16693.39ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 16369.25ms │
│ Average Time (HEAD)                           │   451.17ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   442.41ms │
│ Queries Faster                                │          6 │
│ Queries Slower                                │          5 │
│ Queries with No Change                        │         26 │
│ Queries with Failure                          │          6 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	29.6 GiB
Avg memory	22.7 GiB
CPU user	1036.0s
CPU sys	78.1s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	85.0s
Peak memory	31.7 GiB
Avg memory	21.9 GiB
CPU user	844.1s
CPU sys	54.4s
Peak spill	0 B

File an issue against this benchmark runner

This reverts commit 44f815a.

zhuqi-lucas · 2026-05-26T14:25:18Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-26T14:28:28Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4545016043-330-bxrlv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (a258afe) to 2453bec (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T14:48:09Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.16 / 4.66 ±6.90 / 18.46 ms │          1.19 / 4.60 ±6.78 / 18.15 ms │     no change │
│ QQuery 1  │        12.97 / 13.41 ±0.23 / 13.61 ms │        12.57 / 12.77 ±0.15 / 12.99 ms │     no change │
│ QQuery 2  │        35.83 / 36.27 ±0.33 / 36.82 ms │        36.67 / 36.85 ±0.19 / 37.19 ms │     no change │
│ QQuery 3  │        30.31 / 30.85 ±0.57 / 31.95 ms │        30.74 / 30.79 ±0.04 / 30.85 ms │     no change │
│ QQuery 4  │     220.91 / 228.38 ±5.44 / 234.32 ms │     227.91 / 232.52 ±3.60 / 236.76 ms │     no change │
│ QQuery 5  │     272.55 / 276.08 ±3.13 / 280.95 ms │     268.52 / 272.27 ±2.80 / 275.70 ms │     no change │
│ QQuery 6  │           1.18 / 1.34 ±0.22 / 1.77 ms │           1.22 / 1.37 ±0.23 / 1.81 ms │     no change │
│ QQuery 7  │        14.21 / 14.30 ±0.09 / 14.45 ms │        13.70 / 13.95 ±0.19 / 14.28 ms │     no change │
│ QQuery 8  │     319.76 / 326.02 ±5.15 / 331.92 ms │     330.08 / 334.96 ±3.96 / 341.27 ms │     no change │
│ QQuery 9  │     455.77 / 465.40 ±7.72 / 473.34 ms │     456.61 / 462.46 ±5.25 / 468.93 ms │     no change │
│ QQuery 10 │        70.04 / 70.76 ±0.64 / 71.72 ms │        68.33 / 69.91 ±1.06 / 71.12 ms │     no change │
│ QQuery 11 │        82.10 / 83.51 ±1.33 / 85.67 ms │        80.59 / 84.02 ±2.29 / 86.73 ms │     no change │
│ QQuery 12 │     264.33 / 269.35 ±3.86 / 275.82 ms │     252.81 / 257.68 ±3.70 / 262.55 ms │     no change │
│ QQuery 13 │     370.81 / 384.58 ±9.74 / 398.44 ms │    355.66 / 384.18 ±18.14 / 411.88 ms │     no change │
│ QQuery 14 │     282.93 / 287.15 ±3.16 / 290.90 ms │     269.73 / 271.96 ±1.79 / 273.97 ms │ +1.06x faster │
│ QQuery 15 │     270.54 / 275.63 ±3.90 / 282.05 ms │    271.27 / 282.68 ±12.06 / 304.97 ms │     no change │
│ QQuery 16 │     617.43 / 624.46 ±5.24 / 633.45 ms │    621.33 / 634.86 ±17.66 / 669.40 ms │     no change │
│ QQuery 17 │     635.46 / 641.10 ±4.77 / 648.65 ms │    630.64 / 656.27 ±36.08 / 727.88 ms │     no change │
│ QQuery 18 │ 1273.80 / 1291.02 ±10.31 / 1304.14 ms │ 1121.56 / 1156.68 ±29.14 / 1194.68 ms │ +1.12x faster │
│ QQuery 19 │       27.75 / 39.55 ±11.84 / 56.35 ms │        27.38 / 27.56 ±0.15 / 27.78 ms │ +1.43x faster │
│ QQuery 20 │     520.27 / 531.05 ±6.78 / 540.04 ms │     520.53 / 525.28 ±4.63 / 532.84 ms │     no change │
│ QQuery 21 │     588.29 / 598.94 ±6.38 / 606.65 ms │     594.11 / 598.98 ±4.09 / 604.63 ms │     no change │
│ QQuery 22 │  1063.88 / 1067.93 ±2.69 / 1071.62 ms │  1060.27 / 1070.46 ±9.27 / 1087.69 ms │     no change │
│ QQuery 23 │ 3188.25 / 3210.76 ±23.26 / 3255.67 ms │ 3181.35 / 3204.56 ±14.15 / 3221.14 ms │     no change │
│ QQuery 24 │        41.01 / 41.93 ±1.02 / 43.70 ms │        41.03 / 41.83 ±0.68 / 42.96 ms │     no change │
│ QQuery 25 │     112.35 / 116.34 ±3.14 / 120.48 ms │     111.29 / 116.31 ±7.83 / 131.91 ms │     no change │
│ QQuery 26 │        41.37 / 44.83 ±5.40 / 55.55 ms │        41.34 / 42.01 ±0.66 / 43.16 ms │ +1.07x faster │
│ QQuery 27 │     678.44 / 683.40 ±5.65 / 693.16 ms │     667.10 / 679.45 ±9.09 / 694.29 ms │     no change │
│ QQuery 28 │ 3053.89 / 3080.63 ±30.79 / 3136.18 ms │ 3029.03 / 3063.54 ±21.61 / 3092.94 ms │     no change │
│ QQuery 29 │       40.42 / 51.42 ±14.51 / 76.58 ms │        40.11 / 41.86 ±3.27 / 48.39 ms │ +1.23x faster │
│ QQuery 30 │     301.01 / 306.85 ±6.01 / 317.82 ms │     283.40 / 287.41 ±4.36 / 295.29 ms │ +1.07x faster │
│ QQuery 31 │    275.46 / 294.98 ±16.78 / 325.40 ms │     288.01 / 295.07 ±4.31 / 301.62 ms │     no change │
│ QQuery 32 │   957.02 / 974.29 ±22.56 / 1018.31 ms │   921.30 / 975.61 ±32.55 / 1008.55 ms │     no change │
│ QQuery 33 │ 1460.79 / 1497.04 ±48.78 / 1592.69 ms │ 1446.79 / 1476.40 ±17.14 / 1493.64 ms │     no change │
│ QQuery 34 │ 1454.11 / 1486.08 ±18.38 / 1510.16 ms │ 1462.51 / 1476.27 ±12.80 / 1494.54 ms │     no change │
│ QQuery 35 │    275.67 / 323.30 ±90.15 / 503.57 ms │     289.49 / 297.13 ±9.80 / 315.35 ms │ +1.09x faster │
│ QQuery 36 │        66.79 / 73.58 ±7.05 / 86.12 ms │        57.62 / 63.70 ±6.86 / 76.51 ms │ +1.16x faster │
│ QQuery 37 │        35.41 / 38.69 ±5.40 / 49.38 ms │        35.16 / 36.78 ±1.82 / 39.30 ms │     no change │
│ QQuery 38 │        40.58 / 44.33 ±2.62 / 47.86 ms │        41.03 / 43.24 ±1.18 / 44.51 ms │     no change │
│ QQuery 39 │     145.54 / 153.51 ±6.08 / 163.59 ms │     112.20 / 118.12 ±4.66 / 125.66 ms │ +1.30x faster │
│ QQuery 40 │        13.61 / 14.96 ±2.22 / 19.37 ms │        13.46 / 15.06 ±2.39 / 19.78 ms │     no change │
│ QQuery 41 │        13.58 / 15.63 ±3.95 / 23.53 ms │        13.43 / 13.70 ±0.35 / 14.36 ms │ +1.14x faster │
│ QQuery 42 │        12.86 / 13.04 ±0.13 / 13.22 ms │        13.08 / 14.96 ±2.24 / 18.62 ms │  1.15x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20027.35ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19726.11ms │
│ Average Time (HEAD)                           │   465.75ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   458.75ms │
│ Queries Faster                                │         10 │
│ Queries Slower                                │          1 │
│ Queries with No Change                        │         32 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	30.3 GiB
Avg memory	22.8 GiB
CPU user	1027.4s
CPU sys	74.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	30.9 GiB
Avg memory	23.2 GiB
CPU user	1012.4s
CPU sys	73.4s
Peak spill	0 B

File an issue against this benchmark runner

The Phase 2 revert took out the previous config-disable line. Test exercises the original Rule 1 short-circuit behaviour, so it needs to opt out of the default-on A/B sampling explicitly.

github-actions Bot added documentation Improvements or additions to documentation common Related to common crate physical-plan Changes to the physical-plan crate labels May 26, 2026

github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 26, 2026

zhuqi-lucas added 2 commits May 26, 2026 15:11

Merge branch 'main' into feat/adaptive-partial-agg-cost

5c10375

zhuqi-lucas added 2 commits May 26, 2026 19:43

revert benchmark-only default flip; default back to false

c087cb4

The default-true was a temporary flip so the benchmarking bot would exercise the new code path. Revert before merge — opt-in stays the contract until the cost-aware variant lands.

ci: re-trigger after GitHub Actions infra outage

df6e264

zhuqi-lucas added 2 commits May 26, 2026 21:36

Merge branch 'main' into feat/adaptive-partial-agg-cost

c716f33

Revert "feat: segment-level re-probing for dynamic distribution shifts"

a258afe

This reverts commit 44f815a.

test: explicitly disable cost model in legacy not-locked-until-skip test

a2baa6c

The Phase 2 revert took out the previous config-disable line. Test exercises the original Rule 1 short-circuit behaviour, so it needs to opt out of the default-on A/B sampling explicitly.

zhuqi-lucas mentioned this pull request May 26, 2026

Adaptive (cost-based) decision for skip_partial_aggregation instead of fixed ratio threshold #22405

Open

Conversation

zhuqi-lucas commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

How it works

Benchmark (ClickBench partitioned, ARM Neoverse-V2 12 vCPU)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Followups

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

zhuqi-lucas commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhuqi-lucas commented May 26, 2026 •

edited

Loading