Skip to content

feat(aggregate): cost-aware partial-aggregation skip (opt-in)#22518

Draft
zhuqi-lucas wants to merge 13 commits into
apache:mainfrom
zhuqi-lucas:feat/adaptive-partial-agg-cost
Draft

feat(aggregate): cost-aware partial-aggregation skip (opt-in)#22518
zhuqi-lucas wants to merge 13 commits into
apache:mainfrom
zhuqi-lucas:feat/adaptive-partial-agg-cost

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented May 26, 2026

Which issue does this PR close?

Phase 1 prototype for #22405.

Rationale for this change

The current skip_partial_aggregation_probe_ratio_threshold (default 0.8) is a single fixed knob: when measured num_groups / input_rows ≥ 0.8, partial aggregation skips. This catches the no-reduction case but misses the medium-ratio band (~0.5–0.7) where partial aggregation is still net-negative because per-row cost is high — heavy variable-length keys, complex aggregates, etc.

ClickBench Q18 is the motivating example. Measured ratio is 0.565, well below 0.8, so partial aggregation keeps running and burns ~17 s of compute across 12 partitions for only ~40 % reduction. Lowering the global threshold to 0.6 fixes Q18 but is likely to regress lower-cost queries that benefit from partial agg at that ratio.

This PR replaces the static threshold guess with measured A/B sampling: probe the per-row cost of both partial agg and passthrough, then pick the cheaper path via a closed-form cost comparison. No magic constants.

How it works

After the existing partial probe window (probe_rows_threshold, default 100k rows) closes:

  1. Partial probe — measure partial_ns_per_row from the partial-agg path so far, and compute ratio = num_groups / input_rows.

  2. Rule 1 short-circuitratio ≥ probe_ratio_threshold (0.8) skips immediately (existing behaviour, preserved for compatibility and to save the A/B window when the answer is obvious).

  3. A/B sampling — when Rule 1 doesn't fire, route the next ab_sampling_rows (default 10k) through the passthrough (transform_to_states) path. The hash table is preserved; the passthrough output is sent downstream and merges naturally in Final agg.

  4. Cost decision — at the end of the A/B window, measure passthrough_ns_per_row and apply:

    skip ⇔ ratio > passthrough_ns_per_row / partial_ns_per_row
    

    Derived from cost_keep_partial = partial × N + final × N × ratio vs cost_skip = passthrough × N + final × N, assuming final ≈ partial (same hash-table mechanics).

  5. If the decision is skip, emit the partial hash table and continue via SkippingAggregation. If keep, return to ReadingInput and the hash table continues accumulating.

The crossover is set entirely by the two measured numbers — no magic threshold, automatically adapts to hardware and query shape.

Benchmark (ClickBench partitioned, ARM Neoverse-V2 12 vCPU)

Metric HEAD This PR Δ
Total 20027 ms 19726 ms −1.5%
Q18 1291 ms 1156 ms +1.12×
Q19 39 ms 27 ms +1.43×
Q39 153 ms 118 ms +1.30×
Q29 51 ms 41 ms +1.23×
Q36 73 ms 63 ms +1.16×
Q35 323 ms 297 ms +1.09×
...
Queries faster 10
Queries slower 1 (Q42, ~15 ms, noise)

What changes are included in this PR?

SkipAggregationProbe extended with a phased state machine:

Partial → AbSampling → Active { should_skip }

(ExecutionState::AbSampling mirrors SkippingAggregation — input goes through transform_to_states — but keeps the partial hash table.)

New datafusion.execution.* config:

  • skip_partial_aggregation_use_cost_model (bool, default true) — turns the A/B path on. Set false to fall back to the bare ratio check.
  • skip_partial_aggregation_ab_sampling_rows (usize, default 10000) — size of the passthrough sample window.

New EXPLAIN ANALYZE diagnostic gauges so users (and follow-up tuning work) can see what the probe is doing per-partition:

  • partial_agg_probe_partial_ns_per_row
  • partial_agg_probe_passthrough_ns_per_row
  • partial_agg_probe_ratio_per_mille (ratio × 1000, integer storage)
  • partial_agg_probe_cost_decision_skip (1 = cost said skip, 0 = cost said keep)

Are these changes tested?

Seven SkipAggregationProbe unit tests:

  • skip_probe_cost_model_off_matches_legacy_ratio_check — bare ratio check unchanged when cost model is off.
  • skip_probe_cost_model_short_circuits_on_high_ratio — Rule 1 still wins over A/B.
  • skip_probe_enters_ab_sampling_when_partial_window_closes — A/B transition.
  • skip_probe_cost_decision_chooses_skip_when_partial_is_expensive — cost crossover (skip).
  • skip_probe_cost_decision_chooses_keep_when_passthrough_not_much_cheaper — cost crossover (keep).
  • skip_probe_ab_window_accumulates_across_batches — sampling spans multiple input batches.
  • skip_probe_records_diagnostic_gauges — diagnostic metrics fire as expected.

Existing 100 aggregate tests + 10 aggregate SLT files still pass; cargo clippy -p datafusion-physical-plan --all-targets -- -D warnings clean.

Are there any user-facing changes?

Two additive datafusion.execution.* config options. Default behaviour for the cost-aware path is on based on the benchmark above; can be opted out via SET datafusion.execution.skip_partial_aggregation_use_cost_model = false.

Followups

  • Segment-level re-probing (was attempted in this PR but reverted — see commits 44f815a87, c506a81fb). The current implementation makes one A/B decision per partition. Re-probing every N rows would let a single partition switch direction as the data distribution shifts. Implementation hit a pre-existing GroupValues issue: emit(EmitTo::All) clears the per-column arrays but the hash→index map appears to retain stale entries, panicking on subsequent partial-agg inserts at multi_group_by/primitive.rs:156. Should be tackled as a follow-up after that reset semantic is sorted out.

  • The simplifying assumption final_ns ≈ partial_ns in the cost formula is reasonable but not exact. A more refined model could track Final-agg per-row cost separately. Possible follow-up if measured data (via the diagnostic gauges above) shows the assumption costs us in some workload.

The fixed `skip_partial_aggregation_probe_ratio_threshold` (default 0.8)
catches "the partial agg barely reduces anything" cases, but it misses
the band where the ratio is moderate (say 0.5-0.6) and partial aggregation
is *still* net-negative because per-row cost is high — heavy variable-
length keys, complex aggregates, etc. ClickBench Q18 is the motivating
example (issue apache#22405): ratio 0.565, but partial agg burns 17s of compute
across 12 partitions while reducing input only ~40%; turning the threshold
down enough to catch it would regress lower-cost queries.

Add a second, opt-in skip rule that augments the fixed-ratio check with
the measured per-row wall time of the operator. Disabled by default, so
existing behaviour is preserved.

New config (all under `datafusion.execution`):

- `skip_partial_aggregation_use_cost_model` (bool, default false) —
  turns the cost-aware rule on.
- `skip_partial_aggregation_cost_ns_per_row` (u64, default 1000) — the
  per-row wall-time floor above which the cost-aware rule fires.
- `skip_partial_aggregation_cost_min_ratio` (f64, default 0.3) — below
  this ratio partial agg is kept regardless of per-row cost (it's
  reducing too much to be worth skipping).

How it works: `SkipAggregationProbe` already runs at probe-window
boundaries and already has `baseline_metrics.elapsed_compute` ticking
through every timed block. The probe now snapshots that counter at
construction; once `probe_rows_threshold` is reached, it computes
`ns_per_row = (elapsed_compute - snapshot) / input_rows` and, if
both the per-row cost is above the floor and the ratio sits in the
medium band, switches to skip mode. The existing high-ratio rule still
fires first, so this is purely additive.

Five unit tests on `SkipAggregationProbe` cover the new branches —
cost-model-off matches the legacy ratio check, medium-ratio + high cost
skips, below-min-ratio doesn't, cheap-per-row doesn't, and the high-ratio
rule is honoured even with the cost model on.

Refs: apache#22405
@github-actions github-actions Bot added documentation Improvements or additions to documentation common Related to common crate physical-plan Changes to the physical-plan crate labels May 26, 2026
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-318-wlrsf 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-319-hmhf7 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4540857943-320-ks6ks 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (88f6d4c) to a87bdc9 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_adaptive-partial-agg-cost ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.56 / 39.95 ±1.11 / 41.85 ms │ 38.27 / 38.98 ±1.04 / 41.03 ms │ no change │
│ QQuery 2  │ 19.49 / 19.68 ±0.26 / 20.17 ms │ 19.33 / 19.99 ±0.54 / 20.88 ms │ no change │
│ QQuery 3  │ 32.22 / 34.40 ±1.72 / 37.20 ms │ 32.79 / 33.61 ±0.81 / 34.77 ms │ no change │
│ QQuery 4  │ 17.09 / 17.43 ±0.23 / 17.70 ms │ 17.16 / 17.27 ±0.09 / 17.38 ms │ no change │
│ QQuery 5  │ 40.94 / 41.27 ±0.22 / 41.56 ms │ 38.76 / 40.92 ±1.11 / 41.79 ms │ no change │
│ QQuery 6  │ 15.90 / 16.11 ±0.14 / 16.33 ms │ 15.88 / 16.40 ±0.41 / 17.11 ms │ no change │
│ QQuery 7  │ 46.25 / 47.86 ±0.97 / 48.88 ms │ 45.78 / 49.22 ±1.88 / 50.71 ms │ no change │
│ QQuery 8  │ 43.99 / 44.20 ±0.13 / 44.34 ms │ 44.00 / 44.18 ±0.11 / 44.31 ms │ no change │
│ QQuery 9  │ 48.74 / 49.77 ±1.14 / 51.56 ms │ 49.23 / 49.77 ±0.52 / 50.59 ms │ no change │
│ QQuery 10 │ 62.97 / 63.23 ±0.18 / 63.44 ms │ 62.66 / 63.10 ±0.53 / 64.13 ms │ no change │
│ QQuery 11 │ 13.15 / 13.26 ±0.11 / 13.47 ms │ 12.95 / 13.19 ±0.13 / 13.32 ms │ no change │
│ QQuery 12 │ 24.10 / 25.00 ±1.35 / 27.66 ms │ 24.05 / 25.27 ±1.95 / 29.15 ms │ no change │
│ QQuery 13 │ 33.60 / 35.12 ±2.21 / 39.40 ms │ 33.73 / 35.01 ±1.19 / 37.26 ms │ no change │
│ QQuery 14 │ 25.34 / 25.44 ±0.12 / 25.67 ms │ 25.21 / 25.43 ±0.11 / 25.49 ms │ no change │
│ QQuery 15 │ 31.01 / 31.28 ±0.20 / 31.63 ms │ 31.00 / 31.57 ±0.97 / 33.52 ms │ no change │
│ QQuery 16 │ 14.47 / 14.63 ±0.17 / 14.95 ms │ 14.72 / 14.86 ±0.09 / 14.96 ms │ no change │
│ QQuery 17 │ 74.23 / 75.60 ±1.15 / 77.69 ms │ 73.92 / 77.81 ±3.74 / 84.66 ms │ no change │
│ QQuery 18 │ 61.36 / 64.94 ±4.20 / 72.99 ms │ 60.99 / 62.63 ±1.40 / 64.68 ms │ no change │
│ QQuery 19 │ 33.59 / 34.80 ±1.47 / 37.66 ms │ 33.16 / 34.06 ±1.20 / 36.43 ms │ no change │
│ QQuery 20 │ 36.96 / 37.27 ±0.17 / 37.44 ms │ 37.04 / 37.20 ±0.14 / 37.36 ms │ no change │
│ QQuery 21 │ 56.48 / 57.19 ±0.38 / 57.57 ms │ 54.49 / 55.94 ±0.94 / 57.15 ms │ no change │
│ QQuery 22 │ 23.22 / 24.79 ±2.43 / 29.60 ms │ 23.26 / 24.71 ±2.01 / 28.63 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                             ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 813.22ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 811.13ms │
│ Average Time (HEAD)                           │  36.96ms │
│ Average Time (feat_adaptive-partial-agg-cost) │  36.87ms │
│ Queries Faster                                │        0 │
│ Queries Slower                                │        0 │
│ Queries with No Change                        │       22 │
│ Queries with Failure                          │        0 │
└───────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.1 GiB
CPU user 29.3s
CPU sys 2.3s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.1 GiB
CPU user 29.4s
CPU sys 2.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.69 / 6.21 ±0.85 / 7.90 ms │           5.57 / 6.05 ±0.88 / 7.81 ms │     no change │
│ QQuery 2  │        80.83 / 81.20 ±0.26 / 81.63 ms │        79.78 / 80.00 ±0.15 / 80.20 ms │     no change │
│ QQuery 3  │        28.69 / 29.07 ±0.28 / 29.45 ms │        28.49 / 28.68 ±0.27 / 29.22 ms │     no change │
│ QQuery 4  │    500.59 / 516.71 ±11.53 / 531.32 ms │     494.30 / 497.16 ±2.51 / 500.82 ms │     no change │
│ QQuery 5  │        50.79 / 51.39 ±0.44 / 52.03 ms │        50.15 / 50.75 ±0.38 / 51.33 ms │     no change │
│ QQuery 6  │        35.04 / 36.05 ±0.75 / 37.02 ms │        34.55 / 34.98 ±0.29 / 35.38 ms │     no change │
│ QQuery 7  │     107.85 / 109.47 ±2.22 / 113.82 ms │     106.61 / 108.60 ±2.14 / 112.72 ms │     no change │
│ QQuery 8  │        36.27 / 36.79 ±0.30 / 37.05 ms │        35.69 / 36.49 ±0.66 / 37.31 ms │     no change │
│ QQuery 9  │        53.42 / 54.41 ±1.19 / 56.62 ms │        51.69 / 53.99 ±2.77 / 59.20 ms │     no change │
│ QQuery 10 │        80.36 / 82.58 ±3.00 / 88.53 ms │        80.57 / 81.02 ±0.44 / 81.81 ms │     no change │
│ QQuery 11 │     310.13 / 314.92 ±4.68 / 322.90 ms │     307.98 / 312.56 ±4.72 / 320.99 ms │     no change │
│ QQuery 12 │        28.12 / 29.33 ±0.92 / 30.86 ms │        28.20 / 28.54 ±0.25 / 28.83 ms │     no change │
│ QQuery 13 │     125.22 / 126.30 ±1.45 / 129.07 ms │     124.74 / 128.12 ±3.97 / 135.91 ms │     no change │
│ QQuery 14 │     501.82 / 503.02 ±1.25 / 505.41 ms │     501.24 / 505.68 ±5.26 / 515.52 ms │     no change │
│ QQuery 15 │        60.44 / 62.26 ±2.12 / 66.09 ms │        59.84 / 61.25 ±1.43 / 63.59 ms │     no change │
│ QQuery 16 │           6.57 / 6.73 ±0.19 / 7.09 ms │           6.63 / 6.80 ±0.18 / 7.14 ms │     no change │
│ QQuery 17 │        80.24 / 81.18 ±0.64 / 82.06 ms │        80.14 / 81.14 ±0.94 / 82.91 ms │     no change │
│ QQuery 18 │     151.25 / 151.82 ±0.33 / 152.13 ms │     151.20 / 152.60 ±1.52 / 155.19 ms │     no change │
│ QQuery 19 │        40.88 / 41.12 ±0.34 / 41.78 ms │        40.45 / 40.80 ±0.29 / 41.29 ms │     no change │
│ QQuery 20 │        34.84 / 35.44 ±0.35 / 35.94 ms │        35.19 / 35.54 ±0.30 / 36.05 ms │     no change │
│ QQuery 21 │        16.66 / 16.87 ±0.20 / 17.19 ms │        16.65 / 16.83 ±0.14 / 17.08 ms │     no change │
│ QQuery 22 │        61.83 / 62.45 ±0.48 / 62.99 ms │        61.87 / 62.21 ±0.30 / 62.67 ms │     no change │
│ QQuery 23 │     476.85 / 483.02 ±5.46 / 492.18 ms │    468.45 / 483.79 ±11.59 / 499.59 ms │     no change │
│ QQuery 24 │     233.29 / 239.28 ±6.10 / 250.85 ms │     230.38 / 237.13 ±8.71 / 254.14 ms │     no change │
│ QQuery 25 │     113.86 / 117.88 ±2.65 / 121.07 ms │     112.53 / 115.81 ±3.53 / 121.57 ms │     no change │
│ QQuery 26 │        69.37 / 70.21 ±0.47 / 70.67 ms │        69.84 / 70.71 ±0.68 / 71.84 ms │     no change │
│ QQuery 27 │           6.43 / 6.53 ±0.11 / 6.72 ms │           6.55 / 6.92 ±0.51 / 7.92 ms │  1.06x slower │
│ QQuery 28 │        59.87 / 62.29 ±2.36 / 66.83 ms │        60.45 / 61.09 ±0.66 / 62.10 ms │     no change │
│ QQuery 29 │      99.04 / 101.42 ±1.84 / 103.60 ms │      98.79 / 102.37 ±5.30 / 112.86 ms │     no change │
│ QQuery 30 │        29.97 / 30.43 ±0.41 / 31.10 ms │        30.42 / 30.66 ±0.17 / 30.95 ms │     no change │
│ QQuery 31 │     110.20 / 114.78 ±4.57 / 122.92 ms │     110.46 / 113.36 ±4.43 / 122.07 ms │     no change │
│ QQuery 32 │        19.78 / 20.33 ±0.33 / 20.79 ms │        19.90 / 20.30 ±0.40 / 20.98 ms │     no change │
│ QQuery 33 │        37.80 / 38.19 ±0.26 / 38.52 ms │        38.04 / 38.63 ±0.44 / 39.13 ms │     no change │
│ QQuery 34 │           9.33 / 9.57 ±0.19 / 9.76 ms │          9.48 / 9.91 ±0.27 / 10.31 ms │     no change │
│ QQuery 35 │        79.90 / 80.37 ±0.38 / 80.89 ms │        80.43 / 81.43 ±0.64 / 82.16 ms │     no change │
│ QQuery 36 │          5.87 / 6.98 ±2.07 / 11.12 ms │           5.89 / 5.96 ±0.10 / 6.15 ms │ +1.17x faster │
│ QQuery 37 │           6.73 / 6.94 ±0.16 / 7.12 ms │           6.65 / 6.78 ±0.10 / 6.92 ms │     no change │
│ QQuery 38 │        68.52 / 70.33 ±1.89 / 73.35 ms │        68.20 / 69.72 ±1.75 / 73.14 ms │     no change │
│ QQuery 39 │        97.38 / 98.07 ±0.76 / 99.48 ms │        97.17 / 97.63 ±0.44 / 98.40 ms │     no change │
│ QQuery 40 │        22.46 / 23.73 ±2.12 / 27.96 ms │        22.20 / 22.60 ±0.30 / 22.89 ms │     no change │
│ QQuery 41 │        11.07 / 11.38 ±0.36 / 12.09 ms │        10.99 / 11.17 ±0.25 / 11.67 ms │     no change │
│ QQuery 42 │        23.99 / 24.26 ±0.20 / 24.50 ms │        23.72 / 24.51 ±0.86 / 26.16 ms │     no change │
│ QQuery 43 │           4.76 / 4.84 ±0.10 / 5.03 ms │           4.77 / 4.85 ±0.07 / 4.98 ms │     no change │
│ QQuery 44 │        10.43 / 10.56 ±0.12 / 10.72 ms │        10.43 / 10.71 ±0.25 / 11.13 ms │     no change │
│ QQuery 45 │        39.94 / 40.36 ±0.49 / 41.17 ms │        39.61 / 39.94 ±0.33 / 40.51 ms │     no change │
│ QQuery 46 │        12.77 / 13.15 ±0.21 / 13.34 ms │        12.52 / 13.36 ±0.42 / 13.61 ms │     no change │
│ QQuery 47 │     226.74 / 233.13 ±8.89 / 250.49 ms │     227.31 / 230.95 ±3.46 / 236.55 ms │     no change │
│ QQuery 48 │     102.05 / 106.28 ±4.17 / 111.49 ms │     101.92 / 103.36 ±2.02 / 107.35 ms │     no change │
│ QQuery 49 │        78.64 / 80.89 ±1.87 / 83.37 ms │        78.37 / 80.00 ±1.38 / 82.44 ms │     no change │
│ QQuery 50 │        58.89 / 59.76 ±0.56 / 60.58 ms │        59.32 / 59.72 ±0.32 / 60.17 ms │     no change │
│ QQuery 51 │        92.04 / 95.48 ±2.48 / 99.06 ms │       91.91 / 96.36 ±5.65 / 107.11 ms │     no change │
│ QQuery 52 │        23.81 / 24.20 ±0.33 / 24.60 ms │        23.61 / 24.09 ±0.39 / 24.61 ms │     no change │
│ QQuery 53 │        29.13 / 29.31 ±0.15 / 29.51 ms │        29.01 / 29.51 ±0.29 / 29.89 ms │     no change │
│ QQuery 54 │        53.40 / 53.77 ±0.21 / 54.05 ms │        53.66 / 53.94 ±0.17 / 54.13 ms │     no change │
│ QQuery 55 │        23.30 / 25.62 ±4.08 / 33.77 ms │        23.22 / 23.39 ±0.19 / 23.75 ms │ +1.10x faster │
│ QQuery 56 │        38.49 / 39.01 ±0.32 / 39.37 ms │        38.07 / 40.09 ±3.32 / 46.69 ms │     no change │
│ QQuery 57 │     174.13 / 177.35 ±3.45 / 184.01 ms │     175.44 / 177.54 ±1.52 / 179.63 ms │     no change │
│ QQuery 58 │     115.57 / 118.43 ±2.56 / 123.24 ms │     116.05 / 117.99 ±1.88 / 121.55 ms │     no change │
│ QQuery 59 │     117.34 / 119.86 ±2.19 / 123.46 ms │     116.99 / 118.34 ±1.58 / 120.71 ms │     no change │
│ QQuery 60 │        38.82 / 39.93 ±0.91 / 41.09 ms │        38.69 / 39.77 ±0.90 / 41.30 ms │     no change │
│ QQuery 61 │        12.47 / 12.73 ±0.24 / 13.18 ms │        12.52 / 12.67 ±0.23 / 13.12 ms │     no change │
│ QQuery 62 │        45.45 / 45.82 ±0.38 / 46.36 ms │        45.77 / 46.04 ±0.31 / 46.62 ms │     no change │
│ QQuery 63 │        29.03 / 29.19 ±0.14 / 29.38 ms │        30.00 / 30.57 ±0.64 / 31.75 ms │     no change │
│ QQuery 64 │     461.26 / 468.96 ±7.18 / 480.17 ms │     460.87 / 469.03 ±8.22 / 481.33 ms │     no change │
│ QQuery 65 │     148.14 / 151.92 ±2.80 / 156.85 ms │     146.66 / 149.70 ±2.17 / 153.07 ms │     no change │
│ QQuery 66 │        78.72 / 81.80 ±3.93 / 89.56 ms │        78.22 / 81.63 ±4.04 / 88.21 ms │     no change │
│ QQuery 67 │     244.59 / 250.31 ±4.30 / 255.38 ms │     242.26 / 250.24 ±4.60 / 256.67 ms │     no change │
│ QQuery 68 │        12.86 / 13.06 ±0.28 / 13.62 ms │        12.93 / 13.08 ±0.16 / 13.39 ms │     no change │
│ QQuery 69 │        76.48 / 78.95 ±4.31 / 87.55 ms │        76.63 / 76.93 ±0.26 / 77.31 ms │     no change │
│ QQuery 70 │     107.69 / 112.59 ±7.99 / 128.43 ms │     103.84 / 111.14 ±9.06 / 128.66 ms │     no change │
│ QQuery 71 │        35.18 / 35.49 ±0.27 / 35.96 ms │        35.60 / 35.89 ±0.30 / 36.31 ms │     no change │
│ QQuery 72 │ 2084.93 / 2122.22 ±31.46 / 2166.29 ms │ 2099.10 / 2158.79 ±62.51 / 2267.70 ms │     no change │
│ QQuery 73 │           9.06 / 9.31 ±0.27 / 9.73 ms │           8.86 / 9.20 ±0.31 / 9.71 ms │     no change │
│ QQuery 74 │     176.37 / 178.86 ±3.62 / 185.74 ms │     177.09 / 183.19 ±5.07 / 191.91 ms │     no change │
│ QQuery 75 │     145.59 / 150.84 ±8.40 / 167.59 ms │     145.68 / 147.81 ±1.53 / 150.16 ms │     no change │
│ QQuery 76 │        34.94 / 35.84 ±0.73 / 36.89 ms │        35.60 / 36.31 ±0.57 / 37.35 ms │     no change │
│ QQuery 77 │        59.94 / 60.85 ±0.71 / 61.88 ms │        59.97 / 60.32 ±0.30 / 60.68 ms │     no change │
│ QQuery 78 │     188.52 / 192.03 ±2.07 / 194.08 ms │     189.65 / 192.23 ±2.24 / 194.87 ms │     no change │
│ QQuery 79 │        66.57 / 66.98 ±0.38 / 67.62 ms │        66.70 / 67.16 ±0.27 / 67.47 ms │     no change │
│ QQuery 80 │     100.15 / 105.26 ±4.47 / 112.23 ms │      99.03 / 102.37 ±3.96 / 110.06 ms │     no change │
│ QQuery 81 │        23.85 / 24.21 ±0.44 / 25.04 ms │        23.98 / 24.26 ±0.17 / 24.51 ms │     no change │
│ QQuery 82 │        16.18 / 16.65 ±0.59 / 17.68 ms │        16.00 / 16.17 ±0.15 / 16.41 ms │     no change │
│ QQuery 83 │        35.63 / 36.22 ±0.31 / 36.49 ms │        36.26 / 36.52 ±0.17 / 36.70 ms │     no change │
│ QQuery 84 │        42.65 / 44.71 ±3.11 / 50.86 ms │        42.69 / 44.74 ±3.38 / 51.48 ms │     no change │
│ QQuery 85 │     135.95 / 138.40 ±3.25 / 144.82 ms │     135.61 / 138.68 ±4.15 / 146.73 ms │     no change │
│ QQuery 86 │        24.42 / 24.79 ±0.40 / 25.39 ms │        24.74 / 25.06 ±0.19 / 25.30 ms │     no change │
│ QQuery 87 │        69.63 / 71.96 ±2.23 / 74.70 ms │        69.15 / 70.57 ±1.36 / 73.02 ms │     no change │
│ QQuery 88 │        60.78 / 61.37 ±0.42 / 62.02 ms │        60.70 / 61.54 ±0.64 / 62.63 ms │     no change │
│ QQuery 89 │        35.53 / 35.70 ±0.19 / 35.95 ms │        35.20 / 35.66 ±0.36 / 36.22 ms │     no change │
│ QQuery 90 │        16.66 / 16.91 ±0.24 / 17.24 ms │        16.77 / 16.90 ±0.13 / 17.14 ms │     no change │
│ QQuery 91 │        51.28 / 52.54 ±2.18 / 56.89 ms │        51.03 / 52.40 ±1.74 / 55.83 ms │     no change │
│ QQuery 92 │        28.90 / 31.42 ±2.28 / 35.53 ms │        29.22 / 31.20 ±1.95 / 34.94 ms │     no change │
│ QQuery 93 │        49.98 / 51.85 ±1.00 / 53.00 ms │        49.70 / 51.34 ±1.77 / 53.84 ms │     no change │
│ QQuery 94 │        37.04 / 37.56 ±0.51 / 38.42 ms │        37.76 / 38.59 ±0.84 / 40.04 ms │     no change │
│ QQuery 95 │        85.56 / 86.58 ±1.40 / 89.29 ms │        83.75 / 86.84 ±3.71 / 94.11 ms │     no change │
│ QQuery 96 │        23.69 / 24.66 ±1.36 / 27.31 ms │        23.95 / 24.31 ±0.38 / 25.01 ms │     no change │
│ QQuery 97 │        46.00 / 46.87 ±0.63 / 47.74 ms │        46.16 / 46.63 ±0.50 / 47.56 ms │     no change │
│ QQuery 98 │        42.14 / 42.77 ±0.39 / 43.13 ms │        42.27 / 43.00 ±0.60 / 44.06 ms │     no change │
│ QQuery 99 │        69.28 / 70.49 ±1.76 / 73.96 ms │        69.51 / 71.14 ±1.98 / 75.02 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 10441.92ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 10434.02ms │
│ Average Time (HEAD)                           │   105.47ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   105.39ms │
│ Queries Faster                                │          2 │
│ Queries Slower                                │          1 │
│ Queries with No Change                        │         96 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 7.0 GiB
Avg memory 6.2 GiB
CPU user 233.0s
CPU sys 6.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 7.0 GiB
Avg memory 6.2 GiB
CPU user 231.2s
CPU sys 6.7s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.80 ±7.08 / 18.95 ms │          1.16 / 4.80 ±7.15 / 19.10 ms │     no change │
│ QQuery 1  │        12.56 / 13.03 ±0.25 / 13.34 ms │        13.00 / 13.18 ±0.18 / 13.45 ms │     no change │
│ QQuery 2  │        35.73 / 36.00 ±0.26 / 36.31 ms │        35.67 / 35.99 ±0.34 / 36.66 ms │     no change │
│ QQuery 3  │        31.29 / 31.88 ±0.58 / 32.63 ms │        30.53 / 31.05 ±0.61 / 32.17 ms │     no change │
│ QQuery 4  │     230.72 / 233.14 ±3.95 / 240.96 ms │     227.56 / 229.84 ±2.71 / 234.91 ms │     no change │
│ QQuery 5  │     274.49 / 277.53 ±3.17 / 283.12 ms │     270.34 / 275.39 ±4.50 / 281.58 ms │     no change │
│ QQuery 6  │           1.19 / 1.34 ±0.21 / 1.75 ms │           1.20 / 1.35 ±0.23 / 1.80 ms │     no change │
│ QQuery 7  │        13.87 / 13.99 ±0.09 / 14.13 ms │        14.20 / 14.35 ±0.08 / 14.43 ms │     no change │
│ QQuery 8  │     320.92 / 332.10 ±7.63 / 343.31 ms │     327.08 / 331.10 ±2.65 / 334.46 ms │     no change │
│ QQuery 9  │     463.36 / 471.01 ±5.64 / 478.48 ms │     458.43 / 469.85 ±9.48 / 481.75 ms │     no change │
│ QQuery 10 │        70.52 / 71.91 ±0.96 / 73.39 ms │        70.25 / 70.60 ±0.39 / 71.30 ms │     no change │
│ QQuery 11 │        82.74 / 84.22 ±1.05 / 85.93 ms │       81.89 / 87.09 ±8.18 / 103.36 ms │     no change │
│ QQuery 12 │     271.63 / 280.45 ±7.02 / 290.99 ms │     275.32 / 281.85 ±8.95 / 298.86 ms │     no change │
│ QQuery 13 │     368.93 / 373.66 ±4.15 / 380.24 ms │    365.86 / 377.16 ±12.70 / 400.92 ms │     no change │
│ QQuery 14 │     285.55 / 289.29 ±3.88 / 294.91 ms │     286.46 / 295.28 ±9.43 / 309.74 ms │     no change │
│ QQuery 15 │     270.25 / 279.15 ±8.64 / 293.85 ms │     274.36 / 281.75 ±5.05 / 289.34 ms │     no change │
│ QQuery 16 │    624.97 / 635.44 ±10.36 / 654.00 ms │    621.12 / 636.06 ±12.79 / 654.49 ms │     no change │
│ QQuery 17 │     623.22 / 633.20 ±8.74 / 648.05 ms │     630.76 / 640.15 ±6.09 / 649.66 ms │     no change │
│ QQuery 18 │ 1279.79 / 1295.53 ±13.61 / 1319.20 ms │ 1276.73 / 1304.89 ±23.58 / 1344.53 ms │     no change │
│ QQuery 19 │        27.84 / 28.12 ±0.17 / 28.35 ms │        27.75 / 29.55 ±2.93 / 35.40 ms │  1.05x slower │
│ QQuery 20 │     519.78 / 530.47 ±8.35 / 540.19 ms │    521.10 / 539.50 ±22.27 / 582.40 ms │     no change │
│ QQuery 21 │     597.22 / 606.87 ±8.82 / 623.45 ms │     591.32 / 596.23 ±3.36 / 600.06 ms │     no change │
│ QQuery 22 │ 1069.17 / 1085.88 ±13.18 / 1107.23 ms │  1057.23 / 1070.97 ±8.94 / 1081.82 ms │     no change │
│ QQuery 23 │ 3208.18 / 3242.68 ±34.91 / 3306.50 ms │ 3223.54 / 3255.70 ±32.83 / 3308.97 ms │     no change │
│ QQuery 24 │        41.54 / 44.74 ±3.94 / 52.37 ms │        42.27 / 42.83 ±0.58 / 43.60 ms │     no change │
│ QQuery 25 │     111.91 / 114.57 ±3.43 / 121.07 ms │     112.35 / 120.26 ±9.19 / 136.45 ms │     no change │
│ QQuery 26 │        41.78 / 44.03 ±2.47 / 47.12 ms │        42.18 / 43.54 ±1.87 / 47.13 ms │     no change │
│ QQuery 27 │     665.69 / 677.49 ±6.71 / 685.92 ms │     676.64 / 680.14 ±1.79 / 681.51 ms │     no change │
│ QQuery 28 │ 3043.86 / 3055.40 ±12.53 / 3075.89 ms │ 3037.03 / 3097.03 ±31.69 / 3123.80 ms │     no change │
│ QQuery 29 │        40.18 / 43.57 ±6.42 / 56.40 ms │        40.62 / 51.11 ±7.24 / 61.69 ms │  1.17x slower │
│ QQuery 30 │     305.42 / 313.51 ±8.10 / 328.13 ms │    304.81 / 314.78 ±10.29 / 333.04 ms │     no change │
│ QQuery 31 │     283.03 / 296.49 ±8.33 / 303.47 ms │     295.36 / 303.01 ±9.26 / 320.79 ms │     no change │
│ QQuery 32 │    947.47 / 976.58 ±17.85 / 994.29 ms │  993.88 / 1012.02 ±20.18 / 1050.88 ms │     no change │
│ QQuery 33 │ 1444.17 / 1489.81 ±25.95 / 1516.39 ms │ 1476.61 / 1527.79 ±73.16 / 1672.16 ms │     no change │
│ QQuery 34 │ 1465.36 / 1524.20 ±37.69 / 1576.52 ms │ 1472.71 / 1515.88 ±37.73 / 1577.76 ms │     no change │
│ QQuery 35 │    286.76 / 326.19 ±50.34 / 411.58 ms │   275.63 / 342.34 ±119.71 / 581.36 ms │     no change │
│ QQuery 36 │        66.67 / 67.93 ±0.74 / 68.98 ms │      66.66 / 78.10 ±17.09 / 112.06 ms │  1.15x slower │
│ QQuery 37 │        35.86 / 38.56 ±3.06 / 43.92 ms │        35.41 / 38.65 ±5.04 / 48.68 ms │     no change │
│ QQuery 38 │        43.76 / 49.33 ±5.17 / 55.49 ms │        40.76 / 44.04 ±3.23 / 50.07 ms │ +1.12x faster │
│ QQuery 39 │     143.61 / 151.16 ±6.41 / 161.30 ms │    131.50 / 151.92 ±17.75 / 184.89 ms │     no change │
│ QQuery 40 │        13.89 / 16.16 ±3.80 / 23.69 ms │        13.94 / 16.60 ±3.45 / 23.31 ms │     no change │
│ QQuery 41 │        13.84 / 17.03 ±3.84 / 23.57 ms │        13.48 / 13.58 ±0.10 / 13.78 ms │ +1.25x faster │
│ QQuery 42 │        12.92 / 13.36 ±0.25 / 13.69 ms │        13.13 / 14.39 ±2.31 / 19.01 ms │  1.08x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20111.80ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 20281.69ms │
│ Average Time (HEAD)                           │   467.72ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   471.67ms │
│ Queries Faster                                │          2 │
│ Queries Slower                                │          4 │
│ Queries with No Change                        │         37 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.3 GiB
Avg memory 23.0 GiB
CPU user 1034.6s
CPU sys 75.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 29.6 GiB
Avg memory 22.8 GiB
CPU user 1035.2s
CPU sys 77.3s
Peak spill 0 B

File an issue against this benchmark runner

Temporarily set `skip_partial_aggregation_use_cost_model` default = true
so the benchmark bot actually exercises the new code path.

**Revert this commit before merge** — final default should remain false
(opt-in) until ClickBench-wide validation tunes the constants.

Regenerated:
- docs/source/user-guide/configs.md
- datafusion/sqllogictest/test_files/information_schema.slt (SHOW ALL
  added the new 3 config rows; CI was failing on the stale expectation).
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 26, 2026
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541260123-323-xxlv8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (9940d8a) to a87bdc9 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.66 ±6.87 / 18.39 ms │          1.17 / 4.71 ±6.98 / 18.67 ms │     no change │
│ QQuery 1  │        12.11 / 12.66 ±0.40 / 13.34 ms │        12.37 / 12.59 ±0.27 / 13.10 ms │     no change │
│ QQuery 2  │        35.62 / 36.24 ±0.65 / 37.45 ms │        35.49 / 35.78 ±0.16 / 35.98 ms │     no change │
│ QQuery 3  │        30.64 / 31.62 ±1.14 / 33.84 ms │        30.32 / 30.77 ±0.29 / 31.21 ms │     no change │
│ QQuery 4  │     227.20 / 230.38 ±2.41 / 234.37 ms │     222.94 / 226.83 ±3.57 / 233.43 ms │     no change │
│ QQuery 5  │     271.83 / 275.47 ±2.94 / 280.03 ms │     269.37 / 273.46 ±2.57 / 276.56 ms │     no change │
│ QQuery 6  │           1.19 / 1.34 ±0.22 / 1.77 ms │           1.19 / 1.32 ±0.22 / 1.76 ms │     no change │
│ QQuery 7  │        13.71 / 13.88 ±0.18 / 14.13 ms │        13.30 / 13.41 ±0.08 / 13.51 ms │     no change │
│ QQuery 8  │     329.76 / 331.68 ±2.28 / 335.63 ms │     320.58 / 323.23 ±1.84 / 325.82 ms │     no change │
│ QQuery 9  │     457.29 / 467.02 ±8.02 / 476.93 ms │     459.86 / 462.41 ±4.36 / 471.10 ms │     no change │
│ QQuery 10 │        68.41 / 69.25 ±0.75 / 70.20 ms │        68.88 / 73.17 ±7.10 / 87.32 ms │  1.06x slower │
│ QQuery 11 │        80.82 / 84.18 ±4.08 / 92.00 ms │        80.40 / 82.50 ±1.40 / 83.92 ms │     no change │
│ QQuery 12 │     263.92 / 269.27 ±3.83 / 274.31 ms │     266.76 / 271.92 ±5.82 / 280.78 ms │     no change │
│ QQuery 13 │    365.37 / 382.02 ±15.90 / 408.59 ms │     365.56 / 379.48 ±8.66 / 390.42 ms │     no change │
│ QQuery 14 │     283.30 / 286.17 ±2.28 / 288.71 ms │     279.47 / 285.81 ±7.17 / 299.60 ms │     no change │
│ QQuery 15 │     262.42 / 271.00 ±5.10 / 277.78 ms │     268.65 / 279.08 ±7.92 / 290.13 ms │     no change │
│ QQuery 16 │     615.17 / 625.07 ±9.48 / 641.80 ms │    611.76 / 631.22 ±16.98 / 661.78 ms │     no change │
│ QQuery 17 │     619.71 / 625.56 ±8.74 / 642.97 ms │     625.45 / 631.47 ±4.29 / 636.68 ms │     no change │
│ QQuery 18 │ 1254.71 / 1271.82 ±14.17 / 1289.43 ms │ 1260.04 / 1278.14 ±13.83 / 1299.57 ms │     no change │
│ QQuery 19 │       27.17 / 34.12 ±13.44 / 61.00 ms │       27.18 / 34.39 ±11.31 / 56.68 ms │     no change │
│ QQuery 20 │     515.73 / 523.76 ±7.69 / 537.00 ms │    517.16 / 534.85 ±20.78 / 573.46 ms │     no change │
│ QQuery 21 │     593.22 / 597.98 ±6.11 / 609.80 ms │     590.21 / 599.83 ±8.18 / 612.11 ms │     no change │
│ QQuery 22 │ 1049.47 / 1074.97 ±18.58 / 1105.08 ms │ 1060.70 / 1073.81 ±10.96 / 1093.47 ms │     no change │
│ QQuery 23 │ 3185.14 / 3212.31 ±29.43 / 3266.66 ms │ 3159.45 / 3220.88 ±34.99 / 3259.77 ms │     no change │
│ QQuery 24 │        41.50 / 47.78 ±7.57 / 58.18 ms │        41.66 / 44.28 ±3.70 / 51.55 ms │ +1.08x faster │
│ QQuery 25 │     111.04 / 116.84 ±5.41 / 126.36 ms │     111.56 / 113.12 ±2.04 / 117.15 ms │     no change │
│ QQuery 26 │        41.22 / 42.34 ±0.93 / 43.91 ms │        42.26 / 42.71 ±0.35 / 43.28 ms │     no change │
│ QQuery 27 │     667.50 / 675.99 ±7.09 / 685.02 ms │     662.14 / 669.14 ±6.00 / 677.25 ms │     no change │
│ QQuery 28 │ 3019.99 / 3044.77 ±20.05 / 3070.30 ms │ 2997.70 / 3043.17 ±32.76 / 3081.45 ms │     no change │
│ QQuery 29 │        40.18 / 48.94 ±7.40 / 59.79 ms │        39.80 / 41.62 ±2.80 / 47.11 ms │ +1.18x faster │
│ QQuery 30 │     295.45 / 299.74 ±3.70 / 305.38 ms │    295.92 / 310.48 ±17.60 / 345.04 ms │     no change │
│ QQuery 31 │     284.34 / 288.35 ±2.13 / 290.15 ms │     284.38 / 294.03 ±6.71 / 305.24 ms │     no change │
│ QQuery 32 │    926.32 / 944.81 ±19.90 / 983.52 ms │    943.96 / 964.63 ±15.25 / 983.83 ms │     no change │
│ QQuery 33 │  1464.77 / 1473.13 ±8.13 / 1484.35 ms │ 1411.87 / 1492.71 ±65.30 / 1604.09 ms │     no change │
│ QQuery 34 │ 1464.68 / 1502.56 ±29.24 / 1550.43 ms │ 1465.93 / 1504.05 ±25.18 / 1531.27 ms │     no change │
│ QQuery 35 │    278.82 / 298.27 ±22.23 / 331.14 ms │    277.81 / 290.38 ±14.92 / 314.50 ms │     no change │
│ QQuery 36 │        65.58 / 69.21 ±2.74 / 73.80 ms │        64.14 / 67.96 ±2.90 / 71.62 ms │     no change │
│ QQuery 37 │        35.42 / 40.44 ±5.75 / 49.47 ms │        35.35 / 35.88 ±0.50 / 36.82 ms │ +1.13x faster │
│ QQuery 38 │        43.94 / 45.82 ±2.49 / 50.37 ms │        40.51 / 47.10 ±4.69 / 52.57 ms │     no change │
│ QQuery 39 │    132.11 / 149.66 ±13.98 / 172.02 ms │    135.15 / 146.62 ±11.14 / 166.19 ms │     no change │
│ QQuery 40 │        13.47 / 15.21 ±2.28 / 19.63 ms │        13.44 / 13.91 ±0.31 / 14.41 ms │ +1.09x faster │
│ QQuery 41 │        13.16 / 14.63 ±2.46 / 19.50 ms │        13.18 / 14.60 ±2.54 / 19.67 ms │     no change │
│ QQuery 42 │        12.54 / 12.86 ±0.47 / 13.79 ms │        12.62 / 13.90 ±2.37 / 18.64 ms │  1.08x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 19863.78ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19911.35ms │
│ Average Time (HEAD)                           │   461.95ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   463.05ms │
│ Queries Faster                                │          4 │
│ Queries Slower                                │          2 │
│ Queries with No Change                        │         37 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 30.5 GiB
Avg memory 23.2 GiB
CPU user 1022.1s
CPU sys 74.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 29.4 GiB
Avg memory 22.9 GiB
CPU user 1021.1s
CPU sys 75.3s
Peak spill 0 B

File an issue against this benchmark runner

The 1000 ns/row threshold was wishful thinking. Re-derived per-row cost
on the benchmark hardware (Neoverse-V2 ARM): for ClickBench Q18, partial
agg costs ~100-200 ns/row, well below 1000. M-series MBP from the issue
report is similar (~170 ns/row reading back from the 17 s / 100 M figure).

100 ns/row is roughly the floor of a hash-table probe + insert on
modern CPUs, so anything above that is in the "meaningful per-row
overhead" band where partial agg can plausibly be net-negative.

The 0.3 cost_min_ratio guard keeps low-cardinality / high-reduction
queries (like ClickBench Q35) safe — they sit below 0.3 and never enter
this branch regardless of per-row cost.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541539131-325-p7crj 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (5c10375) to bdf8a6d (merge-base) diff
BENCH_NAME=clickbench
BENCH_COMMAND=cargo bench --features=parquet --bench clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    unions_to_filter
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541540268-326-4gjvk 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (5c10375) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.17 / 4.64 ±6.87 / 18.37 ms │          1.14 / 4.69 ±7.01 / 18.72 ms │     no change │
│ QQuery 1  │        12.57 / 12.85 ±0.16 / 13.00 ms │        12.09 / 12.54 ±0.26 / 12.78 ms │     no change │
│ QQuery 2  │        37.03 / 37.20 ±0.18 / 37.53 ms │        35.75 / 36.03 ±0.23 / 36.41 ms │     no change │
│ QQuery 3  │        30.85 / 31.60 ±0.93 / 33.35 ms │        30.74 / 31.05 ±0.30 / 31.60 ms │     no change │
│ QQuery 4  │     227.65 / 230.17 ±2.08 / 233.95 ms │     225.91 / 227.86 ±2.17 / 231.70 ms │     no change │
│ QQuery 5  │     273.28 / 279.27 ±4.45 / 285.20 ms │     269.83 / 273.50 ±3.72 / 280.59 ms │     no change │
│ QQuery 6  │           1.22 / 1.47 ±0.26 / 1.80 ms │           1.22 / 1.37 ±0.23 / 1.83 ms │ +1.07x faster │
│ QQuery 7  │        13.93 / 14.08 ±0.17 / 14.41 ms │        13.70 / 13.89 ±0.14 / 14.05 ms │     no change │
│ QQuery 8  │     323.04 / 325.35 ±2.45 / 330.10 ms │     320.40 / 325.93 ±4.77 / 334.71 ms │     no change │
│ QQuery 9  │    450.31 / 466.46 ±10.47 / 478.59 ms │    450.46 / 461.66 ±17.07 / 495.63 ms │     no change │
│ QQuery 10 │        69.27 / 72.20 ±4.02 / 80.10 ms │        70.69 / 71.70 ±1.03 / 73.57 ms │     no change │
│ QQuery 11 │        80.56 / 81.57 ±0.67 / 82.46 ms │        80.07 / 84.42 ±5.43 / 94.91 ms │     no change │
│ QQuery 12 │     267.57 / 270.51 ±2.38 / 274.48 ms │     264.54 / 268.83 ±4.54 / 276.93 ms │     no change │
│ QQuery 13 │     371.41 / 380.29 ±9.89 / 397.58 ms │    370.16 / 398.00 ±18.35 / 426.17 ms │     no change │
│ QQuery 14 │     282.14 / 287.37 ±3.30 / 291.93 ms │    288.25 / 310.29 ±14.60 / 327.27 ms │  1.08x slower │
│ QQuery 15 │     270.70 / 278.36 ±6.38 / 288.45 ms │    266.58 / 281.82 ±12.26 / 299.07 ms │     no change │
│ QQuery 16 │     613.02 / 623.15 ±5.36 / 629.05 ms │     615.78 / 630.35 ±7.83 / 638.39 ms │     no change │
│ QQuery 17 │     646.19 / 661.53 ±9.41 / 675.22 ms │    620.25 / 631.96 ±13.46 / 657.52 ms │     no change │
│ QQuery 18 │ 1310.08 / 1349.11 ±24.28 / 1384.90 ms │ 1271.26 / 1293.91 ±17.43 / 1322.39 ms │     no change │
│ QQuery 19 │        28.52 / 32.05 ±6.24 / 44.50 ms │        27.39 / 27.82 ±0.43 / 28.60 ms │ +1.15x faster │
│ QQuery 20 │    530.38 / 540.05 ±11.43 / 554.22 ms │     519.27 / 530.21 ±8.23 / 541.81 ms │     no change │
│ QQuery 21 │     604.04 / 609.78 ±5.84 / 620.41 ms │     597.20 / 599.85 ±2.71 / 605.06 ms │     no change │
│ QQuery 22 │ 1103.33 / 1116.63 ±11.59 / 1137.61 ms │ 1091.96 / 1108.66 ±13.79 / 1127.50 ms │     no change │
│ QQuery 23 │ 3200.29 / 3271.15 ±71.17 / 3407.07 ms │ 3310.05 / 3367.10 ±36.94 / 3412.51 ms │     no change │
│ QQuery 24 │        41.05 / 41.55 ±0.63 / 42.76 ms │       41.18 / 51.62 ±14.40 / 78.70 ms │  1.24x slower │
│ QQuery 25 │     111.61 / 116.72 ±9.73 / 136.18 ms │     110.50 / 113.22 ±2.96 / 118.92 ms │     no change │
│ QQuery 26 │        41.53 / 42.56 ±0.72 / 43.54 ms │        41.65 / 41.95 ±0.46 / 42.86 ms │     no change │
│ QQuery 27 │    672.74 / 697.87 ±14.09 / 715.08 ms │    679.94 / 697.75 ±10.25 / 706.93 ms │     no change │
│ QQuery 28 │ 3023.10 / 3038.62 ±16.21 / 3062.56 ms │ 3023.12 / 3071.95 ±35.89 / 3127.00 ms │     no change │
│ QQuery 29 │        40.30 / 43.55 ±4.63 / 52.76 ms │       40.11 / 48.64 ±11.33 / 69.02 ms │  1.12x slower │
│ QQuery 30 │    304.68 / 312.92 ±10.30 / 330.27 ms │    297.11 / 324.34 ±35.45 / 393.61 ms │     no change │
│ QQuery 31 │     286.68 / 293.91 ±5.04 / 302.16 ms │     280.26 / 288.72 ±5.95 / 294.24 ms │     no change │
│ QQuery 32 │  983.32 / 1024.54 ±22.57 / 1048.63 ms │   937.33 / 969.48 ±24.40 / 1010.79 ms │ +1.06x faster │
│ QQuery 33 │ 1565.29 / 1596.64 ±27.90 / 1639.47 ms │ 1431.27 / 1476.56 ±25.80 / 1504.78 ms │ +1.08x faster │
│ QQuery 34 │ 1525.52 / 1630.70 ±62.17 / 1692.56 ms │ 1544.37 / 1603.14 ±45.23 / 1664.96 ms │     no change │
│ QQuery 35 │    278.85 / 322.55 ±31.11 / 368.30 ms │    308.06 / 339.70 ±40.72 / 417.13 ms │  1.05x slower │
│ QQuery 36 │        65.33 / 73.74 ±9.53 / 91.89 ms │      76.87 / 87.31 ±14.52 / 115.14 ms │  1.18x slower │
│ QQuery 37 │        35.73 / 39.83 ±3.83 / 46.46 ms │        36.80 / 42.37 ±4.10 / 47.24 ms │  1.06x slower │
│ QQuery 38 │        39.54 / 44.81 ±5.26 / 54.67 ms │        41.68 / 43.22 ±1.33 / 44.89 ms │     no change │
│ QQuery 39 │     154.67 / 164.56 ±7.95 / 173.78 ms │     126.41 / 129.86 ±4.88 / 139.52 ms │ +1.27x faster │
│ QQuery 40 │        15.05 / 18.34 ±4.65 / 27.47 ms │        14.72 / 17.67 ±3.21 / 22.81 ms │     no change │
│ QQuery 41 │        14.85 / 14.96 ±0.13 / 15.20 ms │        13.90 / 14.54 ±0.49 / 15.28 ms │     no change │
│ QQuery 42 │        13.99 / 14.15 ±0.14 / 14.40 ms │        13.58 / 13.77 ±0.13 / 13.98 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20509.34ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 20369.26ms │
│ Average Time (HEAD)                           │   476.96ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   473.70ms │
│ Queries Faster                                │          5 │
│ Queries Slower                                │          6 │
│ Queries with No Change                        │         32 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.8 GiB
Avg memory 23.0 GiB
CPU user 1049.2s
CPU sys 81.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.0 GiB
Avg memory 23.1 GiB
CPU user 1040.6s
CPU sys 81.4s
Peak spill 0 B

File an issue against this benchmark runner

Round 1 (cost_ns_per_row > 1000) didn't fire on Q18 because partial agg
per-row cost at probe close (first 100k rows, hash table still small)
is ~100-200 ns on the ARM bot, not 1000+. Lowering to 100 didn't help
either — the measured cost at probe time underestimates the eventual
asymptotic cost, so a single-shot probe-time threshold is fundamentally
fragile.

Pivot:

- Drop `skip_partial_aggregation_cost_ns_per_row` entirely from the
  decision. Rule 2 is now a pure ratio check: skip when
  `ratio >= cost_min_ratio` (default 0.5).
- This matches the empirical finding in the issue body: ratio_threshold
  = 0.6 makes Q18 1.73× faster on M-series. 0.5 is conservative around
  that — the 0.3 cost_min_ratio guard from before is gone.
- Add two diagnostic gauges (always recorded, regardless of which rule
  fires):
    * `partial_agg_probe_ns_per_row` — measured per-row wall time
    * `partial_agg_probe_ratio_per_mille` — ratio × 1000
  EXPLAIN ANALYZE shows these so we can revisit a real cost-aware rule
  later with actual numbers instead of guessing thresholds.

Why keep `use_cost_model` as the flag name even though it isn't
cost-aware anymore: the gauges (the basis for a future cost-aware
rule) ride alongside, and we want a single opt-in surface that
graduates from "lower ratio threshold" to "cost-aware" without
churning configs.

Unit tests rewritten to match: 5 tests covering off/on, fires/doesn't
fire, fixed-rule precedence, and gauge recording.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4542914270-327-27lk2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (e6a98fe) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.16 / 4.57 ±6.78 / 18.14 ms │          1.18 / 4.83 ±7.14 / 19.10 ms │  1.06x slower │
│ QQuery 1  │        12.21 / 12.49 ±0.16 / 12.66 ms │        12.75 / 13.20 ±0.47 / 14.09 ms │  1.06x slower │
│ QQuery 2  │        35.25 / 35.59 ±0.27 / 35.97 ms │        35.83 / 36.22 ±0.36 / 36.65 ms │     no change │
│ QQuery 3  │        30.31 / 30.87 ±0.59 / 32.01 ms │        31.42 / 31.95 ±0.53 / 32.95 ms │     no change │
│ QQuery 4  │     219.39 / 225.19 ±3.24 / 228.79 ms │     240.95 / 243.12 ±1.69 / 245.59 ms │  1.08x slower │
│ QQuery 5  │     268.06 / 272.04 ±2.41 / 275.41 ms │     279.27 / 283.91 ±2.34 / 285.52 ms │     no change │
│ QQuery 6  │           1.20 / 1.35 ±0.23 / 1.80 ms │           1.24 / 1.40 ±0.23 / 1.84 ms │     no change │
│ QQuery 7  │        13.54 / 13.95 ±0.28 / 14.36 ms │        13.78 / 15.16 ±2.09 / 19.32 ms │  1.09x slower │
│ QQuery 8  │     320.27 / 323.82 ±2.18 / 327.01 ms │     327.00 / 333.26 ±7.23 / 346.93 ms │     no change │
│ QQuery 9  │    455.85 / 490.87 ±28.34 / 528.55 ms │    448.66 / 505.40 ±35.71 / 557.69 ms │     no change │
│ QQuery 10 │        74.19 / 75.97 ±1.30 / 77.27 ms │      72.78 / 80.30 ±12.72 / 105.64 ms │  1.06x slower │
│ QQuery 11 │        87.51 / 87.84 ±0.29 / 88.30 ms │       85.61 / 89.89 ±6.64 / 103.04 ms │     no change │
│ QQuery 12 │    266.72 / 276.26 ±11.91 / 299.36 ms │    238.28 / 255.37 ±14.18 / 274.95 ms │ +1.08x faster │
│ QQuery 13 │    365.79 / 378.86 ±11.86 / 394.87 ms │     357.01 / 367.08 ±9.25 / 384.62 ms │     no change │
│ QQuery 14 │     285.98 / 289.99 ±3.39 / 294.76 ms │     251.36 / 258.40 ±5.34 / 265.23 ms │ +1.12x faster │
│ QQuery 15 │     272.40 / 277.74 ±4.46 / 283.63 ms │     267.56 / 268.95 ±1.36 / 271.36 ms │     no change │
│ QQuery 16 │     609.47 / 625.38 ±9.58 / 639.53 ms │    617.01 / 635.93 ±14.36 / 659.85 ms │     no change │
│ QQuery 17 │     614.82 / 627.93 ±8.23 / 640.38 ms │     628.89 / 641.52 ±7.56 / 652.57 ms │     no change │
│ QQuery 18 │ 1230.87 / 1274.09 ±30.24 / 1307.89 ms │  977.51 / 1007.37 ±22.96 / 1034.98 ms │ +1.26x faster │
│ QQuery 19 │       27.46 / 41.97 ±17.09 / 63.90 ms │       27.24 / 36.81 ±17.95 / 72.70 ms │ +1.14x faster │
│ QQuery 20 │     521.25 / 529.50 ±7.85 / 542.79 ms │     515.49 / 520.95 ±4.52 / 529.07 ms │     no change │
│ QQuery 21 │     604.44 / 610.33 ±3.75 / 615.01 ms │     595.09 / 599.02 ±3.18 / 604.29 ms │     no change │
│ QQuery 22 │  1088.08 / 1094.56 ±8.79 / 1111.69 ms │ 1054.27 / 1081.70 ±18.59 / 1108.66 ms │     no change │
│ QQuery 23 │ 3232.06 / 3315.37 ±70.31 / 3424.33 ms │ 3214.09 / 3306.99 ±65.15 / 3393.77 ms │     no change │
│ QQuery 24 │       42.01 / 49.96 ±14.57 / 79.06 ms │        42.35 / 46.30 ±5.99 / 58.17 ms │ +1.08x faster │
│ QQuery 25 │     111.44 / 113.72 ±2.92 / 119.38 ms │     114.20 / 116.05 ±1.48 / 118.40 ms │     no change │
│ QQuery 26 │        41.17 / 41.69 ±0.43 / 42.20 ms │        42.29 / 44.31 ±2.54 / 49.20 ms │  1.06x slower │
│ QQuery 27 │     671.11 / 677.84 ±6.58 / 687.67 ms │    664.65 / 683.54 ±10.80 / 697.73 ms │     no change │
│ QQuery 28 │ 3020.49 / 3054.32 ±38.05 / 3124.71 ms │ 3040.59 / 3090.00 ±30.44 / 3131.64 ms │     no change │
│ QQuery 29 │       39.92 / 51.22 ±10.16 / 66.62 ms │        39.69 / 40.92 ±0.72 / 41.82 ms │ +1.25x faster │
│ QQuery 30 │     295.09 / 303.53 ±9.72 / 322.49 ms │     265.14 / 276.42 ±8.52 / 290.79 ms │ +1.10x faster │
│ QQuery 31 │     281.68 / 289.25 ±5.94 / 297.88 ms │    275.66 / 292.52 ±10.91 / 308.29 ms │     no change │
│ QQuery 32 │    907.69 / 923.56 ±17.36 / 956.28 ms │    911.69 / 959.23 ±31.78 / 999.13 ms │     no change │
│ QQuery 33 │ 1437.38 / 1479.83 ±37.75 / 1546.04 ms │ 1475.27 / 1525.30 ±52.51 / 1610.48 ms │     no change │
│ QQuery 34 │ 1541.17 / 1574.38 ±20.14 / 1596.35 ms │ 1436.55 / 1482.51 ±36.79 / 1526.15 ms │ +1.06x faster │
│ QQuery 35 │    310.04 / 329.26 ±21.84 / 357.70 ms │    270.82 / 313.47 ±71.76 / 456.80 ms │     no change │
│ QQuery 36 │        67.63 / 73.90 ±5.47 / 83.88 ms │        66.38 / 67.07 ±0.89 / 68.74 ms │ +1.10x faster │
│ QQuery 37 │        36.17 / 43.30 ±5.63 / 51.12 ms │        34.48 / 36.11 ±2.10 / 40.19 ms │ +1.20x faster │
│ QQuery 38 │        42.04 / 46.10 ±2.47 / 48.63 ms │        46.46 / 50.11 ±3.81 / 57.49 ms │  1.09x slower │
│ QQuery 39 │     143.84 / 158.70 ±7.92 / 167.55 ms │     103.96 / 108.60 ±3.70 / 112.37 ms │ +1.46x faster │
│ QQuery 40 │        13.61 / 13.99 ±0.29 / 14.46 ms │        13.14 / 13.59 ±0.25 / 13.83 ms │     no change │
│ QQuery 41 │        13.66 / 15.21 ±2.31 / 19.68 ms │        12.98 / 13.18 ±0.16 / 13.38 ms │ +1.15x faster │
│ QQuery 42 │        13.09 / 14.84 ±3.12 / 21.05 ms │        12.49 / 12.83 ±0.22 / 13.19 ms │ +1.16x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20171.13ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19790.79ms │
│ Average Time (HEAD)                           │   469.10ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   460.25ms │
│ Queries Faster                                │         13 │
│ Queries Slower                                │          7 │
│ Queries with No Change                        │         23 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.5 GiB
Avg memory 22.5 GiB
CPU user 1037.9s
CPU sys 75.3s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 31.9 GiB
Avg memory 23.2 GiB
CPU user 1014.6s
CPU sys 73.0s
Peak spill 0 B

File an issue against this benchmark runner

The default-true was a temporary flip so the benchmarking bot would
exercise the new code path. Revert before merge — opt-in stays the
contract until the cost-aware variant lands.
Replace the fixed lower-ratio rule with measurement-driven A/B sampling.
After the initial partial-probe window closes (100k rows by default),
the operator routes the next 10k rows through the passthrough path to
measure `passthrough_ns/row`, then compares it against the previously
measured `partial_ns/row` via a closed-form cost crossover:

  cost_keep_partial = partial_ns × N + final_ns × N × ratio
  cost_skip         = passthrough_ns × N + final_ns × N

  assuming final_ns ≈ partial_ns (similar hash-table mechanics):

      skip wins  ⇔  ratio > passthrough_ns / partial_ns

The crossover is set entirely by the two measured numbers — no magic
constant, no hardcoded ratio. Rule 1 (ratio >= 0.8) still short-circuits
before A/B, preserving the legacy cheap path.

State machine extensions:

- New `ExecutionState::AbSampling` mirrors `SkippingAggregation`
  (input → `transform_to_states` → output) but *keeps the partial
  hash table* — if A/B decides to keep partial, the stream reverts
  to `ReadingInput` and the hash table continues accumulating.
- `ProbePhase` enum (Partial / AbSampling / Locked) inside
  `SkipAggregationProbe` drives the transitions.

Diagnostic gauges exposed via EXPLAIN ANALYZE:

- `partial_agg_probe_partial_ns_per_row`     — measured at probe close
- `partial_agg_probe_passthrough_ns_per_row` — measured at A/B close
- `partial_agg_probe_ratio_per_mille`        — ratio × 1000
- `partial_agg_probe_cost_decision_skip`     — 1 if cost said skip, 0 if keep

Config:

- `skip_partial_aggregation_use_cost_model` (bool, default false) —
  opt-in switch. With it off, behaviour is exactly the legacy bare
  ratio check.
- `skip_partial_aggregation_ab_sampling_rows` (usize, default 10_000) —
  size of the A/B sampling window.
- Drops `skip_partial_aggregation_cost_min_ratio` and
  `skip_partial_aggregation_cost_ns_per_row` from the previous
  iteration of this PR — they were magic-constant gates that the
  cost-aware formula obsoletes.

7 `SkipAggregationProbe` unit tests cover:
- cost-model-off matches legacy ratio check
- cost-model-on short-circuits on Rule 1 (no A/B needed)
- A/B sampling entry transition
- cost decision chooses skip when partial expensive
- cost decision chooses keep when passthrough not much cheaper
- A/B window accumulates across multiple batches
- diagnostic gauges record at every transition

Existing 100 aggregate tests + 10 aggregate SLT files still pass.
Same temporary flip as before — benchmarking bot uses default config,
so cost-aware A/B sampling would never run otherwise. Revert this
commit before merge; the contract stays opt-in until we have data on
which to base a default change.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4544098758-328-k67ld 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (df6e264) to bdf8a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.15 / 4.68 ±6.91 / 18.49 ms │          1.15 / 4.69 ±6.97 / 18.62 ms │     no change │
│ QQuery 1  │        12.48 / 12.89 ±0.25 / 13.17 ms │        12.61 / 13.06 ±0.27 / 13.46 ms │     no change │
│ QQuery 2  │        35.58 / 35.90 ±0.28 / 36.38 ms │        35.60 / 35.91 ±0.24 / 36.33 ms │     no change │
│ QQuery 3  │        30.58 / 31.24 ±0.64 / 32.28 ms │        30.71 / 31.17 ±0.26 / 31.50 ms │     no change │
│ QQuery 4  │     224.83 / 229.33 ±4.11 / 234.47 ms │     229.57 / 236.04 ±4.36 / 241.88 ms │     no change │
│ QQuery 5  │     272.66 / 276.42 ±3.40 / 282.80 ms │     273.84 / 276.31 ±1.73 / 278.65 ms │     no change │
│ QQuery 6  │           1.16 / 1.31 ±0.23 / 1.76 ms │           1.18 / 1.32 ±0.23 / 1.77 ms │     no change │
│ QQuery 7  │        13.81 / 13.92 ±0.06 / 13.98 ms │        14.31 / 14.39 ±0.09 / 14.55 ms │     no change │
│ QQuery 8  │     323.26 / 326.81 ±2.75 / 329.99 ms │     333.76 / 339.68 ±4.91 / 346.89 ms │     no change │
│ QQuery 9  │     461.80 / 465.11 ±3.95 / 472.77 ms │     454.00 / 466.95 ±8.85 / 476.86 ms │     no change │
│ QQuery 10 │        69.42 / 72.07 ±2.96 / 77.61 ms │        70.21 / 73.16 ±4.87 / 82.82 ms │     no change │
│ QQuery 11 │        81.62 / 82.62 ±0.82 / 83.78 ms │        79.43 / 82.46 ±2.30 / 86.38 ms │     no change │
│ QQuery 12 │     274.50 / 276.58 ±1.72 / 278.73 ms │     253.77 / 257.56 ±2.48 / 260.96 ms │ +1.07x faster │
│ QQuery 13 │    376.44 / 388.93 ±13.21 / 406.23 ms │    370.52 / 393.82 ±11.91 / 403.20 ms │     no change │
│ QQuery 14 │     286.35 / 293.18 ±7.91 / 307.87 ms │     268.76 / 275.91 ±7.77 / 290.09 ms │ +1.06x faster │
│ QQuery 15 │     275.06 / 284.05 ±6.42 / 293.92 ms │     278.11 / 287.50 ±6.32 / 294.09 ms │     no change │
│ QQuery 16 │    621.48 / 637.72 ±14.24 / 656.31 ms │    630.27 / 641.57 ±11.64 / 663.51 ms │     no change │
│ QQuery 17 │     626.48 / 635.40 ±7.46 / 645.95 ms │     637.12 / 641.28 ±5.74 / 652.25 ms │     no change │
│ QQuery 18 │ 1278.88 / 1302.21 ±17.06 / 1325.58 ms │ 1139.46 / 1162.80 ±18.97 / 1193.86 ms │ +1.12x faster │
│ QQuery 19 │        27.71 / 27.81 ±0.07 / 27.91 ms │        27.48 / 29.22 ±3.03 / 35.26 ms │  1.05x slower │
│ QQuery 20 │    525.71 / 549.99 ±36.32 / 621.87 ms │     523.20 / 529.02 ±6.30 / 540.91 ms │     no change │
│ QQuery 21 │     595.21 / 599.63 ±5.39 / 609.97 ms │     593.94 / 601.36 ±5.56 / 608.33 ms │     no change │
│ QQuery 22 │  1064.22 / 1073.93 ±8.85 / 1084.80 ms │  1059.57 / 1066.73 ±5.09 / 1072.83 ms │     no change │
│ QQuery 23 │ 3219.08 / 3242.42 ±13.61 / 3261.41 ms │ 3188.16 / 3227.53 ±24.74 / 3262.72 ms │     no change │
│ QQuery 24 │        41.40 / 44.49 ±4.50 / 53.34 ms │        41.32 / 42.99 ±2.11 / 47.06 ms │     no change │
│ QQuery 25 │     111.07 / 116.95 ±5.52 / 125.83 ms │     112.84 / 114.53 ±1.55 / 117.20 ms │     no change │
│ QQuery 26 │        42.29 / 42.63 ±0.24 / 42.98 ms │        41.56 / 45.03 ±5.60 / 56.13 ms │  1.06x slower │
│ QQuery 27 │     670.74 / 680.11 ±9.89 / 695.73 ms │     665.23 / 673.68 ±4.72 / 678.90 ms │     no change │
│ QQuery 28 │ 3040.39 / 3066.12 ±23.61 / 3100.57 ms │ 3054.40 / 3088.22 ±21.67 / 3119.29 ms │     no change │
│ QQuery 29 │        40.36 / 47.86 ±8.59 / 60.60 ms │        40.11 / 44.48 ±5.31 / 54.30 ms │ +1.08x faster │
│ QQuery 30 │     304.80 / 311.17 ±6.52 / 323.11 ms │     274.25 / 287.75 ±7.29 / 294.55 ms │ +1.08x faster │
│ QQuery 31 │     292.41 / 300.35 ±7.20 / 313.18 ms │     286.06 / 296.85 ±7.24 / 306.23 ms │     no change │
│ QQuery 32 │    953.14 / 972.74 ±16.98 / 999.09 ms │    957.92 / 969.77 ±13.33 / 995.46 ms │     no change │
│ QQuery 33 │ 1491.21 / 1504.26 ±19.19 / 1542.42 ms │ 1472.27 / 1508.58 ±26.68 / 1543.56 ms │     no change │
│ QQuery 34 │ 1488.48 / 1517.55 ±25.60 / 1559.77 ms │  1511.60 / 1519.80 ±4.71 / 1524.50 ms │     no change │
│ QQuery 35 │    287.43 / 303.36 ±13.82 / 322.31 ms │    300.26 / 328.80 ±52.33 / 433.31 ms │  1.08x slower │
│ QQuery 36 │        66.45 / 72.71 ±6.12 / 82.66 ms │        56.05 / 68.95 ±6.85 / 76.62 ms │ +1.05x faster │
│ QQuery 37 │        37.57 / 44.07 ±5.05 / 52.59 ms │       36.44 / 42.73 ±11.58 / 65.84 ms │     no change │
│ QQuery 38 │        41.70 / 43.39 ±1.73 / 45.65 ms │        41.78 / 45.08 ±2.87 / 48.98 ms │     no change │
│ QQuery 39 │     136.21 / 148.50 ±7.02 / 157.49 ms │     112.58 / 116.45 ±3.08 / 120.82 ms │ +1.28x faster │
│ QQuery 40 │        13.97 / 19.01 ±5.97 / 27.58 ms │        14.34 / 19.91 ±5.86 / 29.62 ms │     no change │
│ QQuery 41 │        13.66 / 13.85 ±0.11 / 14.00 ms │        13.90 / 14.17 ±0.33 / 14.80 ms │     no change │
│ QQuery 42 │        13.31 / 13.52 ±0.22 / 13.85 ms │        13.11 / 13.80 ±0.72 / 15.09 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20126.77ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19930.96ms │
│ Average Time (HEAD)                           │   468.06ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   463.51ms │
│ Queries Faster                                │          7 │
│ Queries Slower                                │          3 │
│ Queries with No Change                        │         33 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.0 GiB
Avg memory 23.0 GiB
CPU user 1034.3s
CPU sys 75.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 30.4 GiB
Avg memory 23.1 GiB
CPU user 1023.1s
CPU sys 74.6s
Peak spill 0 B

File an issue against this benchmark runner

Phase 2 of cost-aware partial-agg skip. Instead of one final
decision per partition, the probe rewinds back to the partial-probe
phase after `re_probe_interval_rows` rows (default 1M) and re-runs
the partial+A/B sampling cycle on the next segment. Lets a single
partition oscillate between partial and skip as the data
distribution shifts (dense burst of repeated keys followed by
high-cardinality stretch, etc.).

State machine: `ProbePhase::Locked { should_skip }` becomes
`ProbePhase::Active { should_skip, rows_since_decision }`. Per-batch:

- In keep-partial: `observe_partial_batch` increments
  `rows_since_decision`. At the threshold, `start_reprobe` resets
  the probe (phase = Partial, counters cleared,
  `elapsed_compute_at_probe_start` re-snapshotted, `is_locked` = false).
- In skip: `tick_skip_batch` does the same from the
  `SkippingAggregation` exec-state arm. When re-probe fires, the
  main loop transitions back to `ReadingInput` so the partial-agg
  path runs on the next batch (fresh hash table, since the previous
  one was emitted on entry to skip).

Final-agg correctness is unaffected: each segment's output (be it
emitted partial state or per-row passthrough state) is associative-
commutative and merges naturally downstream.

New config:

- `skip_partial_aggregation_re_probe_interval_rows` (usize,
  default 1_000_000). Set to 0 to disable re-probing entirely
  (one-shot decision, the Phase 1 behaviour).

New diagnostic counter:

- `partial_agg_probe_segment_count` — number of completed segments
  in the current partition. 0 means the probe ran once and never
  re-probed; a large value on a fast query suggests the interval is
  too small.

Three new `SkipAggregationProbe` unit tests cover:
- re-probe after a committed skip decision rewinds to `Partial`
- re-probe after a committed keep decision rewinds to `Partial`
- `re_probe_interval_rows = 0` disables re-probing

Existing test `test_skip_aggregation_probe_not_locked_until_skip`
explicitly disables the cost-aware path (it exercises a legacy
Rule 1 corner case the cost model would intercept differently).
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4544553176-329-zpdd7 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (c716f33) to 2453bec (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.18 / 4.72 ±6.98 / 18.68 ms │          1.21 / 4.81 ±7.08 / 18.96 ms │     no change │
│ QQuery 1  │        12.45 / 12.89 ±0.29 / 13.28 ms │        12.52 / 12.87 ±0.24 / 13.24 ms │     no change │
│ QQuery 2  │        35.55 / 35.86 ±0.29 / 36.24 ms │        36.32 / 36.47 ±0.22 / 36.89 ms │     no change │
│ QQuery 3  │        30.41 / 30.90 ±0.68 / 32.23 ms │        31.15 / 31.31 ±0.16 / 31.61 ms │     no change │
│ QQuery 4  │     221.87 / 224.54 ±2.56 / 229.42 ms │     204.41 / 207.97 ±3.29 / 212.95 ms │ +1.08x faster │
│ QQuery 5  │     269.97 / 274.75 ±3.86 / 279.90 ms │     293.59 / 300.68 ±6.25 / 308.65 ms │  1.09x slower │
│ QQuery 6  │           1.19 / 1.35 ±0.23 / 1.81 ms │           1.21 / 1.36 ±0.22 / 1.78 ms │     no change │
│ QQuery 7  │        13.73 / 13.99 ±0.14 / 14.13 ms │        13.75 / 13.87 ±0.09 / 14.00 ms │     no change │
│ QQuery 8  │     320.78 / 328.84 ±4.63 / 334.55 ms │                                  FAIL │  incomparable │
│ QQuery 9  │     454.63 / 463.19 ±4.56 / 467.49 ms │    460.89 / 472.99 ±12.76 / 497.41 ms │     no change │
│ QQuery 10 │        69.29 / 70.27 ±1.11 / 72.42 ms │        71.10 / 73.32 ±4.22 / 81.76 ms │     no change │
│ QQuery 11 │        79.76 / 81.65 ±1.75 / 84.95 ms │        80.64 / 83.74 ±3.38 / 89.89 ms │     no change │
│ QQuery 12 │     266.03 / 271.93 ±6.02 / 281.43 ms │     261.01 / 266.00 ±3.05 / 269.85 ms │     no change │
│ QQuery 13 │    370.50 / 383.80 ±15.84 / 411.68 ms │     382.98 / 396.03 ±8.97 / 407.49 ms │     no change │
│ QQuery 14 │     281.09 / 286.83 ±5.89 / 296.46 ms │                                  FAIL │  incomparable │
│ QQuery 15 │     269.00 / 276.19 ±5.54 / 282.94 ms │    253.65 / 274.17 ±18.94 / 303.25 ms │     no change │
│ QQuery 16 │     622.91 / 627.43 ±4.90 / 635.98 ms │                                  FAIL │  incomparable │
│ QQuery 17 │     630.86 / 647.04 ±8.29 / 653.39 ms │                                  FAIL │  incomparable │
│ QQuery 18 │ 1276.97 / 1304.93 ±18.66 / 1331.74 ms │                                  FAIL │  incomparable │
│ QQuery 19 │        27.48 / 28.70 ±2.16 / 33.01 ms │        27.49 / 32.13 ±6.65 / 45.21 ms │  1.12x slower │
│ QQuery 20 │    516.47 / 528.35 ±11.87 / 548.41 ms │    519.56 / 526.99 ±10.22 / 546.94 ms │     no change │
│ QQuery 21 │     590.44 / 596.24 ±5.21 / 604.91 ms │     595.52 / 597.31 ±1.51 / 599.44 ms │     no change │
│ QQuery 22 │  1063.07 / 1071.01 ±5.75 / 1078.89 ms │ 1068.62 / 1076.76 ±10.29 / 1096.25 ms │     no change │
│ QQuery 23 │ 3187.30 / 3266.59 ±74.85 / 3405.01 ms │ 3186.55 / 3216.99 ±17.77 / 3241.70 ms │     no change │
│ QQuery 24 │       42.95 / 51.86 ±12.32 / 75.38 ms │       41.38 / 49.33 ±15.06 / 79.44 ms │     no change │
│ QQuery 25 │     112.17 / 117.99 ±6.60 / 129.27 ms │    112.30 / 121.06 ±15.56 / 152.12 ms │     no change │
│ QQuery 26 │        42.00 / 42.54 ±0.46 / 43.33 ms │        42.17 / 47.03 ±8.04 / 63.03 ms │  1.11x slower │
│ QQuery 27 │    677.12 / 687.54 ±14.79 / 716.90 ms │    669.83 / 683.37 ±12.27 / 705.25 ms │     no change │
│ QQuery 28 │ 3052.39 / 3099.10 ±30.01 / 3141.78 ms │ 3124.12 / 3178.31 ±58.85 / 3288.20 ms │     no change │
│ QQuery 29 │        40.11 / 43.05 ±4.27 / 51.46 ms │       40.11 / 47.93 ±15.29 / 78.51 ms │  1.11x slower │
│ QQuery 30 │     302.53 / 312.23 ±7.03 / 323.81 ms │                                  FAIL │  incomparable │
│ QQuery 31 │     280.81 / 291.58 ±7.32 / 303.49 ms │    290.32 / 306.04 ±13.04 / 329.66 ms │     no change │
│ QQuery 32 │  968.09 / 1019.84 ±31.94 / 1053.04 ms │    938.93 / 964.53 ±15.80 / 988.36 ms │ +1.06x faster │
│ QQuery 33 │ 1481.45 / 1511.26 ±21.39 / 1540.16 ms │ 1330.36 / 1357.00 ±17.56 / 1378.17 ms │ +1.11x faster │
│ QQuery 34 │ 1527.47 / 1538.89 ±12.25 / 1557.28 ms │ 1375.21 / 1395.77 ±13.17 / 1407.51 ms │ +1.10x faster │
│ QQuery 35 │    283.13 / 300.78 ±17.46 / 324.34 ms │    253.59 / 279.95 ±26.70 / 314.98 ms │ +1.07x faster │
│ QQuery 36 │        65.79 / 68.94 ±2.38 / 73.16 ms │        55.65 / 67.21 ±8.33 / 81.36 ms │     no change │
│ QQuery 37 │        34.78 / 37.17 ±2.16 / 40.24 ms │        36.24 / 38.78 ±3.75 / 46.19 ms │     no change │
│ QQuery 38 │        40.48 / 44.40 ±3.74 / 49.30 ms │        40.07 / 42.92 ±2.42 / 46.90 ms │     no change │
│ QQuery 39 │    135.73 / 157.71 ±13.57 / 177.88 ms │     113.52 / 118.43 ±3.14 / 122.91 ms │ +1.33x faster │
│ QQuery 40 │        14.15 / 14.68 ±0.50 / 15.59 ms │        14.09 / 17.02 ±4.95 / 26.90 ms │  1.16x slower │
│ QQuery 41 │        13.77 / 16.04 ±2.64 / 20.66 ms │        13.51 / 15.44 ±3.16 / 21.72 ms │     no change │
│ QQuery 42 │        12.98 / 13.12 ±0.11 / 13.28 ms │        13.06 / 13.33 ±0.19 / 13.64 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 16693.39ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 16369.25ms │
│ Average Time (HEAD)                           │   451.17ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   442.41ms │
│ Queries Faster                                │          6 │
│ Queries Slower                                │          5 │
│ Queries with No Change                        │         26 │
│ Queries with Failure                          │          6 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.6 GiB
Avg memory 22.7 GiB
CPU user 1036.0s
CPU sys 78.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 85.0s
Peak memory 31.7 GiB
Avg memory 21.9 GiB
CPU user 844.1s
CPU sys 54.4s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4545016043-330-bxrlv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/adaptive-partial-agg-cost (a258afe) to 2453bec (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_adaptive-partial-agg-cost
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃        feat_adaptive-partial-agg-cost ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.16 / 4.66 ±6.90 / 18.46 ms │          1.19 / 4.60 ±6.78 / 18.15 ms │     no change │
│ QQuery 1  │        12.97 / 13.41 ±0.23 / 13.61 ms │        12.57 / 12.77 ±0.15 / 12.99 ms │     no change │
│ QQuery 2  │        35.83 / 36.27 ±0.33 / 36.82 ms │        36.67 / 36.85 ±0.19 / 37.19 ms │     no change │
│ QQuery 3  │        30.31 / 30.85 ±0.57 / 31.95 ms │        30.74 / 30.79 ±0.04 / 30.85 ms │     no change │
│ QQuery 4  │     220.91 / 228.38 ±5.44 / 234.32 ms │     227.91 / 232.52 ±3.60 / 236.76 ms │     no change │
│ QQuery 5  │     272.55 / 276.08 ±3.13 / 280.95 ms │     268.52 / 272.27 ±2.80 / 275.70 ms │     no change │
│ QQuery 6  │           1.18 / 1.34 ±0.22 / 1.77 ms │           1.22 / 1.37 ±0.23 / 1.81 ms │     no change │
│ QQuery 7  │        14.21 / 14.30 ±0.09 / 14.45 ms │        13.70 / 13.95 ±0.19 / 14.28 ms │     no change │
│ QQuery 8  │     319.76 / 326.02 ±5.15 / 331.92 ms │     330.08 / 334.96 ±3.96 / 341.27 ms │     no change │
│ QQuery 9  │     455.77 / 465.40 ±7.72 / 473.34 ms │     456.61 / 462.46 ±5.25 / 468.93 ms │     no change │
│ QQuery 10 │        70.04 / 70.76 ±0.64 / 71.72 ms │        68.33 / 69.91 ±1.06 / 71.12 ms │     no change │
│ QQuery 11 │        82.10 / 83.51 ±1.33 / 85.67 ms │        80.59 / 84.02 ±2.29 / 86.73 ms │     no change │
│ QQuery 12 │     264.33 / 269.35 ±3.86 / 275.82 ms │     252.81 / 257.68 ±3.70 / 262.55 ms │     no change │
│ QQuery 13 │     370.81 / 384.58 ±9.74 / 398.44 ms │    355.66 / 384.18 ±18.14 / 411.88 ms │     no change │
│ QQuery 14 │     282.93 / 287.15 ±3.16 / 290.90 ms │     269.73 / 271.96 ±1.79 / 273.97 ms │ +1.06x faster │
│ QQuery 15 │     270.54 / 275.63 ±3.90 / 282.05 ms │    271.27 / 282.68 ±12.06 / 304.97 ms │     no change │
│ QQuery 16 │     617.43 / 624.46 ±5.24 / 633.45 ms │    621.33 / 634.86 ±17.66 / 669.40 ms │     no change │
│ QQuery 17 │     635.46 / 641.10 ±4.77 / 648.65 ms │    630.64 / 656.27 ±36.08 / 727.88 ms │     no change │
│ QQuery 18 │ 1273.80 / 1291.02 ±10.31 / 1304.14 ms │ 1121.56 / 1156.68 ±29.14 / 1194.68 ms │ +1.12x faster │
│ QQuery 19 │       27.75 / 39.55 ±11.84 / 56.35 ms │        27.38 / 27.56 ±0.15 / 27.78 ms │ +1.43x faster │
│ QQuery 20 │     520.27 / 531.05 ±6.78 / 540.04 ms │     520.53 / 525.28 ±4.63 / 532.84 ms │     no change │
│ QQuery 21 │     588.29 / 598.94 ±6.38 / 606.65 ms │     594.11 / 598.98 ±4.09 / 604.63 ms │     no change │
│ QQuery 22 │  1063.88 / 1067.93 ±2.69 / 1071.62 ms │  1060.27 / 1070.46 ±9.27 / 1087.69 ms │     no change │
│ QQuery 23 │ 3188.25 / 3210.76 ±23.26 / 3255.67 ms │ 3181.35 / 3204.56 ±14.15 / 3221.14 ms │     no change │
│ QQuery 24 │        41.01 / 41.93 ±1.02 / 43.70 ms │        41.03 / 41.83 ±0.68 / 42.96 ms │     no change │
│ QQuery 25 │     112.35 / 116.34 ±3.14 / 120.48 ms │     111.29 / 116.31 ±7.83 / 131.91 ms │     no change │
│ QQuery 26 │        41.37 / 44.83 ±5.40 / 55.55 ms │        41.34 / 42.01 ±0.66 / 43.16 ms │ +1.07x faster │
│ QQuery 27 │     678.44 / 683.40 ±5.65 / 693.16 ms │     667.10 / 679.45 ±9.09 / 694.29 ms │     no change │
│ QQuery 28 │ 3053.89 / 3080.63 ±30.79 / 3136.18 ms │ 3029.03 / 3063.54 ±21.61 / 3092.94 ms │     no change │
│ QQuery 29 │       40.42 / 51.42 ±14.51 / 76.58 ms │        40.11 / 41.86 ±3.27 / 48.39 ms │ +1.23x faster │
│ QQuery 30 │     301.01 / 306.85 ±6.01 / 317.82 ms │     283.40 / 287.41 ±4.36 / 295.29 ms │ +1.07x faster │
│ QQuery 31 │    275.46 / 294.98 ±16.78 / 325.40 ms │     288.01 / 295.07 ±4.31 / 301.62 ms │     no change │
│ QQuery 32 │   957.02 / 974.29 ±22.56 / 1018.31 ms │   921.30 / 975.61 ±32.55 / 1008.55 ms │     no change │
│ QQuery 33 │ 1460.79 / 1497.04 ±48.78 / 1592.69 ms │ 1446.79 / 1476.40 ±17.14 / 1493.64 ms │     no change │
│ QQuery 34 │ 1454.11 / 1486.08 ±18.38 / 1510.16 ms │ 1462.51 / 1476.27 ±12.80 / 1494.54 ms │     no change │
│ QQuery 35 │    275.67 / 323.30 ±90.15 / 503.57 ms │     289.49 / 297.13 ±9.80 / 315.35 ms │ +1.09x faster │
│ QQuery 36 │        66.79 / 73.58 ±7.05 / 86.12 ms │        57.62 / 63.70 ±6.86 / 76.51 ms │ +1.16x faster │
│ QQuery 37 │        35.41 / 38.69 ±5.40 / 49.38 ms │        35.16 / 36.78 ±1.82 / 39.30 ms │     no change │
│ QQuery 38 │        40.58 / 44.33 ±2.62 / 47.86 ms │        41.03 / 43.24 ±1.18 / 44.51 ms │     no change │
│ QQuery 39 │     145.54 / 153.51 ±6.08 / 163.59 ms │     112.20 / 118.12 ±4.66 / 125.66 ms │ +1.30x faster │
│ QQuery 40 │        13.61 / 14.96 ±2.22 / 19.37 ms │        13.46 / 15.06 ±2.39 / 19.78 ms │     no change │
│ QQuery 41 │        13.58 / 15.63 ±3.95 / 23.53 ms │        13.43 / 13.70 ±0.35 / 14.36 ms │ +1.14x faster │
│ QQuery 42 │        12.86 / 13.04 ±0.13 / 13.22 ms │        13.08 / 14.96 ±2.24 / 18.62 ms │  1.15x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                             │ 20027.35ms │
│ Total Time (feat_adaptive-partial-agg-cost)   │ 19726.11ms │
│ Average Time (HEAD)                           │   465.75ms │
│ Average Time (feat_adaptive-partial-agg-cost) │   458.75ms │
│ Queries Faster                                │         10 │
│ Queries Slower                                │          1 │
│ Queries with No Change                        │         32 │
│ Queries with Failure                          │          0 │
└───────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.3 GiB
Avg memory 22.8 GiB
CPU user 1027.4s
CPU sys 74.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 30.9 GiB
Avg memory 23.2 GiB
CPU user 1012.4s
CPU sys 73.4s
Peak spill 0 B

File an issue against this benchmark runner

The Phase 2 revert took out the previous config-disable line. Test
exercises the original Rule 1 short-circuit behaviour, so it needs to
opt out of the default-on A/B sampling explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate documentation Improvements or additions to documentation physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants