Skip to content

perf(physical-optimizer): skip ensure_distribution rebuild when children are unchanged#22521

Open
zhuqi-lucas wants to merge 2 commits into
apache:mainfrom
zhuqi-lucas:qizhu/enforce-distribution-skip-rebuild
Open

perf(physical-optimizer): skip ensure_distribution rebuild when children are unchanged#22521
zhuqi-lucas wants to merge 2 commits into
apache:mainfrom
zhuqi-lucas:qizhu/enforce-distribution-skip-rebuild

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented May 26, 2026

Which issue does this PR close?

Closes #22520.

Rationale for this change

ensure_distribution in datafusion/physical-optimizer/src/ensure_requirements/enforce_distribution.rs unconditionally calls plan.with_new_children(children_plans) after collecting the (possibly redistributed) children, even when none of those children were actually replaced. For nodes like ProjectionExec, that path runs through try_new and recomputes the schema, equivalence properties, output ordering, and output partitioning, then allocates a new Arc<dyn ExecutionPlan>. When every child Arc is pointer-identical to the input, that work produces a logically identical node — pure overhead.

The cost is amplified by two factors:

  1. Plan depth. Workloads dominated by point queries (no join / aggregate / unmet ordering — i.e. nothing for ensure_distribution to inject a RepartitionExec or SortExec for) hit this wasted rebuild at every node in the plan. A 5–30 deep ProjectionExec stack pays the cost N times.
  2. Schema width. Most steps inside ProjectionExec::try_new are O(num_columns): per-column data_type / nullable lookup to build the new schema, per-column remapping of equivalence classes through the projection mapping, and per-column lookup when rewriting PhysicalSortExprs into the output ordering. Wide schemas (tens of columns) make every wasted rebuild proportionally heavier.

Profiling a production point-query workload (wide schemas, deep ProjectionExec stacks) showed ProjectionExec::with_new_children as the single largest cost inside ensure_distribution:

  • ensure_distribution total: 2.87s of a 60s CPU sample
  • ProjectionExec::with_new_children: 1.94s (56% of the rule)
  • SortExec::with_new_children: 0.11s
  • Other ExecutionPlan nodes: 0.82s

What changes are included in this PR?

After collecting children_plans, compare each new child Arc with the original via Arc::ptr_eq. When every child is unchanged, reuse the existing plan Arc and skip with_new_children. The UnionExec to InterleaveExec special case still runs first because it intentionally produces a new node even when child Arcs are unchanged.

This relies on the fact that ensure_distribution already produces pointer-identical Arcs for children that need no redistribution (it threads the original Arc through unchanged), so Arc::ptr_eq precisely distinguishes "rewritten" from "untouched" children at O(1) per child.

Are these changes tested?

Yes. The existing enforce_distribution suite passes unchanged (66/66):

cargo test --release -p datafusion --test core_integration -- physical_optimizer::enforce_distribution

The behavior is observable only as a CPU reduction; correctness is preserved because ExecutionPlan nodes are immutable, so reusing the original Arc produces the same plan tree as with_new_children(unchanged_children) would have, just without the schema / ordering / equivalence / partitioning recomputation.

Are there any user-facing changes?

No. Same plans, lower planning time.

Micro-benchmark

Plan shape: 30-deep ProjectionExec stack over a sorted parquet scan, 5000 iterations.

  • Without fix: 852.74 ms total, 170.55 us/call
  • With fix: 296.81 ms total, 59.36 us/call
  • ~2.87x speedup, -65% CPU per call

Wider schemas (more projection expressions per node) widen the gap further because each skipped with_new_children avoids more O(num_columns) work.

…ren are unchanged

ensure_distribution was unconditionally calling plan.with_new_children after collecting the (possibly redistributed) children, even when none of the children were actually replaced. For nodes like ProjectionExec, that path runs through try_new and recomputes the schema, equivalence properties, and output ordering each time, which is pure overhead when the input Arcs are identical.

Compare each new child Arc with the original via Arc::ptr_eq and reuse the existing plan Arc when nothing changed. The UnionExec to InterleaveExec special case still runs first because it intentionally produces a new node.

On a representative deep ProjectionExec stack (30 layers over a sorted parquet scan, no join / aggregate / unmet ordering, 5000 iterations) this brings ensure_distribution from 170.55 us/call to 59.36 us/call, a ~2.87x speedup. Profiling on a real workload dominated by point queries showed ProjectionExec::with_new_children taking 1.94s out of a 2.87s ensure_distribution slice in a 60s sample, so this is the bulk of the rule's cost on that shape.

Closes apache#22520
Copilot AI review requested due to automatic review settings May 26, 2026 06:34
@github-actions github-actions Bot added the optimizer Optimizer rules label May 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the physical optimizer’s distribution enforcement by avoiding unnecessary plan-node rebuilds when the computed child plans are identical to the existing children, reducing planning CPU overhead (notably for deep ProjectionExec stacks) without changing correctness.

Changes:

  • Detect when all post-processing child plan Arcs are pointer-identical to the original children via Arc::ptr_eq.
  • Reuse the existing plan Arc and skip with_new_children when children are unchanged.
  • Preserve the existing UnionExecInterleaveExec special-case behavior by applying it before the “children unchanged” fast-path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: ensure_distribution rebuilds plan nodes unnecessarily when children are unchanged

2 participants