perf: Handle intermediate Projection nodes in EliminateOuterJoin#22534
perf: Handle intermediate Projection nodes in EliminateOuterJoin#22534neilconway wants to merge 2 commits into
Projection nodes in EliminateOuterJoin#22534Conversation
EliminateOuterJoin previously only matched the literal Filter -> Join pattern. When a Projection sits between the Filter and the Join, the rule no-ops and the outer join stays in place even when the predicate above the projection would justify converting it. A common shape that hits this comes from projection pruning after filter pushdown. In TPC-DS q49, PushDownFilter moves the returns-side predicate above the sales/returns LEFT JOIN, then OptimizeProjections inserts a pruning Projection between that Filter and the LEFT JOIN. The returns-side predicate still filters out the outer rows, but the projection hides the join from the old rule. Extend the rule to walk down through Projection nodes between Filter and Join, rewriting a working copy of the predicate into the join's coordinate space for analysis. The rewritten predicate is used only for analysis; the original predicate and surrounding plan structure are preserved on success. Tests cover passthrough projection, aliased projection, negative cases, a non-transparent Limit guard, and SQL-level q49-shaped cases where OptimizeProjections places a pruning Projection between a returns-side Filter and the sales/returns LEFT JOIN.
| # A CTE can introduce a query boundary between the outer filter and the | ||
| # LEFT JOIN. | ||
| query TT | ||
| explain | ||
| with s as ( | ||
| select t1.a, t2.y | ||
| from t1 left join t2 on t1.a = t2.x | ||
| ) | ||
| select s.a from s where s.y > 150; | ||
| ---- | ||
| logical_plan | ||
| 01)SubqueryAlias: s | ||
| 02)--Projection: t1.a | ||
| 03)----Inner Join: t1.a = t2.x | ||
| 04)------TableScan: t1 projection=[a] | ||
| 05)------Projection: t2.x | ||
| 06)--------Filter: t2.y > Int32(150) | ||
| 07)----------TableScan: t2 projection=[x, y] | ||
|
|
||
| query I rowsort | ||
| with s as ( | ||
| select t1.a, t2.y | ||
| from t1 left join t2 on t1.a = t2.x | ||
| ) | ||
| select s.a from s where s.y > 150; | ||
| ---- | ||
| 2 |
There was a problem hiding this comment.
The "Filter -> Projection -> Join" pattern does not actually appear in the final plan, because of subsequent optimizer passes; I verified that the plan contains an outer join if we revert the changes in this PR.
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/eliminate-outer-join-through-projection (fc7f165) to 2453bec (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/eliminate-outer-join-through-projection (fc7f165) to 2453bec (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/eliminate-outer-join-through-projection (fc7f165) to 2453bec (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
EliminateOuterJoinshould descend throughProjection#22531.Rationale for this change
EliminateOuterJoinlooks for plans with aFilterdirectly above aJoin. For most queries, that is the right plan shape (becausePushdownFilterwill typically place the filters that are useful for outer join elimination directly on top of the relevantJoin). However, some plans don't follow this shape, for at least two reasons:OptimizeProjectionsmight insert aProjectionbetween theFilterandJoinNotably, we run into case (2) in TPC-DS Q49; we currently fail to convert three outer joins to inner joins for that reason.
We can handle this by teaching
EliminateOuterJoinsto descend through one or more intermediateProjectionnodes, rewriting the filter predicate as it goes to account for the effect of the projection.What changes are included in this PR?
EliminateOuterJoinsto descend through one or moreProjectionnodeseliminate_outer_joins.rs, improve commentsAre these changes tested?
Yes, new tests added. Manually verified that we fail to eliminate the outer joins in TPC-DS Q49 without this change and succeed on doing so with this change.
Are there any user-facing changes?
More effective outer join query optimization.