[fix](be) Preserve null probe rows in mark anti join#63767
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue in the null-aware mark anti join fix. The change preserves NULL probe rows when there are no build-side NULL keys, but it regresses the same path when the build hash table keeps NULL keys, which this join mode explicitly does for null-aware joins with other conjuncts.
Critical checkpoint conclusions:
- Goal/test: The PR targets incorrect results for nullable probe keys in mark null-aware left anti joins under disjunction. The added regression covers a nullable probe key, but only with
i.a IS NOT NULL, so it does not cover the build-side NULL path that remains active in this join mode. - Scope: The code change is small and focused, but incomplete because it does not clear the resumed build index when stopping on a NULL probe row.
- Concurrency/lifecycle/config/compatibility/persistence: Not applicable; this is local hash join probe state processing with no new config, persistence, or protocol/storage format change.
- Parallel paths: The reviewed path is specifically
NULL_AWARE_LEFT_ANTI_JOIN/NULL_AWARE_LEFT_SEMI_JOINwith other conjuncts and mark processing; non-other-conjunct paths are separate. - Special conditional checks: The changed NULL-probe check needs to maintain the downstream invariant that
probe_idxstill points to the NULL row andbuild_idxdoes not trigger continuation rewind. - Test coverage/results: Existing added results are deterministic, but missing a case where the subquery/build side contains a NULL mark key, which is exactly where
keep_null_key = truemakesbuild_idx_map[probe_idx]nonzero for NULL probes. - Observability/performance: No additional observability needed; no material performance concern found.
- Data correctness: Blocking issue found; affected queries can skip or reprocess rows when build-side NULL keys exist.
User focus: No additional user-provided review focus was specified.
| if constexpr (has_null_map) { | ||
| if (null_map[probe_idx]) { | ||
| probe_idx++; | ||
| break; |
There was a problem hiding this comment.
This still does not preserve the NULL probe row correctly when the build side contains NULL keys. For null-aware joins with other conjuncts, HashJoinBuild::run() sets keep_null_key = true, so build_idx_map[probe_idx] can be first[bucket_size] for a NULL probe. After this change the loop breaks with that nonzero build_idx, then line 492 applies probe_idx -= (build_idx != 0), returning the previous probe row instead of the NULL row. The caller therefore misses _process_probe_null_key(probe_idx) for this row, or resumes probing the previous row against the NULL bucket. A repro is the new nullable test shape without filtering NULLs out of the subquery build side, e.g. inserting an inner (1, NULL) and removing i.a IS NOT NULL. Please clear build_idx before breaking on a NULL probe, or otherwise skip the continuation rewind for this case, and add coverage with build-side NULL keys.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.
### Release note
Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.
### Check List (For Author)
- Test:
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
- Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
- Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No
519e75b to
81394a2
Compare
|
/review |
There was a problem hiding this comment.
Code review completed for PR 63767.
Summary opinion: no additional blocking issue found in this review pass. The existing inline thread about NULL probe continuation was treated as already-known context and not duplicated; the current diff now clears build_idx and picking_null_keys before breaking on a NULL probe row, which addresses the continuation-rewind shape described there.
Critical checkpoint conclusions:
- Goal and tests: The PR aims to preserve NULL probe rows for null-aware mark joins with other conjuncts. The code change matches that goal, and the added/reenabled regression cases cover nullable probe rows, disjunction filtering, and build-side NULL keys.
- Scope: The change is small and focused on the hash-table probing state transition plus targeted regression coverage.
- Concurrency: No new shared state, thread entry, dependency, lock, or atomic behavior is introduced.
- Lifecycle: No new static/global lifecycle or ownership concerns are introduced.
- Configuration: No configuration items are added.
- Compatibility: No protocol, storage format, function symbol, or rolling-upgrade compatibility concern is introduced.
- Parallel paths: The change applies to the specialized null-aware-with-other-conjuncts path that handles this state machine; other hash join probe paths do not share this exact NULL-probe handoff.
- Conditional checks: The new NULL-probe branch has a concrete reason: stop normal hash-chain probing and hand the same probe row to
_process_probe_null_keywithout rewinding to the previous row. - Test coverage: Regression coverage is relevant and deterministic with
order_qt/ORDER BY; no issue found in the new expected results. - Observability: No new observability appears necessary for this localized correctness fix.
- Transaction/persistence/data writes: Not applicable.
- FE/BE variable passing: Not applicable.
- Performance: The added branch assignments are constant-time and not a material hot-path regression.
User focus response: No additional user-provided review focus was specified.
|
run buildall |
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Correlated
NOT INsubqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key isNULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on theNULLrow so the null-aware join path can emit the correct mark value.Release note
Fix incorrect results for correlated
NOT INsubqueries combined with disjunctions.Check List (For Author)
doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOutdoris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunctionNOT IN+ORreproducer before and after the fix on a local FE/BE cluster./build.sh --be