Skip to content

[fix](be) Preserve null probe rows in mark anti join#63767

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/subquery-not-in-or
Open

[fix](be) Preserve null probe rows in mark anti join#63767
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/subquery-not-in-or

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 27, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.

Release note

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

Check List (For Author)

  • Test:
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction
    • Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
    • Build: ./build.sh --be
  • Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 27, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the null-aware mark anti join fix. The change preserves NULL probe rows when there are no build-side NULL keys, but it regresses the same path when the build hash table keeps NULL keys, which this join mode explicitly does for null-aware joins with other conjuncts.

Critical checkpoint conclusions:

  • Goal/test: The PR targets incorrect results for nullable probe keys in mark null-aware left anti joins under disjunction. The added regression covers a nullable probe key, but only with i.a IS NOT NULL, so it does not cover the build-side NULL path that remains active in this join mode.
  • Scope: The code change is small and focused, but incomplete because it does not clear the resumed build index when stopping on a NULL probe row.
  • Concurrency/lifecycle/config/compatibility/persistence: Not applicable; this is local hash join probe state processing with no new config, persistence, or protocol/storage format change.
  • Parallel paths: The reviewed path is specifically NULL_AWARE_LEFT_ANTI_JOIN/NULL_AWARE_LEFT_SEMI_JOIN with other conjuncts and mark processing; non-other-conjunct paths are separate.
  • Special conditional checks: The changed NULL-probe check needs to maintain the downstream invariant that probe_idx still points to the NULL row and build_idx does not trigger continuation rewind.
  • Test coverage/results: Existing added results are deterministic, but missing a case where the subquery/build side contains a NULL mark key, which is exactly where keep_null_key = true makes build_idx_map[probe_idx] nonzero for NULL probes.
  • Observability/performance: No additional observability needed; no material performance concern found.
  • Data correctness: Blocking issue found; affected queries can skip or reprocess rows when build-side NULL keys exist.

User focus: No additional user-provided review focus was specified.

if constexpr (has_null_map) {
if (null_map[probe_idx]) {
probe_idx++;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not preserve the NULL probe row correctly when the build side contains NULL keys. For null-aware joins with other conjuncts, HashJoinBuild::run() sets keep_null_key = true, so build_idx_map[probe_idx] can be first[bucket_size] for a NULL probe. After this change the loop breaks with that nonzero build_idx, then line 492 applies probe_idx -= (build_idx != 0), returning the previous probe row instead of the NULL row. The caller therefore misses _process_probe_null_key(probe_idx) for this row, or resumes probing the previous row against the NULL bucket. A repro is the new nullable test shape without filtering NULLs out of the subquery build side, e.g. inserting an inner (1, NULL) and removing i.a IS NOT NULL. Please clear build_idx before breaking on a NULL probe, or otherwise skip the continuation rewind for this case, and add coverage with build-side NULL keys.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.

### Release note

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

### Check List (For Author)

- Test:
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
    - Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
    - Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the fix/subquery-not-in-or branch from 519e75b to 81394a2 Compare May 27, 2026 22:51
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 28, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed for PR 63767.

Summary opinion: no additional blocking issue found in this review pass. The existing inline thread about NULL probe continuation was treated as already-known context and not duplicated; the current diff now clears build_idx and picking_null_keys before breaking on a NULL probe row, which addresses the continuation-rewind shape described there.

Critical checkpoint conclusions:

  • Goal and tests: The PR aims to preserve NULL probe rows for null-aware mark joins with other conjuncts. The code change matches that goal, and the added/reenabled regression cases cover nullable probe rows, disjunction filtering, and build-side NULL keys.
  • Scope: The change is small and focused on the hash-table probing state transition plus targeted regression coverage.
  • Concurrency: No new shared state, thread entry, dependency, lock, or atomic behavior is introduced.
  • Lifecycle: No new static/global lifecycle or ownership concerns are introduced.
  • Configuration: No configuration items are added.
  • Compatibility: No protocol, storage format, function symbol, or rolling-upgrade compatibility concern is introduced.
  • Parallel paths: The change applies to the specialized null-aware-with-other-conjuncts path that handles this state machine; other hash join probe paths do not share this exact NULL-probe handoff.
  • Conditional checks: The new NULL-probe branch has a concrete reason: stop normal hash-chain probing and hand the same probe row to _process_probe_null_key without rewinding to the previous row.
  • Test coverage: Regression coverage is relevant and deterministic with order_qt/ORDER BY; no issue found in the new expected results.
  • Observability: No new observability appears necessary for this localized correctness fix.
  • Transaction/persistence/data writes: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: The added branch assignments are constant-time and not a material hot-path regression.

User focus response: No additional user-provided review focus was specified.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 28, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants