Fix multisearch UDT type loss through UNION (#5145, #5146, #5147) by ahkcs · Pull Request #5154 · opensearch-project/sql

ahkcs · 2026-02-17T22:00:57Z

Summary

Override leastRestrictive() in OpenSearchTypeFactory to preserve UDT types (e.g., ExprTimestampType) through set operations (UNION ALL) used by multisearch
Without this fix, Calcite's default leastRestrictive() downgrades UDT-wrapped types to their underlying SQL type (e.g., VARCHAR), breaking downstream operators that rely on type checks
Add integration tests verifying span, bin, and bin+stats work correctly after multisearch

Issues Resolved

Resolves #5145, resolves #5146, resolves #5147

See investigation details: #5153

Root Cause

OpenSearchTypeFactory (extends JavaTypeFactoryImpl) did not override leastRestrictive(). When Calcite builds UNION ALL for multisearch, it calls leastRestrictive() on column types from both branches. Both branches have ExprTimestampType (a UDT backed by SqlTypeName.VARCHAR), but the default implementation creates a plain BasicSqlType(VARCHAR), losing the UDT wrapper.

This caused:

[BUG] PPL span-expression used after multisearch fails #5146: SpanFunction.SpanImplementor checks fieldType instanceof ExprSqlType → fails for plain VARCHAR → "Unsupported expr type: STRING"
[BUG] PPL bin used after multisearch has no effect #5147: BinnableField.isTimeBasedType() can't recognize plain VARCHAR as time-based → bin produces null timestamps
[BUG] PPL multisearch without further processing fails #5145: Same type loss causes plan preparation failures depending on downstream operators

Fix

When all input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT wrapper. Nullability is handled correctly — if any input is nullable, the result is nullable. Otherwise, delegates to the default Calcite implementation.

Test Plan

testMultisearchWithoutFurtherProcessing — basic multisearch returns all 51 rows
testMultisearchWithSpanExpression — stats avg(value) by span(@timestamp, 5m) after multisearch (36 rows verified)
testMultisearchBinTimestamp — bin @timestamp span=5m produces non-null timestamps (36 rows verified)
testMultisearchBinAndStats — bin + stats produces correct time buckets (36 rows verified)
Existing testMultisearchWithTimestampInterleaving schema updated from "string" to "timestamp" (correct behavior)
All existing multisearch integration tests pass

Signed-off-by: Kai Huang <ahkcs@amazon.com>

coderabbitai · 2026-02-17T22:01:20Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Preserve custom/user-defined data types across set operations (UNION/INTERSECT/EXCEPT).
- Corrected handling of timestamp-typed fields in multisearch and append scenarios.
Tests
- Added integration tests for multisearch: timestamp binning, span expressions, projection/aggregation ordering, and stats validations.
- Added unit tests validating least-restrictive behavior and nullability for user-defined types.

Walkthrough

Adds a leastRestrictive override to preserve OpenSearch UDT wrappers with proper nullability handling, updates PPL/multisearch integration tests (adds reproduction tests for issues #5145/#5146/#5147, with a duplicated test block), and adjusts expected timestamp schema in a PPL append test. Also adds unit tests for the new type behavior.

Changes

Cohort / File(s)	Summary
OpenSearch TypeFactory `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`	Added `public` @nullable `RelDataType leastRestrictive(List<RelDataType> types)` and `import org.checkerframework.checker.nullness.qual.Nullable;` to preserve OpenSearch UDT wrappers across set operations with nullability handling.
TypeFactory Unit Tests `core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`	New test class with multiple tests validating leastRestrictive behavior for UDT preservation, nullability propagation, mixed-type fallbacks, and delegation to superclass for plain types.
Multisearch Integration Tests (added/duplicated) `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java`	Changed expected `@timestamp` type from string to timestamp and added four reproduction tests for issues `#5145`, `#5146`, `#5147`: `testMultisearchWithoutFurtherProcessing`, `testMultisearchWithSpanExpression`, `testMultisearchBinTimestamp`, `testMultisearchBinAndStats`. The block of four tests is duplicated elsewhere in the same file (duplicate test declarations present).
PPL Append Test (schema expectation update) `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java`	Updated expected schema type for `birthdate` from string to timestamp in `testAppendSchemaMergeWithTimestampUDT`. No control-flow changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

bug, bugFix, calcite, PPL

Suggested reviewers

penghuo
anirudha
ps48
LantaoJin
kavithacm
derek-ho
joshuali925
mengweieric
Yury-Fridlyand
MaxKsyunz
GumpacG

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main change: fixing UDT type loss through UNION in multisearch, with specific issue references (`#5145`, `#5146`, `#5147`).
Description check	✅ Passed	The description comprehensively covers the changeset: the override of leastRestrictive(), root cause analysis, fix behavior, and test plan with specific test cases.
Linked Issues check	✅ Passed	The PR addresses all three linked issues: `#5145` (multisearch without further processing), `#5146` (span expressions after multisearch), and `#5147` (bin command after multisearch) by preserving UDT types through UNION operations.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the UDT type loss issue: leastRestrictive() override in OpenSearchTypeFactory, comprehensive unit tests for the new method, integration tests for multisearch scenarios, and schema type corrections in existing tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Kai Huang <ahkcs@amazon.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`:
- Around line 381-409: Update the JavaDoc for the public override
leastRestrictive in OpenSearchTypeFactory to include `@param` types (describe that
it is the list of input RelDataType), `@return` (the least restrictive
RelDataType, preserving UDT wrapper when applicable) and `@throws` (document any
runtime exceptions or pass-through from super.leastRestrictive if relevant);
then add unit tests covering: 1) UDT preservation—when all inputs are
AbstractExprRelDataType with the same getUdt() the result must keep that UDT
wrapper (use AbstractExprRelDataType test doubles), 2) mixed nullability—when
any input is nullable and the first is non-nullable the returned type becomes
nullable via createWithNullability(this, true), and 3) mixed UDT/non-UDT
inputs—when inputs include non-UDT types the method must fall back to
super.leastRestrictive; reference the method name leastRestrictive, the class
OpenSearchTypeFactory, and AbstractExprRelDataType in tests and assertions.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java`:
- Around line 371-542: The three tests testMultisearchWithSpanExpression,
testMultisearchBinTimestamp, and testMultisearchBinAndStats rely on implicit
result ordering and must be made deterministic: modify the query passed to
executeQuery to include an explicit sort (e.g., add "| sort `@timestamp` asc" or
"| sort <field> asc/desc" after the final pipeline stage) so results are
consistently ordered, then update the expected verifyDataRows rows(...)
sequences to match the new sorted order (or alternatively change verifyDataRows
calls to use an order-agnostic assertion helper); update only those tests and
keep the rest of the assertions intact.

coderabbitai · 2026-02-17T22:44:46Z

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java

+  /**
+   * Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all
+   * input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT
+   * wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is
+   * critical for operations like multisearch that use UNION ALL, where downstream operators (bin,
+   * span) rely on the UDT type to determine how to process the field.
+   */
+  @Override
+  public @Nullable RelDataType leastRestrictive(List<RelDataType> types) {
+    if (types.size() > 1) {
+      RelDataType first = types.get(0);
+      if (first instanceof AbstractExprRelDataType<?> firstUdt) {
+        boolean allSameUdt =
+            types.stream()
+                .allMatch(
+                    t ->
+                        t instanceof AbstractExprRelDataType<?> udt
+                            && udt.getUdt() == firstUdt.getUdt());
+        if (allSameUdt) {
+          boolean anyNullable = types.stream().anyMatch(RelDataType::isNullable);
+          if (anyNullable && !first.isNullable()) {
+            return firstUdt.createWithNullability(this, true);
+          }
+          return first;
+        }
+      }
+    }
+    return super.leastRestrictive(types);
+  }


⚠️ Potential issue | 🟠 Major

Add required JavaDoc tags and unit tests for the new leastRestrictive override.
This new public method is missing @param/@return/@throws tags, and the commit doesn’t add unit tests for the new behavior (UDT preservation, mixed nullability, mixed UDT/non‑UDT inputs). Please add unit tests and update the JavaDoc accordingly.

✍️ Add the required JavaDoc tags

- /** - * Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all - * input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT - * wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is - * critical for operations like multisearch that use UNION ALL, where downstream operators (bin, - * span) rely on the UDT type to determine how to process the field. - */ + /** + * Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all + * input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT + * wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is + * critical for operations like multisearch that use UNION ALL, where downstream operators (bin, + * span) rely on the UDT type to determine how to process the field. + * + * `@param` types candidate types to reconcile + * `@return` least restrictive type that preserves UDT when possible + * `@throws` NullPointerException if {`@code` types} is null + */

As per coding guidelines: "New functions MUST have unit tests in the same commit" and "Public methods MUST have JavaDoc with `@param`, `@return`, and `@throws`".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java` around lines 381 - 409, Update the JavaDoc for the public override leastRestrictive in OpenSearchTypeFactory to include `@param` types (describe that it is the list of input RelDataType), `@return` (the least restrictive RelDataType, preserving UDT wrapper when applicable) and `@throws` (document any runtime exceptions or pass-through from super.leastRestrictive if relevant); then add unit tests covering: 1) UDT preservation—when all inputs are AbstractExprRelDataType with the same getUdt() the result must keep that UDT wrapper (use AbstractExprRelDataType test doubles), 2) mixed nullability—when any input is nullable and the first is non-nullable the returned type becomes nullable via createWithNullability(this, true), and 3) mixed UDT/non-UDT inputs—when inputs include non-UDT types the method must fall back to super.leastRestrictive; reference the method name leastRestrictive, the class OpenSearchTypeFactory, and AbstractExprRelDataType in tests and assertions.

coderabbitai · 2026-02-17T22:44:46Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java

+  /** Reproduce #5146: span expression used after multisearch should work. */
+  @Test
+  public void testMultisearchWithSpanExpression() throws IOException {
+    JSONObject result =
+        executeQuery(
+            "| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+                + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+                + " = \\\"E\\\"] | stats avg(value) by span(@timestamp, 5m)");
+
+    verifySchema(
+        result,
+        schema("avg(value)", null, "double"),
+        schema("span(@timestamp,5m)", null, "timestamp"));
+
+    // Each data point falls in its own 5-min bucket (all >5min apart), so avg = single value
+    // Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
+    verifyDataRows(
+        result,
+        // Category A (26 rows)
+        rows(8945.0, "2025-07-28 00:15:00"),
+        rows(6834.0, "2025-07-28 03:55:00"),
+        rows(6589.0, "2025-07-28 07:50:00"),
+        rows(9367.0, "2025-07-28 11:05:00"),
+        rows(9245.0, "2025-07-28 15:15:00"),
+        rows(8917.0, "2025-07-28 19:20:00"),
+        rows(8384.0, "2025-07-28 23:30:00"),
+        rows(8798.0, "2025-07-29 03:35:00"),
+        rows(9306.0, "2025-07-29 07:45:00"),
+        rows(8873.0, "2025-07-29 11:50:00"),
+        rows(8542.0, "2025-07-29 15:00:00"),
+        rows(9321.0, "2025-07-29 19:05:00"),
+        rows(8917.0, "2025-07-29 23:10:00"),
+        rows(8756.0, "2025-07-30 03:20:00"),
+        rows(9234.0, "2025-07-30 07:25:00"),
+        rows(8679.0, "2025-07-30 11:35:00"),
+        rows(8765.0, "2025-07-30 15:40:00"),
+        rows(9187.0, "2025-07-30 19:50:00"),
+        rows(8862.0, "2025-07-30 23:55:00"),
+        rows(8537.0, "2025-07-31 03:00:00"),
+        rows(9318.0, "2025-07-31 07:10:00"),
+        rows(8914.0, "2025-07-31 11:15:00"),
+        rows(8753.0, "2025-07-31 15:25:00"),
+        rows(9231.0, "2025-07-31 19:30:00"),
+        rows(8676.0, "2025-07-31 23:40:00"),
+        rows(8762.0, "2025-08-01 03:45:00"),
+        // Category E (10 rows)
+        rows(2001.0, "2025-08-01 04:00:00"),
+        rows(2003.0, "2025-08-01 01:00:00"),
+        rows(2005.0, "2025-07-31 20:45:00"),
+        rows(2007.0, "2025-07-31 16:00:00"),
+        rows(2009.0, "2025-07-31 12:30:00"),
+        rows(2011.0, "2025-07-31 08:00:00"),
+        rows(2013.0, "2025-07-31 04:30:00"),
+        rows(2015.0, "2025-07-31 01:00:00"),
+        rows(2017.0, "2025-07-30 21:30:00"),
+        rows(2019.0, "2025-07-30 18:00:00"));
+  }
+
+  /** Reproduce #5147: bin command after multisearch should produce non-null @timestamp. */
+  @Test
+  public void testMultisearchBinTimestamp() throws IOException {
+    JSONObject result =
+        executeQuery(
+            "| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+                + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+                + " = \\\"E\\\"] | fields @timestamp, category, value | bin @timestamp span=5m");
+
+    verifySchema(
+        result,
+        schema("category", null, "string"),
+        schema("value", null, "int"),
+        schema("@timestamp", null, "timestamp"));
+
+    // bin floors @timestamp to 5-min boundaries; projectPlusOverriding moves @timestamp to end
+    // Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
+    verifyDataRows(
+        result,
+        // Category A (26 rows)
+        rows("A", 8945, "2025-07-28 00:15:00"),
+        rows("A", 6834, "2025-07-28 03:55:00"),
+        rows("A", 6589, "2025-07-28 07:50:00"),
+        rows("A", 9367, "2025-07-28 11:05:00"),
+        rows("A", 9245, "2025-07-28 15:15:00"),
+        rows("A", 8917, "2025-07-28 19:20:00"),
+        rows("A", 8384, "2025-07-28 23:30:00"),
+        rows("A", 8798, "2025-07-29 03:35:00"),
+        rows("A", 9306, "2025-07-29 07:45:00"),
+        rows("A", 8873, "2025-07-29 11:50:00"),
+        rows("A", 8542, "2025-07-29 15:00:00"),
+        rows("A", 9321, "2025-07-29 19:05:00"),
+        rows("A", 8917, "2025-07-29 23:10:00"),
+        rows("A", 8756, "2025-07-30 03:20:00"),
+        rows("A", 9234, "2025-07-30 07:25:00"),
+        rows("A", 8679, "2025-07-30 11:35:00"),
+        rows("A", 8765, "2025-07-30 15:40:00"),
+        rows("A", 9187, "2025-07-30 19:50:00"),
+        rows("A", 8862, "2025-07-30 23:55:00"),
+        rows("A", 8537, "2025-07-31 03:00:00"),
+        rows("A", 9318, "2025-07-31 07:10:00"),
+        rows("A", 8914, "2025-07-31 11:15:00"),
+        rows("A", 8753, "2025-07-31 15:25:00"),
+        rows("A", 9231, "2025-07-31 19:30:00"),
+        rows("A", 8676, "2025-07-31 23:40:00"),
+        rows("A", 8762, "2025-08-01 03:45:00"),
+        // Category E (10 rows)
+        rows("E", 2001, "2025-08-01 04:00:00"),
+        rows("E", 2003, "2025-08-01 01:00:00"),
+        rows("E", 2005, "2025-07-31 20:45:00"),
+        rows("E", 2007, "2025-07-31 16:00:00"),
+        rows("E", 2009, "2025-07-31 12:30:00"),
+        rows("E", 2011, "2025-07-31 08:00:00"),
+        rows("E", 2013, "2025-07-31 04:30:00"),
+        rows("E", 2015, "2025-07-31 01:00:00"),
+        rows("E", 2017, "2025-07-30 21:30:00"),
+        rows("E", 2019, "2025-07-30 18:00:00"));
+  }
+
+  /** Reproduce #5147 full pattern: bin + stats after multisearch. */
+  @Test
+  public void testMultisearchBinAndStats() throws IOException {
+    JSONObject result =
+        executeQuery(
+            "| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+                + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+                + " = \\\"E\\\"] | bin @timestamp span=5m | stats avg(value) by @timestamp");
+
+    verifySchema(
+        result, schema("avg(value)", null, "double"), schema("@timestamp", null, "timestamp"));
+
+    // Each data point falls in its own 5-min bucket (all >5min apart), so avg = single value
+    // Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
+    verifyDataRows(
+        result,
+        // Category A (26 rows)
+        rows(8945.0, "2025-07-28 00:15:00"),
+        rows(6834.0, "2025-07-28 03:55:00"),
+        rows(6589.0, "2025-07-28 07:50:00"),
+        rows(9367.0, "2025-07-28 11:05:00"),
+        rows(9245.0, "2025-07-28 15:15:00"),
+        rows(8917.0, "2025-07-28 19:20:00"),
+        rows(8384.0, "2025-07-28 23:30:00"),
+        rows(8798.0, "2025-07-29 03:35:00"),
+        rows(9306.0, "2025-07-29 07:45:00"),
+        rows(8873.0, "2025-07-29 11:50:00"),
+        rows(8542.0, "2025-07-29 15:00:00"),
+        rows(9321.0, "2025-07-29 19:05:00"),
+        rows(8917.0, "2025-07-29 23:10:00"),
+        rows(8756.0, "2025-07-30 03:20:00"),
+        rows(9234.0, "2025-07-30 07:25:00"),
+        rows(8679.0, "2025-07-30 11:35:00"),
+        rows(8765.0, "2025-07-30 15:40:00"),
+        rows(9187.0, "2025-07-30 19:50:00"),
+        rows(8862.0, "2025-07-30 23:55:00"),
+        rows(8537.0, "2025-07-31 03:00:00"),
+        rows(9318.0, "2025-07-31 07:10:00"),
+        rows(8914.0, "2025-07-31 11:15:00"),
+        rows(8753.0, "2025-07-31 15:25:00"),
+        rows(9231.0, "2025-07-31 19:30:00"),
+        rows(8676.0, "2025-07-31 23:40:00"),
+        rows(8762.0, "2025-08-01 03:45:00"),
+        // Category E (10 rows)
+        rows(2001.0, "2025-08-01 04:00:00"),
+        rows(2003.0, "2025-08-01 01:00:00"),
+        rows(2005.0, "2025-07-31 20:45:00"),
+        rows(2007.0, "2025-07-31 16:00:00"),
+        rows(2009.0, "2025-07-31 12:30:00"),
+        rows(2011.0, "2025-07-31 08:00:00"),
+        rows(2013.0, "2025-07-31 04:30:00"),
+        rows(2015.0, "2025-07-31 01:00:00"),
+        rows(2017.0, "2025-07-30 21:30:00"),
+        rows(2019.0, "2025-07-30 18:00:00"));
+  }


⚠️ Potential issue | 🟡 Minor

Add deterministic ordering for the new multisearch reproduction tests.
These tests assert long ordered row lists, but the queries don’t include sort. Result order can vary, which risks flaky tests and breaks test‑independence requirements. Add explicit sorting (and update expected rows accordingly) or use order‑agnostic assertions.

🔧 Example (apply similarly to the other new tests)

- + " = \"E\"] | stats avg(value) by span(`@timestamp`, 5m)"); + + " = \"E\"] | stats avg(value) by span(`@timestamp`, 5m) | sort span(`@timestamp`, 5m)");

As per coding guidelines: "Tests must not rely on execution order; ensure test independence".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java` around lines 371 - 542, The three tests testMultisearchWithSpanExpression, testMultisearchBinTimestamp, and testMultisearchBinAndStats rely on implicit result ordering and must be made deterministic: modify the query passed to executeQuery to include an explicit sort (e.g., add "| sort `@timestamp` asc" or "| sort <field> asc/desc" after the final pipeline stage) so results are consistently ordered, then update the expected verifyDataRows rows(...) sequences to match the new sorted order (or alternatively change verifyDataRows calls to use an order-agnostic assertion helper); update only those tests and keep the rest of the assertions intact.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

coderabbitai

🧹 Nitpick comments (2)

core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java (2)

111-131: Consider adding boundary condition tests for empty list and single UDT.

The delegation tests cover single plain type and two plain types, but the boundary cases for empty input and single UDT are not explicitly tested.

♻️ Suggested additional boundary tests

`@Test`
public void testLeastRestrictiveDelegatesToSuperForEmptyList() {
  RelDataType result = TYPE_FACTORY.leastRestrictive(List.of());
  
  // Empty list delegates to super.leastRestrictive which returns null
  assertNull(result);
}

`@Test`
public void testLeastRestrictiveDelegatesToSuperForSingleUdt() {
  RelDataType single = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
  
  RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(single));
  
  // Single type (even UDT) delegates to super due to size check
  assertNotNull(result);
  assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
}

As per coding guidelines: "Include boundary condition tests (min/max values, empty inputs) for all new functions".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`
around lines 111 - 131, Add two boundary unit tests in OpenSearchTypeFactoryTest
to cover empty input and single UDT cases for TYPE_FACTORY.leastRestrictive: add
a test that calls TYPE_FACTORY.leastRestrictive(List.of()) and asserts the
result is null (delegates to super for empty list), and add a test that creates
a single UDT (e.g., TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP)), calls
leastRestrictive with that single UDT and asserts the returned type is not null
and has the expected SqlTypeName (e.g., VARCHAR) to ensure single-UDT delegation
behavior is validated.

85-109: Strengthen assertions in fallback tests to avoid silent passes.

The conditional assertions on lines 93-95 and 106-108 will silently pass if result is null, potentially masking regressions. Consider using explicit assertions for the expected behavior.

♻️ Proposed fix to strengthen assertions

  `@Test`
  public void testLeastRestrictiveFallsBackForMixedUdtAndNonUdt() {
    RelDataType udt = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
    RelDataType plain = TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR);

    RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(udt, plain));

-   // Falls back to super.leastRestrictive which may return a plain type or null
-   if (result != null) {
-     assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
-   }
+   // Falls back to super.leastRestrictive — EXPR_TIMESTAMP has VARCHAR underlying type
+   assertNotNull(result, "Expected non-null result for compatible types");
+   assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
  }

  `@Test`
  public void testLeastRestrictiveFallsBackForDifferentUdts() {
    RelDataType timestamp = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
    RelDataType date = TYPE_FACTORY.createUDT(ExprUDT.EXPR_DATE);

    RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(timestamp, date));

-   // Different UDTs — falls back to super.leastRestrictive
-   if (result != null) {
-     assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
-   }
+   // Different UDTs — falls back to super.leastRestrictive (both have VARCHAR underlying)
+   assertNotNull(result, "Expected non-null result for VARCHAR-based UDTs");
+   assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`
around lines 85 - 109, Replace the conditional "if (result != null) { ... }"
checks in testLeastRestrictiveFallsBackForMixedUdtAndNonUdt and
testLeastRestrictiveFallsBackForDifferentUdts with explicit assertions:
assertNotNull(result) immediately after calling
TYPE_FACTORY.leastRestrictive(...) and then assertEquals(SqlTypeName.VARCHAR,
result.getSqlTypeName()); this ensures the RelDataType result is non-null and
its SqlTypeName is verified rather than silently passing when result is null.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`:
- Around line 111-131: Add two boundary unit tests in OpenSearchTypeFactoryTest
to cover empty input and single UDT cases for TYPE_FACTORY.leastRestrictive: add
a test that calls TYPE_FACTORY.leastRestrictive(List.of()) and asserts the
result is null (delegates to super for empty list), and add a test that creates
a single UDT (e.g., TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP)), calls
leastRestrictive with that single UDT and asserts the returned type is not null
and has the expected SqlTypeName (e.g., VARCHAR) to ensure single-UDT delegation
behavior is validated.
- Around line 85-109: Replace the conditional "if (result != null) { ... }"
checks in testLeastRestrictiveFallsBackForMixedUdtAndNonUdt and
testLeastRestrictiveFallsBackForDifferentUdts with explicit assertions:
assertNotNull(result) immediately after calling
TYPE_FACTORY.leastRestrictive(...) and then assertEquals(SqlTypeName.VARCHAR,
result.getSqlTypeName()); this ensures the RelDataType result is non-null and
its SqlTypeName is verified rather than silently passing when result is null.

opensearch-trigger-bot · 2026-03-04T15:25:24Z

This PR is stalled because it has been open for 2 weeks with no activity.

Swiddis

How are we confident that this is always correct outside of the multisearch special case?

My understanding is leastRestrictive is assumed to convert types to the plain SQL types, so keeping around UDTs in some circumstances might quietly break other code's assumptions. But maybe we only hit this branch with multiple types if we're in an expression that needs it, or maybe we don't actually have such an assumption from any caller code.

Swiddis · 2026-03-04T20:11:32Z

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java

+        if (allSameUdt) {
+          boolean anyNullable = types.stream().anyMatch(RelDataType::isNullable);
+          if (anyNullable && !first.isNullable()) {
+            return firstUdt.createWithNullability(this, true);


nit: Could do this in a single pass with a plain for loop instead of splitting the match checks and null checks.

@Override public @Nullable RelDataType leastRestrictive(List<RelDataType> types) { if (types.size() > 1) { RelDataType first = types.get(0); if (first instanceof AbstractExprRelDataType<?> firstUdt) { boolean anyNullable = false; for (RelDataType t : types) { if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { anyNullable |= t.isNullable(); } else { return super.leastRestrictive(types); // If we hit a non-match, fallback } } if (anyNullable && !first.isNullable()) { return firstUdt.createWithNullability(this, true); } return first; } } return super.leastRestrictive(types); }

Swiddis · 2026-03-04T20:16:50Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java

+        rows("E", 2013, "2025-07-31 04:30:00"),
+        rows("E", 2015, "2025-07-31 01:00:00"),
+        rows("E", 2017, "2025-07-30 21:30:00"),
+        rows("E", 2019, "2025-07-30 18:00:00"));


these test assertions are pretty long, is that the minimal reproducing example? would 1-day intervals be clearer?

ahkcs · 2026-03-11T23:32:50Z

How are we confident that this is always correct outside of the multisearch special case?

My understanding is leastRestrictive is assumed to convert types to the plain SQL types, so keeping around UDTs in some circumstances might quietly break other code's assumptions. But maybe we only hit this branch with multiple types if we're in an expression that needs it, or maybe we don't actually have such an assumption from any caller code.

The leastRestrictive override only activates when all input types are the same AbstractExprRelDataType with the same UDT — in that case, preserving the UDT is the semantically correct result (it's the least restrictive type that encompasses all inputs). When any type differs, it falls back to Calcite's default implementation. This is consistent with how other type systems handle UDTs in union operations.

github-actions · 2026-03-12T00:05:43Z

PR Reviewer Guide 🔍

(Review updated until commit `6a30e67`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Override leastRestrictive to preserve UDT types with unit tests Relevant files: core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java Sub-PR theme: Integration tests verifying UDT preservation through multisearch and append Relevant files: integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java
⚡ Recommended focus areas for review Single-Type Bypass The `leastRestrictive` override only applies when `types.size() > 1`. For a single-element list where the element is a UDT, the method delegates to `super.leastRestrictive(types)`, which may downgrade the UDT to its underlying SQL type. Consider handling the single-element case explicitly to preserve the UDT wrapper. if (types.size() > 1) { RelDataType first = types.get(0); if (first instanceof AbstractExprRelDataType<?> firstUdt) { boolean anyNullable = false; for (RelDataType t : types) { if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { anyNullable \|= t.isNullable(); } else { return super.leastRestrictive(types); } } if (anyNullable && !first.isNullable()) { return firstUdt.createWithNullability(this, true); } return first; } } return super.leastRestrictive(types); Nullability Logic When the first type is already nullable (`first.isNullable() == true`) and all subsequent types are non-nullable, `anyNullable` will be `true` but the condition `anyNullable && !first.isNullable()` is false, so `first` is returned directly. This is correct. However, if the first type is nullable and a later type is also nullable, the same path returns `first` without creating a new instance — this is fine since `first` is already nullable. The logic is correct but subtle; a comment clarifying the two branches would improve readability. boolean anyNullable = false; for (RelDataType t : types) { if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { anyNullable \|= t.isNullable(); } else { return super.leastRestrictive(types); } } if (anyNullable && !first.isNullable()) { return firstUdt.createWithNullability(this, true); } return first; Hardcoded Row Counts Tests like `testMultisearchWithoutFurtherProcessing` and `testMultisearchBinTimestamp` assert exact row counts (51 and 36) that depend on the specific test data in the index. If test data changes, these assertions will silently break. Consider verifying row counts via `verifyDataRows` or documenting the data dependency more explicitly. assertEquals(51, result.getInt("total")); } /** Reproduce #5146: span expression used after multisearch should work. / @Test public void testMultisearchWithSpanExpression() throws IOException { JSONObject result = executeQuery( "\| multisearch [search source=opensearch-sql_test_index_time_data \| where category =" + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 \| where category" + " = \\\"E\\\"] \| stats count() by span(@timestamp, 1d)"); verifySchema( result, schema("count()", null, "bigint"), schema("span(@timestamp,1d)", null, "timestamp")); // Category A: 26 rows spanning Jul 28 – Aug 1; Category E: 10 rows spanning Jul 30 – Aug 1 verifyDataRows( result, rows(7L, "2025-07-28 00:00:00"), rows(6L, "2025-07-29 00:00:00"), rows(8L, "2025-07-30 00:00:00"), rows(12L, "2025-07-31 00:00:00"), rows(3L, "2025-08-01 00:00:00")); } /* Reproduce #5147: bin command after multisearch should produce non-null @timestamp. */ @Test public void testMultisearchBinTimestamp() throws IOException { JSONObject result = executeQuery( "\| multisearch [search source=opensearch-sql_test_index_time_data \| where category =" + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 \| where category" + " = \\\"E\\\"] \| fields @timestamp, category, value \| bin @timestamp span=1d"); verifySchema( result, schema("category", null, "string"), schema("value", null, "int"), schema("@timestamp", null, "timestamp")); // bin floors @timestamp to 1-day boundaries; 26 A-rows + 10 E-rows = 36 total assertEquals(36, result.getInt("total"));

Refactor leastRestrictive to use a single for-loop instead of two stream passes. Simplify multisearch integration tests by using 1-day intervals and count() aggregation to reduce verbose 36-row assertions to 5 rows. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-03-12T00:06:09Z

PR Code Suggestions ✨

Latest suggestions up to 6a30e67
Explore these optional code suggestions:

Category	Suggestion	Impact
General	Use equals instead of reference equality for UDT comparison The UDT identity comparison uses `==` (reference equality) instead of `.equals()`. If `getUdt()` returns enum values this is fine, but if it can return non-enum objects, two logically equal UDTs could be treated as different, causing an incorrect fallback to `super.leastRestrictive`. Using `.equals()` is safer and more robust. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [403] -if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { +if (t instanceof AbstractExprRelDataType<?> udt && firstUdt.getUdt().equals(udt.getUdt())) { Suggestion importance[1-10]: 5 __ Why: Using `==` for `getUdt()` comparison works correctly if `ExprUDT` is an enum (which is typical for such type factories), but using `.equals()` is more robust and defensive. This is a minor safety improvement that prevents potential bugs if the UDT type ever changes from an enum to a non-enum object.	Low
Possible issue	Fix loop to skip redundant self-comparison The loop iterates over all types including `first` itself, which means `anyNullable` will be set to `first.isNullable()` on the first iteration, potentially masking the actual nullability from other types. The loop should start from index 1 to only check the remaining types against `first`, making the logic cleaner and more correct. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [398-415] if (types.size() > 1) { RelDataType first = types.get(0); if (first instanceof AbstractExprRelDataType<?> firstUdt) { - boolean anyNullable = false; - for (RelDataType t : types) { + boolean anyNullable = first.isNullable(); + for (int i = 1; i < types.size(); i++) { + RelDataType t = types.get(i); if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { anyNullable \|= t.isNullable(); } else { return super.leastRestrictive(types); } } if (anyNullable && !first.isNullable()) { return firstUdt.createWithNullability(this, true); } return first; } } Suggestion importance[1-10]: 4 __ Why: The current loop includes `first` itself, but since `anyNullable` is initialized to `false` and `first` is an `AbstractExprRelDataType` with the same UDT as itself, the first iteration just sets `anyNullable \|= first.isNullable()`. The final check `if (anyNullable && !first.isNullable())` correctly handles this case — if `first` is nullable, `anyNullable` is true but the condition `!first.isNullable()` is false, so no new object is created. The logic is functionally correct as-is, but starting from index 1 is a minor clarity/efficiency improvement.	Low

Previous suggestions

Suggestions up to commit 533fbdf

Category	Suggestion	Impact
General	Avoid redundant self-check in loop iteration The loop iterates over all types including `first` itself, so `anyNullable` will be set to `first.isNullable()` on the first iteration. This means if `first` is nullable, the result will always be nullable even if all other types are non-nullable, which is correct, but the check `anyNullable && !first.isNullable()` will never be true when `first` is nullable. However, if `first` is non-nullable and a later type is nullable, `anyNullable` becomes true and the condition fires correctly. The real issue is that the loop redundantly re-checks `first` against itself. Consider starting the loop from index 1 to avoid the redundant self-check and make the logic clearer. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [398-414] if (types.size() > 1) { RelDataType first = types.get(0); if (first instanceof AbstractExprRelDataType<?> firstUdt) { - boolean anyNullable = false; - for (RelDataType t : types) { + boolean anyNullable = first.isNullable(); + for (int i = 1; i < types.size(); i++) { + RelDataType t = types.get(i); if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { anyNullable \|= t.isNullable(); } else { return super.leastRestrictive(types); } } if (anyNullable && !first.isNullable()) { return firstUdt.createWithNullability(this, true); } return first; } } Suggestion importance[1-10]: 4 __ Why: The suggestion is logically valid — starting the loop from index 1 avoids re-checking `first` against itself and makes the intent clearer. However, the current code is functionally correct (the self-check on `first` is harmless since `first.getUdt() == first.getUdt()` is always true), so this is a minor readability/clarity improvement rather than a bug fix.	Low
General	Use equals instead of reference equality for UDT comparison Using `==` for comparing `getUdt()` values relies on reference equality, which works for enums but may be fragile if `getUdt()` ever returns non-enum objects. Using `.equals()` would be safer and more idiomatic for general object comparison, ensuring correctness even if the UDT type changes in the future. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [403] -if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) { +if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt().equals(firstUdt.getUdt())) { Suggestion importance[1-10]: 3 __ Why: The suggestion to use `.equals()` over `==` is generally good practice, but since `getUdt()` returns an enum type (as evidenced by `ExprUDT` usage in tests), `==` is perfectly correct and idiomatic for enum comparison in Java. The improvement is marginal and mostly defensive for hypothetical future changes.	Low

github-actions · 2026-03-12T00:09:13Z

Persistent review updated to latest commit 6a30e67

multisearch fix

29fa3a0

Signed-off-by: Kai Huang <ahkcs@amazon.com>

fix IT

81c0d3f

Signed-off-by: Kai Huang <ahkcs@amazon.com>

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

add UT

bb026a5

Signed-off-by: Kai Huang <ahkcs@amazon.com>

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

VladNastase mentioned this pull request Mar 1, 2026

Add OpenSearch PPL support SigmaHQ/pySigma-backend-opensearch#72

Open

opensearch-trigger-bot bot added the stalled label Mar 4, 2026

Swiddis previously approved these changes Mar 4, 2026

View reviewed changes

Swiddis reviewed Mar 4, 2026

View reviewed changes

opensearch-trigger-bot bot removed the stalled label Mar 5, 2026

ahkcs added the bugFix label Mar 11, 2026

ahkcs dismissed Swiddis’s stale review via 533fbdf March 12, 2026 00:04

ahkcs force-pushed the fix/multisearch-issues-5145-5146-5147 branch from 533fbdf to 6a30e67 Compare March 12, 2026 00:07

Conversation

ahkcs commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Resolved

Root Cause

Fix

Test Plan

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

opensearch-trigger-bot bot commented Mar 4, 2026

Uh oh!

Swiddis left a comment

Choose a reason for hiding this comment

Uh oh!

Swiddis Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Swiddis Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 6a30e67)

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ahkcs commented Feb 17, 2026 •

edited

Loading

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

ahkcs Mar 12, 2026 •

edited

Loading

github-actions bot commented Mar 12, 2026 •

edited

Loading

(Review updated until commit `6a30e67`)

github-actions bot commented Mar 12, 2026 •

edited

Loading