Skip to content

Fix multisearch UDT type loss through UNION (#5145, #5146, #5147)#5154

Open
ahkcs wants to merge 4 commits intoopensearch-project:mainfrom
ahkcs:fix/multisearch-issues-5145-5146-5147
Open

Fix multisearch UDT type loss through UNION (#5145, #5146, #5147)#5154
ahkcs wants to merge 4 commits intoopensearch-project:mainfrom
ahkcs:fix/multisearch-issues-5145-5146-5147

Conversation

@ahkcs
Copy link
Collaborator

@ahkcs ahkcs commented Feb 17, 2026

Summary

  • Override leastRestrictive() in OpenSearchTypeFactory to preserve UDT types (e.g., ExprTimestampType) through set operations (UNION ALL) used by multisearch
  • Without this fix, Calcite's default leastRestrictive() downgrades UDT-wrapped types to their underlying SQL type (e.g., VARCHAR), breaking downstream operators that rely on type checks
  • Add integration tests verifying span, bin, and bin+stats work correctly after multisearch

Issues Resolved

Resolves #5145, resolves #5146, resolves #5147

See investigation details: #5153

Root Cause

OpenSearchTypeFactory (extends JavaTypeFactoryImpl) did not override leastRestrictive(). When Calcite builds UNION ALL for multisearch, it calls leastRestrictive() on column types from both branches. Both branches have ExprTimestampType (a UDT backed by SqlTypeName.VARCHAR), but the default implementation creates a plain BasicSqlType(VARCHAR), losing the UDT wrapper.

This caused:

Fix

When all input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT wrapper. Nullability is handled correctly — if any input is nullable, the result is nullable. Otherwise, delegates to the default Calcite implementation.

Test Plan

  • testMultisearchWithoutFurtherProcessing — basic multisearch returns all 51 rows
  • testMultisearchWithSpanExpressionstats avg(value) by span(@timestamp, 5m) after multisearch (36 rows verified)
  • testMultisearchBinTimestampbin @timestamp span=5m produces non-null timestamps (36 rows verified)
  • testMultisearchBinAndStatsbin + stats produces correct time buckets (36 rows verified)
  • Existing testMultisearchWithTimestampInterleaving schema updated from "string" to "timestamp" (correct behavior)
  • All existing multisearch integration tests pass

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Preserve custom/user-defined data types across set operations (UNION/INTERSECT/EXCEPT).
    • Corrected handling of timestamp-typed fields in multisearch and append scenarios.
  • Tests

    • Added integration tests for multisearch: timestamp binning, span expressions, projection/aggregation ordering, and stats validations.
    • Added unit tests validating least-restrictive behavior and nullability for user-defined types.

Walkthrough

Adds a leastRestrictive override to preserve OpenSearch UDT wrappers with proper nullability handling, updates PPL/multisearch integration tests (adds reproduction tests for issues #5145/#5146/#5147, with a duplicated test block), and adjusts expected timestamp schema in a PPL append test. Also adds unit tests for the new type behavior.

Changes

Cohort / File(s) Summary
OpenSearch TypeFactory
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java
Added public @nullable RelDataType leastRestrictive(List<RelDataType> types) and import org.checkerframework.checker.nullness.qual.Nullable; to preserve OpenSearch UDT wrappers across set operations with nullability handling.
TypeFactory Unit Tests
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java
New test class with multiple tests validating leastRestrictive behavior for UDT preservation, nullability propagation, mixed-type fallbacks, and delegation to superclass for plain types.
Multisearch Integration Tests (added/duplicated)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java
Changed expected @timestamp type from string to timestamp and added four reproduction tests for issues #5145, #5146, #5147: testMultisearchWithoutFurtherProcessing, testMultisearchWithSpanExpression, testMultisearchBinTimestamp, testMultisearchBinAndStats. The block of four tests is duplicated elsewhere in the same file (duplicate test declarations present).
PPL Append Test (schema expectation update)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java
Updated expected schema type for birthdate from string to timestamp in testAppendSchemaMergeWithTimestampUDT. No control-flow changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

bug, bugFix, calcite, PPL

Suggested reviewers

  • penghuo
  • anirudha
  • ps48
  • LantaoJin
  • kavithacm
  • derek-ho
  • joshuali925
  • mengweieric
  • Yury-Fridlyand
  • MaxKsyunz
  • GumpacG
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main change: fixing UDT type loss through UNION in multisearch, with specific issue references (#5145, #5146, #5147).
Description check ✅ Passed The description comprehensively covers the changeset: the override of leastRestrictive(), root cause analysis, fix behavior, and test plan with specific test cases.
Linked Issues check ✅ Passed The PR addresses all three linked issues: #5145 (multisearch without further processing), #5146 (span expressions after multisearch), and #5147 (bin command after multisearch) by preserving UDT types through UNION operations.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the UDT type loss issue: leastRestrictive() override in OpenSearchTypeFactory, comprehensive unit tests for the new method, integration tests for multisearch scenarios, and schema type corrections in existing tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`:
- Around line 381-409: Update the JavaDoc for the public override
leastRestrictive in OpenSearchTypeFactory to include `@param` types (describe that
it is the list of input RelDataType), `@return` (the least restrictive
RelDataType, preserving UDT wrapper when applicable) and `@throws` (document any
runtime exceptions or pass-through from super.leastRestrictive if relevant);
then add unit tests covering: 1) UDT preservation—when all inputs are
AbstractExprRelDataType with the same getUdt() the result must keep that UDT
wrapper (use AbstractExprRelDataType test doubles), 2) mixed nullability—when
any input is nullable and the first is non-nullable the returned type becomes
nullable via createWithNullability(this, true), and 3) mixed UDT/non-UDT
inputs—when inputs include non-UDT types the method must fall back to
super.leastRestrictive; reference the method name leastRestrictive, the class
OpenSearchTypeFactory, and AbstractExprRelDataType in tests and assertions.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java`:
- Around line 371-542: The three tests testMultisearchWithSpanExpression,
testMultisearchBinTimestamp, and testMultisearchBinAndStats rely on implicit
result ordering and must be made deterministic: modify the query passed to
executeQuery to include an explicit sort (e.g., add "| sort `@timestamp` asc" or
"| sort <field> asc/desc" after the final pipeline stage) so results are
consistently ordered, then update the expected verifyDataRows rows(...)
sequences to match the new sorted order (or alternatively change verifyDataRows
calls to use an order-agnostic assertion helper); update only those tests and
keep the rest of the assertions intact.

Comment on lines +381 to +409
/**
* Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all
* input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT
* wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is
* critical for operations like multisearch that use UNION ALL, where downstream operators (bin,
* span) rely on the UDT type to determine how to process the field.
*/
@Override
public @Nullable RelDataType leastRestrictive(List<RelDataType> types) {
if (types.size() > 1) {
RelDataType first = types.get(0);
if (first instanceof AbstractExprRelDataType<?> firstUdt) {
boolean allSameUdt =
types.stream()
.allMatch(
t ->
t instanceof AbstractExprRelDataType<?> udt
&& udt.getUdt() == firstUdt.getUdt());
if (allSameUdt) {
boolean anyNullable = types.stream().anyMatch(RelDataType::isNullable);
if (anyNullable && !first.isNullable()) {
return firstUdt.createWithNullability(this, true);
}
return first;
}
}
}
return super.leastRestrictive(types);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add required JavaDoc tags and unit tests for the new leastRestrictive override.
This new public method is missing @param/@return/@throws tags, and the commit doesn’t add unit tests for the new behavior (UDT preservation, mixed nullability, mixed UDT/non‑UDT inputs). Please add unit tests and update the JavaDoc accordingly.

✍️ Add the required JavaDoc tags
-  /**
-   * Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all
-   * input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT
-   * wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is
-   * critical for operations like multisearch that use UNION ALL, where downstream operators (bin,
-   * span) rely on the UDT type to determine how to process the field.
-   */
+  /**
+   * Preserves OpenSearch UDT types through set operations (UNION, INTERSECT, EXCEPT). When all
+   * input types share the same UDT (e.g., all are EXPR_TIMESTAMP), the result retains the UDT
+   * wrapper instead of being downgraded to the underlying SQL type (e.g., VARCHAR). This is
+   * critical for operations like multisearch that use UNION ALL, where downstream operators (bin,
+   * span) rely on the UDT type to determine how to process the field.
+   *
+   * `@param` types candidate types to reconcile
+   * `@return` least restrictive type that preserves UDT when possible
+   * `@throws` NullPointerException if {`@code` types} is null
+   */
As per coding guidelines: "New functions MUST have unit tests in the same commit" and "Public methods MUST have JavaDoc with `@param`, `@return`, and `@throws`".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`
around lines 381 - 409, Update the JavaDoc for the public override
leastRestrictive in OpenSearchTypeFactory to include `@param` types (describe that
it is the list of input RelDataType), `@return` (the least restrictive
RelDataType, preserving UDT wrapper when applicable) and `@throws` (document any
runtime exceptions or pass-through from super.leastRestrictive if relevant);
then add unit tests covering: 1) UDT preservation—when all inputs are
AbstractExprRelDataType with the same getUdt() the result must keep that UDT
wrapper (use AbstractExprRelDataType test doubles), 2) mixed nullability—when
any input is nullable and the first is non-nullable the returned type becomes
nullable via createWithNullability(this, true), and 3) mixed UDT/non-UDT
inputs—when inputs include non-UDT types the method must fall back to
super.leastRestrictive; reference the method name leastRestrictive, the class
OpenSearchTypeFactory, and AbstractExprRelDataType in tests and assertions.

Comment on lines +371 to +542
/** Reproduce #5146: span expression used after multisearch should work. */
@Test
public void testMultisearchWithSpanExpression() throws IOException {
JSONObject result =
executeQuery(
"| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+ " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+ " = \\\"E\\\"] | stats avg(value) by span(@timestamp, 5m)");

verifySchema(
result,
schema("avg(value)", null, "double"),
schema("span(@timestamp,5m)", null, "timestamp"));

// Each data point falls in its own 5-min bucket (all >5min apart), so avg = single value
// Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
verifyDataRows(
result,
// Category A (26 rows)
rows(8945.0, "2025-07-28 00:15:00"),
rows(6834.0, "2025-07-28 03:55:00"),
rows(6589.0, "2025-07-28 07:50:00"),
rows(9367.0, "2025-07-28 11:05:00"),
rows(9245.0, "2025-07-28 15:15:00"),
rows(8917.0, "2025-07-28 19:20:00"),
rows(8384.0, "2025-07-28 23:30:00"),
rows(8798.0, "2025-07-29 03:35:00"),
rows(9306.0, "2025-07-29 07:45:00"),
rows(8873.0, "2025-07-29 11:50:00"),
rows(8542.0, "2025-07-29 15:00:00"),
rows(9321.0, "2025-07-29 19:05:00"),
rows(8917.0, "2025-07-29 23:10:00"),
rows(8756.0, "2025-07-30 03:20:00"),
rows(9234.0, "2025-07-30 07:25:00"),
rows(8679.0, "2025-07-30 11:35:00"),
rows(8765.0, "2025-07-30 15:40:00"),
rows(9187.0, "2025-07-30 19:50:00"),
rows(8862.0, "2025-07-30 23:55:00"),
rows(8537.0, "2025-07-31 03:00:00"),
rows(9318.0, "2025-07-31 07:10:00"),
rows(8914.0, "2025-07-31 11:15:00"),
rows(8753.0, "2025-07-31 15:25:00"),
rows(9231.0, "2025-07-31 19:30:00"),
rows(8676.0, "2025-07-31 23:40:00"),
rows(8762.0, "2025-08-01 03:45:00"),
// Category E (10 rows)
rows(2001.0, "2025-08-01 04:00:00"),
rows(2003.0, "2025-08-01 01:00:00"),
rows(2005.0, "2025-07-31 20:45:00"),
rows(2007.0, "2025-07-31 16:00:00"),
rows(2009.0, "2025-07-31 12:30:00"),
rows(2011.0, "2025-07-31 08:00:00"),
rows(2013.0, "2025-07-31 04:30:00"),
rows(2015.0, "2025-07-31 01:00:00"),
rows(2017.0, "2025-07-30 21:30:00"),
rows(2019.0, "2025-07-30 18:00:00"));
}

/** Reproduce #5147: bin command after multisearch should produce non-null @timestamp. */
@Test
public void testMultisearchBinTimestamp() throws IOException {
JSONObject result =
executeQuery(
"| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+ " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+ " = \\\"E\\\"] | fields @timestamp, category, value | bin @timestamp span=5m");

verifySchema(
result,
schema("category", null, "string"),
schema("value", null, "int"),
schema("@timestamp", null, "timestamp"));

// bin floors @timestamp to 5-min boundaries; projectPlusOverriding moves @timestamp to end
// Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
verifyDataRows(
result,
// Category A (26 rows)
rows("A", 8945, "2025-07-28 00:15:00"),
rows("A", 6834, "2025-07-28 03:55:00"),
rows("A", 6589, "2025-07-28 07:50:00"),
rows("A", 9367, "2025-07-28 11:05:00"),
rows("A", 9245, "2025-07-28 15:15:00"),
rows("A", 8917, "2025-07-28 19:20:00"),
rows("A", 8384, "2025-07-28 23:30:00"),
rows("A", 8798, "2025-07-29 03:35:00"),
rows("A", 9306, "2025-07-29 07:45:00"),
rows("A", 8873, "2025-07-29 11:50:00"),
rows("A", 8542, "2025-07-29 15:00:00"),
rows("A", 9321, "2025-07-29 19:05:00"),
rows("A", 8917, "2025-07-29 23:10:00"),
rows("A", 8756, "2025-07-30 03:20:00"),
rows("A", 9234, "2025-07-30 07:25:00"),
rows("A", 8679, "2025-07-30 11:35:00"),
rows("A", 8765, "2025-07-30 15:40:00"),
rows("A", 9187, "2025-07-30 19:50:00"),
rows("A", 8862, "2025-07-30 23:55:00"),
rows("A", 8537, "2025-07-31 03:00:00"),
rows("A", 9318, "2025-07-31 07:10:00"),
rows("A", 8914, "2025-07-31 11:15:00"),
rows("A", 8753, "2025-07-31 15:25:00"),
rows("A", 9231, "2025-07-31 19:30:00"),
rows("A", 8676, "2025-07-31 23:40:00"),
rows("A", 8762, "2025-08-01 03:45:00"),
// Category E (10 rows)
rows("E", 2001, "2025-08-01 04:00:00"),
rows("E", 2003, "2025-08-01 01:00:00"),
rows("E", 2005, "2025-07-31 20:45:00"),
rows("E", 2007, "2025-07-31 16:00:00"),
rows("E", 2009, "2025-07-31 12:30:00"),
rows("E", 2011, "2025-07-31 08:00:00"),
rows("E", 2013, "2025-07-31 04:30:00"),
rows("E", 2015, "2025-07-31 01:00:00"),
rows("E", 2017, "2025-07-30 21:30:00"),
rows("E", 2019, "2025-07-30 18:00:00"));
}

/** Reproduce #5147 full pattern: bin + stats after multisearch. */
@Test
public void testMultisearchBinAndStats() throws IOException {
JSONObject result =
executeQuery(
"| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
+ " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
+ " = \\\"E\\\"] | bin @timestamp span=5m | stats avg(value) by @timestamp");

verifySchema(
result, schema("avg(value)", null, "double"), schema("@timestamp", null, "timestamp"));

// Each data point falls in its own 5-min bucket (all >5min apart), so avg = single value
// Category A: 26 rows from time_test_data, Category E: 10 rows from time_test_data2
verifyDataRows(
result,
// Category A (26 rows)
rows(8945.0, "2025-07-28 00:15:00"),
rows(6834.0, "2025-07-28 03:55:00"),
rows(6589.0, "2025-07-28 07:50:00"),
rows(9367.0, "2025-07-28 11:05:00"),
rows(9245.0, "2025-07-28 15:15:00"),
rows(8917.0, "2025-07-28 19:20:00"),
rows(8384.0, "2025-07-28 23:30:00"),
rows(8798.0, "2025-07-29 03:35:00"),
rows(9306.0, "2025-07-29 07:45:00"),
rows(8873.0, "2025-07-29 11:50:00"),
rows(8542.0, "2025-07-29 15:00:00"),
rows(9321.0, "2025-07-29 19:05:00"),
rows(8917.0, "2025-07-29 23:10:00"),
rows(8756.0, "2025-07-30 03:20:00"),
rows(9234.0, "2025-07-30 07:25:00"),
rows(8679.0, "2025-07-30 11:35:00"),
rows(8765.0, "2025-07-30 15:40:00"),
rows(9187.0, "2025-07-30 19:50:00"),
rows(8862.0, "2025-07-30 23:55:00"),
rows(8537.0, "2025-07-31 03:00:00"),
rows(9318.0, "2025-07-31 07:10:00"),
rows(8914.0, "2025-07-31 11:15:00"),
rows(8753.0, "2025-07-31 15:25:00"),
rows(9231.0, "2025-07-31 19:30:00"),
rows(8676.0, "2025-07-31 23:40:00"),
rows(8762.0, "2025-08-01 03:45:00"),
// Category E (10 rows)
rows(2001.0, "2025-08-01 04:00:00"),
rows(2003.0, "2025-08-01 01:00:00"),
rows(2005.0, "2025-07-31 20:45:00"),
rows(2007.0, "2025-07-31 16:00:00"),
rows(2009.0, "2025-07-31 12:30:00"),
rows(2011.0, "2025-07-31 08:00:00"),
rows(2013.0, "2025-07-31 04:30:00"),
rows(2015.0, "2025-07-31 01:00:00"),
rows(2017.0, "2025-07-30 21:30:00"),
rows(2019.0, "2025-07-30 18:00:00"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add deterministic ordering for the new multisearch reproduction tests.
These tests assert long ordered row lists, but the queries don’t include sort. Result order can vary, which risks flaky tests and breaks test‑independence requirements. Add explicit sorting (and update expected rows accordingly) or use order‑agnostic assertions.

🔧 Example (apply similarly to the other new tests)
-                + " = \"E\"] | stats avg(value) by span(`@timestamp`, 5m)");
+                + " = \"E\"] | stats avg(value) by span(`@timestamp`, 5m) | sort span(`@timestamp`, 5m)");
As per coding guidelines: "Tests must not rely on execution order; ensure test independence".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java`
around lines 371 - 542, The three tests testMultisearchWithSpanExpression,
testMultisearchBinTimestamp, and testMultisearchBinAndStats rely on implicit
result ordering and must be made deterministic: modify the query passed to
executeQuery to include an explicit sort (e.g., add "| sort `@timestamp` asc" or
"| sort <field> asc/desc" after the final pipeline stage) so results are
consistently ordered, then update the expected verifyDataRows rows(...)
sequences to match the new sorted order (or alternatively change verifyDataRows
calls to use an order-agnostic assertion helper); update only those tests and
keep the rest of the assertions intact.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java (2)

111-131: Consider adding boundary condition tests for empty list and single UDT.

The delegation tests cover single plain type and two plain types, but the boundary cases for empty input and single UDT are not explicitly tested.

♻️ Suggested additional boundary tests
`@Test`
public void testLeastRestrictiveDelegatesToSuperForEmptyList() {
  RelDataType result = TYPE_FACTORY.leastRestrictive(List.of());
  
  // Empty list delegates to super.leastRestrictive which returns null
  assertNull(result);
}

`@Test`
public void testLeastRestrictiveDelegatesToSuperForSingleUdt() {
  RelDataType single = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
  
  RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(single));
  
  // Single type (even UDT) delegates to super due to size check
  assertNotNull(result);
  assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
}

As per coding guidelines: "Include boundary condition tests (min/max values, empty inputs) for all new functions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`
around lines 111 - 131, Add two boundary unit tests in OpenSearchTypeFactoryTest
to cover empty input and single UDT cases for TYPE_FACTORY.leastRestrictive: add
a test that calls TYPE_FACTORY.leastRestrictive(List.of()) and asserts the
result is null (delegates to super for empty list), and add a test that creates
a single UDT (e.g., TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP)), calls
leastRestrictive with that single UDT and asserts the returned type is not null
and has the expected SqlTypeName (e.g., VARCHAR) to ensure single-UDT delegation
behavior is validated.

85-109: Strengthen assertions in fallback tests to avoid silent passes.

The conditional assertions on lines 93-95 and 106-108 will silently pass if result is null, potentially masking regressions. Consider using explicit assertions for the expected behavior.

♻️ Proposed fix to strengthen assertions
  `@Test`
  public void testLeastRestrictiveFallsBackForMixedUdtAndNonUdt() {
    RelDataType udt = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
    RelDataType plain = TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR);

    RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(udt, plain));

-   // Falls back to super.leastRestrictive which may return a plain type or null
-   if (result != null) {
-     assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
-   }
+   // Falls back to super.leastRestrictive — EXPR_TIMESTAMP has VARCHAR underlying type
+   assertNotNull(result, "Expected non-null result for compatible types");
+   assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
  }

  `@Test`
  public void testLeastRestrictiveFallsBackForDifferentUdts() {
    RelDataType timestamp = TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP);
    RelDataType date = TYPE_FACTORY.createUDT(ExprUDT.EXPR_DATE);

    RelDataType result = TYPE_FACTORY.leastRestrictive(List.of(timestamp, date));

-   // Different UDTs — falls back to super.leastRestrictive
-   if (result != null) {
-     assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
-   }
+   // Different UDTs — falls back to super.leastRestrictive (both have VARCHAR underlying)
+   assertNotNull(result, "Expected non-null result for VARCHAR-based UDTs");
+   assertEquals(SqlTypeName.VARCHAR, result.getSqlTypeName());
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`
around lines 85 - 109, Replace the conditional "if (result != null) { ... }"
checks in testLeastRestrictiveFallsBackForMixedUdtAndNonUdt and
testLeastRestrictiveFallsBackForDifferentUdts with explicit assertions:
assertNotNull(result) immediately after calling
TYPE_FACTORY.leastRestrictive(...) and then assertEquals(SqlTypeName.VARCHAR,
result.getSqlTypeName()); this ensures the RelDataType result is non-null and
its SqlTypeName is verified rather than silently passing when result is null.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`:
- Around line 111-131: Add two boundary unit tests in OpenSearchTypeFactoryTest
to cover empty input and single UDT cases for TYPE_FACTORY.leastRestrictive: add
a test that calls TYPE_FACTORY.leastRestrictive(List.of()) and asserts the
result is null (delegates to super for empty list), and add a test that creates
a single UDT (e.g., TYPE_FACTORY.createUDT(ExprUDT.EXPR_TIMESTAMP)), calls
leastRestrictive with that single UDT and asserts the returned type is not null
and has the expected SqlTypeName (e.g., VARCHAR) to ensure single-UDT delegation
behavior is validated.
- Around line 85-109: Replace the conditional "if (result != null) { ... }"
checks in testLeastRestrictiveFallsBackForMixedUdtAndNonUdt and
testLeastRestrictiveFallsBackForDifferentUdts with explicit assertions:
assertNotNull(result) immediately after calling
TYPE_FACTORY.leastRestrictive(...) and then assertEquals(SqlTypeName.VARCHAR,
result.getSqlTypeName()); this ensures the RelDataType result is non-null and
its SqlTypeName is verified rather than silently passing when result is null.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

Swiddis
Swiddis previously approved these changes Mar 4, 2026
Copy link
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we confident that this is always correct outside of the multisearch special case?

My understanding is leastRestrictive is assumed to convert types to the plain SQL types, so keeping around UDTs in some circumstances might quietly break other code's assumptions. But maybe we only hit this branch with multiple types if we're in an expression that needs it, or maybe we don't actually have such an assumption from any caller code.

if (allSameUdt) {
boolean anyNullable = types.stream().anyMatch(RelDataType::isNullable);
if (anyNullable && !first.isNullable()) {
return firstUdt.createWithNullability(this, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could do this in a single pass with a plain for loop instead of splitting the match checks and null checks.

@Override
public @Nullable RelDataType leastRestrictive(List<RelDataType> types) {
  if (types.size() > 1) {
    RelDataType first = types.get(0);
    if (first instanceof AbstractExprRelDataType<?> firstUdt) {
      boolean anyNullable = false;
      for (RelDataType t : types) {
        if (t instanceof AbstractExprRelDataType<?> udt
            && udt.getUdt() == firstUdt.getUdt()) {
          anyNullable |= t.isNullable();
        } else {
          return super.leastRestrictive(types); // If we hit a non-match, fallback
        }
      }
      if (anyNullable && !first.isNullable()) {
        return firstUdt.createWithNullability(this, true);
      }
      return first;
    }
  }
  return super.leastRestrictive(types);
}

Copy link
Collaborator Author

@ahkcs ahkcs Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

rows("E", 2013, "2025-07-31 04:30:00"),
rows("E", 2015, "2025-07-31 01:00:00"),
rows("E", 2017, "2025-07-30 21:30:00"),
rows("E", 2019, "2025-07-30 18:00:00"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these test assertions are pretty long, is that the minimal reproducing example? would 1-day intervals be clearer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@ahkcs
Copy link
Collaborator Author

ahkcs commented Mar 11, 2026

How are we confident that this is always correct outside of the multisearch special case?

My understanding is leastRestrictive is assumed to convert types to the plain SQL types, so keeping around UDTs in some circumstances might quietly break other code's assumptions. But maybe we only hit this branch with multiple types if we're in an expression that needs it, or maybe we don't actually have such an assumption from any caller code.

The leastRestrictive override only activates when all input types are the same AbstractExprRelDataType with the same UDT — in that case, preserving the UDT is the semantically correct result (it's the least restrictive type that encompasses all inputs). When any type differs, it falls back to Calcite's default implementation. This is consistent with how other type systems handle UDTs in union operations.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Reviewer Guide 🔍

(Review updated until commit 6a30e67)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Override leastRestrictive to preserve UDT types with unit tests

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java
  • core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java

Sub-PR theme: Integration tests verifying UDT preservation through multisearch and append

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java

⚡ Recommended focus areas for review

Single-Type Bypass

The leastRestrictive override only applies when types.size() > 1. For a single-element list where the element is a UDT, the method delegates to super.leastRestrictive(types), which may downgrade the UDT to its underlying SQL type. Consider handling the single-element case explicitly to preserve the UDT wrapper.

if (types.size() > 1) {
  RelDataType first = types.get(0);
  if (first instanceof AbstractExprRelDataType<?> firstUdt) {
    boolean anyNullable = false;
    for (RelDataType t : types) {
      if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
        anyNullable |= t.isNullable();
      } else {
        return super.leastRestrictive(types);
      }
    }
    if (anyNullable && !first.isNullable()) {
      return firstUdt.createWithNullability(this, true);
    }
    return first;
  }
}
return super.leastRestrictive(types);
Nullability Logic

When the first type is already nullable (first.isNullable() == true) and all subsequent types are non-nullable, anyNullable will be true but the condition anyNullable && !first.isNullable() is false, so first is returned directly. This is correct. However, if the first type is nullable and a later type is also nullable, the same path returns first without creating a new instance — this is fine since first is already nullable. The logic is correct but subtle; a comment clarifying the two branches would improve readability.

boolean anyNullable = false;
for (RelDataType t : types) {
  if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
    anyNullable |= t.isNullable();
  } else {
    return super.leastRestrictive(types);
  }
}
if (anyNullable && !first.isNullable()) {
  return firstUdt.createWithNullability(this, true);
}
return first;
Hardcoded Row Counts

Tests like testMultisearchWithoutFurtherProcessing and testMultisearchBinTimestamp assert exact row counts (51 and 36) that depend on the specific test data in the index. If test data changes, these assertions will silently break. Consider verifying row counts via verifyDataRows or documenting the data dependency more explicitly.

  assertEquals(51, result.getInt("total"));
}

/** Reproduce #5146: span expression used after multisearch should work. */
@Test
public void testMultisearchWithSpanExpression() throws IOException {
  JSONObject result =
      executeQuery(
          "| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
              + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
              + " = \\\"E\\\"] | stats count() by span(@timestamp, 1d)");

  verifySchema(
      result,
      schema("count()", null, "bigint"),
      schema("span(@timestamp,1d)", null, "timestamp"));

  // Category A: 26 rows spanning Jul 28 – Aug 1; Category E: 10 rows spanning Jul 30 – Aug 1
  verifyDataRows(
      result,
      rows(7L, "2025-07-28 00:00:00"),
      rows(6L, "2025-07-29 00:00:00"),
      rows(8L, "2025-07-30 00:00:00"),
      rows(12L, "2025-07-31 00:00:00"),
      rows(3L, "2025-08-01 00:00:00"));
}

/** Reproduce #5147: bin command after multisearch should produce non-null @timestamp. */
@Test
public void testMultisearchBinTimestamp() throws IOException {
  JSONObject result =
      executeQuery(
          "| multisearch [search source=opensearch-sql_test_index_time_data | where category ="
              + " \\\"A\\\"] [search source=opensearch-sql_test_index_time_data2 | where category"
              + " = \\\"E\\\"] | fields @timestamp, category, value | bin @timestamp span=1d");

  verifySchema(
      result,
      schema("category", null, "string"),
      schema("value", null, "int"),
      schema("@timestamp", null, "timestamp"));

  // bin floors @timestamp to 1-day boundaries; 26 A-rows + 10 E-rows = 36 total
  assertEquals(36, result.getInt("total"));

Refactor leastRestrictive to use a single for-loop instead of two stream
passes. Simplify multisearch integration tests by using 1-day intervals
and count() aggregation to reduce verbose 36-row assertions to 5 rows.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Code Suggestions ✨

Latest suggestions up to 6a30e67
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use equals instead of reference equality for UDT comparison

The UDT identity comparison uses == (reference equality) instead of .equals(). If
getUdt() returns enum values this is fine, but if it can return non-enum objects,
two logically equal UDTs could be treated as different, causing an incorrect
fallback to super.leastRestrictive. Using .equals() is safer and more robust.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [403]

-if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
+if (t instanceof AbstractExprRelDataType<?> udt && firstUdt.getUdt().equals(udt.getUdt())) {
Suggestion importance[1-10]: 5

__

Why: Using == for getUdt() comparison works correctly if ExprUDT is an enum (which is typical for such type factories), but using .equals() is more robust and defensive. This is a minor safety improvement that prevents potential bugs if the UDT type ever changes from an enum to a non-enum object.

Low
Possible issue
Fix loop to skip redundant self-comparison

The loop iterates over all types including first itself, which means anyNullable
will be set to first.isNullable() on the first iteration, potentially masking the
actual nullability from other types. The loop should start from index 1 to only
check the remaining types against first, making the logic cleaner and more correct.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [398-415]

 if (types.size() > 1) {
   RelDataType first = types.get(0);
   if (first instanceof AbstractExprRelDataType<?> firstUdt) {
-    boolean anyNullable = false;
-    for (RelDataType t : types) {
+    boolean anyNullable = first.isNullable();
+    for (int i = 1; i < types.size(); i++) {
+      RelDataType t = types.get(i);
       if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
         anyNullable |= t.isNullable();
       } else {
         return super.leastRestrictive(types);
       }
     }
     if (anyNullable && !first.isNullable()) {
       return firstUdt.createWithNullability(this, true);
     }
     return first;
   }
 }
Suggestion importance[1-10]: 4

__

Why: The current loop includes first itself, but since anyNullable is initialized to false and first is an AbstractExprRelDataType with the same UDT as itself, the first iteration just sets anyNullable |= first.isNullable(). The final check if (anyNullable && !first.isNullable()) correctly handles this case — if first is nullable, anyNullable is true but the condition !first.isNullable() is false, so no new object is created. The logic is functionally correct as-is, but starting from index 1 is a minor clarity/efficiency improvement.

Low

Previous suggestions

Suggestions up to commit 533fbdf
CategorySuggestion                                                                                                                                    Impact
General
Avoid redundant self-check in loop iteration

The loop iterates over all types including first itself, so anyNullable will be set
to first.isNullable() on the first iteration. This means if first is nullable, the
result will always be nullable even if all other types are non-nullable, which is
correct, but the check anyNullable && !first.isNullable() will never be true when
first is nullable. However, if first is non-nullable and a later type is nullable,
anyNullable becomes true and the condition fires correctly. The real issue is that
the loop redundantly re-checks first against itself. Consider starting the loop from
index 1 to avoid the redundant self-check and make the logic clearer.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [398-414]

 if (types.size() > 1) {
   RelDataType first = types.get(0);
   if (first instanceof AbstractExprRelDataType<?> firstUdt) {
-    boolean anyNullable = false;
-    for (RelDataType t : types) {
+    boolean anyNullable = first.isNullable();
+    for (int i = 1; i < types.size(); i++) {
+      RelDataType t = types.get(i);
       if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
         anyNullable |= t.isNullable();
       } else {
         return super.leastRestrictive(types);
       }
     }
     if (anyNullable && !first.isNullable()) {
       return firstUdt.createWithNullability(this, true);
     }
     return first;
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion is logically valid — starting the loop from index 1 avoids re-checking first against itself and makes the intent clearer. However, the current code is functionally correct (the self-check on first is harmless since first.getUdt() == first.getUdt() is always true), so this is a minor readability/clarity improvement rather than a bug fix.

Low
Use equals instead of reference equality for UDT comparison

Using == for comparing getUdt() values relies on reference equality, which works for
enums but may be fragile if getUdt() ever returns non-enum objects. Using .equals()
would be safer and more idiomatic for general object comparison, ensuring
correctness even if the UDT type changes in the future.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [403]

-if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt() == firstUdt.getUdt()) {
+if (t instanceof AbstractExprRelDataType<?> udt && udt.getUdt().equals(firstUdt.getUdt())) {
Suggestion importance[1-10]: 3

__

Why: The suggestion to use .equals() over == is generally good practice, but since getUdt() returns an enum type (as evidenced by ExprUDT usage in tests), == is perfectly correct and idiomatic for enum comparison in Java. The improvement is marginal and mostly defensive for hypothetical future changes.

Low

@ahkcs ahkcs force-pushed the fix/multisearch-issues-5145-5146-5147 branch from 533fbdf to 6a30e67 Compare March 12, 2026 00:07
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 6a30e67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

2 participants