Skip to content

Use Calcite's validation system for type checking & coercion#4892

Open
yuancu wants to merge 112 commits intoopensearch-project:mainfrom
yuancu:issues/4636
Open

Use Calcite's validation system for type checking & coercion#4892
yuancu wants to merge 112 commits intoopensearch-project:mainfrom
yuancu:issues/4636

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Dec 2, 2025

Description

Please review the sub-PRs instead: #4892 (comment)

This PR migrates from our custom type checking mechanism to Apache Calcite's native validation system by introducing a SqlNode validation layer. This addresses the lack of a proper SQL validation phase and resolves compatibility issues with user-defined types (UDTs).

This implements Approach 1 described in #3865.

How It Works

The PR introduces a validation layer in the query execution pipeline:

RelNode → SqlNode → [Validation & Type Coercion] → SqlNode → RelNode

This approach leverages Calcite's robust type system while preserving OpenSearch's custom type semantics through careful type mapping and restoration.

Benefits

  • Comprehensive type checking: Access to Calcite's full type checker suite (variable-length operands, same-operand constraints, OR-relationship type checkers) vs. our limited family-based checking
  • Function optimization: Ability to rewrite functions at the SqlNode level (e.g., sqrt(x)pow(x, 0.5)) before generating physical plans
  • Automatic type coercion: Implicit casting where type-safe and appropriate, improving query compatibility
  • Reduced maintenance: Leverage Calcite's battle-tested validation logic instead of maintaining custom implementations

Work Items / Implementation Details

Core Infrastructure

  • Introduced SqlNode validation layer with round-trip conversion (RelNode ↔ SqlNode)
  • Extended Calcite validator to support OpenSearch UDTs via type substitution
  • Implemented custom deriveType, coerceOperandType, and type inference methods
  • Made validator thread-safe with proper synchronization

Type System Enhancements

  • UDT Support: To support OpenSearch-specific UDTs (which Calcite doesn't natively support), we bridge Calcite's SqlTypeName enum with OpenSearch UDTs through dynamic type mapping:
    1. Convert UDTs to standard SQL types (e.g., EXPR_TIMESqlTypeName.TIME) before validation
    2. Apply Calcite's validation and type coercion
    3. Restore UDTs after validation to maintain plan correctness
  • Cross-type Comparisons: Override commonTypeForBinaryComparison to enable datetime cross-type comparisons (DATE vs TIME, etc.)
  • IP Type Handling: Emulated IP type using standard SQL types for validation, with special handling in operators
  • Composite Types: Extended SQL-UDT conversion to handle nested/composite types
  • Type Coercion: Use SAFE_CAST for string-to-number conversions to tolerate malformatted data; use CAST for literal numbers

Function & Operator Handling

  • Defined operand type checkers for all built-in functions:
    • Array functions (array_slice, reduce, mvappend, etc.) with proper type inference
    • JSON functions (json_extract, json_set, etc.)
    • Mathematical functions (corrected atan overloading, percentile approximations)
    • Aggregation functions (DISTINCT_COUNT_APPROX, COUNT(*) rewriting, etc.)
  • Function overloading support:
    • ADD operator: String concatenation vs numeric addition
    • ATAN: Single-operand vs two-operand versions
    • GEOIP: String overrides due to UDT erasure
  • Arithmetic operations between strings and numerics (implicit coercion)
  • Define SqlKind for DIVIDE and MOD UDFs

Query Construct Support

  • Preserve null ordering and collation through SqlNode round-trips
  • Support SEMI and ANTI joins in SQL conversion
  • Preserve sort orders in subqueries
  • Pass RelHint through conversions (added SqlHint for LogicalAggregate)
  • Handle windowed aggregates and bucket_nullable flags
  • Support IN/NOT IN with tuple inputs via row constructor rewriting

Dialect & Compatibility

  • Extended OpenSearch Spark SQL dialect for custom syntax
  • Fixed interval semantics mismatch between SQL and PPL
  • Fixed quarter interval bug in Calcite
  • Handle identifier expansion in aggregate functions
  • Properly handle LogicalValues for empty row generation

Edge Cases & Fixes

  • Skip validation for unsupported patterns:
    • Bin-on-timestamp operations (not yet implemented)
    • Group-by window functions (return original plan gracefully)
    • Specific aggregation patterns with LogicalValues
  • Fixed nullability attribute preservation through SAFE_CAST
  • Trim unused fields after SQL→Rel conversion
  • Ensure RelToSqlConverter is instantiated per-use (stateful component)
  • Fixed float literal handling with explicit casts
  • Remove JSON_TYPE operator insertion where inappropriate

Test Fixes

  • Fixed integration tests across multiple suites:
    • CalcitePPLDateTimeBuiltinFunctionIT: Interval semantics
    • CalcitePPLBuiltinFunctionIT: LOG function, sarg deserialization
    • CalciteArrayFunctionIT: Type checkers, reduce function inference
    • CalciteMathematicalFunctionIT, CalcitePPLAggregationIT
    • CalciteBinCommandIT: Timestamp operations, windowed aggregates in GROUP BY
    • CalciteStreamstatsCommandIT: Sort columns, bucket_nullable
    • CalcitePPLJsonBuiltinFunctionIT: String conversion
  • Updated all explain integration tests for plan changes
  • Fixed YAML tests, doctests, and unit tests
  • Updated ClickBench query plans

Code Cleanup

  • Removed legacy type checking from PPLFuncImpTable
  • Deprecated UDFOperandMetadata.wrapUDT interface
  • Removed unused classes: CalciteFuncSignature, PPLTypeChecker
  • Consolidated EnhancedCoalesce into built-in Coalesce
  • Removed type checkers from operator registration (now handled by Calcite)

Optimizations

  • Eliminated SAFE_CAST on non-string literal numbers (use CAST for better performance)
  • Investigated and addressed dedup optimization issues

Performance Impact

Dataset Scenario Analyze (ms) Optimize (ms) Execute (ms) Total Avg (ms)
clickbench Without Validation 7.93 8.29 0.95 17.53
With Validation 9.59 8.72 1.00 19.69
Difference +1.66 (+20.9%) +0.43 (+5.2%) +0.05 (+5.3%) +2.16 (+12.3%)
big5 Without Validation 3.52 4.27 0.80 8.99
With Validation 3.37 3.80 0.72 8.22
Difference -0.15 (-4.3%) -0.47 (-11.0%) -0.08 (-10.0%) -0.77 (-8.6%)
tpch Without Validation 8.65 692.00 183.15 1028.80
With Validation 7.91 689.16 175.63 1008.77
Difference -0.74 (-8.6%) -2.84 (-0.4%) -7.52 (-4.1%) -20.03 (-1.9%)

Profiled on personal laptop, each test runs twice, then averaged.

Conclusion: No significant performance degradation. ClickBench shows slight overhead (+12.3%) during analyze phase due to validation, but big5 and TPCH show improvements, likely from better query optimization enabled by proper type information.

Related Issues

Resolves #4636, resolves #3865

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-project opensearch-project deleted a comment from coderabbitai bot Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Calcite-based validation with implicit type coercion and LEFT SEMI/ANTI JOIN support; expanded IP and datetime casting; stronger UDF operand validation and broader variadic support.
  • Bug Fixes
    • More reliable window/ORDER BY behavior, float/interval literal handling, mixed-type IN/BETWEEN predicates, datetime binning, and safer dynamic-field handling in spath.
  • Documentation
    • Explain outputs moved to YAML; several examples marked non-executable (ignore) pending fixes.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Introduces extensive Calcite-based PPL validation and type-coercion plumbing, new PPL-specific validator/convertlets/coercion rules, UDT/type utilities, many UDF operand-metadata updates, Rex/Rel shuttles and Rel↔Sql converters, removal of legacy coercion/type-checker code, and large test/expected-output updates across integ and unit tests.

Changes

Cohort / File(s) Summary
Calcite validation core & providers
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java, core/src/main/java/org/opensearch/sql/calcite/validate/SqlOperatorTableProvider.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java
Add provider interface, new PplValidator, PPL-specific TypeCoercion/Rule and convertlet table; expose validator creation and operator-table injection.
Rel/Sql conversion & shuttles
core/src/main/java/org/opensearch/sql/calcite/validate/converters/OpenSearchRelToSqlConverter.java, .../OpenSearchSqlToRelConverter.java, core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java, .../SqlRewriteShuttle.java, .../SkipRelValidationShuttle.java
New Rel↔Sql converters and shuttles to translate RelNode→SqlNode→validate→RelNode, handle joins/hints, literal fixes, and selective validation skipping.
QueryService validation flow
core/src/main/java/org/opensearch/sql/executor/QueryService.java
Integrates new validation step that converts RelNode→SqlNode, validates via Calcite, converts back; supports tolerant fallback and configurable skip.
Type utilities & factory changes
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java, .../OpenSearchTypeFactory.java, .../OpenSearchTypeFactory.leastRestrictive, core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java
Add OpenSearchTypeUtil helpers, move UDT handling there, implement leastRestrictive override and utilities for syncing attributes/UDT creation.
Operand/type coercion & operator table refactor
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java, core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java, core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Replace composite/family wrappers with direct OperandTypes usage, introduce SCALAR checkers, migrate many operators from SqlOperator to SqlFunction, and simplify function/agg registries to single-implementation maps.
Rex/Rel visitors & builders
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, .../CalciteRexNodeVisitor.java, .../ExtendedRexBuilder.java, core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java, core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java
Embed plan collations into RexOver windows, simplify window construction, adjust casting/type-derivation for mixed numeric/char, and small plan utility replacements to use OpenSearchTypeUtil.
UDF operand metadata & nullability annotations
many files under core/src/main/java/org/opensearch/sql/expression/function/... (e.g., CollectionUDF/*, jsonUDF/*, udf/*, udf/datetime/*, udf/math/*, udf/ip/*, UserDefinedFunctionBuilder.java, UDFOperandMetadata.java, UserDefinedFunctionBuilder.java)
Make numerous getOperandMetadata() returns non-null, replace nulls with UDFOperandMetadata.wrap(...), migrate many operand checks to Calcite SqlOperandTypeChecker/OperandTypes, add @NonNull annotations, and introduce new custom checkers (e.g., IP, MAP repeats, transform/variadic semantics).
Removed legacy type-coercion framework
core/src/main/java/org/opensearch/sql/expression/function/CoercionUtils.java, core/src/main/java/org/opensearch/sql/expression/function/PPLTypeChecker.java, core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
Remove legacy CoercionUtils and PPLTypeChecker/CalciteFuncSignature artifacts; their responsibilities moved into Calcite validation/coercion paths.
New/changed validation helpers & convertlets
core/src/main/java/org/opensearch/sql/calcite/validate/PplRelToSqlRelShuttle.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java
Add convertlet table and rel→sql shuttle for literal normalization and operator remapping (e.g., IP operators).
Tests & expected outputs
test additions/changes under core/src/test/..., integ-test/src/test/..., integ-test/src/test/resources/expectedOutput/calcite/*, api/src/test/...
Large test surface updates: many new/updated unit tests for coercion/validation/shuttles/type utils; numerous integration expected-plan YAML changes (many JSON→YAML resources replaced/added); update transpiler test import path.
Docs & examples
docs/user/ppl/...
Minor docs/example updates (make code blocks ignore execution or formatting tweaks).

Sequence Diagram(s)

sequenceDiagram
    participant Rel as RelNode (PPL plan)
    participant Shuttle as PplRelToSqlRelShuttle
    participant RelToSql as OpenSearchRelToSqlConverter
    participant Validator as SqlValidator / PplValidator
    participant SqlToRel as OpenSearchSqlToRelConverter
    participant Planner as QueryService (convertToCalcitePlan)

    Rel->>Shuttle: traverse & normalize RexLiterals
    Shuttle->>RelToSql: convert RelNode -> SqlNode
    RelToSql->>Validator: validate SqlNode (type coercion, implicit casts)
    Validator-->>SqlToRel: produce validated SqlNode
    SqlToRel->>Planner: convert validated SqlNode -> RelNode (validated RelNode)
    Planner->>Planner: continue planning / optimize using validated RelNode
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

backport 2.19-dev

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • dai-chen
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely and accurately describes the primary change: adding Calcite-based validation for type checking and coercion.
Description check ✅ Passed Description clearly documents the SqlNode validation round-trip, UDT mapping, and type-coercion work and matches the changeset.
Linked Issues check ✅ Passed Implements SqlNode round-trip, PplValidator, PplTypeCoercion/Rule, UDT mapping and implicit coercion per linked objectives [#4636][#3865].
Out of Scope Changes check ✅ Passed Changes align with the migration to Calcite validation/coercion; refactors, removals, and test updates appear directly related and in-scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuancu yuancu added the enhancement New feature or request label Dec 3, 2025
@yuancu yuancu force-pushed the issues/4636 branch 5 times, most recently from e085f81 to fc6dd27 Compare December 15, 2025 13:40
…checking

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type
- todo: update the toRex in ExtendedRelJson to the align with the latest version

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast
- rewrite call to sql compare to custom ip comapre

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialectTest.java`:
- Around line 30-41: Add two unit tests in OpenSearchSparkSqlDialectTest that
call OpenSearchSparkSqlDialect.DEFAULT.getCastSpec: one that passes a non‑IP
RelDataType (e.g., TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER)) and asserts
the method returns null (or not a SqlDataTypeSpec) for unsupported types, and
another that calls getCastSpec(null) and asserts it throws a
NullPointerException (using assertThrows). Reference the existing test pattern
that inspects SqlDataTypeSpec (castSpec/SqlDataTypeSpec) so the new tests mirror
the IP test but cover non‑IP and null/boundary cases.

In
`@core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRuleTest.java`:
- Around line 40-50: The test testVarcharToOtherCoercion's comment is incorrect:
it states "coercible from OTHER" while the code checks
mapping.get(SqlTypeName.OTHER) contains VARCHAR/CHAR (i.e., VARCHAR can be
coerced to OTHER); update the inline comment above the assertion to correctly
read that VARCHAR is coercible to OTHER (or reword to "OTHER should allow
coercion to VARCHAR/CHAR") so the comment matches the logic in
PplTypeCoercionRule.instance() and mapping.get(SqlTypeName.OTHER).

In
`@core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttleTest.java`:
- Around line 33-36: Add unit tests to SkipRelValidationShuttleTest to cover
NULL, empty, and invalid cases for the skip predicate logic: add tests that pass
null operands to the shuttle (e.g., simulate a RexCall or RexNode with a null
operand), tests that pass an empty operand list, and negative tests for invalid
aggregate-input shapes (e.g., a single CASE expression or a non-project input)
to assert the shuttle rejects or throws as expected; create clearly named test
methods (e.g., testSkipPredicateWithNullOperand,
testSkipPredicateWithEmptyOperands, testSkipPredicateWithInvalidAggregateInput)
and exercise the same validation paths used by the existing happy-path tests so
you get NULL/boundary/error coverage for SkipRelValidationShuttle and relevant
methods in that class.

In
`@core/src/test/java/org/opensearch/sql/calcite/validate/ValidationUtilsTest.java`:
- Around line 209-232: Add a unit test in ValidationUtilsTest to cover a
null-exception input to ValidationUtils.tolerantValidationException: create a
test method (e.g., testTolerantValidationExceptionNullException) that passes
null as the exception argument and asserts the expected boolean result (likely
false based on other tests); place it alongside the existing tests in the same
class so the null-exception case is covered per the NULL input testing
guideline.
- Around line 98-207: Add null-input unit tests for
ValidationUtils.createUDTWithAttributes: verify that passing sourceType == null
and typeName == null to both overloads (the one taking ExprUDT and the one
taking SqlTypeName) throws a NullPointerException (use
assertThrows(NullPointerException.class, ...)). Add four tests in
ValidationUtilsTest such as testCreateUDTWithAttributes_NullSourceType_ExprUDT,
testCreateUDTWithAttributes_NullTypeName_ExprUDT,
testCreateUDTWithAttributes_NullSourceType_SqlTypeName, and
testCreateUDTWithAttributes_NullTypeName_SqlTypeName that call
ValidationUtils.createUDTWithAttributes(TYPE_FACTORY or null,
null/ExprUDT.EXPR_DATE) and the SqlTypeName variant accordingly and assert the
expected exception.
🧹 Nitpick comments (3)
core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttleTest.java (2)

36-94: Consider adding NULL input test for identifier visit.

Per coding guidelines, NULL input tests should be included for all new functions. Consider adding a test for how shuttle.visit() handles a null SqlIdentifier input, or document that null inputs are not expected at this layer.

`@Test`
public void testVisitIdentifierWithNull() {
    SqlNode result = shuttle.visit((SqlIdentifier) null);
    // Assert expected behavior (null return, exception, etc.)
}

174-189: Consider adding symmetric test for NOT IN with non-SqlNodeList.

For completeness and symmetry with the IN tests, consider adding a test that verifies NOT IN with a non-SqlNodeList first operand also remains unchanged.

`@Test`
public void testVisitCallNotInWithNonSqlNodeListUnchanged() {
    SqlIdentifier column = new SqlIdentifier("my_column", SqlParserPos.ZERO);
    SqlIdentifier subquery = new SqlIdentifier("subquery", SqlParserPos.ZERO);
    SqlBasicCall notInCall =
        new SqlBasicCall(SqlStdOperatorTable.NOT_IN, List.of(column, subquery), SqlParserPos.ZERO);

    SqlNode result = shuttle.visit(notInCall);

    assertInstanceOf(SqlBasicCall.class, result);
    SqlBasicCall resultCall = (SqlBasicCall) result;
    assertEquals(SqlKind.NOT_IN, resultCall.getKind());
    assertInstanceOf(SqlIdentifier.class, resultCall.getOperandList().get(0));
}
core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRuleTest.java (1)

17-67: Consider adding tests for lenientInstance() and assignmentInstance().

The PplTypeCoercionRule class exposes three public factory methods: instance(), lenientInstance(), and assignmentInstance(). Currently, only instance() is tested. Adding similar tests for the other two would ensure the PPL-specific mappings are correctly merged with both SqlTypeCoercionRule.lenientInstance() and SqlTypeAssignmentRule.instance() base mappings.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`:
- Around line 173-184: The test method
testLeastRestrictive_ipUdtAndOther_returnsIpUdt is using SqlTypeName.NULL for
the second type which duplicates the IP+NULL scenario; change the second type to
SqlTypeName.OTHER by creating RelDataType otherType =
TYPE_FACTORY.createSqlType(SqlTypeName.OTHER) and pass List.of(ipUdt, otherType)
into TYPE_FACTORY.leastRestrictive so the code exercises the IP+OTHER branch;
update assertions remain the same (use OpenSearchTypeUtil.isIp and
result.isNullable()) and ensure the variable name (otherType) replaces nullType
in that call.
- Around line 20-22: Add an integration-style test method (e.g.,
testLeastRestrictiveViaSqlPipeline) to OpenSearchTypeFactoryTest that drives
leastRestrictive through Calcite's SQL->validation->Rel conversion pipeline:
construct a test schema with the same UDT types used by the existing unit tests,
parse a simple SQL SELECT that uses expressions/columns with those UDTs, run the
SqlValidator (using the test's SqlValidator and
OpenSearchSqlOperatorTable/SqlStdOperatorTable), convert the validated SqlNode
to a RelNode via SqlToRelConverter, and assert that the resulting RelNode types
(or the validator/typeFactory outputs) resolved the common type using
leastRestrictive as expected; reference the class OpenSearchTypeFactoryTest, the
type factory instance used in existing tests, and the Calcite pipeline classes
SqlParser/SqlValidator/SqlToRelConverter to locate where to integrate this new
test.
- Around line 24-278: Add three unit tests to OpenSearchTypeFactoryTest that
cover null/empty inputs for TYPE_FACTORY.leastRestrictive: (1)
testLeastRestrictive_nullInput_returnsNull — call
TYPE_FACTORY.leastRestrictive(null) and assertNull(result); (2)
testLeastRestrictive_listWithNullElement_returnsNull — call
TYPE_FACTORY.leastRestrictive(List.of(TYPE_FACTORY.createUDT(ExprUDT.EXPR_DATE),
null)) and assertNull(result); (3) testLeastRestrictive_emptyList_returnsNull —
call TYPE_FACTORY.leastRestrictive(List.of()) and assertNull(result). Name each
test accordingly and use the existing TYPE_FACTORY and ExprUDT symbols so they
integrate with the current test suite.

In
`@core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionTest.java`:
- Around line 101-113: The test testImplicitCast_preservesNullability currently
checks nullableResult but misses asserting nonNullableResult's nullability;
update the test in PplTypeCoercionTest (method
testImplicitCast_preservesNullability) to assert that nonNullableResult is
non-nullable by adding an assertion like
assertFalse(nonNullableResult.isNullable()) after the existing checks on
nonNullableResult, using the same typeCoercion.implicitCast invocation to verify
non-nullable input yields non-nullable output.
🧹 Nitpick comments (5)
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java (2)

19-45: Consider adding NULL input tests for type-checking methods.

Only isScalar (line 342) includes a NULL input test. Per the coding guidelines, NULL input tests should be included for all new functions. Consider adding NULL input tests for isUserDefinedType, isNumericOrCharacter, isDatetime, isDate, isTimestamp, isTime, isCharacter, isIp, and isBinary to verify behavior when passed null.

Example NULL input tests to add
`@Test`
public void testIsUserDefinedType_withNull_returnsFalse() {
    assertFalse(OpenSearchTypeUtil.isUserDefinedType(null));
}

`@Test`
public void testIsNumericOrCharacter_withNull_returnsFalse() {
    assertFalse(OpenSearchTypeUtil.isNumericOrCharacter(null));
}

`@Test`
public void testIsDatetime_withNull_returnsFalse() {
    assertFalse(OpenSearchTypeUtil.isDatetime(null));
}

// Similar tests for isDate, isTimestamp, isTime, isCharacter, isIp, isBinary

As per coding guidelines: "NULL input tests must be included for all new functions".


8-10: Add missing import for java.util.List.

java.util.List.of is used inline at lines 369 and 372. Consider adding an import for consistency with the other imports.

Suggested import
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 import static org.opensearch.sql.calcite.utils.OpenSearchTypeFactory.TYPE_FACTORY;

+import java.util.List;
 import org.apache.calcite.rel.type.RelDataType;

Then update lines 369-372:

-            java.util.List.of(
+            List.of(
                 TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER),
                 TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR)),
-            java.util.List.of("id", "name"));
+            List.of("id", "name"));
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java (1)

24-25: Align test method names with the required pattern.

Current names use underscores; the guideline expects test<Functionality><Condition> (camel-cased) naming. Please rename consistently (e.g., testLeastRestrictiveDateUdtsOnlyReturnsDateUdt).

As per coding guidelines, "Test naming should follow pattern test".

core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttleTest.java (2)

94-201: Consider adding boundary condition tests for interval literals.

Per testing guidelines, boundary condition tests (min/max values, zero, negative inputs) should be included for new functions. The interval tests use specific positive values but don't cover edge cases:

  • Zero interval value (e.g., INTERVAL 0 DAY)
  • Negative interval value (if supported)
  • Very large interval values near overflow boundaries

These edge cases could reveal issues with arithmetic operations in the transformation logic.


259-294: Test helper duplicates logic rather than testing actual class.

The createRexShuttle helper reimplements the exact transformation logic from PplRelToSqlRelShuttle's internal visitLiteral method rather than delegating to the actual class. While this approach isolates and tests the algorithm correctly, it doesn't verify that PplRelToSqlRelShuttle itself functions as intended—if the production implementation diverges from this test copy, tests would still pass.

Consider refactoring to instantiate PplRelToSqlRelShuttle and invoke its behavior directly, rather than reimplementing its logic inline in tests.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu yuancu requested a review from LantaoJin January 30, 2026 05:19
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@LantaoJin
Copy link
Member

LantaoJin commented Feb 6, 2026

This PR is too large to review, is it possible to separate into multiple smaller ones?
In future, if we want to merge such a large change, I think we'd better to create a feature branch and review PRs targeted to feature branch, then we can merge the feature branch back to main without review.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5723ced.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3fd9714.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5723ced.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 2c831f0.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

yuancu added 4 commits March 6, 2026 15:02
…path PRs

Merge the validation branch with origin/main, resolving conflicts caused
by the revert of dynamic column support (opensearch-project#5139). Spath-related files
(DynamicFieldsHelper, AppendFunctionImpl, spath explain YAMLs) are
deleted to align with the revert. All other conflicts resolved to
preserve the validation round-trip changes from the branch.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- Add missing import for OpenSearchSparkSqlDialect in CalcitePPLNoMvTest
  (class moved from ppl to core/validate package)
- Update CalcitePPLSpathTest.testSpathAutoExtractModeWithEval expected plan
  to match validation-deferred behavior (no eager SAFE_CAST)
- Update CalcitePPLNoMvTest.testNoMvNonExistentField to accept
  "inferred array element type" error message

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
The validation round-trip (RelNode → SQL → validate → RelNode) introduces
plan changes such as extra LogicalProject nodes, type annotations on MAP
keys, and reordered aggregate operands. Update YAML expected outputs and
test assertions to match.

- Catch Throwable (not just Exception) in validate() for AssertionError
  from unsupported RelNodes like LogicalGraphLookup
- Update 17 YAML expected output files for calcite and calcite_no_pushdown
- Add 4 new alternative YAML files for non-deterministic plan variants
- Add separate YAML for boolean string literal filter test
- Accept "inferred array element type" error in NoMv missing field test

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 84f53df.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@yuancu
Copy link
Collaborator Author

yuancu commented Mar 9, 2026

PR Split Plan

As suggested by @LantaoJin, this PR has been split into 4 sub-PRs targeting the feature/validation branch:

Sub-PR Title Files Dependency
#5213 Add validation infrastructure and type system 32 Independent
#5214 Define operand type checkers for all PPL built-in functions 86 Independent
#5215 Enable validation pipeline and update all test expected outputs 583 After #5213 + #5214
#5216 Remove legacy type checking code and update documentation 13 After #5215

Merge order: #5213 and #5214 can be reviewed/merged in parallel → #5215#5216 → merge feature/validationmain

Reviewer guide:

@yuancu
Copy link
Collaborator Author

yuancu commented Mar 9, 2026

Note: This PR is superseded by 4 smaller sub-PRs targeting the feature/validation branch, as suggested in #4892 (comment).

Please review the sub-PRs instead:

Order PR Title Files Status
1 #5213 Add validation infrastructure and type system 32 Ready for review
2 #5214 Define operand type checkers for all PPL built-in functions 86 Ready for review
3 #5215 Enable validation pipeline and update all test expected outputs 583 Merge after #5213 + #5214
4 #5216 Remove legacy type checking code and update documentation 13 Merge after #5215

This PR will be closed once all sub-PRs are merged and feature/validation is merged back to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Type Conversion - Leverage RelDataType from Calcite in V3 engine. [RFC] Implicit Type Coercion with Calcite

4 participants