Add validation infrastructure and type system by yuancu · Pull Request #5213 · opensearch-project/sql

yuancu · 2026-03-09T02:44:46Z

Summary

Sub-PR 1/4 for #4892 — splitting the large PR into reviewable pieces targeting feature/validation.

This PR adds all validation infrastructure and type system code without enabling validation in the execution pipeline. All existing tests pass unchanged.

What's included

Core Validation Pipeline (new calcite/validate/ package):

PplValidator — Custom SQL validator for PPL with UDT-aware deriveType
ValidationUtils — Type sync, UDT creation helpers
OpenSearchSparkSqlDialect — Extended Spark SQL dialect for OpenSearch
PplConvertletTable, SqlOperatorTableProvider

Converters (RelNode ↔ SqlNode round-trip):

OpenSearchRelToSqlConverter — Handles SEMI/ANTI joins, IN/NOT IN tuple rewriting, sort preservation
OpenSearchSqlToRelConverter — Hint strategy, field trimming, JSON_TYPE disabling

Shuttles:

PplRelToSqlRelShuttle — Interval literal fixing, collation embedding, bucket_nullable hints
SkipRelValidationShuttle — Detects patterns that should skip validation (bin-on-timestamp, group-by-CASE, LogicalValues)
SqlRewriteShuttle — Removes database qualifiers before validation

Type System:

OpenSearchTypeFactory / OpenSearchTypeUtil — UDT ↔ SqlTypeName mapping, composite types
PplTypeCoercion / PplTypeCoercionRule — Custom coercion blacklist/whitelist
CoercionUtils — SAFE_CAST vs CAST decision logic

Integration Point:

QueryService — Adds validate() and doValidate() methods (NOT yet wired into execute/explain flow — that happens in Sub-PR C)

Unit Tests

All new code has unit tests under calcite/validate/ and calcite/utils/.

Dependency

Independent — can be reviewed and merged in parallel with Sub-PR B (#5214).

graph LR
    A["#5213 Infrastructure + Type System"] --> C["#5215 Enable Validation + Tests"]
    B["#5214 Function Operand Types"] --> C
    C --> D["#5216 Cleanup + Docs"]

    style A fill:#4da6ff,color:#fff
    style B fill:#ccc,color:#333
    style C fill:#ccc,color:#333
    style D fill:#ccc,color:#333

Signed-off-by: Yuanchun Shen yuanchu@amazon.com

github-actions · 2026-03-09T02:46:34Z

PR Reviewer Guide 🔍

(Review updated until commit `306e276`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Type system refactoring (OpenSearchTypeFactory + OpenSearchTypeUtil) Relevant files: core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java Sub-PR theme: Core validation pipeline (PplValidator, type coercion, dialect) Relevant files: core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java core/src/main/java/org/opensearch/sql/calcite/validate/SqlOperatorTableProvider.java Sub-PR theme: Converters and shuttles (RelNode ↔ SqlNode round-trip) Relevant files: core/src/main/java/org/opensearch/sql/calcite/validate/converters/OpenSearchRelToSqlConverter.java core/src/main/java/org/opensearch/sql/calcite/validate/converters/OpenSearchSqlToRelConverter.java core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java
⚡ Recommended focus areas for review Logic Gap in leastRestrictive In the `leastRestrictive` override, the condition `types.stream().anyMatch(OpenSearchTypeUtil::isUserDefinedType) && types.stream().allMatch(Objects::nonNull)` is checked, but the null-check guard (`allMatch(Objects::nonNull)`) is evaluated after the `anyMatch` short-circuit. If a list contains a null element alongside a UDT, the `anyMatch` may return true before the `allMatch` is evaluated, but the subsequent `for` loop will still encounter the null and throw an NPE. This is partially tested in `OpenSearchTypeFactoryTest.testLeastRestrictive_listWithNullElement_throwsNPE`, but the behavior may be inconsistent depending on element ordering. public @Nullable RelDataType leastRestrictive(List<RelDataType> types) { // Handle UDTs separately, otherwise the least restrictive type will become VARCHAR if (types.stream().anyMatch(OpenSearchTypeUtil::isUserDefinedType) && types.stream().allMatch(Objects::nonNull)) { int nullCount = 0; int anyCount = 0; int nullableCount = 0; int dateCount = 0; int timeCount = 0; int ipCount = 0; int binaryCount = 0; int otherCount = 0; for (RelDataType t : types) { if (t.isNullable()) { nullableCount++; } if (t.getSqlTypeName() == SqlTypeName.NULL) { nullCount++; } else if (t.getSqlTypeName() == SqlTypeName.ANY) { anyCount++; } if (t.getSqlTypeName() == SqlTypeName.OTHER) { otherCount++; } if (OpenSearchTypeUtil.isDate(t)) { dateCount++; } else if (OpenSearchTypeUtil.isTime(t)) { timeCount++; } else if (OpenSearchTypeUtil.isIp(t)) { ipCount++; } else if (OpenSearchTypeUtil.isBinary(t)) { binaryCount++; } } // When there is ANY, fall through to standard leastRestrictive if (anyCount == 0) { RelDataType udt; boolean nullable = nullableCount > 0 \|\| nullCount > 0; if (dateCount + nullCount == types.size()) { udt = createUDT(ExprUDT.EXPR_DATE, nullable); } else if (timeCount + nullCount == types.size()) { udt = createUDT(ExprUDT.EXPR_TIME, nullable); } // There are cases where UDT IP interleaves with its intermediate SQL type for validation // OTHER, we check otherCount to patch such cases else if (ipCount + nullCount == types.size() \|\| otherCount + nullCount == types.size()) { udt = createUDT(ExprUDT.EXPR_IP, nullable); } else if (binaryCount + nullCount == types.size()) { udt = createUDT(ExprUDT.EXPR_BINARY, nullable); } // There exists a mix of time, date, and timestamp (and optionally null) else if (binaryCount == 0 && ipCount == 0) { udt = createUDT(ExprUDT.EXPR_TIMESTAMP, nullable); } else { udt = createSqlType(SqlTypeName.VARCHAR, nullable); } return udt; } } RelDataType type = leastRestrictive(types, PplTypeCoercionRule.assignmentInstance()); // Convert CHAR(precision) to VARCHAR so that results won't be padded if (type != null && SqlTypeName.CHAR.equals(type.getSqlTypeName())) { return createSqlType(SqlTypeName.VARCHAR, type.isNullable()); } return type; } Ambiguous IP Fallback In `leastRestrictive`, when `otherCount + nullCount == types.size()` (i.e., all types are OTHER or NULL with no UDT IP present), the method returns an IP UDT. This could incorrectly classify non-IP OTHER types as IP when no actual IP UDT is in the list. The comment says "UDT IP interleaves with its intermediate SQL type for validation OTHER", but this condition fires even when no IP UDT is present in the list. else if (ipCount + nullCount == types.size() \|\| otherCount + nullCount == types.size()) { udt = createUDT(ExprUDT.EXPR_IP, nullable); } else if (binaryCount + nullCount == types.size()) { Unchecked Cast Risk In `castTo`, when `OpenSearchTypeUtil.isDatetime(targetType) \|\| OpenSearchTypeUtil.isIp(targetType)` is true, `OpenSearchTypeFactory.convertRelDataTypeToExprType(targetType)` is called and the result is used in a switch. If the targetType is a standard SQL datetime type (not a UDT), `convertRelDataTypeToExprType` may not return an `ExprCoreType` matching the switch cases, potentially hitting the `default` branch and throwing `UnsupportedOperationException` at runtime. private static SqlNode castTo(SqlNode node, RelDataType sourceType, RelDataType targetType) { if (OpenSearchTypeUtil.isDatetime(targetType) \|\| OpenSearchTypeUtil.isIp(targetType)) { ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(targetType); return switch (exprType) { case ExprCoreType.DATE -> PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); case ExprCoreType.TIMESTAMP -> PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); case ExprCoreType.TIME -> PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); }; } Visibility Change The `getPreparingStmt` method visibility was changed from `protected` to `public`. This widens the API surface of an internal class and may expose implementation details. Verify this change is intentional and necessary. public CalcitePrepareImpl.CalcitePreparingStmt getPreparingStmt( CalcitePrepare.Context context, Type elementType, CalciteCatalogReader catalogReader, Duplicated Logic The `createRexShuttle` helper in `PplRelToSqlRelShuttleTest` duplicates the transformation logic from `PplRelToSqlRelShuttle` rather than testing the actual class. If the production implementation diverges, these tests will not catch regressions. Consider testing the actual shuttle class directly. private RexShuttle createRexShuttle(boolean forward) { return new RexShuttle() { @Override public RexNode visitLiteral(RexLiteral literal) { // 1. Fix float literal SqlTypeName literalType = literal.getType().getSqlTypeName(); if (SqlTypeName.REAL.equals(literalType) \|\| SqlTypeName.FLOAT.equals(literalType)) { return rexBuilder.makeCall( literal.getType(), SqlLibraryOperators.SAFE_CAST, List.of(literal)); } // 2. Fix interval literal SqlIntervalQualifier qualifier = literal.getType().getIntervalQualifier(); if (qualifier == null) { return literal; } BigDecimal value = literal.getValueAs(BigDecimal.class); if (value == null) { return literal; } TimeUnit unit = qualifier.getUnit(); BigDecimal forwardMultiplier = TimeUnit.QUARTER.equals(unit) ? BigDecimal.valueOf(1) : unit.multiplier; BigDecimal newValue = forward ? value.multiply(forwardMultiplier) : value.divideToIntegralValue(unit.multiplier); return rexBuilder.makeIntervalLiteral(newValue, qualifier); } }; }

github-actions · 2026-03-09T02:47:22Z

PR Code Suggestions ✨

Latest suggestions up to 306e276

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Use validated SqlNode result for conversion `validator.validate()` returns the validated `SqlNode` (potentially rewritten with coercions), but the result is discarded and the original `rewritten` node is passed to `sql2rel.convertQuery()`. This means any type coercions or rewrites performed during validation are lost. The validated node returned by `validate()` should be used for the subsequent conversion. core/src/main/java/org/opensearch/sql/executor/QueryService.java [320-342] -validator.validate(Objects.requireNonNull(rewritten)); +SqlNode validated = validator.validate(Objects.requireNonNull(rewritten)); ... -RelNode validatedRel = sql2rel.convertQuery(rewritten, false, true).project(); +RelNode validatedRel = sql2rel.convertQuery(validated, false, true).project(); Suggestion importance[1-10]: 9 __ Why: `validator.validate()` returns a potentially rewritten `SqlNode` with type coercions applied, but the result is discarded and the original `rewritten` node is passed to `sql2rel.convertQuery()`. This is a correctness bug — coercions inserted during validation are lost, defeating the purpose of the validation step.	High
	Fix incorrect IP type inference for OTHER-only lists The condition `otherCount + nullCount == types.size()` will match any list that contains only NULL and OTHER types, even when no IP UDT is present. This could incorrectly classify non-IP types as IP. The condition should require at least one IP UDT to be present before falling back to accepting OTHER as IP. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [379-381] -else if (ipCount + nullCount == types.size() \|\| otherCount + nullCount == types.size()) { +else if (ipCount + nullCount == types.size() \|\| (ipCount > 0 && otherCount + ipCount + nullCount == types.size())) { udt = createUDT(ExprUDT.EXPR_IP, nullable); } Suggestion importance[1-10]: 7 __ Why: The condition `otherCount + nullCount == types.size()` would incorrectly classify a list containing only `NULL` and `OTHER` types (with no IP UDT) as an IP type. The fix requires at least one IP UDT to be present, which is more semantically correct.	Medium
	Avoid catching Throwable in validation fallback Catching `Throwable` is overly broad and will silently swallow serious errors like `OutOfMemoryError` or `StackOverflowError`. Consider catching only `Exception` (or a more specific subset like `RuntimeException` and `Error` subclasses that are truly recoverable), and re-throwing anything that is not a known recoverable condition. core/src/main/java/org/opensearch/sql/executor/QueryService.java [299-305] -} catch (Throwable e) { +} catch (Exception e) { // Gracefully skip validation when the plan contains operators that cannot be converted - // to SQL (e.g., LogicalGraphLookup throws AssertionError) or validated (e.g., unregistered - // UDFs like ARRAY_COMPACT). Return the original plan without validation. + // to SQL (e.g., LogicalGraphLookup throws UnsupportedOperationException) or validated + // (e.g., unregistered UDFs like ARRAY_COMPACT). Return the original plan without validation. log.debug("Skipping validation due to unsupported operation: {}", e.getMessage()); return relNode; } Suggestion importance[1-10]: 6 __ Why: Catching `Throwable` is overly broad and can silently swallow serious JVM errors like `OutOfMemoryError`. The comment mentions `AssertionError` as a known case, but catching `Exception` would be safer while still handling the intended recoverable cases.	Low
	Fix invalid nullable parameter in createSqlType call `createSqlType(SqlTypeName.VARCHAR, nullable)` is not a standard Calcite API — `createSqlType` does not accept a boolean nullable parameter. This will likely cause a compilation error or unexpected behavior. Use `createTypeWithNullability(createSqlType(SqlTypeName.VARCHAR), nullable)` instead. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [387-390] } else { - udt = createSqlType(SqlTypeName.VARCHAR, nullable); + udt = createTypeWithNullability(createSqlType(SqlTypeName.VARCHAR), nullable); } return udt; Suggestion importance[1-10]: 3 __ Why: The suggestion claims `createSqlType(SqlTypeName.VARCHAR, nullable)` is invalid, but `createSqlType` in Calcite's `RelDataTypeFactory` does accept a boolean `nullable` parameter as an overload, so this may not be a real issue. The score is low because the premise may be incorrect.	Low
General	Avoid shared mutable static state in context class Using a `static` mutable field for `operatorTableProvider` makes `CalcitePlanContext` implicitly stateful at the class level, which can cause issues in tests or environments with multiple class loaders. Consider using an instance field or a proper dependency injection mechanism to avoid shared mutable static state. core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java [88] -@Setter private static SqlOperatorTableProvider operatorTableProvider; +@Setter private SqlOperatorTableProvider operatorTableProvider; Suggestion importance[1-10]: 4 __ Why: The static `operatorTableProvider` field introduces shared mutable state at the class level, which can cause issues in tests or multi-classloader environments. However, the PR comment acknowledges this design is intentional for cross-module dependency injection, so the impact is limited.	Low
General	Declare singleton instance field as final The `INSTANCE` field is not declared `final`, which means it could be reassigned accidentally. Since this is a singleton, it should be declared `final` to prevent unintended modification. core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java [25] -public static PplConvertletTable INSTANCE = new PplConvertletTable(); +public static final PplConvertletTable INSTANCE = new PplConvertletTable(); Suggestion importance[1-10]: 4 __ Why: The `INSTANCE` field is a singleton but not declared `final`, allowing accidental reassignment. Adding `final` is a simple defensive improvement with no functional impact.	Low

Previous suggestions

Suggestions up to commit aca9781

Category	Suggestion	Impact
Possible issue	Use validated SqlNode for subsequent conversion `validator.validate()` returns the validated (and potentially rewritten) `SqlNode`, but the result is discarded. The original `rewritten` node is then passed to `convertQuery`, which may miss type coercions or rewrites applied during validation. The return value of `validate()` should be used for the subsequent conversion. core/src/main/java/org/opensearch/sql/executor/QueryService.java [319-342] SqlValidator validator = context.getValidator(); -validator.validate(Objects.requireNonNull(rewritten)); +SqlNode validated = validator.validate(Objects.requireNonNull(rewritten)); ... -RelNode validatedRel = sql2rel.convertQuery(rewritten, false, true).project(); +RelNode validatedRel = sql2rel.convertQuery(validated, false, true).project(); Suggestion importance[1-10]: 8 __ Why: `validator.validate()` returns a potentially rewritten `SqlNode` with type coercions applied, but the result is discarded and the original `rewritten` node is passed to `convertQuery`. This means type coercions inserted during validation are lost, which is a correctness issue.	Medium
	Fix incorrect IP type inference for OTHER-only lists The condition `otherCount + nullCount == types.size()` will match any list that contains only NULL and OTHER types, even when no IP UDT is present. This could incorrectly classify non-IP OTHER types as IP. The condition should require at least one IP UDT to be present before falling back to accepting OTHER as IP. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [379-381] -else if (ipCount + nullCount == types.size() \|\| otherCount + nullCount == types.size()) { +else if (ipCount + nullCount == types.size() \|\| (ipCount > 0 && ipCount + otherCount + nullCount == types.size())) { udt = createUDT(ExprUDT.EXPR_IP, nullable); } Suggestion importance[1-10]: 7 __ Why: The condition `otherCount + nullCount == types.size()` would incorrectly classify a list containing only `OTHER` and `NULL` types (with no IP UDT) as an IP type. The fix requires at least one IP UDT to be present when using `otherCount` as a fallback, which is a valid correctness concern.	Medium
	Fix inconsistent multiplier in reverse interval transformation When `forward` is `false` (reverse direction), the code divides by `unit.multiplier` but should divide by `forwardMultiplier` (which accounts for the QUARTER special case). This inconsistency means the reverse transformation for QUARTER intervals will use the wrong multiplier, causing incorrect interval values. core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java [69-72] BigDecimal newValue = forward ? value.multiply(forwardMultiplier) - : value.divideToIntegralValue(unit.multiplier); + : value.divideToIntegralValue(forwardMultiplier); Suggestion importance[1-10]: 7 __ Why: The reverse transformation uses `unit.multiplier` instead of `forwardMultiplier`, which was specifically defined to handle the QUARTER special case. This inconsistency would cause incorrect interval values when reversing QUARTER intervals, as QUARTER's `forwardMultiplier` is `1` but `unit.multiplier` is `3`.	Medium
	Avoid catching Throwable for non-fatal errors Catching `Throwable` is overly broad and will silently swallow serious errors like `OutOfMemoryError` or `StackOverflowError`. Consider catching only `Exception` (or a more specific subset) to avoid masking critical JVM-level failures. core/src/main/java/org/opensearch/sql/executor/QueryService.java [299-305] -} catch (Throwable e) { +} catch (Exception e) { // Gracefully skip validation when the plan contains operators that cannot be converted // to SQL (e.g., LogicalGraphLookup throws AssertionError) or validated (e.g., unregistered // UDFs like ARRAY_COMPACT). Return the original plan without validation. log.debug("Skipping validation due to unsupported operation: {}", e.getMessage()); return relNode; } Suggestion importance[1-10]: 6 __ Why: Catching `Throwable` is overly broad and can mask critical JVM errors like `OutOfMemoryError`. The comment mentions `AssertionError` as a specific case, but using `Exception` would be safer while still handling the documented use cases.	Low
	Fix switch case labels for enum constants The switch uses `SqlTypeName.DATE` etc. as case labels, but in Java switch expressions over enum values, the case labels should be the enum constants without the class prefix (e.g., `DATE` not `SqlTypeName.DATE`). Using the fully-qualified form may cause a compilation error depending on the Java version and switch syntax used. core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java [74-78] return switch (casted.getSqlTypeName()) { - case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> + case DATE, TIME, TIMESTAMP, BINARY -> createUDTWithAttributes(factory, in, casted.getSqlTypeName()); default -> casted; }; Suggestion importance[1-10]: 2 __ Why: In modern Java (14+), switch expressions over enums support fully-qualified case labels like `SqlTypeName.DATE`. This is valid Java syntax and would compile correctly, making this suggestion a minor style preference rather than a real bug.	Low
General	Ensure thread-safe static field initialization Using a `static` mutable field for `operatorTableProvider` makes `CalcitePlanContext` implicitly stateful at the class level, which can cause issues in concurrent or multi-tenant environments. Consider using instance-level injection or a thread-safe initialization pattern (e.g., `volatile` with double-checked locking) to prevent race conditions. core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java [88] -@Setter private static SqlOperatorTableProvider operatorTableProvider; +@Setter private static volatile SqlOperatorTableProvider operatorTableProvider; Suggestion importance[1-10]: 4 __ Why: The static `operatorTableProvider` field is not `volatile`, which could cause visibility issues in multi-threaded environments. Adding `volatile` is a minor but valid improvement for thread safety.	Low
	Fix asymmetric QUARTER multiplier in backward direction The backward direction always uses `unit.multiplier` for division, but for QUARTER the forward direction uses a special multiplier of 1. This asymmetry means a round-trip (forward then backward) for QUARTER won't restore the original value. If the intent is to test the actual shuttle logic, the backward path for QUARTER should also use the special multiplier of 1 to be consistent. core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttleTest.java [284-291] BigDecimal forwardMultiplier = TimeUnit.QUARTER.equals(unit) ? BigDecimal.valueOf(1) : unit.multiplier; BigDecimal newValue = forward ? value.multiply(forwardMultiplier) - : value.divideToIntegralValue(unit.multiplier); + : value.divideToIntegralValue(forwardMultiplier); Suggestion importance[1-10]: 3 __ Why: The test's `createRexShuttle` helper is a local reimplementation of the shuttle logic for testing purposes. The asymmetry in QUARTER handling between forward and backward directions may be intentional (the test at line 183-200 verifies the backward case uses the normal multiplier of 3). Changing this would break the existing backward QUARTER test.	Low
	Simplify redundant conditional return pattern The method can be simplified by removing the redundant `if (condition) return true;` `return false;` pattern. The last two statements can be collapsed into a single return expression. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java [41-55] public static boolean isNumericOrCharacter(RelDataType fieldType) { // Check for OpenSearch UDT numeric types if (isUserDefinedType(fieldType)) { AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; ExprType udtType = exprType.getExprType(); return ExprCoreType.numberTypes().contains(udtType); } // Check standard SQL numeric types & string types (VARCHAR, CHAR) -if (SqlTypeUtil.isNumeric(fieldType) \|\| SqlTypeUtil.isCharacter(fieldType)) { - return true; +return SqlTypeUtil.isNumeric(fieldType) \|\| SqlTypeUtil.isCharacter(fieldType); } -return false; -} - Suggestion importance[1-10]: 3 __ Why: This is a valid minor code style improvement that removes a redundant `if (condition) return true; return false;` pattern, making the code slightly more concise without changing behavior.	Low

Suggestions up to commit 9eb9c4f

Category	Suggestion	Impact
Possible issue	Ensure flag is restored on exception The `top` flag is not thread-safe and is not restored if `super.deriveType` throws an exception. This could leave `top` in an incorrect state for subsequent calls. Use a try-finally block to ensure `top` is always restored. core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java [106-109] boolean original = top; top = false; -RelDataType type = super.deriveType(scope, expr); -top = original; +RelDataType type; +try { + type = super.deriveType(scope, expr); +} finally { + top = original; +} Suggestion importance[1-10]: 7 __ Why: The `top` flag is not restored if `super.deriveType` throws an exception, which could leave the validator in an inconsistent state for subsequent calls. Using a try-finally block is a correctness improvement that prevents subtle bugs in error scenarios.	Medium
	Fix asymmetric multiplier in interval reverse conversion When `forward` is `false` (reverse direction), the code divides by `unit.multiplier` but should use `forwardMultiplier` for consistency with the forward path, especially for `QUARTER` intervals where `forwardMultiplier` differs from `unit.multiplier`. This asymmetry could cause incorrect interval values when reversing QUARTER intervals. core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java [69-72] BigDecimal newValue = forward ? value.multiply(forwardMultiplier) - : value.divideToIntegralValue(unit.multiplier); + : value.divideToIntegralValue(forwardMultiplier); Suggestion importance[1-10]: 7 __ Why: The asymmetry between forward (`forwardMultiplier`) and reverse (`unit.multiplier`) paths for QUARTER intervals is a genuine bug. For QUARTER, `forwardMultiplier` is 1 while `unit.multiplier` is 3, so the reverse path would incorrectly divide by 3 instead of 1, producing wrong interval values.	Medium
	Fix incorrect IP type inference for OTHER types The condition `otherCount + nullCount == types.size()` will match any list containing only NULL and OTHER types, even when no IP UDT is present. This could incorrectly classify non-IP OTHER types as IP. The condition should require at least one IP UDT to be present when falling back to the OTHER type check. core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [379-381] -else if (ipCount + nullCount == types.size() \|\| otherCount + nullCount == types.size()) { +else if (ipCount + nullCount == types.size() \|\| (ipCount > 0 && otherCount + ipCount + nullCount == types.size())) { udt = createUDT(ExprUDT.EXPR_IP, nullable); } Suggestion importance[1-10]: 6 __ Why: The condition `otherCount + nullCount == types.size()` could incorrectly classify a list of only NULL and OTHER types (with no IP UDT) as IP type. The fix requires at least one IP UDT to be present when using the OTHER fallback, which is more semantically correct. However, the comment in the code suggests this is an intentional workaround for IP/OTHER interleaving during validation, so the impact may be limited.	Low
	Avoid catching overly broad Throwable Catching `Throwable` is overly broad and will silently swallow serious JVM errors like `OutOfMemoryError` or `StackOverflowError`. Consider catching `Exception` instead, or at minimum re-throw `Error` subclasses that are not expected validation failures. core/src/main/java/org/opensearch/sql/executor/QueryService.java [299-305] -} catch (Throwable e) { +} catch (Exception e) { // Gracefully skip validation when the plan contains operators that cannot be converted // to SQL (e.g., LogicalGraphLookup throws AssertionError) or validated (e.g., unregistered // UDFs like ARRAY_COMPACT). Return the original plan without validation. log.debug("Skipping validation due to unsupported operation: {}", e.getMessage()); return relNode; } Suggestion importance[1-10]: 6 __ Why: Catching `Throwable` is indeed overly broad and can swallow serious JVM errors. However, the comment explicitly mentions `AssertionError` (which is an `Error`, not `Exception`) as a known case to handle, so changing to `Exception` would break the intended behavior for `LogicalGraphLookup`. The suggestion is partially valid but conflicts with the documented intent.	Low
	Make static mutable field thread-safe Using a `static` mutable field for `operatorTableProvider` is not thread-safe and can cause race conditions in concurrent environments. The field should be declared `volatile` to ensure visibility across threads, or use a thread-safe initialization mechanism. core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java [88] -@Setter private static SqlOperatorTableProvider operatorTableProvider; +@Setter private static volatile SqlOperatorTableProvider operatorTableProvider; Suggestion importance[1-10]: 5 __ Why: The `operatorTableProvider` static field is set once during initialization via a `static` block in `OpenSearchExecutionEngine`, so thread-safety is a valid concern. Adding `volatile` is a low-cost improvement for visibility guarantees in concurrent environments.	Low
General	Fix incorrect attribute source in UDT creation When wrapping the casted type in a UDT, the method passes `in` (the source type) to `createUDTWithAttributes` for attribute syncing, but should pass `casted` to preserve the target type's attributes (e.g., precision, nullability from the cast result). Using `in` may incorrectly sync source attributes onto the UDT instead of the intended target attributes. core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java [75-79] return switch (casted.getSqlTypeName()) { case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> - createUDTWithAttributes(factory, in, casted.getSqlTypeName()); + createUDTWithAttributes(factory, casted, casted.getSqlTypeName()); default -> casted; }; Suggestion importance[1-10]: 4 __ Why: The suggestion to pass `casted` instead of `in` to `createUDTWithAttributes` has merit for preserving target type attributes, but the purpose of `syncAttributes` is specifically to sync source (`in`) attributes like charset/collation onto the target UDT. Passing `casted` instead may not be clearly more correct without deeper context of `createUDTWithAttributes` behavior.	Low
General	Handle intermediate nodes in join type modification The `modifyJoinType` method only handles `LogicalProject` and `LogicalJoin` nodes. If the RelNode tree contains other intermediate nodes (e.g., `LogicalFilter`) between the project and the join, the method will silently return the original root without modifying the join type, leading to incorrect join semantics. The method should handle or log unexpected intermediate node types. core/src/main/java/org/opensearch/sql/calcite/validate/converters/OpenSearchSqlToRelConverter.java [89-109] private RelNode modifyJoinType(RelNode root, JoinRelType correctJoinType) { if (root instanceof LogicalProject project) { RelNode input = project.getInput(); RelNode fixedInput = modifyJoinType(input, correctJoinType); if (fixedInput != input) { return project.copy( project.getTraitSet(), fixedInput, project.getProjects(), project.getRowType()); } } else if (root instanceof LogicalJoin join) { if (join.getJoinType() == JoinRelType.LEFT) { return join.copy( join.getTraitSet(), join.getCondition(), join.getLeft(), join.getRight(), correctJoinType, join.isSemiJoinDone()); } + } else { + // For other node types, attempt to fix join type in inputs recursively + List<RelNode> inputs = root.getInputs(); + for (int i = 0; i < inputs.size(); i++) { + RelNode fixedInput = modifyJoinType(inputs.get(i), correctJoinType); + if (fixedInput != inputs.get(i)) { + return root.copy(root.getTraitSet(), java.util.Collections.singletonList(fixedInput)); + } + } } return root; } Suggestion importance[1-10]: 4 __ Why: The concern about intermediate nodes is valid in theory, but the method is specifically designed to handle the output of `super.convertFrom()` for LEFT SEMI/ANTI joins, which typically produces `LogicalProject` wrapping `LogicalJoin`. The improved code uses `Collections.singletonList` which would fail for nodes with multiple inputs, making it potentially worse than the original.	Low

github-actions · 2026-03-09T04:04:27Z

Persistent review updated to latest commit aca9781

This PR adds the foundation for PPL operand type validation: - Add OperandTypeChecker interface and implementations (fixed-arity, variadic, composite, etc.) - Add TypeFamily enum for categorizing SQL/PPL types - Add ValidationRule and ValidationContext for the validation pipeline - Add ValidatingRelNodeVisitor for walking Calcite rel trees - Wire validation infrastructure into CalcitePPLAbstractModule - Update expected output files for explain tests - None of the validation logic is enabled yet - it will be turned on in a subsequent PR Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

github-actions · 2026-03-10T05:59:44Z

Persistent review updated to latest commit 306e276

This was referenced Mar 9, 2026

Define operand type checkers for all PPL built-in functions #5214

Open

Enable validation pipeline and update all test expected outputs #5215

Draft

yuancu mentioned this pull request Mar 9, 2026

Use Calcite's validation system for type checking & coercion #4892

Open

49 tasks

yuancu added calcite calcite migration releated enhancement New feature or request labels Mar 9, 2026

yuancu force-pushed the subpr-a branch from aca9781 to 306e276 Compare March 10, 2026 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation infrastructure and type system#5213

Add validation infrastructure and type system#5213
yuancu wants to merge 1 commit intoopensearch-project:feature/validationfrom
yuancu:subpr-a

yuancu commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuancu commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Unit Tests

Dependency

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 306e276)

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuancu commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading

(Review updated until commit `306e276`)

github-actions bot commented Mar 9, 2026 •

edited

Loading