Skip to content

Define operand type checkers for all PPL built-in functions#5214

Open
yuancu wants to merge 1 commit intoopensearch-project:feature/validationfrom
yuancu:subpr-b
Open

Define operand type checkers for all PPL built-in functions#5214
yuancu wants to merge 1 commit intoopensearch-project:feature/validationfrom
yuancu:subpr-b

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Mar 9, 2026

Summary

Sub-PR 2/4 for #4892 — splitting the large PR into reviewable pieces targeting feature/validation.

This PR adds operand type definitions for all PPL built-in functions, simplifies the function registry, and migrates operator declarations from SqlOperator to SqlFunction. Validation is not yet enabled, so existing tests pass unchanged.

What's included

Operand Type Registry:

  • PPLOperandTypes — Shared operand type checker constants (NUMERIC, STRING, DATETIME_OPTIONAL_INTEGER, etc.)

Operator Table Changes (PPLBuiltinOperators):

  • SqlOperatorSqlFunction for all UDFs (enables Calcite's type validation)
  • ADD operator override for string concatenation vs numeric addition
  • ATAN function overloading (1-arg vs 2-arg)
  • COUNT()COUNT(*) rewriting for type inference
  • DIVIDE and MOD SqlKind definitions for coercion

Function Registry Simplification (PPLFuncImpTable):

  • Removed CalciteFuncSignature from function registry (type checking now deferred to validation phase)
  • Simplified Pair<CalciteFuncSignature, FunctionImp>FunctionImp

UDF Operand Type Annotations (65+ files):

  • All datetime, math, IP, collection/array, JSON, condition, and binning UDFs now override getOperandMetadata() with proper type checkers
  • UDFOperandMetadata — New wrap() factory method
  • UserDefinedFunctionBuilder@NonNull annotation for getOperandMetadata()

Notable function changes:

  • IP functions: Custom type checkers for IP/STRING polymorphism
  • TransformFunctionImpl / ReduceFunctionImpl: Lambda return type inference
  • JsonSetFunctionImpl: Thread-safe validation improvements
  • GeoIpFunction: String override for UDT erasure during validation

Tests

  • CalcitePPLFunctionTypeTest, CalcitePPLAggregateFunctionTypeTest, CalcitePPLEventstatsTypeTest — Validate operand types for all function categories

Dependency

Signed-off-by: Yuanchun Shen yuanchu@amazon.com

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Reviewer Guide 🔍

(Review updated until commit 7ba97fb)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Simplify function registry - remove CalciteFuncSignature and type-checker coupling

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java

Sub-PR theme: Migrate SqlOperator declarations to SqlFunction in PPLBuiltinOperators

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java

⚡ Recommended focus areas for review

Silent Override

In registerExternalOperator and registerExternalAggOperator, when a function is already registered, only a warning is logged but the new registration silently overwrites the existing one. This could lead to unexpected behavior if the same function is registered multiple times from different sources, as the last registration wins without any error.

  if (externalFunctionRegistry.containsKey(functionName)) {
    logger.warn(
        String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
  }
  externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args));
}

/**
 * Register an external aggregate operator dynamically.
 *
 * @param functionName the name of the function, has to be defined in BuiltinFunctionName
 * @param aggFunction a SqlUserDefinedAggFunction representing the aggregate function
 *     implementation
 */
public void registerExternalAggOperator(
    BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) {
  if (aggExternalFunctionRegistry.containsKey(functionName)) {
    logger.warn(
        String.format(
            Locale.ROOT, "Aggregate function %s is registered multiple times", functionName));
  }
  AggHandler handler =
      (distinct, field, argList, ctx) ->
          UserDefinedFunctionUtils.makeAggregateCall(
              aggFunction, List.of(field), argList, ctx.relBuilder);
  aggExternalFunctionRegistry.put(functionName, handler);
Coercion Logic

The coerceArgsToNumeric method skips coercion when datetime args are present, with the comment that "string args may actually be datetime-derived values like TIMESTAMP - TIMESTAMP which produces VARCHAR". This heuristic could incorrectly skip coercion in legitimate cases where both string and datetime args are present in the same function call, potentially causing type errors at runtime.

// Skip coercion if no string args, or if datetime args are present (the "string" args may
// actually be datetime-derived values like TIMESTAMP - TIMESTAMP which produces VARCHAR).
if (!hasString || hasDatetime) {
  return null;
}
ADD Coercion

The ADD function implementation creates a new doubleType per argument inside the loop using builder.getTypeFactory(), which is slightly inefficient. More importantly, the mixed-type path (neither all-string nor all-numeric) always widens to DOUBLE, which may lose precision for integer-only arithmetic that happens to include a VARCHAR field. Consider whether this is the intended behavior.

RexNode[] plusArgs = new RexNode[args.length];
for (int i = 0; i < args.length; i++) {
  RelDataType t = args[i].getType();
  RelDataType doubleType =
      builder
          .getTypeFactory()
          .createTypeWithNullability(
              builder.getTypeFactory().createSqlType(SqlTypeName.DOUBLE),
              t.isNullable());
  if (SqlTypeUtil.isNumeric(t)) {
    plusArgs[i] = builder.makeCast(doubleType, args[i]);
  } else {
    plusArgs[i] = builder.makeCast(doubleType, args[i], true, true);
  }
}
return builder.makeCall(SqlStdOperatorTable.PLUS, plusArgs);
Breaking Change

The paramTypes and paramNames methods now throw UnsupportedOperationException instead of returning empty lists. If any Calcite internal code path calls these methods (e.g., during validation or metadata introspection), this will cause a runtime exception rather than a graceful fallback. The comment says "not used in the current context" but this assumption may not hold for all Calcite versions or future upgrades.

  // This function is not used in the current context
  throw new UnsupportedOperationException(
      "paramTypes of UDFOperandMetadata is not implemented and should not be called");
}

@Override
public List<String> paramNames() {
  // This function is not used in the current context
  throw new UnsupportedOperationException(
      "paramNames of UDFOperandMetadata is not implemented and should not be called");
ATAN Rewrite

The ATAN SqlFunction overrides rewriteCall to mutate the SqlBasicCall operator via setOperator. Mutating the AST node in-place during rewriteCall can cause issues if the same call node is reused or visited multiple times (e.g., in query caching or re-validation scenarios). Consider returning a new SqlCall instead of mutating the existing one.

public static final SqlFunction ATAN =
    new SqlFunction(
        "ATAN",
        SqlKind.OTHER_FUNCTION,
        ReturnTypes.DOUBLE_NULLABLE,
        null,
        OperandTypes.NUMERIC_OPTIONAL_NUMERIC,
        SqlFunctionCategory.NUMERIC) {
      @Override
      public SqlNode rewriteCall(SqlValidator validator, SqlCall call) {
        if (call instanceof SqlBasicCall) {
          SqlOperator op =
              call.getOperandList().size() == 2
                  ? SqlStdOperatorTable.ATAN2
                  : SqlStdOperatorTable.ATAN;
          ((SqlBasicCall) call).setOperator(op);
        }
        return call;
      }
    };

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Code Suggestions ✨

Latest suggestions up to 7ba97fb

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Restore delegating interface methods in wrapped type checker

The checkOperandTypes method was removed from the wrap(SqlOperandTypeChecker)
implementation, but the interface UDFOperandMetadata extends SqlOperandMetadata
still requires it. Without this override, the default implementation from
SqlOperandTypeChecker will be used, which may not delegate to the inner typeChecker
correctly. The method should be preserved to ensure proper type checking delegation.

core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java [32-45]

 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
-  return typeChecker.checkOperandTypesWithoutTypeCoercion(callBinding, throwOnFailure);
+  return typeChecker.checkOperandTypes(callBinding, throwOnFailure);
 }
 
+@Override
+public SqlOperandCountRange getOperandCountRange() {
+  return typeChecker.getOperandCountRange();
+}
+
+@Override
+public String getAllowedSignatures(SqlOperator op, String opName) {
+  return typeChecker.getAllowedSignatures(op, opName);
+}
+
Suggestion importance[1-10]: 7

__

Why: The PR removed checkOperandTypes, getOperandCountRange, and getAllowedSignatures overrides from the wrap(SqlOperandTypeChecker) implementation. Looking at the diff, these methods were present in the old wrap(FamilyOperandTypeChecker) but are missing in the new unified wrap(SqlOperandTypeChecker). Without these overrides, the default interface implementations won't delegate to the inner typeChecker, potentially breaking type checking. However, SqlOperandTypeChecker itself has these as abstract methods, so the anonymous class would fail to compile without them — suggesting they may exist elsewhere in the file not shown in the diff.

Medium
Add missing two-argument variant to operand type checker

The NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL operand type checker does not include a
2-argument variant (NUMERIC, NUMERIC), meaning calls with only two numeric arguments
will fail validation. The original NUMERIC_NUMERIC_OPTIONAL_NUMERIC included
OperandTypes.NUMERIC_NUMERIC as the base case. This should be added back.

core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java [81-89]

 public static final UDFOperandMetadata NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL =
     UDFOperandMetadata.wrap(
-        OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY)
+        OperandTypes.NUMERIC_NUMERIC
+            .or(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY))
             .or(
                 OperandTypes.family(
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.ANY)));
Suggestion importance[1-10]: 6

__

Why: The new NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL only covers 3-argument and 4-argument forms, but the old NUMERIC_NUMERIC_OPTIONAL_NUMERIC included a 2-argument base case (OperandTypes.NUMERIC_NUMERIC). If any function using this operand type is called with only 2 numeric arguments, validation will fail. Adding the 2-argument variant back would maintain backward compatibility.

Low
Fix minimum operand count for relevance query function

The comment says "Parameters 2-25 are optional" but OperandTypes.repeat with
SqlOperandCountRanges.between(1, 25) means the minimum is 1 operand, not 2. The
original code required at least 2 MAP parameters (the query parameter and at least
one more). If the function requires at least 2 operands, the range should be
between(2, 25).

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java [42-45]

 return UDFOperandMetadata.wrap(
     OperandTypes.repeat(
-        SqlOperandCountRanges.between(1, 25),
-        OperandTypes.MAP)); // Parameters 2-25 are optional
+        SqlOperandCountRanges.between(2, 25),
+        OperandTypes.MAP)); // First 2 parameters are required, parameters 3-25 are optional
Suggestion importance[1-10]: 6

__

Why: The comment says "Parameters 2-25 are optional" but between(1, 25) allows only 1 operand. If the function requires at least a query parameter (MAP), the minimum should be at least 1, but the original code had 2 required parameters. This is a potentially meaningful semantic change worth reviewing.

Low
Fix IS_EMPTY to handle non-string types correctly

The IS_EMPTY implementation now uses EQUALS with an empty string literal, but this
will fail for non-character types (e.g., arrays, maps, numeric types) that the
previous IS_EMPTY operator handled. The original SqlStdOperatorTable.IS_EMPTY
supported collection types; replacing it with a string equality check silently
breaks IS_EMPTY for non-string fields.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1212-1219]

 register(
     IS_EMPTY,
     (FunctionImp1)
-        (builder, arg) ->
-            builder.makeCall(
+        (builder, arg) -> {
+          ExprType type = OpenSearchTypeFactory.convertRelDataTypeToExprType(arg.getType());
+          if (type == ExprCoreType.STRING) {
+            return builder.makeCall(
                 SqlStdOperatorTable.OR,
                 builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
-                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral(""))));
+                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral("")));
+          }
+          return builder.makeCall(
+              SqlStdOperatorTable.OR,
+              builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
+              builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg));
+        });
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about IS_EMPTY behavior for non-string types. However, the PR appears to be intentionally changing the semantics of IS_EMPTY to use string equality, and the improved_code correctly demonstrates the fix. The impact depends on whether IS_EMPTY is used with non-string types in practice.

Low
Guard string parsing with type check

The value.stringValue() call may throw an exception if the ExprValue is not a string
type (e.g., it's a date or timestamp type that doesn't support stringValue()). You
should guard against this by checking the value type before calling stringValue(),
or catch a more specific exception to avoid masking unrelated errors.

core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/StrftimeFunction.java [150-155]

 // Try to parse string values as numeric (handles string literals like '1521467703')
 try {
-  return Double.parseDouble(value.stringValue());
-} catch (Exception e) {
+  if (value.type() == ExprCoreType.STRING) {
+    return Double.parseDouble(value.stringValue());
+  }
+  return null;
+} catch (NumberFormatException e) {
   // Not a parseable numeric string
   return null;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion to check value.type() before calling stringValue() is reasonable, but the PR already wraps the call in a broad catch (Exception e) which would catch any exception from stringValue(). The improvement of using NumberFormatException is valid for clarity, but the type check adds more safety. Overall a minor improvement.

Low
General
Avoid double map lookup for function resolution

The external registry is queried twice when it contains the key: once for the null
check and once to retrieve the value. Use getOrDefault or a local variable to avoid
the double lookup, which is both inefficient and slightly inconsistent (e.g., if the
map were to change between calls in a concurrent context).

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [565-568]

-FunctionImp implementation =
-    externalFunctionRegistry.get(functionName) != null
-        ? externalFunctionRegistry.get(functionName)
-        : functionRegistry.get(functionName);
+FunctionImp implementation = externalFunctionRegistry.getOrDefault(
+    functionName, functionRegistry.get(functionName));
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid and correct - externalFunctionRegistry.get(functionName) is called twice when the key exists. Using getOrDefault eliminates the redundant lookup and is cleaner. The improved_code accurately reflects the change.

Low
Validate odd operand count for JSON set function

The JsonSet function requires at least 3 operands (json_string, path1, value1), but
the count must also be odd (1 json + pairs of path/value). The checkOperandTypes
method iterates with i += 2 starting from index 1, but doesn't validate that the
total count is odd. An even count (e.g., 4 operands) would leave an unpaired path
without a value, which should be rejected.

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java [57-69]

 @Override
-public SqlOperandCountRange getOperandCountRange() {
-  return SqlOperandCountRanges.from(3);
+public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
+  final int count = callBinding.getOperandCount();
+  if (count < 3 || count % 2 == 0) {
+    if (throwOnFailure) {
+      throw callBinding.newValidationSignatureError();
+    }
+    return false;
+  }
+  for (int i = 1; i < count; i += 2) {
+    RelDataType nameType = callBinding.getOperandType(i);
+    if (!OpenSearchTypeUtil.isCharacter(nameType)) {
+      if (throwOnFailure) {
+        throw callBinding.newError(RESOURCE.expectedCharacter());
+      }
+      return false;
+    }
+  }
+  return true;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that JsonSet requires an odd number of operands (1 json + N path/value pairs), and the current checkOperandTypes doesn't validate this. However, the improved_code modifies checkOperandTypes rather than getOperandCountRange, and the fix is reasonable but not critical since malformed calls would still fail at execution.

Low
Avoid treating character types as numeric in binning

Changing isNumeric to include character types via isNumericOrCharacter may cause
character/string fields to be treated as numeric in binning operations, potentially
leading to incorrect behavior in numeric binning calculations (e.g.,
SpanBucketFunction, WidthBucketFunction). If string fields are only valid for
time-based binning (after parsing), this broadening of the isNumeric flag could
bypass the unsupported-type check for truly non-numeric string fields.

core/src/main/java/org/opensearch/sql/calcite/utils/binning/BinnableField.java [37]

-this.isNumeric = OpenSearchTypeUtil.isNumericOrCharacter(fieldType);
+this.isNumeric = OpenSearchTypeUtil.isNumeric(fieldType);
Suggestion importance[1-10]: 5

__

Why: The concern about isNumericOrCharacter broadening isNumeric to include string types is valid — it could allow string fields to pass the unsupported-type check and enter numeric binning paths incorrectly. However, the PR may intentionally support string numeric values (as seen in StrftimeFunction changes), so this needs careful review.

Low
Handle null array element type explicitly in reduce

When elementType is null (e.g., the array component type cannot be determined), the
second operand's type is silently skipped. This can lead to incorrect type inference
for the lambda body. A null check with a fallback or an explicit error would be
safer to prevent silent incorrect behavior.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java [63-65]

 if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), baseType);
-if (operands.size() > 1 && elementType != null)
+if (operands.size() > 1) {
+  if (elementType == null) {
+    throw new IllegalArgumentException("Cannot determine element type of the array argument for reduce function");
+  }
   validator.setValidatedNodeType(operands.get(1), elementType);
+}
Suggestion importance[1-10]: 4

__

Why: When elementType is null, the second lambda parameter type is silently skipped, which could lead to incorrect type inference. However, the existing null check (elementType != null) is a reasonable defensive guard, and throwing an exception might break valid use cases where the array type is not yet resolved during validation.

Low
Clarify dual-dispatch coercion behavior for ATAN/ATAN2

ATAN is registered in NUMERIC_COERCE_FUNCTIONS but dispatches to ATAN2 for 2-arg
calls. Since ATAN2 is also separately registered and in NUMERIC_COERCE_FUNCTIONS,
coercion will be applied twice for a direct ATAN2 call path through ATAN. Ensure the
2-arg ATAN path and the direct ATAN2 path are not both subject to independent
coercion that could conflict.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [860-866]

+// ATAN with 1 arg -> SqlStdOperatorTable.ATAN, with 2 args -> ATAN2
+// Note: ATAN2 is also registered separately; string coercion applies to both paths via NUMERIC_COERCE_FUNCTIONS
 register(
     ATAN,
     (RexBuilder b, RexNode... a) ->
         a.length == 2
             ? b.makeCall(SqlStdOperatorTable.ATAN2, a)
             : b.makeCall(SqlStdOperatorTable.ATAN, a));
 registerOperator(ATAN2, SqlStdOperatorTable.ATAN2);
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment to clarify behavior without changing any code - the existing_code and improved_code are functionally identical. This is a documentation-only suggestion with minimal impact.

Low
Guard against null field in numeric aggregation coercion

The resolveAgg method coerces VARCHAR fields to DOUBLE for numeric aggregation
functions, but does not handle the case where field is null (e.g., COUNT()). While
the null check is present, COUNT(
) is not in NUMERIC_AGG_FUNCTIONS, so this is safe
for now. However, SUM and AVG with a null field would skip coercion silently.
Consider adding a guard or assertion to make the intent explicit and prevent future
regressions if NUMERIC_AGG_FUNCTIONS is extended.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [440-448]

 if (field != null && NUMERIC_AGG_FUNCTIONS.contains(functionName)) {
   ExprType fieldType = OpenSearchTypeFactory.convertRelDataTypeToExprType(field.getType());
   if (fieldType == ExprCoreType.STRING) {
     RelDataType doubleType =
         OpenSearchTypeFactory.convertExprTypeToRelDataType(ExprCoreType.DOUBLE);
     field = context.rexBuilder.makeCast(doubleType, field, true, true);
   }
+} else if (field == null && NUMERIC_AGG_FUNCTIONS.contains(functionName)) {
+  logger.warn(
+      String.format(Locale.ROOT, "Numeric aggregation function %s called with null field", functionName));
 }
 return implementation.apply(distinct, field, argList, context);
Suggestion importance[1-10]: 2

__

Why: The suggestion itself acknowledges that the null check is already present and the scenario is safe for now. Adding a warning log for a case that shouldn't occur in practice adds noise without meaningful benefit, making this a low-impact suggestion.

Low

Previous suggestions

Suggestions up to commit 63b01b3
CategorySuggestion                                                                                                                                    Impact
General
Avoid double lookup in external registry

The external registry is queried twice when a function is found there, which is
inefficient and could cause issues if the registry is modified concurrently. Use a
local variable to store the result of the first lookup.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [565-568]

-FunctionImp implementation =
-    externalFunctionRegistry.get(functionName) != null
-        ? externalFunctionRegistry.get(functionName)
-        : functionRegistry.get(functionName);
+FunctionImp implementation = externalFunctionRegistry.get(functionName);
+if (implementation == null) {
+  implementation = functionRegistry.get(functionName);
+}
Suggestion importance[1-10]: 7

__

Why: The externalFunctionRegistry.get(functionName) is called twice when the function is found, which is inefficient. The improved code correctly uses a single lookup with a null check, which is cleaner and avoids the double map access.

Medium
Avoid accepting string types as numeric for binning fields

The original code used OpenSearchTypeFactory.isNumericType(fieldType) which checked
only numeric types, but the new code uses
OpenSearchTypeUtil.isNumericOrCharacter(fieldType) which also accepts
character/string types as "numeric". This broadens the accepted types for binning
operations to include strings, which may allow invalid binning operations on string
fields to pass the type check silently instead of throwing a SemanticCheckException.

core/src/main/java/org/opensearch/sql/calcite/utils/binning/BinnableField.java [37]

-this.isNumeric = OpenSearchTypeUtil.isNumericOrCharacter(fieldType);
+this.isNumeric = OpenSearchTypeUtil.isNumeric(fieldType);
Suggestion importance[1-10]: 6

__

Why: The change from isNumericType to isNumericOrCharacter broadens accepted types for binning to include strings, which could allow semantically invalid binning operations on string fields to pass silently. However, the improved_code references OpenSearchTypeUtil.isNumeric which may not exist in the codebase, making the suggestion potentially incorrect.

Low
Enforce odd operand count for JSON set path/value pairs

The getOperandCountRange() returns a minimum of 3 operands, but the
checkOperandTypes loop starts at index 1 and checks only odd-indexed operands (path
strings). For a valid call with 3 operands (json, path1, value1), the count is
correct. However, the count should also enforce an odd total count (1 + pairs of
path/value), meaning valid counts are 3, 5, 7, etc. The current implementation
allows even counts like 4 which would be invalid (json + path1 + value1 + path2
without value2).

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java [72-74]

 @Override
-public SqlOperandCountRange getOperandCountRange() {
-  return SqlOperandCountRanges.from(3);
+public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
+  final int count = callBinding.getOperandCount();
+  if (!getOperandCountRange().isValidCount(count) || count % 2 == 0) {
+    if (throwOnFailure) {
+      throw callBinding.newError(RESOURCE.invalidArgCount(callBinding.getOperator().getName()));
+    }
+    return false;
+  }
+  for (int i = 1; i < count; i += 2) {
+    RelDataType nameType = callBinding.getOperandType(i);
+    if (!OpenSearchTypeUtil.isCharacter(nameType)) {
+      if (throwOnFailure) {
+        throw callBinding.newError(RESOURCE.expectedCharacter());
+      }
+      return false;
+    }
+  }
+  return true;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that JsonSet requires an odd number of operands (json + pairs of path/value), and the current implementation allows invalid even counts. However, the improved_code modifies checkOperandTypes rather than just getOperandCountRange, making it a larger change than suggested.

Low
Clarify ADD/ADDFUNCTION coercion logic consistency

ADDFUNCTION is included in ARITHMETIC_COERCE_FUNCTIONS, so the early return for
ADDFUNCTION in the if block is redundant with the final return statement. More
importantly, ADD is also in ARITHMETIC_COERCE_FUNCTIONS, meaning the special
mixed-type logic for ADD would be bypassed if the early if check weren't there. The
ARITHMETIC_COERCE_FUNCTIONS set should not contain ADD or ADDFUNCTION to avoid
confusion, or the logic should be restructured to be consistent.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [526-538]

 if (functionName == ADD || functionName == ADDFUNCTION) {
-  // For ADD: only coerce if there's a mix of string and numeric args
+  // For ADD/ADDFUNCTION: only coerce if there's a mix of string and numeric args
   boolean hasString = false;
   boolean hasNumeric = false;
   for (RexNode arg : args) {
     ExprType type = OpenSearchTypeFactory.convertRelDataTypeToExprType(arg.getType());
     if (type == ExprCoreType.STRING) hasString = true;
     else if (ExprCoreType.numberTypes().contains(type)) hasNumeric = true;
   }
   return hasString && hasNumeric;
 }
+// ARITHMETIC_COERCE_FUNCTIONS should not contain ADD/ADDFUNCTION (handled above)
 return ARITHMETIC_COERCE_FUNCTIONS.contains(functionName)
     || NUMERIC_COERCE_FUNCTIONS.contains(functionName);
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that ADD and ADDFUNCTION are in ARITHMETIC_COERCE_FUNCTIONS but also handled by the early if block, creating potential confusion. However, the improved_code is nearly identical to the existing_code with only a comment change, making the actual fix minimal.

Low
Handle null element type in reduce lambda inference

If elementType is null (e.g., when the array component type cannot be determined),
the second lambda parameter type is silently skipped. This could lead to incorrect
type inference for the lambda body. A null check with an appropriate error or
fallback should be added to handle this case explicitly.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java [60-66]

 if (function1 instanceof SqlCall call) {
   List<SqlNode> operands = call.getOperandList();
   // The first argument is base (accumulator), while the second is from the array
   if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), baseType);
-  if (operands.size() > 1 && elementType != null)
+  if (operands.size() > 1) {
+    if (elementType == null) {
+      throw new IllegalArgumentException("Cannot determine element type of the array argument in reduce function");
+    }
     validator.setValidatedNodeType(operands.get(1), elementType);
+  }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - silently skipping the second lambda parameter type when elementType is null could lead to incorrect type inference. However, throwing an exception might be too strict since some array types may legitimately have null component types, and the existing null check was intentional.

Low
Guard string-to-numeric parsing against non-string value types

The value.stringValue() call may throw an exception if the ExprValue is not a string
type (e.g., it's a date or timestamp type that doesn't support stringValue()). You
should guard against this by checking the value type before calling stringValue(),
or catch a more specific exception path. Additionally, calling stringValue() on a
non-string ExprValue could produce unexpected results like a formatted date string
being parsed as a number.

core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/StrftimeFunction.java [150-155]

 // Try to parse string values as numeric (handles string literals like '1521467703')
 try {
-  return Double.parseDouble(value.stringValue());
+  String strVal = value.stringValue();
+  return Double.parseDouble(strVal);
+} catch (NumberFormatException e) {
+  // Not a parseable numeric string
+  return null;
 } catch (Exception e) {
-  // Not a parseable numeric string
+  // Not a string-compatible value
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion to use NumberFormatException instead of a generic Exception is a minor improvement, but the existing code already catches all exceptions. The concern about stringValue() throwing on non-string types is valid but low risk since the method is only reached for string-type values based on the operand type checker.

Low
Remove unreachable multi-arg fallback in TOSTRING

The TOSTRING registration no longer registers PPLBuiltinOperators.TOSTRING as a
separate entry (the old registerOperator(TOSTRING, PPLBuiltinOperators.TOSTRING) was
removed), but the fallback to builder.makeCall(PPLBuiltinOperators.TOSTRING, args)
for args.length != 1 is still present. If TOSTRING is only ever called with 1
argument in PPL, the multi-arg branch is dead code; if it can be called with
multiple args, the behavior change (from always registering the operator to only
using it as fallback) should be documented.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1016-1024]

 register(
     TOSTRING,
-    (builder, args) -> {
-      if (args.length == 1) {
-        return builder.makeCast(
-            TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR, true), args[0]);
-      }
-      return builder.makeCall(PPLBuiltinOperators.TOSTRING, args);
-    });
+    (builder, args) ->
+        builder.makeCast(
+            TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR, true), args[0]));
Suggestion importance[1-10]: 3

__

Why: The suggestion to simplify TOSTRING to only handle single-arg case may be valid if multi-arg is never used, but removing the fallback could break edge cases. The improved code is more aggressive than necessary and may not be safe without verifying all call sites.

Low
Possible issue
Preserve two-argument case in operand type checker

The rename from NUMERIC_NUMERIC_OPTIONAL_NUMERIC to
NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL changes the semantics: the original allowed
2 or 3 numeric arguments, but the new version requires exactly 3 or 4 arguments
(always including an ANY symbol). This means calls with only 2 numeric arguments
(e.g., percentile(field, 50)) will no longer match and may fail at runtime. The
2-argument case should be preserved.

core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java [81-89]

 public static final UDFOperandMetadata NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL =
     UDFOperandMetadata.wrap(
-        OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY)
+        OperandTypes.NUMERIC_NUMERIC
+            .or(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY))
             .or(
                 OperandTypes.family(
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.ANY)));
Suggestion importance[1-10]: 6

__

Why: The rename from NUMERIC_NUMERIC_OPTIONAL_NUMERIC to NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL changes the semantics by requiring a minimum of 3 arguments (always with an ANY symbol), potentially breaking 2-argument calls like percentile(field, 50). Adding the 2-argument case back would preserve backward compatibility.

Low
Fix minimum operand count for relevance query function

The comment says "Parameters 2-25 are optional" but OperandTypes.repeat with
SqlOperandCountRanges.between(1, 25) means the minimum is 1 operand, not 2. The
original code required at least 2 MAP parameters (the query parameter plus at least
one more). If the function requires at least 2 parameters, the range should be
between(2, 25).

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java [42-45]

 return UDFOperandMetadata.wrap(
     OperandTypes.repeat(
-        SqlOperandCountRanges.between(1, 25),
-        OperandTypes.MAP)); // Parameters 2-25 are optional
+        SqlOperandCountRanges.between(2, 25),
+        OperandTypes.MAP)); // First parameter is required, parameters 2-25 are optional
Suggestion importance[1-10]: 6

__

Why: The comment says "Parameters 2-25 are optional" but SqlOperandCountRanges.between(1, 25) allows only 1 operand. If the function semantically requires at least 2 MAP parameters (query + at least one field), this could be a functional regression from the original code that required at least 2 parameters.

Low
Fix IS_EMPTY to handle non-string types

The IS_EMPTY implementation now uses EQUALS with an empty string literal, but this
will fail for non-string types (e.g., arrays, maps) that were previously supported
via IS_EMPTY. The original SqlStdOperatorTable.IS_EMPTY operator handled collection
types; the new implementation only handles strings and will throw a type error for
other types.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1208-1215]

 register(
     IS_EMPTY,
     (FunctionImp1)
-        (builder, arg) ->
-            builder.makeCall(
+        (builder, arg) -> {
+          RelDataType type = arg.getType();
+          if (SqlTypeUtil.isCharacter(type)) {
+            return builder.makeCall(
                 SqlStdOperatorTable.OR,
                 builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
-                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral(""))));
+                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral("")));
+          }
+          return builder.makeCall(
+              SqlStdOperatorTable.OR,
+              builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
+              builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg));
+        });
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about IS_EMPTY behavior for non-string types. However, the PR intentionally changed IS_EMPTY to use EQUALS with empty string, and the improved code reintroduces SqlStdOperatorTable.IS_EMPTY which may have been removed intentionally. The impact depends on whether non-string types need to be supported.

Low
Restore missing interface method delegations

The checkOperandTypes method was removed from the wrap(SqlOperandTypeChecker)
implementation, but the interface UDFOperandMetadata extends SqlOperandMetadata
requires it. Without this override, the default implementation from
SqlOperandTypeChecker will be used, which may not delegate to the inner typeChecker
correctly. The method should be added back to delegate to
typeChecker.checkOperandTypes(callBinding, throwOnFailure).

core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java [32-45]

 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
-  return typeChecker.checkOperandTypesWithoutTypeCoercion(callBinding, throwOnFailure);
+  return typeChecker.checkOperandTypes(callBinding, throwOnFailure);
 }
 
+@Override
+public SqlOperandCountRange getOperandCountRange() {
+  return typeChecker.getOperandCountRange();
+}
+
+@Override
+public String getAllowedSignatures(SqlOperator op, String opName) {
+  return typeChecker.getAllowedSignatures(op, opName);
+}
+
Suggestion importance[1-10]: 2

__

Why: Looking at the PR diff more carefully, the checkOperandTypes, getOperandCountRange, and getAllowedSignatures methods are still present in the new hunk (they appear after line 45 in the __new hunk__ section). The suggestion incorrectly assumes these methods were removed. The existing_code snippet doesn't match the actual new hunk content.

Low
Suggestions up to commit bc59f3a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve two-argument variant in operand type checker

The rename from NUMERIC_NUMERIC_OPTIONAL_NUMERIC to
NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL changes the semantics: the original accepted
(NUMERIC, NUMERIC) or (NUMERIC, NUMERIC, NUMERIC), but the new version requires a
mandatory third ANY argument and adds an optional fourth. This means a two-argument
call like percentile(field, 50) would no longer match the first alternative,
potentially breaking existing queries.

core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java [81-89]

 public static final UDFOperandMetadata NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL =
     UDFOperandMetadata.wrap(
-        OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY)
+        OperandTypes.NUMERIC_NUMERIC
+            .or(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.NUMERIC, SqlTypeFamily.ANY))
             .or(
                 OperandTypes.family(
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.NUMERIC,
                     SqlTypeFamily.ANY)));
Suggestion importance[1-10]: 7

__

Why: The new NUMERIC_NUMERIC_OPTIONAL_NUMERIC_SYMBOL requires at least 3 arguments (NUMERIC, NUMERIC, ANY), dropping the original 2-argument (NUMERIC, NUMERIC) variant. This could break percentile(field, 50) calls that only pass two arguments, which is a meaningful behavioral regression.

Medium
Preserve non-string type support in IS_EMPTY

The IS_EMPTY implementation now uses EQUALS with an empty string literal, but this
will fail for non-string types (e.g., arrays, maps) that the original IS_EMPTY
operator supported. The original code used SqlStdOperatorTable.IS_EMPTY which
handles collection types. This change narrows the behavior to strings only and may
cause runtime errors for other types.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1190-1197]

 register(
     IS_EMPTY,
     (FunctionImp1)
-        (builder, arg) ->
-            builder.makeCall(
+        (builder, arg) -> {
+          RelDataType type = arg.getType();
+          if (SqlTypeUtil.isCharacter(type)) {
+            return builder.makeCall(
                 SqlStdOperatorTable.OR,
                 builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
-                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral(""))));
+                builder.makeCall(SqlStdOperatorTable.EQUALS, arg, builder.makeLiteral("")));
+          }
+          return builder.makeCall(
+              SqlStdOperatorTable.OR,
+              builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
+              builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg));
+        });
Suggestion importance[1-10]: 5

__

Why: The change from SqlStdOperatorTable.IS_EMPTY to EQUALS with an empty string literal may break non-string types. However, the PR appears to be intentionally changing this behavior, and the suggestion's improved_code introduces a type-conditional branch that may be the correct fix for maintaining backward compatibility.

Low
Fix inconsistent UDT character type handling

The method name isNumericOrCharacter is misleading because for UDT types it only
checks numeric types (not character types). A UDT string type would return false
even though the method name implies character types are included. Either rename the
method or add character type checking for UDTs as well.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java [41-55]

 public static boolean isNumericOrCharacter(RelDataType fieldType) {
   // Check for OpenSearch UDT numeric types
   if (isUserDefinedType(fieldType)) {
     AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType;
     ExprType udtType = exprType.getExprType();
-    return ExprCoreType.numberTypes().contains(udtType);
+    return ExprCoreType.numberTypes().contains(udtType)
+        || udtType == ExprCoreType.STRING;
   }
 
   // Check standard SQL numeric types & string types (VARCHAR, CHAR)
-  if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) {
-    return true;
-  }
-
-  return false;
+  return SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType);
 }
Suggestion importance[1-10]: 5

__

Why: The method isNumericOrCharacter only checks numeric UDT types but not string/character UDT types, creating an inconsistency with the method name. However, this may be intentional design depending on how the method is used in the codebase.

Low
Preserve explicit operand type checking delegation

The old wrap(FamilyOperandTypeChecker) and wrap(CompositeOperandTypeChecker)
overloads both called checkOperandTypesWithoutTypeCoercion, but the new unified
wrap(SqlOperandTypeChecker) no longer overrides checkOperandTypes. The
SqlOperandMetadata default implementation delegates to the inner checker's
checkOperandTypes, which may apply type coercion. This is a behavioral change that
could allow previously-rejected type combinations to pass silently.

core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java [32-45]

 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
-  return typeChecker.checkOperandTypesWithoutTypeCoercion(callBinding, throwOnFailure);
+  return typeChecker.checkOperandTypes(callBinding, throwOnFailure);
 }
 
+@Override
+public SqlOperandCountRange getOperandCountRange() {
+  return typeChecker.getOperandCountRange();
+}
+
+@Override
+public String getAllowedSignatures(SqlOperator op, String opName) {
+  return typeChecker.getAllowedSignatures(op, opName);
+}
+
Suggestion importance[1-10]: 3

__

Why: The suggestion points out that the new wrap(SqlOperandTypeChecker) no longer overrides checkOperandTypes, but looking at the diff, the old code did override it with checkOperandTypesWithoutTypeCoercion. However, the PR intentionally removes strict type checking in favor of Calcite's type coercion, as evidenced by the test changes. The existing_code in the suggestion doesn't match the new hunk (the override was removed), making this suggestion partially inaccurate.

Low
General
Avoid double map lookup for external registry

The external registry is queried twice when it contains the key, which is
inefficient and could be a problem if the map is modified concurrently. Use a local
variable to store the result of the first lookup and fall back to the internal
registry only when it's null.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [552-555]

-FunctionImp implementation =
-    externalFunctionRegistry.get(functionName) != null
-        ? externalFunctionRegistry.get(functionName)
-        : functionRegistry.get(functionName);
+FunctionImp implementation = externalFunctionRegistry.get(functionName);
+if (implementation == null) {
+  implementation = functionRegistry.get(functionName);
+}
Suggestion importance[1-10]: 6

__

Why: The externalFunctionRegistry.get(functionName) is called twice when the key exists, which is inefficient. The improved code stores the result in a local variable and falls back to the internal registry only when null, which is cleaner and more efficient.

Low
Throw validation error when throwOnFailure is true

When throwOnFailure is true and the operand types are invalid, the method silently
returns false instead of throwing a validation error. This is inconsistent with the
Calcite SqlOperandTypeChecker contract, which expects an exception to be thrown when
throwOnFailure is true. Add error throwing when throwOnFailure is true and
validation fails.

opensearch/src/main/java/org/opensearch/sql/opensearch/functions/GeoIpFunction.java [70-81]

 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
   if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) {
+    if (throwOnFailure) {
+      throw callBinding.newValidationSignatureError();
+    }
     return false;
   }
   boolean valid =
       OpenSearchTypeUtil.isCharacter(callBinding.getOperandType(0))
           && OpenSearchTypeUtil.isIp(callBinding.getOperandType(1), true);
   if (callBinding.getOperandCount() == 3) {
     valid = valid && OpenSearchTypeUtil.isCharacter(callBinding.getOperandType(2));
   }
+  if (!valid && throwOnFailure) {
+    throw callBinding.newValidationSignatureError();
+  }
   return valid;
 }
Suggestion importance[1-10]: 6

__

Why: The checkOperandTypes implementation does not throw an exception when throwOnFailure is true and validation fails, which violates the Calcite SqlOperandTypeChecker contract and could lead to silent validation failures or confusing error messages.

Low
Validate first JSON operand type in setter

The checkOperandTypes method validates that odd-indexed operands (path arguments)
are character types, but it does not validate that the first operand (the JSON
string, index 0) is also a character/string type. This means a non-string first
argument would pass type checking silently. Add a check for the first operand to
ensure it is a character type.

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java [57-69]

 @Override
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
   final int count = callBinding.getOperandCount();
+  if (!OpenSearchTypeUtil.isCharacter(callBinding.getOperandType(0))) {
+    if (throwOnFailure) {
+      throw callBinding.newError(RESOURCE.expectedCharacter());
+    }
+    return false;
+  }
   for (int i = 1; i < count; i += 2) {
     RelDataType nameType = callBinding.getOperandType(i);
     if (!OpenSearchTypeUtil.isCharacter(nameType)) {
       if (throwOnFailure) {
         throw callBinding.newError(RESOURCE.expectedCharacter());
       }
       return false;
     }
   }
   return true;
 }
Suggestion importance[1-10]: 5

__

Why: The checkOperandTypes method does not validate that the first operand (the JSON string at index 0) is a character type, which could allow invalid inputs to pass type checking silently. This is a valid correctness concern, though it may be handled elsewhere in the pipeline.

Low
Handle null array element type explicitly

If elementType is null (e.g., when the array component type cannot be determined),
the second lambda parameter type is silently skipped. This could lead to incorrect
type inference for the lambda body. A null elementType should be handled explicitly,
either by throwing an informative error or by using a fallback type.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java [60-66]

 if (function1 instanceof SqlCall call) {
   List<SqlNode> operands = call.getOperandList();
   // The first argument is base (accumulator), while the second is from the array
   if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), baseType);
-  if (operands.size() > 1 && elementType != null)
+  if (operands.size() > 1) {
+    if (elementType == null) {
+      throw new IllegalArgumentException(
+          "Cannot determine element type of the array argument in reduce function");
+    }
     validator.setValidatedNodeType(operands.get(1), elementType);
+  }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that a null elementType is silently skipped, which could lead to incorrect type inference. Adding explicit null handling would improve robustness, though the impact depends on whether null element types are a realistic scenario.

Low
Narrow exception catch to NumberFormatException

Catching the broad Exception type here could mask unexpected errors. The
Double.parseDouble method only throws NumberFormatException, so the catch should be
narrowed to NumberFormatException to avoid silently swallowing other unexpected
runtime exceptions.

core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/StrftimeFunction.java [150-155]

 // Try to parse string values as numeric (handles string literals like '1521467703')
 try {
   return Double.parseDouble(value.stringValue());
-} catch (Exception e) {
+} catch (NumberFormatException e) {
   // Not a parseable numeric string
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: Catching broad Exception instead of NumberFormatException could mask unexpected runtime errors. Narrowing the catch type is a good practice, though the risk here is low since Double.parseDouble only throws NumberFormatException.

Low
Validate argument count for TOSTRING function

The TOSTRING function is registered only once now (the previous duplicate
registerOperator(TOSTRING, ...) was removed), but the fallback to
PPLBuiltinOperators.TOSTRING for args.length != 1 is unclear and may cause issues if
TOSTRING only accepts one argument. If TOSTRING is always a single-argument
function, the else branch should throw an IllegalArgumentException instead of
silently delegating to an operator that may not handle multiple args.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1000-1006]

 register(
     TOSTRING,
     (builder, args) -> {
       if (args.length == 1) {
         return builder.makeCast(
             TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR, true), args[0]);
       }
-      return builder.makeCall(PPLBuiltinOperators.TOSTRING, args);
+      throw new IllegalArgumentException(
+          String.format(Locale.ROOT, "TOSTRING expects 1 argument, but got %d", args.length));
     });
Suggestion importance[1-10]: 3

__

Why: The suggestion to throw an exception for non-single-argument TOSTRING calls is reasonable, but the fallback to PPLBuiltinOperators.TOSTRING may be intentional for multi-arg cases. The impact is low as this is a design choice rather than a clear bug.

Low
Coerce string arguments in argList for numeric aggregations

The resolveAgg method only coerces the field argument for numeric agg functions, but
argList may also contain string arguments that need coercion (e.g., for
PERCENTILE_APPROX or other functions in NUMERIC_AGG_FUNCTIONS). This could lead to
type errors at runtime when additional arguments are strings.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [440-448]

 if (field != null && NUMERIC_AGG_FUNCTIONS.contains(functionName)) {
   ExprType fieldType = OpenSearchTypeFactory.convertRelDataTypeToExprType(field.getType());
   if (fieldType == ExprCoreType.STRING) {
     RelDataType doubleType =
         OpenSearchTypeFactory.convertExprTypeToRelDataType(ExprCoreType.DOUBLE);
     field = context.rexBuilder.makeCast(doubleType, field, true, true);
   }
+  // Also coerce string args in argList if needed
+  argList = argList.stream().map(arg -> {
+    ExprType argType = OpenSearchTypeFactory.convertRelDataTypeToExprType(arg.getType());
+    if (argType == ExprCoreType.STRING) {
+      RelDataType doubleType =
+          OpenSearchTypeFactory.convertExprTypeToRelDataType(ExprCoreType.DOUBLE);
+      return context.rexBuilder.makeCast(doubleType, arg, true, true);
+    }
+    return arg;
+  }).collect(java.util.stream.Collectors.toList());
 }
 return implementation.apply(distinct, field, argList, context);
Suggestion importance[1-10]: 3

__

Why: The suggestion to also coerce argList string arguments is potentially valid, but PERCENTILE_APPROX and similar functions in NUMERIC_AGG_FUNCTIONS have specific argument handling. The argList for these functions typically contains non-field arguments (like percentile values), and blindly coercing all of them to DOUBLE could cause issues.

Low
Fix misleading comment on operand count range

The old code had two overloads accepting 14 and 25 MAP parameters respectively, with
the first parameter always required. The new OperandTypes.repeat with between(1, 25)
allows as few as 1 MAP operand, but the comment says "Parameters 2-25 are optional",
implying at least 1 is required. However, the original code required at least 1 MAP
(the query parameter), so the minimum should be 1, which is correct. But the comment
is misleading — it should say "Parameter 1 is required, parameters 2-25 are
optional". More critically, verify that OperandTypes.repeat with MAP correctly
matches the actual MAP family type used by relevance query arguments, as the
original code used SqlTypeFamily.MAP explicitly.

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java [42-45]

 return UDFOperandMetadata.wrap(
     OperandTypes.repeat(
         SqlOperandCountRanges.between(1, 25),
-        OperandTypes.MAP)); // Parameters 2-25 are optional
+        OperandTypes.MAP)); // Parameter 1 is required, parameters 2-25 are optional
Suggestion importance[1-10]: 1

__

Why: The suggestion only changes a comment from "Parameters 2-25 are optional" to "Parameter 1 is required, parameters 2-25 are optional". This is a trivial comment improvement with minimal impact on correctness or functionality.

Low
Suggestions up to commit bc59f3a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect return type inference for reduce function

When sqlOperatorBinding is a RexCallBinding, the return type is inferred as the last
operand's type. However, for reduce(array, init, mergeFn) the last operand is a
lambda/function type, not the actual return type. This would return an incorrect
type (e.g., FUNCTION type) instead of the actual accumulated result type,
potentially causing downstream type errors.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java [49-51]

-if (sqlOperatorBinding instanceof RexCallBinding) {
-  return sqlOperatorBinding.getOperandType(sqlOperatorBinding.getOperandCount() - 1);
+if (sqlOperatorBinding instanceof RexCallBinding rexCallBinding) {
+  // The return type is the type of the initial accumulator (second operand)
+  // or the result of the merge lambda, which matches the accumulator type
+  return rexCallBinding.getOperandType(1);
 } else if (sqlOperatorBinding instanceof SqlCallBinding callBinding) {
Suggestion importance[1-10]: 7

__

Why: The RexCallBinding branch returns the last operand's type, which for reduce(array, init, mergeFn) would be a function/lambda type rather than the actual accumulated result type. Returning the second operand's type (the initial accumulator) would be more semantically correct.

Medium
Avoid breaking change in interface method implementations

**Changing paramTypes and paramNames from returning empty lists to throwing
Unsupporte...

@yuancu yuancu added calcite calcite migration releated enhancement New feature or request labels Mar 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Persistent review updated to latest commit 3f5398a

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 081f077.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java475mediumLarge-scale removal of explicit type checking and ExpressionEvaluationException throwing for invalid function argument types. Previously, calling functions with wrong argument types threw descriptive exceptions; now requests pass through to Calcite's permissive defaults. Multiple tests that previously verified type-safety guardrails have been changed to verify no exception is thrown. While this appears to be a legitimate architectural refactoring (delegating to Calcite), the systematic removal of input validation across dozens of functions warrants review to confirm no bypass of security-relevant type checks.
opensearch/src/main/java/org/opensearch/sql/opensearch/functions/GeoIpFunction.java129lowThe `fetchIpEnrichment` method has been made public and a new overload accepting a raw String (instead of validated ExprIpValue) has been added. This expands the public API surface for IP geolocation lookups and bypasses the ExprIpValue wrapper that may have provided validation. The change has a plausible legitimate purpose (internal refactoring) but increases exposure of IP lookup functionality.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java1038lowIS_EMPTY and IS_BLANK function semantics changed from using SqlStdOperatorTable.IS_EMPTY (SQL collection emptiness check) to EQUALS with empty string literal. While functionally similar for strings, this semantic change could affect logic that relies on the previous behavior (e.g., null handling differences) if these functions are used in security-relevant filter conditions. The test resource files confirm this behavioral change in generated query plans.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 081f077

@yuancu yuancu force-pushed the subpr-b branch 2 times, most recently from e4efe64 to bc59f3a Compare March 11, 2026 11:20
@github-actions
Copy link
Contributor

Persistent review updated to latest commit bc59f3a

1 similar comment
@github-actions
Copy link
Contributor

Persistent review updated to latest commit bc59f3a

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 63b01b3

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7ba97fb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant