Skip to content

Add time conversion functions for convert command#5210

Open
ritvibhatt wants to merge 22 commits intoopensearch-project:mainfrom
ritvibhatt:convert-time-functions
Open

Add time conversion functions for convert command#5210
ritvibhatt wants to merge 22 commits intoopensearch-project:mainfrom
ritvibhatt:convert-time-functions

Conversation

@ritvibhatt
Copy link
Contributor

@ritvibhatt ritvibhatt commented Mar 6, 2026

Description

Implement ctime, mktime, mstime, and dur2sec conversion functions for the PPL convert command, along with timeformat parameter support.

Changes

  • ctime(field) — converts UNIX epoch timestamps to human-readable time strings
  • mktime(field) — converts time strings to UNIX epoch timestamps
  • mstime(field) — converts [MM:]SS.SSS time strings to total seconds
  • dur2sec(field) — converts HH:MM:SS duration strings to total seconds
  • timeformat parameter — allows ctime and mktime to use custom strftime format specifiers (default: %m/%d/%Y %H:%M:%S)
  • Register all four functions in PPLFuncImpTable
  • Preserve timeformat parameter in PPLQueryDataAnonymizer output
  • Add ANY_OPTIONAL_STRING operand type for functions accepting 1 or 2 arguments
  • Add toJavaPattern() utility to convert strftime specifiers to Java DateTimeFormatter patterns

Testing

  • Unit tests for all four conversion functions
  • PPL parser tests for convert with timeformat
  • Anonymizer tests for convert command variants
  • Integration tests for all functions with default and custom timeformat

Related Issues

Related to #5031

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit 7394650)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add ctime() and mktime() with timeformat support

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/CTimeConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
  • core/src/main/java/org/opensearch/sql/ast/tree/Convert.java
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java

Sub-PR theme: Add mstime() and dur2sec() conversion functions

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/Dur2SecConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

⚡ Recommended focus areas for review

Logic Order Bug

In convertWithFormat, CTimeConvertFunction.toEpochSeconds(value) is called before checking value == null. The toEpochSeconds method handles null internally, but the subsequent null check on value is unreachable if toEpochSeconds already returned null. More critically, if the input is a valid numeric string like "1066473433", it will be returned as-is as a Double without attempting date parsing — this may be intentional but could silently skip parsing for numeric-looking date strings.

public static Object convertWithFormat(Object value, Object timeFormatObj) {
  Double numeric = CTimeConvertFunction.toEpochSeconds(value);
  if (numeric != null) {
    return numeric;
  }
  if (value == null) {
    return null;
  }
  String str = value instanceof String ? ((String) value).trim() : value.toString().trim();
  if (str.isEmpty()) {
    return null;
  }

  String strftimeFormat =
      (timeFormatObj != null) ? timeFormatObj.toString().trim() : DEFAULT_FORMAT;
  if (strftimeFormat.isEmpty()) {
    return null;
  }
  return parseWithFormat(str, strftimeFormat);
Missing Minutes Validation

The applyConversion method validates that seconds >= 60 returns null, but there is no upper bound validation on the minutes field. For mstime, minutes can be arbitrarily large (e.g., "999:59.999" would be accepted), which may not match the expected [MM:]SS.SSS semantics. Consider whether minutes should be bounded (e.g., < 60).

int minutes = matcher.group(1) != null ? Integer.parseInt(matcher.group(1)) : 0;
int seconds = Integer.parseInt(matcher.group(2));

if (seconds >= 60) {
  return null;
}

double millis = 0.0;
if (matcher.group(3) != null) {
  String milliStr = matcher.group(3);
  // Pad to 3 digits
  while (milliStr.length() < 3) {
    milliStr += "0";
  }
  millis = Double.parseDouble(milliStr.substring(0, 3)) / 1000.0;
}

return (double) (minutes * 60 + seconds) + millis;
Incomplete Strftime Mapping

The STRFTIME_TO_JAVA_PARSE map is missing several common strftime specifiers such as %j (day of year), %Z (timezone name), %z (UTC offset), %e (day with space padding), %n (newline), and %t (tab). Unsupported specifiers are passed through unchanged (e.g., %j stays as %j), which will cause DateTimeFormatter to throw an IllegalArgumentException at runtime — this is silently caught and returns null, but users may not understand why their format string fails.

public static String toJavaPattern(String strftimeFormat) {
  Matcher m = Pattern.compile("%[A-Za-z%]").matcher(strftimeFormat);
  StringBuilder sb = new StringBuilder();
  while (m.find()) {
    String replacement = STRFTIME_TO_JAVA_PARSE.getOrDefault(m.group(), m.group());
    m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
  }
  m.appendTail(sb);
  return sb.toString();
}
Timeformat Applied Globally

The timeFormat is passed to all function conversions uniformly. For non-time functions like auto(), num(), etc., the resolveConvertFunction method falls through to the single-argument resolve path. However, if a user specifies timeformat alongside non-time functions, there is no warning or error — the parameter is silently ignored for those functions. This could be confusing behavior worth documenting or validating.

private RexNode resolveConvertFunction(
    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {

  // Time functions that support timeformat parameter
  Set<String> timeFunctions = Set.of("ctime", "mktime");

  if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
    // For time functions with custom timeformat, pass the format as a second parameter
    RexNode timeFormatLiteral = context.rexBuilder.makeLiteral(timeFormat);
    return PPLFuncImpTable.INSTANCE.resolve(
        context.rexBuilder, functionName, sourceField, timeFormatLiteral);
  } else {
    // Regular conversion functions or time functions without custom format
    return PPLFuncImpTable.INSTANCE.resolve(context.rexBuilder, functionName, sourceField);
  }
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Suggestions ✨

Latest suggestions up to 7394650

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null check ordering and numeric passthrough logic

The null check for value comes after toEpochSeconds(value) is called, but
toEpochSeconds already handles null by returning null. However, the logic is
inverted: if value is a numeric string like "1066473433", toEpochSeconds will
successfully parse it and return it as a double, bypassing the date-string parsing
entirely. This means a numeric string will be returned as-is rather than being
treated as a date string to parse. The null check should be moved before the
toEpochSeconds call, and numeric passthrough should be a deliberate, documented
behavior.

core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java [75-82]

 public static Object convertWithFormat(Object value, Object timeFormatObj) {
-  Double numeric = CTimeConvertFunction.toEpochSeconds(value);
-  if (numeric != null) {
-    return numeric;
-  }
   if (value == null) {
     return null;
   }
+  // If value is already numeric, return it directly as epoch seconds
+  Double numeric = CTimeConvertFunction.toEpochSeconds(value);
+  if (numeric != null && !(value instanceof String)) {
+    return numeric;
+  }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about the ordering of the null check relative to toEpochSeconds, but the existing test assertEquals(1066473433.0, MkTimeConvertFunction.convert(1066473433)) and assertEquals(1066473433.0, MkTimeConvertFunction.convert("1066473433")) show that numeric passthrough is intentional behavior. The improved_code changes the semantics by only bypassing date parsing for non-String numerics, which would break the numeric string passthrough that tests explicitly verify.

Low
General
Cache compiled regex pattern as static constant

The Pattern.compile call inside toJavaPattern is executed every time the method is
called. Since this pattern is constant, it should be compiled once as a static field
to improve performance, especially when processing many rows.

core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java [278-287]

+private static final Pattern STRFTIME_SPECIFIER_PATTERN = Pattern.compile("%[A-Za-z%]");
+
 public static String toJavaPattern(String strftimeFormat) {
-  Matcher m = Pattern.compile("%[A-Za-z%]").matcher(strftimeFormat);
+  Matcher m = STRFTIME_SPECIFIER_PATTERN.matcher(strftimeFormat);
   StringBuilder sb = new StringBuilder();
   while (m.find()) {
     String replacement = STRFTIME_TO_JAVA_PARSE.getOrDefault(m.group(), m.group());
     m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
   }
   m.appendTail(sb);
   return sb.toString();
 }
Suggestion importance[1-10]: 5

__

Why: Compiling a Pattern inside a method that could be called per-row is a genuine performance concern. Moving it to a static constant is a standard Java best practice, and the improved_code accurately reflects the change.

Low
Extract repeated set creation to a static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially since
it's called in a loop over all conversions.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+
Suggestion importance[1-10]: 4

__

Why: The timeFunctions set is indeed recreated on every call to resolveConvertFunction, and extracting it to a static constant is a valid performance improvement. The improved_code accurately reflects the change needed, though the impact is minor since this is not a hot path.

Low

Previous suggestions

Suggestions up to commit 39e95ec
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null check ordering in conversion method

The null check for value comes after toEpochSeconds(value) is called, but
toEpochSeconds already handles null internally. However, the order of operations is
logically incorrect: if value is null, toEpochSeconds returns null, and then the
code falls through to the null check below — which is fine but redundant. The real
bug is that a numeric string like "1066473433" will be parsed as a number by
toEpochSeconds and returned directly, bypassing the date-string parsing path
entirely. This means mktime cannot distinguish between a numeric epoch value and a
date string that happens to look like a number, which may be intentional, but it
should be clearly documented. More critically, if value is a non-null non-numeric
non-string type, value.toString() is called without a null check after the numeric
path fails.

core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java [75-82]

 public static Object convertWithFormat(Object value, Object timeFormatObj) {
+  if (value == null) {
+    return null;
+  }
   Double numeric = CTimeConvertFunction.toEpochSeconds(value);
   if (numeric != null) {
     return numeric;
   }
-  if (value == null) {
+  String str = value instanceof String ? ((String) value).trim() : value.toString().trim();
+  if (str.isEmpty()) {
     return null;
   }
Suggestion importance[1-10]: 4

__

Why: The null check for value after calling toEpochSeconds(value) is redundant since toEpochSeconds already handles null internally. Moving the null check before the toEpochSeconds call is a minor code clarity improvement, but there's no actual bug since toEpochSeconds(null) returns null and the subsequent null check handles it correctly.

Low
General
Extract repeated set creation to static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially since
this method may be called for every field in every row during query planning.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+
Suggestion importance[1-10]: 4

__

Why: The timeFunctions set is recreated on every call to resolveConvertFunction. Extracting it to a private static final constant is a valid performance and code quality improvement, though the impact is minimal since this runs at query planning time, not per-row.

Low
Add missing minutes range validation

The mstime function validates that seconds >= 60 is invalid, but does not validate
that minutes has a reasonable upper bound. According to the test
assertEquals(3661.0, MsTimeConvertFunction.convert("61:01")), minutes of 61 is
accepted, which may be intentional. However, the pattern allows up to 2 digits for
minutes (\d{1,2}), so values like 99:59 would be accepted. If the intent is to
represent a time duration where minutes should be bounded (e.g., < 60), the
validation is missing.

core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java [47-49]

-if (seconds >= 60) {
+if (minutes >= 60 || seconds >= 60) {
   return null;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion contradicts the existing test assertEquals(3661.0, MsTimeConvertFunction.convert("61:01")) which explicitly validates that minutes >= 60 is accepted. Adding minutes >= 60 validation would break this test and the intended behavior of mstime as a duration format where minutes can exceed 59.

Low
Suggestions up to commit 8d3b734
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null check ordering before numeric conversion

The null check for value comes after toEpochSeconds(value) is called, but
toEpochSeconds already handles null by returning null. However, the logic is
inverted: if the input is already a numeric value (e.g., an integer epoch), it
returns it directly without attempting string parsing, which is correct. But if
value is null, toEpochSeconds returns null and then the code falls through to the
null check — this is fine but redundant. The real bug is that a numeric string like
"1066473433" will be parsed as a double by toEpochSeconds and returned directly,
bypassing the format parsing entirely, which may be intentional but should be
documented clearly. More critically, if value is a non-null non-numeric non-string
object, value.toString() is called without null check after the null guard — this is
safe, but the ordering of null check after toEpochSeconds call is confusing and
could mask intent.

core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java [75-82]

 public static Object convertWithFormat(Object value, Object timeFormatObj) {
+  if (value == null) {
+    return null;
+  }
   Double numeric = CTimeConvertFunction.toEpochSeconds(value);
   if (numeric != null) {
     return numeric;
   }
-  if (value == null) {
-    return null;
-  }
Suggestion importance[1-10]: 5

__

Why: Moving the null check before toEpochSeconds(value) improves code clarity and avoids a redundant null check after the call. The current code is functionally correct since toEpochSeconds handles null, but the ordering is confusing and the improved version is cleaner.

Low
General
Cache compiled regex pattern as static constant

The Pattern.compile call inside toJavaPattern is executed on every invocation, which
is inefficient. The regex pattern %[A-Za-z%] is constant and should be compiled once
as a static field. This is especially important since this method is called during
query execution for each row conversion.

core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java [278-287]

+private static final Pattern STRFTIME_SPECIFIER_PATTERN = Pattern.compile("%[A-Za-z%]");
+
 public static String toJavaPattern(String strftimeFormat) {
-  Matcher m = Pattern.compile("%[A-Za-z%]").matcher(strftimeFormat);
+  Matcher m = STRFTIME_SPECIFIER_PATTERN.matcher(strftimeFormat);
   StringBuilder sb = new StringBuilder();
   while (m.find()) {
     String replacement = STRFTIME_TO_JAVA_PARSE.getOrDefault(m.group(), m.group());
     m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
   }
   m.appendTail(sb);
   return sb.toString();
 }
Suggestion importance[1-10]: 5

__

Why: Compiling the regex pattern on every call to toJavaPattern is inefficient. Extracting it to a static constant is a standard Java best practice and avoids repeated compilation overhead, especially relevant if called frequently during query execution.

Low
Add missing minutes range validation

The validation only checks that seconds >= 60 but does not validate that minutes is
within a reasonable range. Unlike Dur2SecConvertFunction which validates hours < 24,
MsTimeConvertFunction allows arbitrarily large minute values (e.g., 999:59 would be
accepted). If the intent is to represent sub-minute timestamps, minutes should also
be bounded (e.g., < 60). This inconsistency could lead to unexpected behavior.

core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java [47-49]

-if (seconds >= 60) {
+if (minutes >= 60 || seconds >= 60) {
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The test at line 381 explicitly asserts assertEquals(3661.0, MsTimeConvertFunction.convert("61:01")), meaning large minute values are intentionally accepted. Adding minutes >= 60 validation would break this existing test case, making this suggestion incorrect for the intended behavior.

Low
Extract repeated set creation to static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially since
it's called in a loop over all conversions.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+
Suggestion importance[1-10]: 4

__

Why: The timeFunctions set is recreated on every call to resolveConvertFunction. Extracting it to a static constant is a valid performance improvement, though the impact is minor since this is called per-query rather than per-row.

Low
Suggestions up to commit 4364539
CategorySuggestion                                                                                                                                    Impact
General
Fix redundant null check ordering in method

The null check for value comes after toEpochSeconds(value) is already called, which
handles null internally. However, the logic is misleading: if value is a numeric
string like "1066473433", toEpochSeconds will parse it and return it as epoch
seconds, bypassing the time-string parsing entirely. This means mktime("1066473433")
returns the numeric value directly instead of treating it as a date string, which
may be intentional but should be clearly documented. More critically, the null check
after toEpochSeconds is redundant since toEpochSeconds already returns null for null
input — the second null check is dead code and should be removed to avoid confusion.

core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java [75-82]

 public static Object convertWithFormat(Object value, Object timeFormatObj) {
+  if (value == null) {
+    return null;
+  }
   Double numeric = CTimeConvertFunction.toEpochSeconds(value);
   if (numeric != null) {
     return numeric;
   }
-  if (value == null) {
+  String str = value instanceof String ? ((String) value).trim() : value.toString().trim();
+  if (str.isEmpty()) {
     return null;
   }
 
+  String strftimeFormat =
+      (timeFormatObj != null) ? timeFormatObj.toString().trim() : DEFAULT_FORMAT;
+  if (strftimeFormat.isEmpty()) {
+    return null;
+  }
+  return parseWithFormat(str, strftimeFormat);
+}
+
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the null check after toEpochSeconds is dead code since toEpochSeconds already handles null internally. Moving the null check before the toEpochSeconds call improves code clarity and removes the redundant check. The improved_code also removes the duplicate str variable declaration that exists in the original code.

Low
Extract repeated set creation to static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially since it
is called in a loop over all conversions.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-// Time functions that support timeformat parameter
-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+    RexNode timeFormatLiteral = context.rexBuilder.makeLiteral(timeFormat);
+    return PPLFuncImpTable.INSTANCE.resolve(
+        context.rexBuilder, functionName, sourceField, timeFormatLiteral);
+  } else {
+    return PPLFuncImpTable.INSTANCE.resolve(context.rexBuilder, functionName, sourceField);
+  }
+}
+
Suggestion importance[1-10]: 5

__

Why: The timeFunctions set is indeed recreated on every call to resolveConvertFunction, which is called in a loop. Extracting it as a static constant is a valid performance and code quality improvement, avoiding repeated object allocation.

Low
Cache compiled regex pattern as static constant

The regex pattern %[A-Za-z%] is compiled on every call to toJavaPattern. Since this
method may be called frequently during query execution, the compiled Pattern should
be extracted as a static constant to avoid repeated compilation overhead.

core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java [278-287]

+private static final Pattern STRFTIME_SPECIFIER_PATTERN = Pattern.compile("%[A-Za-z%]");
+
 public static String toJavaPattern(String strftimeFormat) {
-  Matcher m = Pattern.compile("%[A-Za-z%]").matcher(strftimeFormat);
+  Matcher m = STRFTIME_SPECIFIER_PATTERN.matcher(strftimeFormat);
   StringBuilder sb = new StringBuilder();
   while (m.find()) {
     String replacement = STRFTIME_TO_JAVA_PARSE.getOrDefault(m.group(), m.group());
     m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
   }
   m.appendTail(sb);
   return sb.toString();
 }
Suggestion importance[1-10]: 5

__

Why: Compiling a Pattern on every call is a known performance anti-pattern in Java. Extracting the compiled pattern as a static constant is a straightforward and valid improvement, especially if this method is called frequently during query execution.

Low
Add missing minutes range validation

The mstime() function validates that seconds are less than 60, but does not validate
that minutes are within a reasonable range. According to the documentation, the
format is [MM:]SS.SSS, but there is no upper bound check on the minutes value. The
test even asserts assertEquals(3661.0, MsTimeConvertFunction.convert("61:01")),
which suggests 61 minutes is accepted. This should be explicitly documented or a
validation should be added for consistency with dur2sec().

core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java [47-49]

-if (seconds >= 60) {
+if (minutes >= 60 || seconds >= 60) {
   return null;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes adding a minutes validation, but the existing test explicitly asserts assertEquals(3661.0, MsTimeConvertFunction.convert("61:01")), meaning 61 minutes is intentionally accepted per the design. The improved_code would break this existing test, making the suggestion incorrect in this context.

Low
Suggestions up to commit afa3915
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix sequential string replacement causing double-substitution bugs

The current implementation replaces specifiers sequentially, which can cause
double-replacement issues. For example, %Y contains % which could be re-processed if
%% is also in the map, or a replacement value like yyyy could accidentally match
another specifier. Use a regex-based single-pass replacement to avoid this issue.

core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java [278-287]

 public static String toJavaPattern(String strftimeFormat) {
-  String result = strftimeFormat;
-  for (Map.Entry<String, String> entry : STRFTIME_TO_JAVA_PARSE.entrySet()) {
-    String specifier = entry.getKey();
-    if (result.contains(specifier)) {
-      result = result.replace(specifier, entry.getValue());
+  StringBuilder result = new StringBuilder();
+  int i = 0;
+  while (i < strftimeFormat.length()) {
+    if (strftimeFormat.charAt(i) == '%' && i + 1 < strftimeFormat.length()) {
+      String specifier = strftimeFormat.substring(i, i + 2);
+      String replacement = STRFTIME_TO_JAVA_PARSE.get(specifier);
+      if (replacement != null) {
+        result.append(replacement);
+        i += 2;
+      } else {
+        result.append(strftimeFormat.charAt(i));
+        i++;
+      }
+    } else {
+      result.append(strftimeFormat.charAt(i));
+      i++;
     }
   }
-  return result;
+  return result.toString();
 }
Suggestion importance[1-10]: 7

__

Why: The sequential replace approach can cause double-substitution issues (e.g., %% being processed after %Y has already been replaced). The character-by-character approach in the improved code is more correct and avoids this class of bugs.

Medium
General
Add missing minutes range validation in mstime conversion

The mstime function validates that seconds are less than 60, but does not validate
that minutes are within a reasonable range. According to the documentation, the
format is [MM:]SS.SSS, but there is no upper bound check on minutes, allowing values
like 99:59 to be accepted silently. A validation for minutes (e.g., >= 60) should be
added for consistency.

core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java [47-49]

-if (seconds >= 60) {
+if (minutes >= 60 || seconds >= 60) {
   return null;
 }
Suggestion importance[1-10]: 5

__

Why: The test testMstimeConvertInvalid shows "25:70" returns null (due to seconds >= 60), but "61:01" is expected to return 3661.0 (line 381 in tests), meaning the PR intentionally allows minutes >= 60. Adding a minutes validation would break existing test cases and contradict the documented behavior.

Low
Guard against NaN and infinite values in epoch conversion

The toEpochSeconds method accepts any Double value including negative numbers, which
would represent timestamps before the Unix epoch (pre-1970). While this may be
intentional, Instant.ofEpochSecond with very large Double values (e.g.,
Double.MAX_VALUE) will throw an ArithmeticException that is not caught here but is
caught in convertWithFormat. However, precision loss when casting a large double to
long via timestamp.longValue() can silently produce incorrect results. Consider
adding a range check before conversion.

core/src/main/java/org/opensearch/sql/expression/function/udf/CTimeConvertFunction.java [86-102]

 static Double toEpochSeconds(Object value) {
   if (value == null) {
     return null;
   }
   if (value instanceof Number) {
-    return ((Number) value).doubleValue();
+    double d = ((Number) value).doubleValue();
+    if (Double.isNaN(d) || Double.isInfinite(d)) {
+      return null;
+    }
+    return d;
   }
   String str = value.toString().trim();
   if (str.isEmpty()) {
     return null;
   }
   try {
-    return Double.parseDouble(str);
+    double d = Double.parseDouble(str);
+    if (Double.isNaN(d) || Double.isInfinite(d)) {
+      return null;
+    }
+    return d;
   } catch (NumberFormatException e) {
     return null;
   }
 }
Suggestion importance[1-10]: 5

__

Why: Adding NaN and infinite checks prevents potential silent precision loss or unexpected behavior when Instant.ofEpochSecond receives extreme values. The existing try/catch in convertWithFormat handles ArithmeticException, but the NaN/infinite guard is a cleaner defensive approach.

Low
Extract repeated set creation to a static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially when
processing many rows or conversions.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+
Suggestion importance[1-10]: 4

__

Why: The timeFunctions set is recreated on every call to resolveConvertFunction, which is a minor inefficiency. Extracting it to a static constant is a straightforward improvement for maintainability and performance.

Low
Suggestions up to commit 6f6162e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix double-replacement bug in format conversion

The current implementation iterates over map entries and replaces specifiers one by
one, which can cause double-replacement issues. For example, %Y contains % which
could interact with %% replacement, and %T expands to HH:mm:ss which contains mm
that could be re-processed. Use a single-pass regex-based replacement to avoid this
problem.

core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java [278-287]

 public static String toJavaPattern(String strftimeFormat) {
-  String result = strftimeFormat;
-  for (Map.Entry<String, String> entry : STRFTIME_TO_JAVA_PARSE.entrySet()) {
-    String specifier = entry.getKey();
-    if (result.contains(specifier)) {
-      result = result.replace(specifier, entry.getValue());
+  StringBuilder result = new StringBuilder();
+  int i = 0;
+  while (i < strftimeFormat.length()) {
+    if (strftimeFormat.charAt(i) == '%' && i + 1 < strftimeFormat.length()) {
+      String specifier = "%" + strftimeFormat.charAt(i + 1);
+      String replacement = STRFTIME_TO_JAVA_PARSE.get(specifier);
+      if (replacement != null) {
+        result.append(replacement);
+      } else {
+        result.append(specifier);
+      }
+      i += 2;
+    } else {
+      result.append(strftimeFormat.charAt(i));
+      i++;
     }
   }
-  return result;
+  return result.toString();
 }
Suggestion importance[1-10]: 7

__

Why: The current replace-based approach can cause double-replacement issues (e.g., %%'%' then %Y within a partially-replaced string could interact). The suggested single-pass character-by-character approach correctly handles all specifiers without risk of re-processing already-replaced content. This is a real correctness concern for edge cases.

Medium
General
Extract repeated set creation to static constant

The timeFunctions set is recreated on every call to resolveConvertFunction. This set
should be a static constant to avoid repeated object allocation, especially since
it's called in a loop for each conversion.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1088-1090]

-// Time functions that support timeformat parameter
-Set<String> timeFunctions = Set.of("ctime", "mktime");
+private static final Set<String> TIME_FUNCTIONS = Set.of("ctime", "mktime");
 
-if (timeFunctions.contains(functionName.toLowerCase()) && timeFormat != null) {
+private RexNode resolveConvertFunction(
+    String functionName, RexNode sourceField, String timeFormat, CalcitePlanContext context) {
 
+  if (TIME_FUNCTIONS.contains(functionName.toLowerCase()) && timeFormat != null) {
+    RexNode timeFormatLiteral = context.rexBuilder.makeLiteral(timeFormat);
+    return PPLFuncImpTable.INSTANCE.resolve(
+        context.rexBuilder, functionName, sourceField, timeFormatLiteral);
+  } else {
+    return PPLFuncImpTable.INSTANCE.resolve(context.rexBuilder, functionName, sourceField);
+  }
+}
+
Suggestion importance[1-10]: 4

__

Why: The timeFunctions set is indeed recreated on every call to resolveConvertFunction. Moving it to a static constant is a valid minor optimization and code quality improvement, though the performance impact is negligible in practice.

Low
Add minutes bounds validation in mstime conversion

The pattern ^(?:(\d{1,2}):)?(\d{1,2})(?:\.(\d{1,3}))?$ is ambiguous: when given
input like "225" (a plain number), it matches with minutes=null and seconds=225, but
the tryParseDouble check above handles this case. However, the pattern also matches
"1:2:3" partially — actually it won't match "1:2:3" due to the ^...$ anchors, but it
will match "61:01" (minutes=61, seconds=01) which the test expects to return 3661.0.
The issue is that there's no validation on the minutes value (minutes >= 60 is not
checked), so "99:59" would return a large value instead of null. Add a minutes
bounds check consistent with the seconds check.

core/src/main/java/org/opensearch/sql/expression/function/udf/MsTimeConvertFunction.java [20-22]

 // Matches optional MM: prefix, required SS, optional .SSS
 private static final Pattern MSTIME_PATTERN =
     Pattern.compile("^(?:(\\d{1,2}):)?(\\d{1,2})(?:\\.(\\d{1,3}))?$");
 
+// In applyConversion, after parsing:
+// int minutes = matcher.group(1) != null ? Integer.parseInt(matcher.group(1)) : 0;
+// int seconds = Integer.parseInt(matcher.group(2));
+// if (minutes >= 60 || seconds >= 60) {
+//   return null;
+// }
+
Suggestion importance[1-10]: 3

__

Why: The suggestion identifies a missing minutes bounds check, but the test at line 381 (assertEquals(3661.0, MsTimeConvertFunction.convert("61:01"))) explicitly expects 61:01 to return 3661.0, meaning minutes >= 60 is intentionally allowed. The improved_code only adds a comment rather than actual code changes, making it a low-impact suggestion.

Low
Document intentional truncation of fractional epoch seconds

The convertWithFormat method uses timestamp.longValue() to convert the epoch
seconds, which truncates fractional seconds silently. If a caller passes a
fractional epoch (e.g., 1066507633.9), the .9 is silently dropped. This is likely
intentional for ctime, but the truncation should be explicit (e.g., using Math.round
or documenting the behavior) to avoid confusion.

core/src/main/java/org/opensearch/sql/expression/function/udf/CTimeConvertFunction.java [68-84]

-static Double toEpochSeconds(Object value) {
-  if (value == null) {
+public static String convertWithFormat(Object value, Object timeFormatObj) {
+  Double timestamp = toEpochSeconds(value);
+  if (timestamp == null) {
     return null;
   }
-  if (value instanceof Number) {
-    return ((Number) value).doubleValue();
-  }
-  String str = value.toString().trim();
-  if (str.isEmpty()) {
+  String format = (timeFormatObj != null) ? timeFormatObj.toString().trim() : DEFAULT_FORMAT;
+  if (format.isEmpty()) {
     return null;
   }
   try {
-    return Double.parseDouble(str);
-  } catch (NumberFormatException e) {
+    // Truncate fractional seconds intentionally for display
+    long epochSeconds = timestamp.longValue();
+    Instant instant = Instant.ofEpochSecond(epochSeconds);
+    ZonedDateTime zdt = ZonedDateTime.ofInstant(instant, ZoneId.of("UTC"));
+    return StrftimeFormatterUtil.formatZonedDateTime(zdt, format).stringValue();
+  } catch (Exception e) {
     return null;
   }
 }
Suggestion importance[1-10]: 2

__

Why: The improved_code is functionally identical to the existing_code (both use timestamp.longValue()), only adding a comment. This is a documentation-only suggestion with no actual code change, warranting a low score.

Low

@ritvibhatt ritvibhatt changed the title Convert time functions Add time conversion functions for convert command Mar 6, 2026
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit eae36db

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 73f4088

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 82fd69f.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/expression/function/udf/MkTimeConvertFunction.java100lowUser-controlled timeformat string is passed without length or complexity bounds directly to StrftimeFormatterUtil.toJavaPattern() and then to DateTimeFormatter.ofPattern(). Complex or deeply nested Java DateTimeFormatter patterns can be slow to compile, creating a minor resource-exhaustion vector if users can supply arbitrarily long format strings through PPL queries. Exceptions are caught and return null, so there is no crash risk, but no input length guard exists.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 82fd69f

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit b45ff6d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit becd73e.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/expression/datetime/StrftimeFormatterUtil.java290lowThe toJavaPattern() method does not escape single-quote characters in user-supplied strftime format strings before passing them to DateTimeFormatter.ofPattern(). A literal single quote in the input would be passed through unescaped, potentially producing malformed Java formatter patterns (e.g., unclosed literal sequences). Exceptions are caught and return null, so there is no exploitable impact, but it is an unvalidated user-controlled input path that warrants attention.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit becd73e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 93f7768.

PathLineSeverityDescription
.github/workflows/sql-cli-integration-test.yml71lowThe `./gradlew clean` step runs immediately after `./gradlew publishToMavenLocal`, deleting compiled build artifacts after they have been published. While this is a common disk-space-saving practice, the sequencing (clean after publish rather than before) means local build artifacts are removed post-publication, which slightly reduces auditability of what was published. No clear malicious intent, and the pattern has a plausible legitimate explanation.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 93f7768

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 80b16a9

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit fdc0b67

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Persistent review updated to latest commit da6cfeb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Persistent review updated to latest commit 5ac49be

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Persistent review updated to latest commit 215b59b

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 4364539

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 8d3b734

+----------+
| duration |
|----------|
| 5025.0 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if there is a numeric value passed to the function like "5025.5", it should return 5025.5


## Example 11: Using timeformat with ctime() and mktime()

The `timeformat` parameter specifies a strftime format string for `ctime()` and `mktime()`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: mktime is not demonstrated below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! Updated to include an example with mktime

+----------+
| time_str |
|----------|
| 225.5 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes fractional seconds are allowed so should return a double

Comment on lines +1002 to +1005
registerOperator(CTIME, PPLBuiltinOperators.CTIME);
registerOperator(MKTIME, PPLBuiltinOperators.MKTIME);
registerOperator(MSTIME, PPLBuiltinOperators.MSTIME);
registerOperator(DUR2SEC, PPLBuiltinOperators.DUR2SEC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these have to a UDF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I couldn't find any Calcite operators that handled parsing time the formats that was needed for these functions

calcite:
logical: |
LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
LogicalProject(ts=[CTIME(1066507633)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check if it's necessary to add 2 yaml files for new PPL function. Also this ymal is almost the same as no pushdown version, are these new function not pushed down anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions aren't pushed down, pushdown implementation will be in a follow up PR but I think the explain tests look for the files in both folders so both files are necessary.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@ritvibhatt ritvibhatt requested a review from ahkcs as a code owner March 12, 2026 16:15
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 39e95ec

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7394650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants