Skip to content

added cloudwatch style contains operator#5219

Open
ThyTran1402 wants to merge 2 commits intoopensearch-project:mainfrom
ThyTran1402:feat/cloudwatch_contains_operator
Open

added cloudwatch style contains operator#5219
ThyTran1402 wants to merge 2 commits intoopensearch-project:mainfrom
ThyTran1402:feat/cloudwatch_contains_operator

Conversation

@ThyTran1402
Copy link

@ThyTran1402 ThyTran1402 commented Mar 9, 2026

Description

  • Adds a contains operator to PPL where clauses as a more readable compare to LIKE(field, '%keyword%') or field like '%keyword%'.
  • | where message contains 'error is equivalent to | where message like '%error%' .

Behavior:

  • Case-insensitive substring match (backed by ilike)
  • RHS must be a string literal; a field reference on the RHS raises a SemanticCheckException - % and _ in the keyword are treated as LIKE wildcards (consistent with the underlying operator)
  • contains can still be used as a field/identifier name in other positions

Related Issues

Resolves #5181

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Grammar changes for CONTAINS token

Relevant files:

  • language-grammar/src/main/antlr4/OpenSearchPPLLexer.g4
  • language-grammar/src/main/antlr4/OpenSearchPPLParser.g4
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4

Sub-PR theme: AST builder logic and tests for contains operator

Relevant files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstExpressionBuilderTest.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/WhereCommandIT.java

⚡ Recommended focus areas for review

Wildcard Injection

The contains operator wraps the user-provided string literal with % to form %value%. However, if the input string already contains % or _ characters, they will be treated as LIKE wildcards. This behavior is documented in the PR description as intentional, but it may surprise users expecting a literal substring match. Consider escaping % and _ in the keyword before wrapping, or at minimum ensure this is clearly documented.

String wrapped = "%" + ((Literal) right).getValue() + "%";
return new Compare(ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
Calcite-only behavior

The contains operator is translated to ilike, but ilike appears to be gated behind Calcite being enabled (based on the surrounding code for LIKE). It is unclear whether contains will work correctly when Calcite is disabled, potentially causing silent failures or unexpected behavior in non-Calcite mode.

} else if ("contains".equalsIgnoreCase(operator)) {
  UnresolvedExpression left = visit(ctx.left);
  UnresolvedExpression right = visit(ctx.right);
  if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
    throw new SemanticCheckException(
        "The right-hand side of 'contains' must be a string literal");
  }
  String wrapped = "%" + ((Literal) right).getValue() + "%";
  return new Compare(ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
Missing left field validation

There is no validation that the left-hand side of contains is a field reference. While the right-hand side is validated to be a string literal, the left-hand side could be any expression (e.g., another literal or a function call), which may lead to unexpected behavior or confusing error messages downstream.

UnresolvedExpression left = visit(ctx.left);
UnresolvedExpression right = visit(ctx.right);
if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
  throw new SemanticCheckException(
      "The right-hand side of 'contains' must be a string literal");
}
String wrapped = "%" + ((Literal) right).getValue() + "%";
return new Compare(ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Escape wildcard characters in contains operand

The contains operator wraps the user-provided string with % wildcards, but if the
input string itself contains % or _ wildcard characters, they will be interpreted as
ILIKE pattern metacharacters, leading to unintended matches. These special
characters should be escaped before wrapping with %.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [223]

-String wrapped = "%" + ((Literal) right).getValue() + "%";
+String rawValue = ((Literal) right).getValue().toString()
+    .replace("\\", "\\\\")
+    .replace("%", "\\%")
+    .replace("_", "\\_");
+String wrapped = "%" + rawValue + "%";
Suggestion importance[1-10]: 7

__

Why: This is a valid concern - if the user's input contains % or _ characters, they would be interpreted as ILIKE wildcards, causing unintended matches. The fix correctly escapes these metacharacters before wrapping with %.

Medium
General
Respect Calcite settings for contains operator

The contains operator only validates that the right-hand side is a string literal,
but does not validate that the left-hand side is a field or string expression.
Adding a check or at least a comment clarifying this assumption would improve
robustness. More critically, the early return here bypasses the rest of the method,
which may handle additional settings-based logic (e.g., Calcite enablement checks)
that should also apply to contains.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [216-225]

 } else if ("contains".equalsIgnoreCase(operator)) {
   UnresolvedExpression left = visit(ctx.left);
   UnresolvedExpression right = visit(ctx.right);
   if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
     throw new SemanticCheckException(
         "The right-hand side of 'contains' must be a string literal");
   }
   String wrapped = "%" + ((Literal) right).getValue() + "%";
-  return new Compare(ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
+  String ilikeFn = UnresolvedPlanHelper.isCalciteEnabled(astBuilder.getSettings())
+      ? (UnresolvedPlanHelper.legacyPreferred(astBuilder.getSettings()) ? ILIKE.getName().getFunctionName() : "ilike")
+      : ILIKE.getName().getFunctionName();
+  return new Compare(ilikeFn, left, new Literal(wrapped, DataType.STRING));
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion about bypassing Calcite settings is questionable - the contains operator always maps to ilike semantics by design, and the improved code's logic for determining the function name is unclear and potentially incorrect. The ILIKE function name is used in both branches of the ternary, making the suggestion's value minimal.

Low

JSONObject result =
executeQuery(
String.format(
"source=%s | where firstname contains 'mbe' | fields firstname",
Copy link
Member

@LantaoJin LantaoJin Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the contains case-sensitive or insensitive? Need more tests for this.
Can you link the cloudwatch doc link in description?
And this PR should include user document updates. maybe in condition.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think contains is case-insensitive since it's implemented with ILIKE with %value% wrapping.
If a user writes firstname contains '%', the wrapped pattern becomes %%% which would match everything, which is likely unintended. I'll fix this and add more test cases.
Here is the doc provided: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax-Filter.html

@LantaoJin LantaoJin added the enhancement New feature or request label Mar 10, 2026
Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Cloudwatch-style like operator

2 participants