[Feature] PPL Highlight Command by RyanL1997 · Pull Request #5234 · opensearch-project/sql

RyanL1997 · 2026-03-14T17:01:23Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-14T17:02:22Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	516	low	User-supplied highlight terms are interpolated directly into a Lucene query_string expression with only double-quote wrapping and no further sanitization. A crafted term containing backslash escapes or unbalanced quotes could malform or expand the generated query. Impact is limited to highlighting behavior (not document access control), so this is likely an oversight rather than malicious intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:03:27Z

PR Reviewer Guide 🔍

(Review updated until commit `f0dacc0`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns Term Injection: In `applyHighlightPushDown` (AbstractCalciteIndexScan.java), user-supplied highlight terms are embedded into an OpenSearch query string using simple double-quote wrapping: `"\"" + term + "\""`. A term containing special characters (e.g., `"`, `AND`, `OR`, `NOT`, wildcards) could manipulate the query string logic. For example, a term like `foo" OR password:*` could expand the highlight query beyond the intended scope. Input terms should be escaped before being used in a query string query.
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: PPL Highlight AST and Parser Relevant files: ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 core/src/main/java/org/opensearch/sql/ast/tree/Highlight.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java core/src/main/java/org/opensearch/sql/analysis/Analyzer.java ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java Sub-PR theme: Highlight Calcite Planner and OpenSearch Pushdown Relevant files: core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalHighlight.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/HighlightIndexScanRule.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java Sub-PR theme: Highlight Response Format and Documentation Relevant files: protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java docs/user/ppl/cmd/highlight.md docs/category.json
⚡ Recommended focus areas for review Term Injection Risk In `applyHighlightPushDown`, highlight terms are interpolated directly into a query string using string concatenation with only basic double-quote wrapping. A term containing a double-quote character could break the query string syntax or produce unexpected behavior. Input sanitization or escaping should be applied before building the query string. String queryStr = highlightArgs.stream() .map(term -> "\"" + term + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("")); hb.field(field); Duplicate _highlight Column In `pushDownHighlight`, the `_highlight` field is unconditionally added to the schema without checking if it already exists (unlike `LogicalHighlight.create` which does check). If `pushDownHighlight` is called more than once, or if the field already exists, this could result in a duplicate column in the row type. public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); schemaBuilder.add( HighlightExpression.HIGHLIGHT_FIELD, getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Unsafe Cast In the `highlights()` method, there is an unchecked cast `(Map<String, Object>) hlMap` which is unnecessary since `hlMap` is already declared as `Map<String, Object>`. While harmless here, it suppresses compiler warnings and should be removed for clarity. return (Map<String, Object>) hlMap; Iterator Change The `iterator()` method was refactored to filter out `_highlight` and use `e.getValue().value()` instead of the previous `ExprValue::value`. This changes behavior for all queries, not just highlight queries. The previous implementation used `convertExprValuesToValues` which called `ExprValue::value`. The new implementation filters by key and calls `.value()` — this should be validated to ensure no regression for non-highlight queries, especially for complex/nested ExprValue types. public Iterator<Object[]> iterator() { return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> tuple.entrySet().stream() .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) .map(e -> e.getValue().value()) .toArray(Object[]::new)) .iterator(); Implicit _highlight Projection In `visitProject`, the `_highlight` field is automatically appended to all projections whenever it exists in the schema. This means any query that uses `highlight` followed by a `fields` command will silently include `_highlight` in the projection, even if the user did not request it. This could cause unexpected behavior if the `_highlight` column is not properly handled downstream in all code paths. // Include _highlight in projections when the highlight column is present in the schema int hlIndex = currentFields.indexOf(HighlightExpression.HIGHLIGHT_FIELD); if (hlIndex >= 0) { expandedFields.add(context.relBuilder.field(hlIndex)); }

github-actions · 2026-03-14T17:04:15Z

PR Code Suggestions ✨

Latest suggestions up to f0dacc0
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Prevent duplicate highlight column in schema If `_highlight` is already present in the row type (e.g., when `pushDownHighlight` is called more than once), the method unconditionally adds a duplicate `_highlight` column, which will cause a schema conflict. Add a guard to only add the column when it is not already present, mirroring the check done in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-97] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema without checking if it already exists, which could cause a schema conflict if called multiple times. The fix mirrors the guard already present in `LogicalHighlight.create`.	Medium
Possible issue	Preserve schema column order in row iterator The new `iterator()` implementation filters out `_highlight` but also changes the column ordering guarantee: it now iterates over the map's entry set order rather than the schema column order. This can cause column values to be misaligned with the schema columns returned by `columnNameTypes()`. The original code used `Map::values` which relied on insertion order; the new code should explicitly follow the schema column order to be safe. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [89-97] return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> - tuple.entrySet().stream() - .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) - .map(e -> e.getValue().value()) + schema.getColumns().stream() + .filter(col -> !HIGHLIGHT_FIELD.equals(getColumnName(col))) + .map(col -> tuple.getOrDefault(getColumnName(col), + org.opensearch.sql.data.model.ExprMissingValue.of()).value()) .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 7 __ Why: The new `iterator()` implementation iterates over the map's entry set order rather than the schema column order, which could cause column values to be misaligned with the schema. Iterating over schema columns explicitly ensures correct alignment between `datarows` and `schema`.	Medium
Security	Escape special characters in highlight query terms The terms are interpolated directly into a query string using `""" + term + """` without any escaping. A term containing a double-quote character (e.g., `he said` `"hello"`) will break the query string syntax and may cause an OpenSearch request failure or unexpected behavior. The terms should be escaped before being embedded in the query string. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [511-522] } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + term.replace("\\", "\\\\").replace("\"", "\\\"") + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("")); hb.field(field); } Suggestion importance[1-10]: 6 __ Why: Terms containing double-quote characters are interpolated directly into the query string without escaping, which could break the query string syntax. However, since the grammar restricts `highlightArg` to string literals parsed by the PPL parser, the risk of unescaped quotes reaching this code is limited in practice.	Low

Previous suggestions

Suggestions up to commit 5891541

Category	Suggestion	Impact
Possible issue	Prevent duplicate highlight column in schema The `_highlight` column is unconditionally added to the schema even if it already exists (e.g., if `pushDownHighlight` is called twice). This can cause duplicate column errors. Add a guard to only add the column when it is not already present, similar to how `LogicalHighlight.create` handles it. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [90-99] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Suggestion importance[1-10]: 6 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema without checking if it already exists, which could cause duplicate column errors if called multiple times. The fix mirrors the guard already present in `LogicalHighlight.create`.	Low
Security	Escape highlight terms to prevent query injection The term values are interpolated directly into a query string using `""" + term +` `"""` without any escaping. If a term contains special query string characters (e.g., `"`, `</code>, :, ), this could produce a malformed or injected query. Terms should be escaped before being embedded in the query string.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [511-522] } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + org.apache.lucene.queryparser.classic.QueryParser.escape(term) + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("*")); hb.field(field); } Suggestion importance[1-10]: 6 __ Why: User-supplied terms are interpolated directly into a Lucene query string without escaping, which could produce malformed queries or allow injection of special characters. Using QueryParser.escape would mitigate this risk.	Low
General	Guard against null highlight field values Calling `entry.getValue().collectionValue()` will throw a `NullPointerException` or `ExpressionEvaluationException` if the highlight field value for a particular key is not a collection type. Add a null/type check before calling `collectionValue()` to handle unexpected value shapes gracefully. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [110-124] map( tuple -> { ExprValue hl = tuple.get(HIGHLIGHT_FIELD); if (hl == null \|\| hl.isMissing() \|\| hl.isNull()) { return null; } Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) continue; hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } return (Map<String, Object>) hlMap; }) Suggestion importance[1-10]: 4 __ Why: Calling `collectionValue()` on a highlight entry without checking if the value is null or missing could throw an exception for unexpected value shapes. Adding a null/missing check improves robustness.	Low
General	Verify child plan attachment for highlight command The `visitHighlightCommand` returns a `new Highlight(highlightArgs)` without attaching it to the child plan. All other command visitors (e.g., `visitTrendlineCommand`) return the node and rely on the caller to call `.attach(child)`, but `AstDSL.highlight` calls `.attach(input)`. Since `visitHighlightCommand` is called by the AST builder framework which chains nodes, the returned `Highlight` node will have a null child, causing a `NullPointerException` when the plan is traversed. The child should be attached here or the framework must handle it — verify the attachment mechanism is consistent with other commands. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1290-1305] public UnresolvedPlan visitHighlightCommand(OpenSearchPPLParser.HighlightCommandContext ctx) { List<String> highlightArgs = ctx.highlightArg().stream() .map( arg -> { if (arg.STAR() != null) { return "*"; } else if (arg.stringLiteral() != null) { return StringUtils.unquoteText(arg.stringLiteral().getText()); } else { return arg.fieldExpression().getText(); } }) .collect(Collectors.toList()); + // Return without attach; the framework (visitPipeCommands) attaches the child. + // Ensure this is consistent with how other commands are handled. return new Highlight(highlightArgs); } Suggestion importance[1-10]: 2 __ Why: The suggestion asks to verify that the framework attaches the child, but the `improved_code` is essentially identical to the `existing_code` with only a comment added. Other command visitors in the same file follow the same pattern of returning without attaching, so this is consistent and not a real bug.	Low

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-14T17:29:29Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	519	low	User-supplied highlight terms are wrapped in double quotes but not escaped before being passed to queryStringQuery(). A term containing a literal '"' character could partially break out of the phrase context and alter the Lucene query structure. Because PPL users are already authenticated and can issue arbitrary search queries, the practical exploitability is limited, but proper escaping of the term string (e.g., via QueryParser.escape) would be safer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:30:33Z

Persistent review updated to latest commit f0dacc0

RyanL1997 added 5 commits March 13, 2026 15:08

initial commit for highlight cmd

38ad5e8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix the response format

2e901a3

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

pushdown fix (similar to v2)

087c5f0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix - filter out highlight col in result schema

15e32cb

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add docs for highlight

5891541

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 added PPL Piped processing language v3.6.0 Issues targeting release v3.6.0 feature labels Mar 14, 2026

dead code cleanup

f0dacc0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] PPL Highlight Command#5234

[Feature] PPL Highlight Command#5234
RyanL1997 wants to merge 6 commits intoopensearch-project:mainfrom
RyanL1997:ppl-highlight-cmd

RyanL1997 commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyanL1997 commented Mar 14, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit f0dacc0)

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 14, 2026 •

edited

Loading

(Review updated until commit `f0dacc0`)

github-actions bot commented Mar 14, 2026 •

edited

Loading