Skip to content

[Feature] Calcite PPL search result highlighting#5141

Open
RyanL1997 wants to merge 18 commits intoopensearch-project:mainfrom
RyanL1997:syntax-highlighting
Open

[Feature] Calcite PPL search result highlighting#5141
RyanL1997 wants to merge 18 commits intoopensearch-project:mainfrom
RyanL1997:syntax-highlighting

Conversation

@RyanL1997
Copy link
Collaborator

@RyanL1997 RyanL1997 commented Feb 12, 2026

Description.

For ease of review, please reference the design doc in issue #5156.

  • Add request-level highlight API for PPL queries. Callers (OSD, API, CLI) can include an optional highlight JSON object in the PPL request body, which the backend forwards as-is to OpenSearch.
  • Highlight config is carried across the thread boundary (REST handler → sql-worker) via AbstractPlan, then set as a ThreadLocal for Calcite planning and execution.
  • Highlight metadata from OpenSearch hits is collected via a side-channel ThreadLocal in OpenSearchIndexEnumerator and merged back into the JDBC/SimpleJson response as a highlights array parallel to datarows.

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-request PPL highlighting end-to-end: request parsing, plan-level propagation via ThreadLocal, conditional hidden _highlight column in Calcite schema, attaching HighlightBuilder to OpenSearch requests, extracting highlights from responses, and exposing per-row highlights through QueryResult/formatters with unit and integration tests.

Changes

Cohort / File(s) Summary
Plan context & propagation
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java, core/src/main/java/org/opensearch/sql/executor/execution/AbstractPlan.java
Add ThreadLocal highlightConfig and plan-level highlightConfig field with accessors to carry highlight settings across thread boundary.
Plan execution flow
core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java, core/src/main/java/org/opensearch/sql/executor/execution/ExplainPlan.java
Set/clear CalcitePlanContext ThreadLocal around execute/explain to ensure worker thread sees plan highlight config.
PPL request surface
ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java, ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java
Parse highlight from PPL request JSON and attach it to created plan before submission.
Calcite schema / planning
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java, core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
When highlight config exists, add hidden _highlight column (ANY) to row type and preserve it through projection/visitor logic.
OpenSearch request building
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java
Add applyHighlightConfig to build/attach OpenSearch HighlightBuilder from CalcitePlanContext and apply it before executing requests; include in explain output.
Response handling & enumerator
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java, opensearch/src/main/java/org/opensearch/sql/opensearch/response/OpenSearchResponse.java
Carry _highlight as an opaque per-row value (excluded from schema), special-case resolution in enumerator, and centralize highlight field constant usage.
Protocol & formatting
protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java, protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java, protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java
Add QueryResult.highlights() and include highlights in JDBC/Simple JSON outputs when present.
Expressions & constants
core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java
Introduce HIGHLIGHT_FIELD constant and use it instead of hard-coded string.
Tests (unit & integration)
core/src/test/.../ExplainPlanTest.java, core/src/test/.../QueryPlanTest.java, opensearch/src/test/.../AbstractCalciteIndexScanHighlightTest.java, protocol/src/test/.../*, ppl/src/test/.../PPLQueryRequestTest.java, integ-test/src/test/java/.../CalcitePPLHighlightIT.java, integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml
Add unit and integration tests for propagation, request-building, highlighter config application, enumerator extraction, QueryResult highlights, and formatted outputs.
Docs
docs/user/ppl/interfaces/endpoint.md
Add Highlight section documenting highlight request object, parameters, examples, response format, and limitations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PPLService
    participant Planner as CalcitePlanner
    participant Worker as ExecThread
    participant CalciteCtx as CalcitePlanContext
    participant OpenSearch as OpenSearchNode
    participant Enumerator as OpenSearchIndexEnumerator
    participant Protocol as QueryResult/Formatter

    Client->>PPLService: POST /_plugins/_ppl (with optional "highlight")
    PPLService->>Planner: build execution plan
    PPLService->>Planner: set plan.highlightConfig
    Planner->>Worker: submit plan (worker thread)
    Worker->>CalciteCtx: setHighlightConfig(plan.highlightConfig)
    Worker->>OpenSearch: execute search (HighlightBuilder attached)
    OpenSearch-->>Enumerator: return hits with highlight fragments
    Enumerator->>Worker: produce rows including hidden _highlight value
    Worker->>Protocol: build QueryResult (extract highlights())
    Worker->>CalciteCtx: clearHighlightConfig()
    Protocol-->>Client: JSON response (optional "highlights" array)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

calcite

Suggested reviewers

  • LantaoJin
  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • anirudha
  • Swiddis
  • yuancu
  • GumpacG
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Feature] Calcite PPL search result highlighting' directly and clearly describes the main feature being added—PPL result highlighting via Calcite.
Description check ✅ Passed The description is related to the changeset, explaining the highlighting feature, thread-local propagation, response formatting, and referencing design docs and related issues.
Linked Issues check ✅ Passed The PR successfully implements all key objectives from issues #5156 and #5059: request-level highlight API, thread-local config propagation, conditional _highlight column, HighlightBuilder integration, highlight metadata collection, and response formatting with highlights array.
Out of Scope Changes check ✅ Passed All changes are scoped to highlighting feature implementation: config propagation (AbstractPlan, ThreadLocal), Calcite integration (_highlight column, projection handling), OpenSearch request building, response extraction/formatting, and comprehensive tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@RyanL1997 RyanL1997 added enhancement New feature or request PPL Piped processing language feature labels Feb 12, 2026
@RyanL1997 RyanL1997 marked this pull request as ready for review February 19, 2026 00:10
@RyanL1997 RyanL1997 changed the title [WIP][Feature] Calcite PPL search result highlighting eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting Feb 19, 2026
@RyanL1997 RyanL1997 changed the title eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting [Feature] Calcite PPL search result highlighting Feb 19, 2026
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 5bda888

@RyanL1997
Copy link
Collaborator Author

Although we only expose highlight via the PPL API today, internally can we make it generic:
Make the highlight part of PPL query as early as possible behind the API, as discussed inhttps://github.com/#5156 (comment) (highlight function/command or search command parameter?)
If not possible, can we accept arbitrary DSL and merge them with the SearchSourceBuilder already constructed in the index scan operator? That way future extensions don't each need their own end-to-end pipeline.

Hi @dai-chen. Thanks for the suggestion. After checking, I went with option 2 — accept arbitrary DSL and merge it with the SearchSourceBuilder at the index scan level.

Why not option 1 (make highlight part of PPL syntax)?

The main issue is the response format. The V2 highlight() function returns highlights inline as columns in datarows:

  {
    "schema": [
      { "name": "Tags", "type": "text" },
      { "name": "highlight('Tags')", "type": "nested" }
    ],
    "datarows": [
      ["yeast home-brew", ["<em>yeast</em> home-brew"]]
    ]
  }

OSD Explore already knows how to consume highlights from DSL, where they come back as a separate metadata array parallel to the hits. Our current approach matches that shape:

  {
    "schema": [{ "name": "Tags", "type": "text" }],
    "datarows": [["yeast home-brew"]],
    "highlights": [{ "Tags": ["<em>yeast</em> home-brew"] }]
  }

If we went with the highlight() function, OSD would need to handle a fundamentally different response shape — highlights mixed into datarows as columns instead of a separate highlights array it already consumes today. The request-body approach keeps the schema/datarows untouched and gives the caller (OSD, API users, CLI) full control over highlight config without changing the PPL query text.

Current design:

The plumbing is now generic. Instead of a highlight-specific pipeline (Map<String, Object> → manual HighlightBuilder construction), we:

  1. PPLQueryRequest.getExtraSearchSource() wraps caller-provided fields into a search-source-compatible JSON string ({"highlight": {...}})
  2. This String flows through AbstractPlanThreadLocal → index scan as an opaque blob
  3. At the index scan level, applyExtraSearchSource() parses it via SearchSourceBuilder.fromXContent() and selectively merges recognized clauses (currently: if(extra.highlighter() != null) target.highlighter(extra.highlighter()))

To add a future extension (e.g. suggest, rescore, post_filter), you would:

  • Add the new key to the wrapper in getExtraSearchSource()
  • Add one merge line in applyExtraSearchSource()

No new end-to-end plumbing needed. So please take a look again thanks!

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit b290e06.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java137mediumUser-controlled JSON from request body is parsed using SearchSourceBuilder.fromXContent() without schema validation. While currently only highlighter is extracted, the full SearchSourceBuilder is parsed which could process additional OpenSearch DSL clauses if code is extended. Recommend explicit schema validation before parsing.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java142lowHighlight configuration from user input is applied directly to OpenSearch requests without validation. Depending on OpenSearch script security settings, custom highlighters could potentially execute scripts. Recommend validating highlight configuration structure and disabling script-based highlighters if not required.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit b290e06

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit ff9eb4f

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@RyanL1997 RyanL1997 force-pushed the syntax-highlighting branch from ff9eb4f to 6abfc1a Compare March 13, 2026 22:09
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 6abfc1a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE RFC] PPL Search Result Highlighting [FEATURE] Supporting Query Highlight Feature into PPL API

3 participants