Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a7e0f6f
initial commit for grammar API
mengweieric Feb 19, 2026
63916b2
remove unused etag
mengweieric Feb 19, 2026
2f862f3
cleanup on unused code
mengweieric Feb 19, 2026
98321e2
cleanup on comments and debug logging
mengweieric Feb 19, 2026
dd16569
modify tests
mengweieric Feb 19, 2026
55df662
a few mode cleanup
mengweieric Feb 19, 2026
7161842
Read ANTLR version from runtime
mengweieric Feb 20, 2026
7c42d67
add antlr version
mengweieric Feb 20, 2026
53fdd91
spotless fix
mengweieric Feb 20, 2026
d4b2c53
Address PR review: fix hash truncation, antlrVersion API, immutabilit…
mengweieric Feb 23, 2026
7cce850
Mark grammar API as experimental
mengweieric Feb 23, 2026
c07d992
Address review: ATN v4 guard, startRuleIndex by name, test hardening
mengweieric Feb 23, 2026
cf9ed36
Reduce invalidateCache() visibility from public to protected
mengweieric Feb 23, 2026
c3f3c02
add more necessary fields
mengweieric Mar 2, 2026
e070024
adjusting ignore token set to be lexical/internal only
mengweieric Mar 3, 2026
5854bf5
addressed fix-now comments
mengweieric Mar 3, 2026
24d6f01
fix test duplication
mengweieric Mar 3, 2026
877669c
Polish grammar bundle builder and stabilize grammar endpoint doctest
mengweieric Mar 3, 2026
8f0ae44
address issue: transport action wrapper
mengweieric Mar 3, 2026
f1226b0
Refactor PPL grammar bundle loading to static holder singleton
mengweieric Mar 3, 2026
1b467d2
Revert grammar endpoint doc example to clean JSON format
mengweieric Mar 19, 2026
97de071
Fix typo renameClasue in grammar bundle builder
mengweieric Mar 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions THIRD-PARTY
Original file line number Diff line number Diff line change
Expand Up @@ -467,15 +467,15 @@ DAMAGE.

------

** ANTLR; version 4.7.1 -- https://github.com/antlr/antlr4
** ANTLR; version 4.13.2 -- https://github.com/antlr/antlr4
/*
* Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
* Copyright (c) 2012-2024 The ANTLR Project. All rights reserved.
* Use of this file is governed by the BSD 3-clause license that
* can be found in the LICENSE.txt file in the project root.
*/

[The "BSD 3-clause license"]
Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
Copyright (c) 2012-2024 The ANTLR Project. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
Expand Down
32 changes: 32 additions & 0 deletions docs/user/ppl/interfaces/endpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,3 +293,35 @@ Exceeding these limits returns an error.
- In the simple array format, `["*"]` highlights all fields. Specific field names like `["firstname", "lastname"]` scope highlighting to those fields only.
- In the object format, each key in the `fields` object is a field name or wildcard. Each value is an object of per-field highlight options. Supported per-field options: `fragment_size`, `number_of_fragments`, `type` (`plain`, `unified`, `fvh`), `pre_tags`, `post_tags`, `require_field_match`, `no_match_size`, `order`. Use `{}` for defaults. Example: `{"title": {"fragment_size": 200}, "body": {"type": "plain"}}`.
- Highlights may include fields that are not explicitly projected in the other columns. For example, using `{"*": {}}` highlights all fields that matched the search query, including fields not selected by `| fields`. In the example above, the `address` field appears in `_highlight` because it contains a match ("880 Holmes Lane") even though only `account_number`, `firstname`, and `lastname` are projected as separate columns.

## Grammar (Experimental)

### Description

You can send an HTTP GET request to endpoint **/_plugins/_ppl/_grammar** to fetch serialized PPL grammar metadata used by autocomplete clients.

### Example

```bash ppl ignore
curl -sS -X GET localhost:9200/_plugins/_ppl/_grammar
```

Expected output (trimmed):

```json
{
"bundleVersion": "1.0",
"antlrVersion": "4.13.2",
"grammarHash": "sha256:...",
"startRuleIndex": 0,
"lexerSerializedATN": [4, ...],
"parserSerializedATN": [4, ...],
"lexerRuleNames": ["SEARCH", "..."],
"parserRuleNames": ["root", "..."],
"literalNames": [null, "'SEARCH'", "..."],
"symbolicNames": [null, "SEARCH", "..."],
"tokenDictionary": {"PIPE": 196, "...": 0},
"ignoredTokens": [472, 473, "..."],
"rulesToVisit": [200, 201, "..."]
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,23 @@ private JSONObject executeQueryAsUser(String query, String username) throws IOEx
return new JSONObject(org.opensearch.sql.legacy.TestUtils.getResponseBody(response, true));
}

/** Executes a grammar metadata request as a specific user with basic authentication. */
private JSONObject executeGrammarAsUser(String username) throws IOException {
Request request = new Request("GET", "/_plugins/_ppl/_grammar");

RequestOptions.Builder restOptionsBuilder = RequestOptions.DEFAULT.toBuilder();
restOptionsBuilder.addHeader(
"Authorization",
"Basic "
+ java.util.Base64.getEncoder()
.encodeToString((username + ":" + STRONG_PASSWORD).getBytes()));
request.setOptions(restOptionsBuilder);

Response response = client().performRequest(request);
assertEquals(200, response.getStatusLine().getStatusCode());
return new JSONObject(org.opensearch.sql.legacy.TestUtils.getResponseBody(response, true));
}

@Test
public void testUserWithBankPermissionCanAccessBankIndex() throws IOException {
// Test that bank_user can access bank index - this should work with the fix
Expand Down Expand Up @@ -512,6 +529,32 @@ public void testBankUserWithEvalCommand() throws IOException {
verifyColumn(result, columnName("full_name"));
}

@Test
public void testUserWithPPLPermissionCanAccessGrammarEndpoint() throws IOException {
JSONObject result = executeGrammarAsUser(BANK_USER);
assertTrue(result.has("bundleVersion"));
assertTrue(result.has("antlrVersion"));
assertTrue(result.has("grammarHash"));
assertTrue(result.has("tokenDictionary"));
}

@Test
public void testUserWithoutPPLPermissionCannotAccessGrammarEndpoint() throws IOException {
try {
executeGrammarAsUser(NO_PPL_USER);
fail("Expected security exception for user without PPL permission");
} catch (ResponseException e) {
assertEquals(403, e.getResponse().getStatusLine().getStatusCode());
String responseBody =
org.opensearch.sql.legacy.TestUtils.getResponseBody(e.getResponse(), false);
assertTrue(
"Response should contain permission error message",
responseBody.contains("no permissions")
|| responseBody.contains("Forbidden")
|| responseBody.contains("cluster:admin/opensearch/ppl"));
}
}

// Negative test cases for missing permissions

@Test
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"ppl.grammar": {
"documentation": {
"url": "https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/interfaces/endpoint.md",
"description": "PPL Grammar Endpoint for Autocomplete"
},
"stability": "experimental",
"url": {
"paths": [
{
"path": "/_plugins/_ppl/_grammar",
"methods": ["GET"]
}
]
},
"params": {}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
"PPL grammar endpoint returns expected response shape":
- do:
ppl.grammar: {}
- is_true: bundleVersion
- is_true: antlrVersion
- is_true: grammarHash
- match: {startRuleIndex: 0}
- gt: {lexerSerializedATN.0: 0}
- is_true: lexerRuleNames.0
- is_true: channelNames.0
- is_true: modeNames.0
- gt: {parserSerializedATN.0: 0}
- is_true: parserRuleNames.0
- is_true: symbolicNames.1
- is_true: tokenDictionary.PIPE
- gt: {ignoredTokens.0: 0}
- gt: {rulesToVisit.0: 0}
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
import org.opensearch.sql.opensearch.storage.OpenSearchDataSourceFactory;
import org.opensearch.sql.opensearch.storage.script.CompoundedScriptEngine;
import org.opensearch.sql.plugin.config.OpenSearchPluginModule;
import org.opensearch.sql.plugin.rest.RestPPLGrammarAction;
import org.opensearch.sql.plugin.rest.RestPPLQueryAction;
import org.opensearch.sql.plugin.rest.RestPPLStatsAction;
import org.opensearch.sql.plugin.rest.RestQuerySettingsAction;
Expand Down Expand Up @@ -163,6 +164,7 @@ public List<RestHandler> getRestHandlers(

return Arrays.asList(
new RestPPLQueryAction(),
new RestPPLGrammarAction(),
new RestSqlAction(settings, injector),
new RestSqlStatsAction(settings, restController),
new RestPPLStatsAction(settings, restController),
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/*
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No integration test covers this new endpoint. Without an integ-test or yamlRestTest, there's no verification that the endpoint works when the plugin is loaded in a real OpenSearch node (route registration, serialization over the wire, security plugin interaction). Add at least a basic integ-test that hits the endpoint and validates the response shape.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added YAML REST API spec + integration test for /_plugins/_ppl/_grammar response shape.

* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.plugin.rest;

import static org.opensearch.rest.RestRequest.Method.GET;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableList;
import java.io.IOException;
import java.util.List;
import lombok.extern.log4j.Log4j2;
import org.opensearch.common.annotation.ExperimentalApi;
import org.opensearch.core.action.ActionListener;
import org.opensearch.core.rest.RestStatus;
import org.opensearch.core.xcontent.XContentBuilder;
import org.opensearch.rest.BaseRestHandler;
import org.opensearch.rest.BytesRestResponse;
import org.opensearch.rest.RestChannel;
import org.opensearch.rest.RestRequest;
import org.opensearch.sql.plugin.transport.PPLQueryAction;
import org.opensearch.sql.plugin.transport.TransportPPLQueryRequest;
import org.opensearch.sql.plugin.transport.TransportPPLQueryResponse;
import org.opensearch.sql.ppl.autocomplete.GrammarBundle;
import org.opensearch.sql.ppl.autocomplete.PPLGrammarBundleBuilder;
import org.opensearch.transport.client.node.NodeClient;

/*
* REST handler for {@code GET /_plugins/_ppl/_grammar}.
*
* @opensearch.experimental
*/
@ExperimentalApi
@Log4j2
public class RestPPLGrammarAction extends BaseRestHandler {

private static final String ENDPOINT_PATH = "/_plugins/_ppl/_grammar";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update endpoint documentation?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added endpoint docs and stabilized doctest example output.


@Override
public String getName() {
return "ppl_grammar_action";
}

@Override
public List<Route> routes() {
return ImmutableList.of(new Route(GET, ENDPOINT_PATH));
}

@Override
protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client)
throws IOException {

return channel -> {
try {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint bypasses the transport action layer that other PPL handlers use (e.g., RestPPLQueryActionPPLQueryAction). While the /_plugins/_ppl/ route prefix is likely covered by the security plugin's route-level rules, the inconsistency means this endpoint's access control depends entirely on that assumption rather than explicit action-level permissions. Worth confirming that the security plugin's default config protects this path, or add a transport action wrapper for consistency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in this PR: /_plugins/_ppl/_grammar now performs an explicit transport-action authorization check via PPLQueryAction before serving the bundle, so access control is no longer only route-prefix-based. I also added security tests for grammar endpoint access with and without PPL permission.

authorizeRequest(
client,
new ActionListener<>() {
@Override
public void onResponse(TransportPPLQueryResponse ignored) {
try {
GrammarBundle bundle = getBundle();
XContentBuilder builder = channel.newBuilder();
serializeBundle(builder, bundle);
channel.sendResponse(new BytesRestResponse(RestStatus.OK, builder));
} catch (Exception e) {
log.error("Error building or serializing PPL grammar", e);
sendErrorResponse(channel, e);
}
}

@Override
public void onFailure(Exception e) {
log.error("PPL grammar authorization failed", e);
sendErrorResponse(channel, e);
}
});
} catch (Exception e) {
log.error("Error authorizing PPL grammar request", e);
sendErrorResponse(channel, e);
}
};
}

@VisibleForTesting
protected void authorizeRequest(
NodeClient client, ActionListener<TransportPPLQueryResponse> listener) {
client.execute(
PPLQueryAction.INSTANCE, new TransportPPLQueryRequest("", null, ENDPOINT_PATH), listener);
}

private void sendErrorResponse(RestChannel channel, Exception e) {
try {
channel.sendResponse(new BytesRestResponse(channel, e));
} catch (IOException ioException) {
log.error("Failed to send PPL grammar error response", ioException);
}
}

/** Gets the grammar bundle. Override in tests to inject a custom or failing bundle provider. */
@VisibleForTesting
protected GrammarBundle getBundle() {
return PPLGrammarBundleBuilder.getBundle();
}

private void serializeBundle(XContentBuilder builder, GrammarBundle bundle) throws IOException {
builder.startObject();

// Identity & versioning
builder.field("bundleVersion", bundle.getBundleVersion());
builder.field("antlrVersion", bundle.getAntlrVersion());
builder.field("grammarHash", bundle.getGrammarHash());
builder.field("startRuleIndex", bundle.getStartRuleIndex());

// Lexer ATN & metadata
builder.field("lexerSerializedATN", bundle.getLexerSerializedATN());
builder.field("lexerRuleNames", bundle.getLexerRuleNames());
builder.field("channelNames", bundle.getChannelNames());
builder.field("modeNames", bundle.getModeNames());

// Parser ATN & metadata
builder.field("parserSerializedATN", bundle.getParserSerializedATN());
builder.field("parserRuleNames", bundle.getParserRuleNames());

// Vocabulary
builder.field("literalNames", bundle.getLiteralNames());
builder.field("symbolicNames", bundle.getSymbolicNames());

// Autocomplete configuration
builder.field("tokenDictionary", bundle.getTokenDictionary());
builder.field("ignoredTokens", bundle.getIgnoredTokens());
builder.field("rulesToVisit", bundle.getRulesToVisit());

builder.endObject();
Comment on lines +110 to +135
Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I dislike that we're going to be directly exposing ANTLR implementation details like this.

It means any frontends wanting to build their own autocomplete (CLIs, CWL Console, anything else other than OSD) will need to introduce ANTLR dependencies. There's also just generally a lot of validation and conversion happening between AST/RelNode that lives outside of the antlr directly, so frontends still need to be pretty highly aware of how the backend works and per-command semantics. Did we already research & eliminate the option of making a custom format that encodes more useful information?

Guessing there isn't a better way or format without some heavy lifting :/

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,21 @@ protected void doExecute(
+ " false"));
return;
}

TransportPPLQueryRequest transportRequest = TransportPPLQueryRequest.fromActionRequest(request);
if (transportRequest.isGrammarRequest()) {
// Authorization is enforced by this transport action before returning grammar metadata in
// REST.
listener.onResponse(new TransportPPLQueryResponse("{}"));
return;
}

Metrics.getInstance().getNumericalMetric(MetricName.PPL_REQ_TOTAL).increment();
Metrics.getInstance().getNumericalMetric(MetricName.PPL_REQ_COUNT_TOTAL).increment();

QueryContext.addRequestId();

PPLService pplService = injector.getInstance(PPLService.class);
TransportPPLQueryRequest transportRequest = TransportPPLQueryRequest.fromActionRequest(request);
// in order to use PPL service, we need to convert TransportPPLQueryRequest to PPLQueryRequest
PPLQueryRequest transformedRequest = transportRequest.toPPLQueryRequest();
QueryContext.setProfile(transformedRequest.profile());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,16 @@ public String getRequest() {
* @return true if it is an explain request
*/
public boolean isExplainRequest() {
return path.endsWith("/_explain");
return path != null && path.endsWith("/_explain");
}

/**
* Check if request is for grammar metadata endpoint.
*
* @return true if it is a grammar metadata request
*/
public boolean isGrammarRequest() {
return path != null && path.endsWith("/_grammar");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.endsWith("/_grammar") would match a hypothetical future endpoint like /_plugins/_sql/_grammar
Should we consider matching against the full path constant like /_ppl/_grammar

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done the same thing historically:

It's a bit sloppy, but it works for the most part lol. Should maybe revisit using proper path routing some time, I'd like that as it'd make it easier to navigate how our code responds to requests.

}

/** Decide on the formatter by the requested format. */
Expand Down
Loading
Loading