Skip to content

Fix memory leak: ExecutionEngine recreated per query appending to global function registry#5222

Open
ahkcs wants to merge 2 commits intoopensearch-project:mainfrom
ahkcs:fix/memory-leak-execution-engine-singleton
Open

Fix memory leak: ExecutionEngine recreated per query appending to global function registry#5222
ahkcs wants to merge 2 commits intoopensearch-project:mainfrom
ahkcs:fix/memory-leak-execution-engine-singleton

Conversation

@ahkcs
Copy link
Collaborator

@ahkcs ahkcs commented Mar 10, 2026

Summary

  • Root cause: OpenSearchPluginModule.executionEngine() was missing @Singleton, so Guice created a new OpenSearchExecutionEngine on every SQL/PPL request. Each construction called registerOpenSearchFunctions(), which appended to PPLFuncImpTable.INSTANCE.externalFunctionRegistry via compute() — growing the list by 1 entry per query, forever.
  • After ~730K queries, the GEOIP function list contained ~730K entries, each holding 14 objects (GeoIpFunction, SqlIdentifier, CalciteFuncSignature, PPLTypeChecker, etc.), retaining ~2.12GB of heap.
  • The bug existed pre-3.3 but was invisible because each leaked entry was only ~3 lightweight objects. PR Implement type checking for aggregation functions with Calcite #4024 changed registration from lazy (lambda) to eager (full object tree), making each entry ~14 objects and pushing the leak into OOM territory.

…bal function registry

The OpenSearchExecutionEngine was not scoped as @singleton in the Guice
module, causing a new instance to be created for every SQL/PPL request
via injector.getInstance(SQLService.class). Each construction called
registerOpenSearchFunctions(), which appended entries to the static
PPLFuncImpTable.INSTANCE.externalFunctionRegistry via compute(). After
N queries, the GEOIP function list contained N entries (each holding
GeoIpFunction, SqlIdentifier, CalciteFuncSignature, PPLTypeChecker,
etc.), leading to ~2GB heap retention after ~730K queries.

Fix:
1. Add @singleton to the ExecutionEngine provider in OpenSearchPluginModule
   so only one instance is created per Guice injector.
2. Change registerExternalOperator() from compute()/append to put()/replace
   so repeated registrations overwrite instead of accumulating.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit a77e23b.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java426lowChanging from compute() (which accumulates entries per key) to put() with List.of() (immutable single-entry list) silently discards all prior registrations for the same function name. While this appears to be a simplification refactor, it subtly changes semantics so that repeated calls to registerExternalOperator for the same function overwrite rather than accumulate, which could be abused to replace legitimately registered functions. No clear malicious intent, but the behavioral change is non-obvious and warrants review.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

PR Reviewer Guide 🔍

(Review updated until commit 1d661ac)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add @singleton to ExecutionEngine provider

Relevant files:

  • plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java

Sub-PR theme: Replace compute() with put() in registerExternalOperator

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

⚡ Recommended focus areas for review

Overwrite Risk

The change from compute() (which appended to an existing list) to put() (which replaces the entire entry) means that if registerExternalOperator is called multiple times for the same functionName, only the last registration will be retained. While the @singleton fix prevents repeated calls per query, this behavioral change could silently drop registrations if multiple operators are legitimately registered for the same function name in other code paths.

externalFunctionRegistry.put(
    functionName,
    List.of(
        Pair.of(signature, (FunctionImp) (builder, args) -> builder.makeCall(operator, args))));
Thread Safety

With @Singleton, the OpenSearchExecutionEngine instance is now shared across all concurrent requests. It should be verified that OpenSearchExecutionEngine is thread-safe (stateless or properly synchronized), as concurrent SQL/PPL queries will now share the same instance.

@Singleton
public ExecutionEngine executionEngine(
    OpenSearchClient client, ExecutionProtector protector, PlanSerializer planSerializer) {
  return new OpenSearchExecutionEngine(client, protector, planSerializer);

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

PR Code Suggestions ✨

Latest suggestions up to 1d661ac
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve existing function overloads on re-registration

Using List.of() creates an immutable list, which replaces any existing registrations
for the same functionName with a single-entry list. If the same function name is
registered multiple times (e.g., with different signatures/overloads), all previous
registrations will be lost. Consider preserving existing entries while still
avoiding unbounded growth.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [426-429]

-externalFunctionRegistry.put(
+externalFunctionRegistry.merge(
     functionName,
-    List.of(
-        Pair.of(signature, (FunctionImp) (builder, args) -> builder.makeCall(operator, args))));
+    new ArrayList<>(List.of(
+        Pair.of(signature, (FunctionImp) (builder, args) -> builder.makeCall(operator, args)))),
+    (existingList, newList) -> {
+      existingList.addAll(newList);
+      return existingList;
+    });
Suggestion importance[1-10]: 6

__

Why: The PR replaced compute() (which preserved existing entries) with put() using List.of(), which overwrites any existing registrations for the same functionName. If multiple overloads need to be registered under the same name, this change would silently discard previous entries. The suggestion to use merge() is valid and addresses a real behavioral regression, though it depends on whether multiple registrations per name are actually expected.

Low

Previous suggestions

Suggestions up to commit a77e23b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve existing registrations when adding new entries

Using List.of() creates an immutable list, which will replace any previously
registered functions for the same functionName key. If multiple operators can be
registered under the same functionName, this change will silently discard all prior
registrations for that key. Consider whether the original intent of accumulating
multiple entries per key should be preserved, or if overwriting is intentional.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [426-428]

-externalFunctionRegistry.put(
+externalFunctionRegistry.merge(
     functionName,
-    List.of(Pair.of(signature, (FunctionImp) (builder, args) -> builder.makeCall(operator, args))));
+    new ArrayList<>(List.of(Pair.of(signature, (FunctionImp) (builder, args) -> builder.makeCall(operator, args)))),
+    (existingList, newList) -> {
+      existingList.addAll(newList);
+      return existingList;
+    });
Suggestion importance[1-10]: 7

__

Why: The PR replaces compute() (which accumulated entries) with put() using List.of(), which silently overwrites any existing registrations for the same functionName. If multiple operators can be registered under the same key, this is a behavioral regression. The suggestion to use merge() correctly preserves the original accumulation semantics.

Medium

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 1d661ac

@ahkcs ahkcs added the bugFix label Mar 11, 2026
Copy link
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that registerExternalOperator is never called more than once for the same BuiltinFunctionName? The old compute() logic accumulated overloads, while the new put() replaces them. If single-registration is guaranteed, this simplification LGTM. Also, please verify that no downstream code mutates the list retrieved from externalFunctionRegistry, since List.of() is immutable.

@ahkcs
Copy link
Collaborator Author

ahkcs commented Mar 12, 2026

Can you confirm that registerExternalOperator is never called more than once for the same BuiltinFunctionName? The old compute() logic accumulated overloads, while the new put() replaces them. If single-registration is guaranteed, this simplification LGTM. Also, please verify that no downstream code mutates the list retrieved from externalFunctionRegistry, since List.of() is immutable.

registerExternalOperator is only called once per function name (GEOIP and DISTINCT_COUNT_APPROX), and the new @Singleton on the engine ensures it only runs once anyway. So put() replacing compute() is safe — there's nothing to accumulate.
Verified that there's no downstream mutation. All code that reads from the registry just iterates the list; nothing tries to .add() or modify it. List.of() being immutable is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants