Skip to content

Comments

Fix Process resource leak in system metrics collection#17212

Open
PDGGK wants to merge 1 commit intoapache:masterfrom
PDGGK:fix-process-resource-leak-metrics
Open

Fix Process resource leak in system metrics collection#17212
PDGGK wants to merge 1 commit intoapache:masterfrom
PDGGK:fix-process-resource-leak-metrics

Conversation

@PDGGK
Copy link

@PDGGK PDGGK commented Feb 22, 2026

Summary

  • Add process.waitFor() and process.destroyForcibly() in finally block to prevent zombie processes in periodic metrics collection
  • Handle InterruptedException properly by restoring the interrupt flag
  • Fixes both SystemMetrics (memory) and LinuxNetMetricManager (socket count)

Fixes #17211

Test plan

  • Code compiles
  • CI checks

Add process.waitFor() and process.destroyForcibly() in finally block
to prevent zombie processes when collecting system memory and network
metrics via Runtime.exec(). Also handle InterruptedException properly
by restoring the interrupt flag.

Affected files:
- SystemMetrics: memory info collection via 'free' command
- LinuxNetMetricManager: socket count collection via 'ss' command
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes process resource leaks in Linux system metrics collection by adding proper process lifecycle management. The changes ensure that child processes spawned via Runtime.exec() are properly waited for and cleaned up, preventing zombie processes and file descriptor leaks that could accumulate over time in periodic metrics collection.

Changes:

  • Added process.waitFor() calls to reap child processes after reading their output
  • Added finally blocks with process.destroyForcibly() to ensure cleanup on exceptions
  • Added proper InterruptedException handling with interrupt flag restoration

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
SystemMetrics.java Fixed process resource leak in Linux memory metrics collection by adding waitFor(), destroyForcibly(), and InterruptedException handling
LinuxNetMetricManager.java Fixed process resource leak in socket count metrics collection by adding waitFor(), destroyForcibly(), and InterruptedException handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

logger.debug("Interrupted while waiting for memory command", e);
Thread.currentThread().interrupt();
} finally {
if (process != null) {
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling destroyForcibly() unconditionally in the finally block can forcibly terminate a process that has already completed successfully via waitFor(). This is problematic because:

  1. After waitFor() returns successfully, the process has already exited normally, and destroyForcibly() should not be needed
  2. destroyForcibly() may interfere with proper cleanup of process resources on some platforms
  3. This pattern goes against Java best practices for process management

The fix should only call destroyForcibly() when the process hasn't completed normally (i.e., when an exception occurs before or during waitFor()). Consider adding a boolean flag to track whether waitFor() completed successfully, or only destroy the process in the catch blocks.

Suggested change
if (process != null) {
if (process != null && process.isAlive()) {

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +242
} finally {
if (process != null) {
process.destroyForcibly();
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling destroyForcibly() unconditionally in the finally block can forcibly terminate a process that has already completed successfully via waitFor(). This is problematic because:

  1. After waitFor() returns successfully, the process has already exited normally, and destroyForcibly() should not be needed
  2. destroyForcibly() may interfere with proper cleanup of process resources on some platforms
  3. This pattern goes against Java best practices for process management

The fix should only call destroyForcibly() when the process hasn't completed normally (i.e., when an exception occurs before or during waitFor()). Consider adding a boolean flag to track whether waitFor() completed successfully, or only destroy the process in the catch blocks.

Copilot uses AI. Check for mistakes.
process.waitFor();
this.connectionNum = Integer.parseInt(result.toString().trim());
} catch (IOException e) {
LOGGER.error("Failed to get socket num", e);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Integer.parseInt() call can throw NumberFormatException if the command output is not a valid integer, but this exception is not caught. If the external command returns unexpected output, this will cause an unhandled exception that could disrupt metrics collection. Consider adding a catch block for NumberFormatException to handle malformed output gracefully, similar to how IOException is handled.

Suggested change
LOGGER.error("Failed to get socket num", e);
LOGGER.error("Failed to get socket num", e);
} catch (NumberFormatException e) {
LOGGER.error(
"Failed to parse socket num from command output: '{}'",
e.getMessage());

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Process resource leak in system metrics collection (SystemMetrics and LinuxNetMetricManager)

1 participant