Skip to content

feat: Add supervisor:get_tree/1 for full supervision tree introspection#10972

Open
km-git007 wants to merge 2 commits intoerlang:masterfrom
km-git007:feature/supervisor-tree-introspection
Open

feat: Add supervisor:get_tree/1 for full supervision tree introspection#10972
km-git007 wants to merge 2 commits intoerlang:masterfrom
km-git007:feature/supervisor-tree-introspection

Conversation

@km-git007
Copy link
Copy Markdown

Operators debugging distributed Erlang applications often need to understand the complete supervision structure at runtime. Previously, this required manually walking the supervision tree by repeatedly calling supervisor:which_children/1 on each supervisor found.

This commit introduces supervisor:get_tree/1, which returns a complete hierarchical representation of the supervision tree rooted at a given supervisor. The function recursively builds a nested structure containing:

  • Child ID, PID, type (worker/supervisor), and modules
  • Full subtrees for child supervisors
  • Empty subtrees for worker processes

The feature enables:

  • Better debugging of supervision hierarchies
  • Improved application visualization tools
  • Runtime introspection for operational monitoring
  • Automation of cluster topology discovery

The implementation:

  • Uses existing supervisor:which_children/1 as its foundation
  • Handles all supervisor reference types (pid, atom, global, via)
  • Gracefully handles errors and nonexistent supervisors
  • Returns structured data suitable for JSON serialization or display

Added comprehensive tests for:

  • Basic tree building with multiple children
  • Supervisor and worker process types
  • Terminated and undefined child processes
  • Error handling for invalid references

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 4, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

CT Test Results

    2 files    100 suites   1h 8m 54s ⏱️
2 295 tests 2 241 ✅ 52 💤 2 ❌
2 714 runs  2 656 ✅ 56 💤 2 ❌

For more details on these failures, see this check.

Results for commit 3388e9a.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Apr 7, 2026
Comment thread lib/stdlib/src/supervisor.erl Outdated
Comment on lines +884 to +886
SupPid = get_sup_pid(Supervisor),
{SupPid, [build_tree_node(Id, Child, Type, Modules) ||
{Id, Child, Type, Modules} <- Children]}
Copy link
Copy Markdown
Contributor

@josevalim josevalim Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems get_sup_pid is a bit unrelated to this function. If you need the supervisor pid, then perhaps you could call a separate function and expand it yourself, and keep it outside of this new functionality?

Furthermore, there is a subtle race condition in this code. The Supervisor can actually change between which_children and get_sup_pid, so you would return the children of one supervisor and the Pid of another.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching both issues. I've fixed the race condition by resolving the supervisor reference first via resolve_supervisor_reference/1 and then passing the stable PID to which_children/1. I've also extracted the helper into a standalone resolve_supervisor_reference/1 function with a spec and a comment explaining the race-avoidance rationale, keeping it separate from the tree-building logic.

%% Add a worker child
{ok, Child1} = supervisor:start_child(SupPid, #{id => child1,
start => {supervisor_1, start_child, []}}),
{SupPid, [{child1, Child1, worker, [supervisor_1], []}]} = supervisor:get_tree(SupPid),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not testing many of the scenarios in the code above, such as different names, undefined children, etc.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've added 5 new tests covering: local atom name reference, global name reference, undefined (terminated) children, nested supervisor hierarchies, and mixed running/undefined child states. These are now grouped under a get_tree test group in the suite.

Comment thread lib/stdlib/src/supervisor.erl Outdated
%% Helper function to build a tree node for a single child
build_tree_node(Id, Child, Type, Modules) ->
Subtree = case Type of
supervisor when is_pid(Child), Child =/= undefined ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Child is a pid, then it is certainly =/= from undefined:

Suggested change
supervisor when is_pid(Child), Child =/= undefined ->
supervisor when is_pid(Child) ->

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, is_pid/1 already excludes undefined. Applied the suggestion.

Operators debugging distributed Erlang applications often need to understand
the complete supervision structure at runtime. Previously, this required
manually walking the supervision tree by repeatedly calling
supervisor:which_children/1 on each supervisor found.

This commit introduces supervisor:get_tree/1, which returns a complete
hierarchical representation of the supervision tree rooted at a given
supervisor. The function recursively builds a nested structure containing:
- Child ID, PID, type (worker/supervisor), and modules
- Full subtrees for child supervisors
- Empty subtrees for worker processes

The feature enables:
- Better debugging of supervision hierarchies
- Improved application visualization tools
- Runtime introspection for operational monitoring
- Automation of cluster topology discovery

The implementation:
- Uses existing supervisor:which_children/1 as its foundation
- Handles all supervisor reference types (pid, atom, global, via)
- Gracefully handles errors and nonexistent supervisors
- Returns structured data suitable for JSON serialization or display

Added comprehensive tests for:
- Basic tree building with multiple children
- Supervisor and worker process types
- Terminated and undefined child processes
- Error handling for invalid references
- Fix race condition in get_tree/1: resolve supervisor reference once
  via resolve_supervisor_reference/1 and reuse the stable PID for
  which_children/1, preventing children of one supervisor being paired
  with the PID of another when a registered name rebinds between calls
- Rename get_sup_pid/1 to resolve_supervisor_reference/1 with -spec and
  comment explaining the race-avoidance intent; separates reference
  resolution from tree-building logic
- Simplify build_tree_node/4 guard: is_pid(Child) already excludes
  undefined, so the redundant Child =/= undefined clause is removed
- Add 5 tests covering previously untested scenarios: local atom name
  reference, global name reference, undefined children, nested
  supervisor hierarchies, and mixed running/undefined child states
@km-git007 km-git007 force-pushed the feature/supervisor-tree-introspection branch from feff915 to 3388e9a Compare April 9, 2026 17:34
Comment on lines +448 to +450
-type supervision_tree() :: {Pid :: pid(),
[supervision_tree_node()]} |
{error, no_supervision_info}.
Copy link
Copy Markdown
Contributor

@juhlig juhlig Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error response does not belong in this type...

Suggested change
-type supervision_tree() :: {Pid :: pid(),
[supervision_tree_node()]} |
{error, no_supervision_info}.
-type supervision_tree() :: {Pid :: pid(),
[supervision_tree_node()]}.

Instead it should be...

```
""".
-doc(#{since => <<"OTP 28.0">>}).
-spec get_tree(SupRef) -> supervision_tree() when
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... here.

Suggested change
-spec get_tree(SupRef) -> supervision_tree() when
-spec get_tree(SupRef) -> supervision_tree() | {error, no_supervision_info} when

Copy link
Copy Markdown
Contributor

@Maria-12648430 Maria-12648430 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree.

But {error, no_supervision_info} is in itself not all that great. It glosses over, like, everything. SupRef could not be a process identifier (of sorts). Even if, the process could not be alive (any more). Even if, it could be that the process is not a supervisor. Or something else could have gone wrong when building the tree. All that is conflated into this one error tuple, and even that I would object to: it should rather be a badarg exception, depending on what went wrong.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maria-12648430 you're not putting too fine a point on it, but I agree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry... 🙍‍♀️ Just when the jetlag from going to Singapore started wearing off, I had to return to Germany, and there it is back again... 😩 So, sorry @km-git007 for being blunt. Nevertheless, my point stands.

@IngelaAndin
Copy link
Copy Markdown
Contributor

IngelaAndin commented Apr 14, 2026

@km-git007 We think the idea is interesting. It will take some polishing, and it is not in time for OTP-29.0 so we will come back to you with OTP-team feedback after the the release, as we need to prioritize the release right now. @josevalim , @Maria-12648430 and@juhlig usually have good feedback, so that you can consider in the mean time. It could end up in later version of 29 or in 30 depending on a lot of things that we can not address right now, just saying hang in there.

@IngelaAndin IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Apr 14, 2026
@@ -2537,3 +2596,43 @@ dyn_init(Child,State) when ?is_temporary(Child) ->
State#state{dynamics={mapsets,maps:new()}};
dyn_init(_Child,State) ->
State#state{dynamics={maps,maps:new()}}.

%% Resolve a supervisor reference (pid | atom | {global,Name} | {via,Mod,Name})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sup_ref can also be {Name, Node}, which your implementation does not consider.

Aside from that, I think a function like this would be better placed in the gen module (with another name, of course), since it equally applies to all other gen_* behaviors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The other thing is that this function makes many assumptions (for example, it will catch child failures, it will return the PID, etc), which I am not sure are general enough for inclusion either here or in gen. It feels this is app specific code or, if the goal is debugging, add a version the prints the tree to the c module, without exposing the implementation details seen here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim are you referring to the build_tree_node function below? As far as that goes, I agree 👍

But in this comment, I was referring to only resolve_supervisor_reference, which determines the pid for a given sup_ref.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it equally applies to all other gen_* behaviors.

Oh, that would indeed be handy, I'd love that 🥰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled waiting for input by the Erlang/OTP team team:PS Assigned to OTP team PS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants