Skip to content

Collect stage-level metrics#461

Open
gabotechs wants to merge 1 commit into
gabrielmusat/factor-out-distributed-recursion-fixedfrom
gabrielmusat/stage-metrics
Open

Collect stage-level metrics#461
gabotechs wants to merge 1 commit into
gabrielmusat/factor-out-distributed-recursion-fixedfrom
gabrielmusat/stage-metrics

Conversation

@gabotechs
Copy link
Copy Markdown
Collaborator

@gabotechs gabotechs commented May 23, 2026

This is one PR from the following stack of PRs:

This PR introduces stage-level metrics collection to provide better observability into the execution of distributed query plans. By collecting metrics at the stage level, we can understand performance characteristics of individual stages and optimize task distribution and scheduling decisions based on actual runtime data.

@gabotechs gabotechs changed the base branch from main to gabrielmusat/local-worker-connections May 23, 2026 12:49
@gabotechs gabotechs marked this pull request as ready for review May 23, 2026 12:55
@gabotechs gabotechs mentioned this pull request May 24, 2026
@gabotechs gabotechs force-pushed the gabrielmusat/stage-metrics branch from cba86e8 to aedad16 Compare May 26, 2026 13:21
@gabotechs gabotechs changed the base branch from gabrielmusat/local-worker-connections to gabrielmusat/factor-out-distributed-recursion May 26, 2026 13:26
@gabotechs gabotechs changed the base branch from gabrielmusat/factor-out-distributed-recursion to gabrielmusat/factor-out-distributed-recursion-fixed May 26, 2026 13:29
@gabotechs gabotechs force-pushed the gabrielmusat/factor-out-distributed-recursion-fixed branch from e28d614 to 12a761b Compare May 26, 2026 13:53
@gabotechs gabotechs force-pushed the gabrielmusat/stage-metrics branch from aedad16 to af69055 Compare May 26, 2026 13:53
gabotechs added a commit that referenced this pull request May 26, 2026
… be the same (#427)

This is one PR from the following stack of PRs:
- #427
<- you are here
- #469
- #461
- #462
- #463
- #464
- #432

This is a preparatory step towards:
-
#377

This is an optimization that allows workers to communicate in-memory
avoiding network calls and serialization in case it needs to communicate
to itself.

This optimization shows good improvements if using a small number of
workers, and fades away as more workers are used.

Even if this shows improvements today, it will become very meaningful in
adaptative query execution: there will be times when two consecutive
stages are assigned 1 task each, and as that was done dynamically, we
cannot eagerly do the optimization that collapses those two stages into
one, so instead, we assigned both stages to the same worker, and the
optimization in this PR kicks in, executing the plan fully locally.
@gabotechs gabotechs force-pushed the gabrielmusat/factor-out-distributed-recursion-fixed branch from 12a761b to c5e321e Compare May 26, 2026 18:03
@gabotechs gabotechs force-pushed the gabrielmusat/stage-metrics branch from af69055 to 72c7a9e Compare May 26, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant