Skip to content

feat: runtime stage's metrics rendering#1778

Open
sandugood wants to merge 9 commits into
apache:mainfrom
sandugood:feat/rest-show-running-jobs
Open

feat: runtime stage's metrics rendering#1778
sandugood wants to merge 9 commits into
apache:mainfrom
sandugood:feat/rest-show-running-jobs

Conversation

@sandugood
Copy link
Copy Markdown
Contributor

@sandugood sandugood commented May 26, 2026

Which issue does this PR close?

Closes #1743 .

Rationale for this change

Right now we don't expose stage-level metrics via Scheduler's REST API interface. We only have task level metrics (which are essential and explanatory by themselves, but not full).

What changes are included in this PR?

Introduced a new query parameter (PlanFormat) enum, which can be Default (no metrics and tree-style rendering; for backward compatibility), Tree (tree-style rendering) and Metrics (metrics of each stage of the execution)

format_node is responsible for rendering the metrics in right order

Results for a tpchgen-cli -s 1 --format parquet --output-dir testdata generated dataset with metrics rendering.

curl -s localhost:50050/api/job/cjaZtGS/stages?plan_format=metrics | jq -r ".stages.[1].stage_plan"
SortShuffleWriterExec: partitioning=Hash([c_custkey@0], 4), metrics=[write_time=158.15ms, output_rows=150.0 K, spill_count=0, repart_time=11.19ms, spill_time=4ns, input_rows=150.0 K, spill_bytes=0]
  AggregateExec: mode=Partial, gby=[c_custkey@0 as c_custkey], aggr=[sum(test_parquet.c_acctbal)], metrics=[elapsed_compute=94.16ms, aggregation_time=5.38ms, output_rows=150.0 K, peak_mem_used=10.16 M, output_bytes=45.0 MB, start_timestamp=2026-05-26 20:59:43.356634094 UTC, skipped_aggregation_rows=0, emitting_time=62.03µs, output_batches=20, aggregate_arguments_time=147.62µs, end_timestamp=2026-05-26 20:59:43.546108325 UTC, spilled_bytes=0.0 B, spill_count=0, reduction_factor=100% (150.0 K/150.0 K), spilled_rows=0, time_calculating_group_ids=87.23ms]
    DataSourceExec: ...

Are there any user-facing changes?

No, because users can continue to use /api/job/{job_id}/stages without a query parameter. Using plan_format=tree or plan_format=metrics is optional

@sandugood sandugood force-pushed the feat/rest-show-running-jobs branch from 157b9d8 to ba87335 Compare May 26, 2026 21:38
@sandugood sandugood changed the title feat: runtime stages metrics rendering feat: runtime stage's metrics rendering May 26, 2026
Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sandugood

suggestion, can we have three options for this

  • no metrics, (default) same behaviour like we had
  • tree render
  • with metrics, user requested

Comment thread ballista/scheduler/src/api/handlers.rs Outdated
Comment thread ballista/scheduler/src/api/handlers.rs Outdated
Comment thread ballista/scheduler/src/api/handlers.rs Outdated
Comment thread ballista/scheduler/src/api/handlers.rs
Comment thread ballista/scheduler/src/api/handlers.rs Outdated
@sandugood
Copy link
Copy Markdown
Contributor Author

Introduced a new enum (which has a default value). Going to add info about the API change to PR's overview

/// ?plan_format=default => plain indent, no metrics
#[default]
Default,
/// ?plan_format=tree => tree render, no metrics
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the parameter name at https://github.com/apache/datafusion-ballista/blob/main/ballista-cli/src/tui/http_client.rs#L132
We can add support for metrics in a follow-up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. After this one gets merged I can take care of the tui part, if you dont mind

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that script with a new -m param

Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, will try it tomorrow, may have a bit more

Comment thread ballista/scheduler/src/api/handlers.rs Outdated
Comment thread ballista/scheduler/src/api/handlers.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REST API to expose job runtime metrics

3 participants