Add payload size metrics by jeremy-cook · Pull Request #1289 · temporalio/sdk-rust

jeremy-cook · 2026-05-21T19:29:11Z

What was changed

This adds payload-size metrics for three paths:

workflow_payload_size for workflow activation inputs and successful completion results
activity_payload_size for activity inputs and successful results
rpc_message_size for client gRPC request and response bodies

The metrics use byte units, shared payload-size histogram buckets that cover the 4 MiB boundary, and low-cardinality message_direction labels for request and response series. Payload metric names and message_direction label values live in common so client and core emitters use the same names and cardinality values.

The client transport wrapper records streamed DATA frame bytes without changing the body, while workflow and activity metrics count payload data plus metadata key/value bytes.

Why?

These metrics make payload growth visible before it hits transport limits, and separate logical payload sizing from client-side RPC body sizing.

Testing

cargo +nightly fmt --all --check
cargo test -p temporalio-client body_size_recorder_counts_data_frames
cargo test -p temporalio-sdk-core payload_size_counts_data_and_metadata_bytes

Record logical workflow and activity payload sizes alongside client-side gRPC message body sizes so users can detect payload growth before hitting transport limits. This adds workflow_payload_size metrics for workflow activation inputs and successful completion results, activity_payload_size metrics for activity inputs and successful results, and rpc_message_size metrics for client gRPC request and response bodies. The metrics use byte units, shared payload-size histogram buckets that cover the 4 MiB boundary, and low-cardinality message_direction labels for request and response series. Payload metric names and message_direction labels live in common so client and core emitters use the same names and cardinality values. The client transport wrapper records streamed DATA frame bytes without changing the body, while workflow and activity metrics count payload data plus metadata key/value bytes. Tests cover payload byte accounting, Prometheus series labels/counts for activity and workflow payload metrics, request and response RPC message-size series, and non-zero activity payload sums.

Sushisource · 2026-05-21T22:38:02Z

Let me bring this up with the rest of the team. We already record many, many metrics and emitting new ones can impose costs on users. We can potentially add these behind a flag. Often though our answer is to add these yourself via interceptors if you find you have a need for them.

However, these particular metrics have come up once or twice before, and are in line with some of our work around storing payloads externally.

I'll get back to you.

jeremy-cook · 2026-05-21T23:28:31Z

Definitely, and thanks! I did assume this one might be a bit more controversial, but thought I would throw it up as a proposal in any case; It was the source of motivation for implementing the activity interceptors first :)

My take for why this might be a beneficial one for the collective is hitting the 4mb payload limit is an immediate unrecoverable workflow failure, and devs calculating the in production payload creep is an easy one to go unobserved until failure.

Something to consider with storing the payloads externally. I've discussed this with some other folks on the temporal team, but there is a catch to the current implementation, at least in the go SDK. The commands are accumulated in a list before getting sent out, so while the individual payloads might not exceed the 4mb limit or trigger the external storage threshold, the accumulation of all the commands before a yield can result in exceeding 4mb. i.e. 500 workflows triggered in a loop before the first future.Get is called.

As you mentioned though, with the interceptors this can be done client side now, so either solution works for me.

Thanks for the consideration!

Sushisource · 2026-05-21T23:47:19Z

the accumulation of all the commands before a yield can result in exceeding 4mb. i.e. 500 workflows triggered in a loop before the first future.Get is called.

We have a fix coming for this too, where we'll paginate.

jeremy-cook requested a review from a team as a code owner May 21, 2026 19:29

jeremy-cook force-pushed the jecook-native-payload-metrics branch from f0fbbf8 to d503332 Compare May 21, 2026 21:43

jeremy-cook force-pushed the jecook-native-payload-metrics branch from d503332 to 131d8a2 Compare May 21, 2026 21:48

Merge branch 'main' into jecook-native-payload-metrics

506b1b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add payload size metrics#1289

Add payload size metrics#1289
jeremy-cook wants to merge 2 commits into
temporalio:mainfrom
jeremy-cook:jecook-native-payload-metrics

jeremy-cook commented May 21, 2026 •

edited

Loading

Uh oh!

Sushisource commented May 21, 2026

Uh oh!

jeremy-cook commented May 21, 2026

Uh oh!

Sushisource commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeremy-cook commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Why?

Testing

Uh oh!

Sushisource commented May 21, 2026

Uh oh!

jeremy-cook commented May 21, 2026

Uh oh!

Sushisource commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremy-cook commented May 21, 2026 •

edited

Loading