Reduce allocations + copies due to rebuffering in LocalDataResponse row serialization by cscotta · Pull Request #4707 · apache/cassandra

cscotta · 2026-04-05T14:30:41Z

https://issues.apache.org/jira/browse/CASSANDRA-21285

…ow serialization

src/java/org/apache/cassandra/db/ReadResponse.java

netudima · 2026-04-05T15:00:38Z

the only potential concern for the approach itself: if we have frequent enough heavy select from one table and light selects from other tables (or even the same table but using different queries) then we will allocate more for every request as a tradeoff. But it is probably an unavoidable issue.., maybe we can limit a max value to limit an impact in such use cases..

…ut buffer without duplicating.

cscotta · 2026-04-05T18:18:27Z

Thanks, was thinking about this a bit.

The decay is pretty quick (on the basis of the last 1000 LocalDataResponses generated), which probably helps. But a workload that's 99.9% 128-byte responses and 0.01% 10MB responses would still be degenerate. If the table name/ID were available in this scope, that would make it easier to vary by table - but still not perfect.

One simple approach might be to track a histogram of generated response sizes and calculate whether the old behavior or new behavior would be preferable. But I also don't mean to go overkill on it and introduce stats tracking/comparison in every response generation.

I may look at deploying this for a variety of workloads with additional instrumentation that measures "better or worse" as well. Could also feature-flag it.

Interested in your + others' thoughts on this.

netudima · 2026-04-05T20:25:26Z

actually table name/ID could be tricky here because you will have to implement a cleanup (or forget logic) to not leak memory in case of a table drop.
I see the following ways:

the simplest option is to have an upper limit here for the value - min(limit, metricValue) to reduce impact if the metric value is too high. If the configurable limit value is non-positive we can disable the metric usage and apply the old behaviuor just in case.
slightly different way is to apply a limit to the input, not output, of the metric - estimatedResponseBytes.update(min(limit, currentResponseSize)) - to reduce impact of too huge requests to an avg calculation.
use a median, not avg to avoid impact from size pikes by introducing a histogram here but histogram values retrieval is more expensive and we will have to cache them as we do for speculative retry threshold value.

maedhroz

Mostly LGTM

In terms of safety, perhaps a good initial path forward here would just be to have a system property that controls the maximum initial size. Then, this whole optimization could be trivially disabled by setting it to the current minimum, which, actually, could also be a system property. They could default to 128 KiB / 1 MiB, and be loaded at static initialization time in LocalDataResponse from CassandraRelevantProperties. You end up with something like...

double bufferSizeEstimate = Double.isNaN(estimatedResponseSize) ? <min> : Math.min(estimatedResponseSize, <max>);

(I think the default max is the one thing to bike-shed, but ultimately that depends on the workload.)

Reduce allocations + copies due to rebuffering in LocalDataResponse r…

1f5c87c

…ow serialization

netudima reviewed Apr 5, 2026

View reviewed changes

src/java/org/apache/cassandra/db/ReadResponse.java Outdated Show resolved Hide resolved

src/java/org/apache/cassandra/db/ReadResponse.java Outdated Show resolved Hide resolved

Eliminate duplicate call to estimatedResponseBytes.get(); return outp…

df95df1

…ut buffer without duplicating.

maedhroz reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations + copies due to rebuffering in LocalDataResponse row serialization#4707

Reduce allocations + copies due to rebuffering in LocalDataResponse row serialization#4707
cscotta wants to merge 2 commits intoapache:trunkfrom
cscotta:CASSANDRA-21285

cscotta commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

netudima commented Apr 5, 2026

Uh oh!

cscotta commented Apr 5, 2026 •

edited

Loading

Uh oh!

netudima commented Apr 5, 2026 •

edited

Loading

Uh oh!

maedhroz left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cscotta commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

netudima commented Apr 5, 2026

Uh oh!

cscotta commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netudima commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maedhroz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cscotta commented Apr 5, 2026 •

edited

Loading

cscotta commented Apr 5, 2026 •

edited

Loading

netudima commented Apr 5, 2026 •

edited

Loading

maedhroz left a comment •

edited

Loading