Skip to content

ResultStream.result_size reports inconsistent units for bytes vs str writes #53

@nficano

Description

@nficano

src/arcp/_runtime/result_stream.py:63 accumulates self._total_size += len(payload.encode("utf-8")) where payload is whatever the writer is about to send on the wire. When the caller passes bytes, payload is the base64-encoded ASCII string at src/arcp/_runtime/result_stream.py:49, so the counter records roughly 4/3 the original byte count. When the caller passes str with encoding="utf8", payload is the original string and the counter records the UTF-8 byte length. The terminal job.result envelope echoes that counter as result_size at src/arcp/_runtime/result_stream.py:83, so a client that calls handle.collect_chunks() and compares len(out) against result.result_size will see a ~33% mismatch only for the bytes path. Clients building hash or quota checks on top of result_size get silently wrong answers for binary streams.

Fix prompt: Track the size of the input data, not the encoded wire payload. At src/arcp/_runtime/result_stream.py:40, record data_len = len(data) if isinstance(data, bytes) else len(data.encode("utf-8")) before the base64 branch, and update self._total_size from that. Document in the docstring that result_size is "bytes of decoded application data" so the contract is unambiguous. Add a parametrized test that writes the same logical payload via bytes and via str and asserts both runs report identical result_size to the client. If the spec requires a different definition, encode that explicitly in the docstring and the test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingseverity:lowLow severity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions