Skip to content

quicperf ed25519 tls#23

Merged
endel merged 8 commits into
endel:mainfrom
victorstewart:quicperf-ed25519-tls
May 15, 2026
Merged

quicperf ed25519 tls#23
endel merged 8 commits into
endel:mainfrom
victorstewart:quicperf-ed25519-tls

Conversation

@endel
Copy link
Copy Markdown
Owner

@endel endel commented May 15, 2026

No description provided.

quicperf exposed stalls in multistream upload and bidirectional loopback workloads when packets or control frames were lost or delayed. The sender could rewind send_offset for retransmission overflow, causing already-sent bytes to be counted against MAX_DATA again, and receiver-credit control frames could be suppressed while congestion-limited. BLOCKED and STREAM_DATA_BLOCKED frames were also effectively one-shot, so a lost control packet could leave peers stuck waiting for credit.

Fix this by tracking stream ACK ranges contiguously, coalescing retransmit ranges without rewinding send_offset, repeating blocked signals while still blocked, immediately re-advertising flow-control credit on peer blocked frames, coalescing duplicate pending control frames, and allowing receiver-credit MAX_* frames through congestion-limited packet assembly.

Verification: /root/quicperf built quiczigperf successfully; quiczigperf multistream_upload syscall t2 passed 80/80; multistream_upload iouring t1/t2 passed 40/40; bidi syscall t1 clean proof passed 40/40; adjacent multistream_upload, multistream_download, and bidi smoke passed 60/60 across syscall and iouring. In this fork checkout, Zig tests passed with /root/.local/zig/zig-x86_64-linux-0.16.0/zig build test -Dtarget=x86_64-linux-musl. The default glibc-targeted Zig test is blocked on this host by Zig linker handling of GCC 16 .sframe relocations in crt1.o, before changed code is compiled.
quicperf repeatedly failed quiczigperf bidi rows with thread_check_failed / missing_server_complete even after the client completed 128 MiB bidirectional transfers. A live stuck-server backtrace showed the server was not blocked in the C++ NetworkHub; it was inside quic-zig ACK processing, walking a compressed ACK range with largest_ack=256835 and first_ack_range=187349 one packet number at a time.

Make ACK processing scan the packets still in flight and test a large ACK range instead of iterating every packet number in the encoded range. Also keep ACK-only sends from force-generating sub-threshold delayed ACKs, which can produce congestion-limited ACK-only loops, and keep the fork aligned with the TLS time-source behavior used by the quicperf build.

Verification: zig build test -Dtarget=x86_64-linux-musl passes.
quicperf bidi syscall t1 exposed 0.06 Gbps outliers after the ACK-range fix. Tracing showed slow runs emitted about 275k-309k packets for one 64 MiB bidi transfer while fast runs emitted about 48k-54k packets. The slow path was dominated by repeated DATA_BLOCKED/STREAM_DATA_BLOCKED frames while the sender was waiting for the same flow-control limit to advance.

BLOCKED frames are advisory, so emit them once per blocked limit and re-arm when MAX_DATA/MAX_STREAM_DATA raises credit. This removes the control-frame packet storm without disabling the row or changing the quicperf C++ I/O path.

Verification: zig build test -Doptimize=ReleaseFast passed 518 tests.
quicperf bidi syscall t1 still measured only about 0.33 Gbps after the blocked-frame storm fix. perf record on the exact row showed 43.69% self CPU in std AutoArrayHashMap.orderedRemove from FrameSorter.pop -> ReceiveStream.read -> qzf_stream_recv.

The frame sorter is keyed by stream offset and does not require insertion order. Add an exact read_pos fetchSwapRemove fast path for the common in-order case and use swapRemove for fallback removals. This avoids O(n) ordered removal on every packet-sized stream chunk during bulk receive.

Verification: zig build test -Doptimize=ReleaseFast passed 518 tests.
In quicperf loss_recovery/iouring, the server could complete while the client remained stuck waiting for terminal bytes. Live trace showed send_offset below ack_offset after loss recovery, which underflowed app_unacked and left tens of MB reported as unsent after ACK progress.

Advance send_offset to the contiguous ACK point and trim retransmit ranges below ack_offset when ACKs advance. This prevents already ACKed data from re-entering the new-data send path after retransmission.

Verification:

- zig build test -Dtarget=x86_64-linux-musl

- quiczigperf loss_recovery/iouring t1: 2 warmups + 40 measured samples passed, p50 3.505395 Gbps
quicperf upload/syscall t2 exposed a quic-zig receive-side bottleneck. Two-client upload completed, but throughput collapsed to 0.338-1.035 Gbps with 16-51s samples. A traced repro still only reached 1.382920 Gbps, and perf showed about 92% of server cycles in quic.stream.FrameSorter.push.

FrameSorter.push scanned the full chunk map on every incoming STREAM frame even when new data appended at or beyond the highest buffered offset. Add a sequential-append fast path that skips overlap scanning when overlap is impossible, preserving the existing scan for out-of-order and overlap cases.

Verification:

- zig build test -Dtarget=x86_64-linux-musl

- quiczigperf upload/syscall t2: 40 measured samples passed, p50 9.791096 Gbps
@endel endel merged commit 5d9c626 into endel:main May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants