Describe the bug
Title: hf_xet downloads stall due to massive out-of-order packet delivery from Xet CDN
Environment
OS: Fedora 43 KDE Plasma
Python: 3.12.10
huggingface_hub: 1.9.0
hf_xet: 1.4.3
Hello. I was struggling with broken downloads so asked Sonnet and Gemini to do some digging and this is what they found. Thanks.
Behaviour
Xet downloads start at 80-400 MB/s (local buffer reads), then stall completely after 16-200 MB transferred. Plain HTTP (HF_HUB_DISABLE_XET=1) downloads at stable line speed throughout.
Diagnosis
ss -tinp captured during a stall shows:
rcv_ooopack in the thousands on every Xet connection (e.g. 11,671 / 17,548 / 10,697)
TCP congestion window (cwnd) stuck at 10 — never expands
One connection with rtt:793ms, jitter 598ms, rto:3185 — gone into timeout
All connections to AWS endpoints (18.244.140.x, 50.17.153.x)
TCP is spending all resources reordering packets. Congestion window never opens. Eventually one connection times out and the transfer stalls permanently.
Workaround
HF_HUB_DISABLE_XET=1 hf download
Suspected cause
Xet CDN delivering chunks heavily out of order at scale, causing TCP congestion collapse.
Reproduction
No code required — this is a network/protocol issue reproducible with the standard CLI:
Ensure hf_xet is active
hf env | grep -i xet
Start download of any large Xet-enabled repo (example used in diagnosis):
hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
In a second terminal, monitor TCP connections while download is active:
watch -n 1 'ss -tinp | grep hf'
What to observe:
Download progress stalls after 16-200 MB transferred
ss output shows rcv_ooopack climbing into thousands on all connections
cwnd remains stuck at 10 throughout
Eventually one connection shows rtt in the hundreds of ms with high jitter
Confirm Xet is active by checking the download shows an initial burst speed
(80-400 MB/s reported) before stalling — this distinguishes Xet buffer reads
from plain HTTP which delivers steady line speed throughout.
Confirm issue is Xet-specific:
Stalls:
hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
Works fine at line speed:
HF_HUB_DISABLE_XET=1 hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
The smoking gun appears to be:
rtt:793.869/597.661 rto:3185
One connection to 18.244.140.27 has an RTT of 794ms with huge variance (598ms jitter). That's catastrophic. Normal connections to the same host are showing 105ms. That one connection has gone sick and is dragging everything down.
Also notable:
rcv_ooopack (out-of-order packets) is enormous on every connection — thousands of them. 18.244.140.27 fd=11 shows 11,671 out-of-order packets received.
cwnd is stuck at 10 on every connection — TCP congestion window never grows, meaning TCP thinks the network is congested and won't open up.
Conclusion: The Xet CDN is sending chunks out of order at scale, TCP is spending all its time reordering, the congestion window never expands, and one connection eventually goes into timeout hell dragging the whole transfer to a halt.
This is a Xet/CDN-side problem, not your system. Your TCP stack is behaving correctly — it's the server hammering you with out-of-order packets.
Way above my pay grade. I hope this helps. Thanks.
Reproduction
No response
Logs
System info
Traceback (most recent call last):
File "/usr/bin/huggingface-cli", line 5, in <module>
from huggingface_hub.commands.huggingface_cli import main
ModuleNotFoundError: No module named 'huggingface_hub'
Describe the bug
Title: hf_xet downloads stall due to massive out-of-order packet delivery from Xet CDN
Environment
Hello. I was struggling with broken downloads so asked Sonnet and Gemini to do some digging and this is what they found. Thanks.
Behaviour
Xet downloads start at 80-400 MB/s (local buffer reads), then stall completely after 16-200 MB transferred. Plain HTTP (HF_HUB_DISABLE_XET=1) downloads at stable line speed throughout.
Diagnosis
ss -tinp captured during a stall shows:
TCP is spending all resources reordering packets. Congestion window never opens. Eventually one connection times out and the transfer stalls permanently.
Workaround
HF_HUB_DISABLE_XET=1 hf download
Suspected cause
Xet CDN delivering chunks heavily out of order at scale, causing TCP congestion collapse.
Reproduction
No code required — this is a network/protocol issue reproducible with the standard CLI:
Ensure hf_xet is active
hf env | grep -i xet
Start download of any large Xet-enabled repo (example used in diagnosis):
hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
In a second terminal, monitor TCP connections while download is active:
watch -n 1 'ss -tinp | grep hf'
What to observe:
Confirm Xet is active by checking the download shows an initial burst speed
(80-400 MB/s reported) before stalling — this distinguishes Xet buffer reads
from plain HTTP which delivers steady line speed throughout.
Confirm issue is Xet-specific:
Stalls:
hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
Works fine at line speed:
HF_HUB_DISABLE_XET=1 hf download unsloth/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-MXFP4_MOE.gguf --local-dir ~/Downloads
The smoking gun appears to be:
rtt:793.869/597.661 rto:3185
One connection to 18.244.140.27 has an RTT of 794ms with huge variance (598ms jitter). That's catastrophic. Normal connections to the same host are showing 105ms. That one connection has gone sick and is dragging everything down.
Also notable:
rcv_ooopack (out-of-order packets) is enormous on every connection — thousands of them. 18.244.140.27 fd=11 shows 11,671 out-of-order packets received.
cwnd is stuck at 10 on every connection — TCP congestion window never grows, meaning TCP thinks the network is congested and won't open up.
Conclusion: The Xet CDN is sending chunks out of order at scale, TCP is spending all its time reordering, the congestion window never expands, and one connection eventually goes into timeout hell dragging the whole transfer to a halt.
This is a Xet/CDN-side problem, not your system. Your TCP stack is behaving correctly — it's the server hammering you with out-of-order packets.
Way above my pay grade. I hope this helps. Thanks.
Reproduction
No response
Logs
System info