fix: zero-initialize KV cache to prevent NaN in fallback attention by Gunther-Schulz · Pull Request #1628 · deepbeepmeep/Wan2GP

Gunther-Schulz · 2026-03-21T19:31:07Z

🤖 Generated with Claude Code

Summary

torch.empty() leaves uninitialized GPU memory in KV cache slots that are never written (e.g. the tail of the last block when a sequence doesn't fill it completely)
_flash_attention_fallback_decode reads all allocated blocks and masks invalid positions with -inf in the attention bias
Bug: NaN + (-inf) = NaN, not -inf — so when uninitialized slots contain NaN/Inf, the masking fails and NaN propagates through softmax, corrupting the entire attention output
Switching to torch.zeros() ensures unwritten slots contain 0: 0 + (-inf) = -inf → softmax weight = 0 → no contamination

This fixes garbage output (e.g. "A!!!...") from Qwen TI mode image captioning via the vLLM embedded prefill/decode path. The bug was non-deterministic: it only manifested when torch.empty() happened to place NaN/Inf in the uninitialized tail of a specific layer's cache (in practice, always model layer 23 / attention layer_idx 5 in the 9B hybrid model).

Test plan

Run prompt enhancer in TI (text + image) mode with an image input and verify captions are coherent instead of garbage
Verify T-only mode prompt enhancement still works normally

🤖 Generated with Claude Code

torch.empty() leaves uninitialized GPU memory in cache slots that are never written during a sequence (e.g. the tail of the last block). _flash_attention_fallback_decode reads all allocated blocks and masks out invalid positions with -inf, but NaN + (-inf) = NaN rather than -inf, causing NaN to propagate through softmax and corrupt the output. Switching to torch.zeros() ensures unwritten slots contain 0, so the masking works correctly (0 + -inf = -inf → softmax weight = 0). This fixes garbage output ("A!!!...") from Qwen TI mode image captioning via the vLLM embedded prefill path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: zero-initialize KV cache to prevent NaN in fallback attention#1628

fix: zero-initialize KV cache to prevent NaN in fallback attention#1628
Gunther-Schulz wants to merge 1 commit intodeepbeepmeep:mainfrom
Gunther-Schulz:fix/kvcache-nan-uninitialized

Gunther-Schulz commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gunther-Schulz commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gunther-Schulz commented Mar 21, 2026 •

edited

Loading