Skip to content

fix: FP16 accumulation overflow on sub-Ampere CUDA (Volta/Turing)

fa33a09
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Merged

vae: drop BF16 activation casts, decode in F32 throughout #7

fix: FP16 accumulation overflow on sub-Ampere CUDA (Volta/Turing)
fa33a09
Select commit
Loading
Failed to load commit list.
build (macos-latest)
succeeded Mar 2, 2026 in 45s