Fix x86 bf16 GEMM packing on i386#6708
Conversation
Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths. Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.
|
|
|
@codex review |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6708 +/- ##
==========================================
- Coverage 93.59% 93.57% -0.02%
==========================================
Files 933 933
Lines 299557 299807 +250
==========================================
+ Hits 280360 280558 +198
- Misses 19197 19249 +52 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
This PR addresses an intermittent correctness issue in x86 32-bit (i386) bf16 GEMM on AVX512BF16-capable hosts by fixing the B-tile packing order to match how dpbf16 consumes k/k+1 pairs. It also adjusts MultiHeadAttention int8 test input generation to avoid values near rounding boundaries (consistent with existing GEMM int8 test randomization), improving test stability while keeping randomness reproducible.
Changes:
- Fix i386 AVX512BF16 B-tile packing to interleave k/k+1 pairs (instead of adjacent columns) for dpbf16 consumption.
- Update MultiHeadAttention int8 tests to use dynamic-quantization-safe randomized inputs (avoiding ~0.5 rounding boundaries).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/test_multiheadattention_1.cpp | Adds dynamic-quantization-safe randomization helper and uses it for int8 MultiHeadAttention test inputs. |
| src/layer/x86/gemm_bf16s.h | Corrects i386 AVX512BF16 pack_B_tile_bf16 interleave order so dpbf16 reads k/k+1 pairs correctly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths.
Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.