Skip to content

Fix x86 bf16 GEMM packing on i386#6708

Merged
nihui merged 1 commit intoTencent:masterfrom
nihui:x86-bf16s-fix
May 9, 2026
Merged

Fix x86 bf16 GEMM packing on i386#6708
nihui merged 1 commit intoTencent:masterfrom
nihui:x86-bf16s-fix

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented May 9, 2026

Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths.

Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.

Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths.

Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nihui nihui requested a review from Copilot May 9, 2026 03:35
@nihui
Copy link
Copy Markdown
Member Author

nihui commented May 9, 2026

@codex review

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.57%. Comparing base (8c18666) to head (347ba43).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6708      +/-   ##
==========================================
- Coverage   93.59%   93.57%   -0.02%     
==========================================
  Files         933      933              
  Lines      299557   299807     +250     
==========================================
+ Hits       280360   280558     +198     
- Misses      19197    19249      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an intermittent correctness issue in x86 32-bit (i386) bf16 GEMM on AVX512BF16-capable hosts by fixing the B-tile packing order to match how dpbf16 consumes k/k+1 pairs. It also adjusts MultiHeadAttention int8 test input generation to avoid values near rounding boundaries (consistent with existing GEMM int8 test randomization), improving test stability while keeping randomness reproducible.

Changes:

  • Fix i386 AVX512BF16 B-tile packing to interleave k/k+1 pairs (instead of adjacent columns) for dpbf16 consumption.
  • Update MultiHeadAttention int8 tests to use dynamic-quantization-safe randomized inputs (avoiding ~0.5 rounding boundaries).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/test_multiheadattention_1.cpp Adds dynamic-quantization-safe randomization helper and uses it for int8 MultiHeadAttention test inputs.
src/layer/x86/gemm_bf16s.h Corrects i386 AVX512BF16 pack_B_tile_bf16 interleave order so dpbf16 reads k/k+1 pairs correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nihui nihui merged commit 10cee2a into Tencent:master May 9, 2026
110 of 113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants