Fix x86 bf16 GEMM packing on i386 by nihui · Pull Request #6708 · Tencent/ncnn

nihui · 2026-05-09T03:33:47Z

Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths.

Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.

Correct the i386 AVX512BF16 B-tile packing order so dpbf16 consumes k/k+1 pairs instead of adjacent columns. This fixes intermittent x86 32-bit bf16s GEMM failures on AVX512BF16 hosts without disabling AVX512 paths. Also make the MultiHeadAttention int8 test inputs follow the same dynamic-quantization-safe randomization used by GEMM int8 tests, avoiding values near .5 rounding boundaries while keeping the tensors random and reproducible.

tencent-adm · 2026-05-09T03:34:04Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

nihui · 2026-05-09T03:35:59Z

@codex review

codecov-commenter · 2026-05-09T03:37:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.57%. Comparing base (8c18666) to head (347ba43).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6708      +/-   ##
==========================================
- Coverage   93.59%   93.57%   -0.02%     
==========================================
  Files         933      933              
  Lines      299557   299807     +250     
==========================================
+ Hits       280360   280558     +198     
- Misses      19197    19249      +52

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatgpt-codex-connector · 2026-05-09T03:38:58Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR addresses an intermittent correctness issue in x86 32-bit (i386) bf16 GEMM on AVX512BF16-capable hosts by fixing the B-tile packing order to match how dpbf16 consumes k/k+1 pairs. It also adjusts MultiHeadAttention int8 test input generation to avoid values near rounding boundaries (consistent with existing GEMM int8 test randomization), improving test stability while keeping randomness reproducible.

Changes:

Fix i386 AVX512BF16 B-tile packing to interleave k/k+1 pairs (instead of adjacent columns) for dpbf16 consumption.
Update MultiHeadAttention int8 tests to use dynamic-quantization-safe randomized inputs (avoiding ~0.5 rounding boundaries).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
tests/test_multiheadattention_1.cpp	Adds dynamic-quantization-safe randomization helper and uses it for int8 MultiHeadAttention test inputs.
src/layer/x86/gemm_bf16s.h	Corrects i386 AVX512BF16 `pack_B_tile_bf16` interleave order so dpbf16 reads k/k+1 pairs correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions Bot added test x86 labels May 9, 2026

nihui requested a review from Copilot May 9, 2026 03:35

Copilot started reviewing on behalf of nihui May 9, 2026 03:36 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

nihui merged commit 10cee2a into Tencent:master May 9, 2026
110 of 113 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix x86 bf16 GEMM packing on i386#6708

Fix x86 bf16 GEMM packing on i386#6708
nihui merged 1 commit intoTencent:masterfrom
nihui:x86-bf16s-fix

nihui commented May 9, 2026

Uh oh!

tencent-adm commented May 9, 2026

Uh oh!

nihui commented May 9, 2026

Uh oh!

codecov-commenter commented May 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nihui commented May 9, 2026

Uh oh!

tencent-adm commented May 9, 2026

Uh oh!

nihui commented May 9, 2026

Uh oh!

codecov-commenter commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector Bot commented May 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 9, 2026 •

edited

Loading