unroll 16 for convolution x86 bf16s avx512#6680
unroll 16 for convolution x86 bf16s avx512#6680nihui wants to merge 10 commits intoTencent:masterfrom
Conversation
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6680 +/- ##
==========================================
+ Coverage 93.92% 93.93% +0.01%
==========================================
Files 933 933
Lines 300879 303335 +2456
==========================================
+ Hits 282599 284942 +2343
- Misses 18280 18393 +113 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR updates x86 AVX512 convolution GEMM kernels to use a wider N-dimension unroll of 16 (vs 12), and adjusts tiling/packing paths accordingly to better match AVX512 register width and improve throughput in im2col+GEMM and Winograd codepaths.
Changes:
- Update AVX512F GEMM microkernels to compute/store 16 output columns per inner loop (new sum registers, updated pointer increments, transpose/store paths).
- Add AVX512F-specific jj+=16 packing/compute loops in several remaining kernels, keeping 12/8/4 fallbacks where applicable.
- Update optimal tile selection to prefer TILE_N = 16 (AVX512F) and TILE_N = 12 (AVX).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/layer/x86/convolution_im2col_gemm.h | Expands AVX512F im2col+GEMM N-tile unroll to 16 and updates tiling heuristics accordingly. |
| src/layer/x86/convolution_3x3_winograd.h | Updates Winograd BT packing and GEMM kernels to use 16-wide AVX512F N-tiles and adjusts related tiling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… into convolution-x86-bf16s-2
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
No description provided.