conv1d gemm with cm#6587
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6587 +/- ##
==========================================
- Coverage 93.59% 93.52% -0.07%
==========================================
Files 933 933
Lines 299557 299971 +414
==========================================
+ Hits 280360 280554 +194
- Misses 19197 19417 +220 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@nihui run ci please ^_^ |
|
|
|
infact, a lot of advice is unnecessary even error |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92eb255d10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (bias_term == 1) | ||
| { | ||
| sum0 = buffer_ld4(bias_data, gy2.x); | ||
| sum2 = buffer_ld4(bias_data, gy2.y); |
There was a problem hiding this comment.
Guard second output-pack reads in 1x1 shader
When gy points to the last output-pack row (gy + 1 >= psc(outh)), this path still reads bias_data[gy2.y] and later weight_data via wbase1 for a non-existent second pack. That causes out-of-bounds buffer reads for models where ceil(num_output / 4) is odd (for example num_output=4 or 12), which can produce undefined results or GPU faults.
Useful? React with 👍 / 👎.
| #if !NCNN_shader_local_memory | ||
| if (gx >= outsize || gy >= psc(outh)) | ||
| return; | ||
| #endif |
There was a problem hiding this comment.
Add bounds guard before local-memory GEMM loads
In the NCNN_shader_local_memory build, the early return is disabled, but the shader still immediately reads bias_data[gy] and then input/weight tiles before any bounds check. Because dispatch uses ceil(grouping), extra invocations with gy >= psc(outh) or gx >= outsize are expected whenever output sizes are not multiples of the local size, so these threads perform out-of-bounds reads.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds new Vulkan Convolution1D fast paths based on packed GEMM and cooperative-matrix GEMM, and wires in a new perf benchmark to measure Convolution1D performance.
Changes:
- Added a Convolution1D perf benchmark and registered it in the perf test CMake.
- Implemented/added Vulkan compute shaders for Convolution1D packed GEMM and packed 1x1, plus cooperative-matrix (CM) GEMM and CM 1x1 variants.
- Updated
Convolution1D_vulkanpipeline creation, weight packing, and forward dispatch logic to select between standard, GEMM, 1x1, and CM variants based on runtime options/device support.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/perf/perf_convolution1d.cpp | New perf driver that benchmarks a set of Convolution1D shapes/params via perf_layer. |
| tests/perf/CMakeLists.txt | Registers the Convolution1D perf executable behind WITH_LAYER_convolution1d. |
| src/layer/vulkan/convolution1d_vulkan.cpp | Adds GEMM + CM path selection, weight packing for the new layouts, and dispatch logic for new pipelines. |
| src/layer/vulkan/convolution1d_vulkan.h | Adds new pipeline members and cooperative-matrix tuning/state fields used by the Vulkan implementation. |
| src/layer/vulkan/shader/convolution1d_packed_gemm.comp | New packed GEMM kernel for general Conv1D (non-1x1), including optional local-memory path. |
| src/layer/vulkan/shader/convolution1d_packed_1x1s1d1.comp | New packed 1x1s1d1 kernel specialized for Conv1D pointwise cases. |
| src/layer/vulkan/shader/convolution1d_gemm_cm.comp | New cooperative-matrix GEMM kernel for Conv1D. |
| src/layer/vulkan/shader/convolution1d_1x1s1d1_cm.comp | New cooperative-matrix 1x1s1d1 kernel for Conv1D pointwise cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e1fedca83
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| v0 = buffer_ld4(bottom_blob_data_4, gx2.x + y * psc(w)); | ||
| v1 = buffer_ld4(bottom_blob_data_4, gx2.y + y * psc(w)); |
There was a problem hiding this comment.
Guard second x-lane loads in 1x1 packed shader
When outw is odd, the last invocation runs with gx == outw - 1, so gx2.y becomes outw. This branch still unconditionally reads bottom_blob_data_4 at gx2.y (and the pack1 branch does the same pattern), which reads one element past the valid width before the later guarded stores. This can trigger out-of-bounds reads on tail tiles (for example kernel=1,stride=1,dilation=1 where outw == w).
Useful? React with 👍 / 👎.
| v1 = buffer_ld4(bottom_blob_data_4, v_offset.g); | ||
| v2 = buffer_ld4(bottom_blob_data_4, v_offset.b); | ||
| v3 = buffer_ld4(bottom_blob_data_4, v_offset.a); |
There was a problem hiding this comment.
Avoid tail-lane input reads beyond output width in GEMM shader
This kernel computes four x positions per invocation, but only checks gx < outsize before entering the loop. On the tail tile (outw % 4 != 0), v_offset.g/b/a correspond to invalid output positions and are still read unconditionally, so input reads can go out of bounds before store-side guards run. A concrete case is w=5,kernel_w=3,stride=1,dilation=1 (outw=3), where the lane for gx+3 dereferences past the row end.
Useful? React with 👍 / 👎.
|
Thanks for your contribution ! |
No description provided.