Merged
Conversation
|
@byeonguk-jeong could you also rebase this one against current develop? We will tackle each PR separately. Thanks |
Fix critical bugs found in the SuperVector SIMD abstraction layer.
SuperVector operator!() — x86 (SSE/AVX2/AVX512), ppc64el:
- Was XOR-ing with self (always returns Zeroes instead of bitwise
NOT).
- Note that some other operators depends on operator!().
SuperVector<16> Ones_vshl() — x86:
- Called vshr_128() instead of vshl_128().
Element-wise shift boundary and Unroller range — x86:
- vshl_32/vshr_32 on SuperVector<16>: zero-boundary was N==16,
instead of N>=32; Unroller range was <1,16> not <1,32>.
- vshl_64/vshr_64 on SuperVector<16>: same issue.
- vshl_64/vshr_64 on SuperVector<32>: same issue.
- vshr_64 on SuperVector<64>: same issue.
SuperVector<32> vshr_256_imm — x86:
- Was a copy-paste of vshl_256_imm.
SuperVector<64> vshl_256_imm / vshl_512_imm — x86:
- Were unimplemented stubs returning empty SuperVector.
SuperVector<64> vshr_256_imm — x86:
- Operated on v256[0] only with broken SuperVector<32> logic.
SuperVector<64> vsh{l,r}_* — x86, ppc64el, arm:
- Were incorrectly delegating to vshl_128/vshr_128. (x86)
- Did not have boundary checks. PPC wraps when it tries to shift
more than bit length. (ppc64el)
- Had signed rshifts. (arm)
comparison operators - arm:
- operator>=: used vcgeq_u8 (unsigned) instead of vcgeq_s8 (signed).
- operator<=: used vcgeq_s8 (>=) instead of vcleq_s8 (<=).
Fixes: 1af82e3
Fixes: f0e6b84
Fixes: 2f55e5b
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
These tests covers NOT operator and many of element-wise shifts, especially for AVX2, AVX512. Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
The match result vector (c_lo & c_hi) was compared using operator> against Zeroes to detect non-zero (matching) bytes. On ARM, operator> delegates to signed comparison (vcgtq_s8), which treats byte values with the high bit set (0x80–0xFF) as negative, making them compare as less than zero and falsely reporting no match. Fixes: 92e0b9a ("simplify shufti and provide arch-specific block functions") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
2cde1df to
4b0fd5b
Compare
Author
Done. Thanks. |
markos
approved these changes
Apr 3, 2026
markos
left a comment
There was a problem hiding this comment.
Some good catches, and some things that were left unimplemented for too long! Thanks!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix critical bugs found in the SuperVector SIMD abstraction layer.
SuperVector operator!() — x86 (SSE/AVX2/AVX512), ppc64el:
SuperVector<16> Ones_vshl() — x86:
Element-wise shift boundary and Unroller range — x86:
instead of N>=32; Unroller range was <1,16> not <1,32>.
SuperVector<32> vshr_256_imm — x86:
SuperVector<64> vshl_256_imm / vshl_512_imm — x86:
SuperVector<64> vshr_256_imm — x86:
SuperVector<64> vsh{l,r}_* — x86, ppc64el, arm:
more than bit length. (ppc64el)
comparison operators - arm:
ARM shufti blockSingleMask:
the broken signed comparison and missed matches with high-bit-set bytes.
Also adds comprehensive unit tests covering all fixed bugs.
@AhnLab-OSS @AhnLab-OSSG