Add Optimized and HOL Light verified AVX2 Keccak x4#3020
Add Optimized and HOL Light verified AVX2 Keccak x4#3020manastasova merged 11 commits intoaws:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3020 +/- ##
==========================================
- Coverage 78.22% 78.21% -0.01%
==========================================
Files 689 689
Lines 122073 122089 +16
Branches 17030 17031 +1
==========================================
+ Hits 95491 95498 +7
- Misses 25677 25685 +8
- Partials 905 906 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9430212 to
ee4fc2f
Compare
|
@manastasova is this ready for review or waiting on the s2n-bignum merge? |
ee4fc2f to
f99d8a1
Compare
| generated_fips_shared_support.c | ||
| ${AWSLC_SOURCE_DIR}/crypto/fipsmodule/cpucap/cpucap.c | ||
| ) | ||
| target_compile_definitions(generated_fipsmodule PRIVATE BORINGSSL_IMPLEMENTATION) |
There was a problem hiding this comment.
Just fixed it. It was a leftover from an earlier attempt to fix the ARM Windows FIPS build, as in https://github.com/aws/aws-lc/pull/3013/changes#diff-c2f6f9fb79082c57d43c2890110e27c6399436eccebdc5b2d07a252d55a8b873R678. Thanks!
|
why are we importing mlkem and mldsa implementations? |
|
The s2n-bignum import.sh imports all s2n-bignum files into aws-lc. I can manually remove the mlkem and mldsa files, however, there are still other files and repos that are imported but still not used including generic/, secp256k1/, sm2/ and others (even the tutorial repos https://github.com/aws/aws-lc/tree/main/third_party/s2n-bignum/s2n-bignum-imported/arm/tutorial). |
…512AVX flag condition
Issues:
Import AVX2 Optimized and HOL Light verified 4x Keccak permutation awslabs/s2n-bignum#354
NOTE:: Once awslabs/s2n-bignum#354 is merged, the assembly files would be imported directly with the importer script.
Description of changes:
This PR introduces an optimized AVX2 implementation of the Keccak-f[1600] x4 permutation, formally verified using HOL Light. This batched Keccak implementation processes four independent Keccak permutations in parallel using AVX2 SIMD instructions, significantly accelerating the core hash operations underlying ML-KEM (FIPS 203) and ML-DSA (FIPS 204).
The 4-way parallel Keccak permutation is a critical building block for lattice-based cryptographic schemes, as it is heavily used in:
Performance Results
The optimization delivers substantial throughput improvements across all tested EC2 instance types:
Average Speedups by Algorithm Family:
Notable highlights:
Call-outs:
Testing:
./crypto/crypto_testtool/bssl speed -filter "ML-KEM"tool/bssl speed -filter "MLDSA"More Performance Data
EC2 c7i
EC2 c7a
EC2 c6i
EC2 c6a
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.