Performance optimizations and fix Poseidon's compact mode#477
Open
srinathsetty wants to merge 6 commits intomainfrom
Open
Performance optimizations and fix Poseidon's compact mode#477srinathsetty wants to merge 6 commits intomainfrom
srinathsetty wants to merge 6 commits intomainfrom
Conversation
- Classify R1CS matrix coefficients at setup into zero/one/minus-one/other buckets for branch-free SpMV hot loops - Add multiply_vec_pair_into for buffer-reuse dual-vector SpMV - Fused dual-vector sparse matrix multiply and multi_evaluate - Precomputed SpMV in multiply_vec and commit_T with clone elimination - Sequential path for small sparse matrix multiply (<4097 rows) - Sequential multiply_vec, commit_T, commit_T_relaxed for small circuits
- Add sequential MSM path for small inputs (<4096 points) - Add parallelism threshold to bind_poly_var_top for small arrays - Raise sequential thresholds for MSM, SpMV, and fold to avoid nested rayon contention when many small operations run concurrently - Add WitnessCS::clear() and with_capacity() for reusing constraint system buffers across IVC steps - Remove unused msm_large_xyzz function
- Replace decimal string conversion in nat_to_f with direct byte-based conversion using to_repr/from_repr - Optimize fits_in_bits and decompose with single to_repr call - Add nat_to_limbs fast path for direct byte access - Remove unused BitAccess trait
In compact mode, ensure_allocated was a no-op for Elt::Allocated but allocated a new variable for Elt::Num. Whether an element is Num vs Allocated depends on the MDS matrix multiplication results, which vary by input data. This caused different variable counts for different inputs, breaking IVC schemes that require a fixed R1CS shape. Fix: always allocate a fresh variable and enforce equality, regardless of the Elt variant. This guarantees consistent R1CS structure.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces several performance-oriented optimizations across Spartan/R1CS evaluation, MSM, and nonnative gadget utilities, largely by reducing Rayon overhead for small workloads and by adding a precomputed sparse-matrix SpMV accelerator intended to reduce field multiplication cost during R1CS matrix-vector products.
Changes:
- Add
PrecomputedSparseMatrixand integrate lazy/eager precomputation intoR1CSShape, plus new dual-vector SpMV APIs. - Optimize multilinear polynomial multi-evaluation by fusing reductions, and add sequential fallbacks for small sizes to avoid Rayon overhead.
- Reduce repeated
to_repr()calls in nonnative bit decomposition and add witness CS capacity/clearing helpers; adjust PoseidonElt::ensure_allocatedto stabilize R1CS variable counts.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/spartan/polys/multilinear.rs | Adds small-size sequential path and fuses multi-evaluation reductions for cache efficiency. |
| src/r1cs/sparse.rs | Introduces PrecomputedSparseMatrix SpMV accelerator and adds dual-vector SpMV helpers + sequential thresholds. |
| src/r1cs/mod.rs | Exposes PrecomputedSparseMatrix, adds lazy/eager precompute storage in R1CSShape, and new pairwise multiply APIs. |
| src/provider/msm.rs | Adds sequential MSM path for medium-sized inputs to reduce Rayon contention. |
| src/gadgets/nonnative/util.rs | Optimizes bit extraction by caching to_repr() bytes; changes nat_to_f conversion implementation. |
| src/gadgets/nonnative/mod.rs | Removes the now-unused BitAccess trait. |
| src/gadgets/nonnative/bignat.rs | Adds a byte-based fast path for limb extraction. |
| src/frontend/util_cs/witness_cs.rs | Adds with_capacity and clear to reuse witness allocations. |
| src/frontend/gadgets/poseidon/circuit2.rs | Changes ensure_allocated to always allocate a fresh variable for shape stability. |
| Cargo.toml | Bumps crate version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/r1cs/sparse.rs
Outdated
Comment on lines
+24
to
+27
| row_unit_pos: Vec<(u32, u16)>, | ||
| row_unit_neg: Vec<(u32, u16)>, | ||
| row_small: Vec<(u32, u16)>, | ||
| row_general: Vec<(u32, u16)>, |
Comment on lines
+69
to
+80
| if val == one { | ||
| unit_pos_cols.push(col as u32); | ||
| } else if val == neg_one { | ||
| unit_neg_cols.push(col as u32); | ||
| } else if let Some(k) = small_pos.iter().position(|&v| v == val) { | ||
| small_cols.push(col as u32); | ||
| small_coeffs.push((k as i8) + 2); | ||
| } else if let Some(k) = small_neg.iter().position(|&v| v == val) { | ||
| small_cols.push(col as u32); | ||
| small_coeffs.push(-((k as i8) + 2)); | ||
| } else { | ||
| general_cols.push(col as u32); |
src/r1cs/sparse.rs
Outdated
Comment on lines
+85
to
+88
| row_unit_pos.push((up_start, (unit_pos_cols.len() as u32 - up_start) as u16)); | ||
| row_unit_neg.push((un_start, (unit_neg_cols.len() as u32 - un_start) as u16)); | ||
| row_small.push((sm_start, (small_cols.len() as u32 - sm_start) as u16)); | ||
| row_general.push((g_start, (general_cols.len() as u32 - g_start) as u16)); |
Comment on lines
+340
to
+359
| use rayon::prelude::*; | ||
| // Split output into chunks that can be written independently | ||
| let results: Vec<(F, F)> = self | ||
| .indptr | ||
| .par_windows(2) | ||
| .map(|ptrs| { | ||
| let mut s1 = F::ZERO; | ||
| let mut s2 = F::ZERO; | ||
| for (val, col_idx) in self.get_row_unchecked(ptrs.try_into().unwrap()) { | ||
| s1 += *val * v1[*col_idx]; | ||
| s2 += *val * v2[*col_idx]; | ||
| } | ||
| (s1, s2) | ||
| }) | ||
| .collect(); | ||
| for (row_idx, (s1, s2)) in results.into_iter().enumerate() { | ||
| out1[row_idx] = s1; | ||
| out2[row_idx] = s2; | ||
| } | ||
| } |
Comment on lines
+469
to
+484
| if z1.len() != self.num_io + self.num_vars + 1 | ||
| || z2.len() != self.num_io + self.num_vars + 1 | ||
| { | ||
| return Err(NovaError::InvalidWitnessLength); | ||
| } | ||
|
|
||
| rayon::join( | ||
| || self.A.multiply_vec_pair_into(z1, z2, az1, az2), | ||
| || { | ||
| rayon::join( | ||
| || self.B.multiply_vec_pair_into(z1, z2, bz1, bz2), | ||
| || self.C.multiply_vec_pair_into(z1, z2, cz1, cz2), | ||
| ) | ||
| }, | ||
| ); | ||
|
|
Comment on lines
+105
to
+120
| // Pre-compute all bit values from the field element's byte representation | ||
| // to avoid calling to_repr() per bit (which does Montgomery reduction each time). | ||
| let bit_values: Option<Vec<bool>> = v.map(|val| { | ||
| let repr = val.to_repr(); | ||
| let bytes = repr.as_ref(); | ||
| (0..n_bits) | ||
| .map(|i| { | ||
| let (byte_pos, bit_pos) = (i / 8, i % 8); | ||
| if byte_pos < bytes.len() { | ||
| (bytes[byte_pos] >> bit_pos) & 1 == 1 | ||
| } else { | ||
| false | ||
| } | ||
| }) | ||
| .collect() | ||
| }); |
Comment on lines
+43
to
+52
| if nat.bits() as usize > n_limbs * limb_width { | ||
| return Err(SynthesisError::Unsatisfiable(format!( | ||
| "nat {nat} does not fit in {n_limbs} limbs of width {limb_width}" | ||
| ))); | ||
| } | ||
| let bytes_per_limb = limb_width / 8; | ||
| if limb_width % 8 == 0 && bytes_per_limb <= 8 { | ||
| // Fast path: extract limbs directly from bytes | ||
| let (_, bytes) = nat.to_bytes_le(); | ||
| Ok( |
src/r1cs/sparse.rs
Outdated
Comment on lines
+322
to
+326
| debug_assert_eq!(self.cols, v1.len()); | ||
| debug_assert_eq!(self.cols, v2.len()); | ||
| let n = self.indptr.len() - 1; | ||
| debug_assert!(out1.len() >= n); | ||
| debug_assert!(out2.len() >= n); |
Comment on lines
+40
to
+105
| impl<F: PrimeField> PrecomputedSparseMatrix<F> { | ||
| /// Build from a CSR SparseMatrix by classifying entries. | ||
| pub fn from_sparse(m: &SparseMatrix<F>) -> Self { | ||
| let num_rows = m.indptr.len() - 1; | ||
| let one = F::ONE; | ||
| let neg_one = -F::ONE; | ||
|
|
||
| // Precompute small positive/negative field elements for comparison | ||
| let small_pos: Vec<F> = (2u64..=7).map(F::from).collect(); | ||
| let small_neg: Vec<F> = (2u64..=7).map(|k| -F::from(k)).collect(); | ||
|
|
||
| let mut row_unit_pos = Vec::with_capacity(num_rows); | ||
| let mut row_unit_neg = Vec::with_capacity(num_rows); | ||
| let mut row_small = Vec::with_capacity(num_rows); | ||
| let mut row_general = Vec::with_capacity(num_rows); | ||
| let mut unit_pos_cols = Vec::new(); | ||
| let mut unit_neg_cols = Vec::new(); | ||
| let mut small_cols = Vec::new(); | ||
| let mut small_coeffs: Vec<i8> = Vec::new(); | ||
| let mut general_cols = Vec::new(); | ||
| let mut general_vals = Vec::new(); | ||
|
|
||
| for ptrs in m.indptr.windows(2) { | ||
| let up_start = unit_pos_cols.len() as u32; | ||
| let un_start = unit_neg_cols.len() as u32; | ||
| let sm_start = small_cols.len() as u32; | ||
| let g_start = general_cols.len() as u32; | ||
|
|
||
| for (&val, &col) in m.data[ptrs[0]..ptrs[1]].iter().zip(&m.indices[ptrs[0]..ptrs[1]]) { | ||
| if val == one { | ||
| unit_pos_cols.push(col as u32); | ||
| } else if val == neg_one { | ||
| unit_neg_cols.push(col as u32); | ||
| } else if let Some(k) = small_pos.iter().position(|&v| v == val) { | ||
| small_cols.push(col as u32); | ||
| small_coeffs.push((k as i8) + 2); | ||
| } else if let Some(k) = small_neg.iter().position(|&v| v == val) { | ||
| small_cols.push(col as u32); | ||
| small_coeffs.push(-((k as i8) + 2)); | ||
| } else { | ||
| general_cols.push(col as u32); | ||
| general_vals.push(val); | ||
| } | ||
| } | ||
|
|
||
| row_unit_pos.push((up_start, (unit_pos_cols.len() as u32 - up_start) as u16)); | ||
| row_unit_neg.push((un_start, (unit_neg_cols.len() as u32 - un_start) as u16)); | ||
| row_small.push((sm_start, (small_cols.len() as u32 - sm_start) as u16)); | ||
| row_general.push((g_start, (general_cols.len() as u32 - g_start) as u16)); | ||
| } | ||
|
|
||
|
|
||
| Self { | ||
| num_rows, | ||
| row_unit_pos, | ||
| row_unit_neg, | ||
| row_small, | ||
| row_general, | ||
| unit_pos_cols, | ||
| unit_neg_cols, | ||
| small_cols, | ||
| small_coeffs, | ||
| general_cols, | ||
| general_vals, | ||
| } | ||
| } |
Comment on lines
+586
to
+597
| // Build Z = Z1 + Z2 in one pass without allocating Z1/Z2 separately | ||
| let n_w = W1.W.len(); | ||
| let n_x = U1.X.len(); | ||
| let z_len = n_w + 1 + n_x; | ||
| let mut Z = Vec::with_capacity(z_len); | ||
| for i in 0..n_w { | ||
| Z.push(W1.W[i] + W2.W[i]); | ||
| } | ||
| Z.push(U1.u + E::Scalar::ONE); | ||
| for i in 0..n_x { | ||
| Z.push(U1.X[i] + U2.X[i]); | ||
| } |
- Restore par_iter with size threshold (>65536) in RelaxedR1CSWitness::fold and fold_relaxed to avoid regression on large witnesses (918K elements) - Widen PrecomputedSparseMatrix row span counts from u16 to u32 - Upgrade debug_assert to assert in multiply_vec_pair_into - Eliminate intermediate Vec allocation in parallel multiply_vec_pair_into by writing directly into output slices via par_iter_mut - Add PrecomputedSparseMatrix::multiply_vec_pair_into for _into API - Route multiply_vec and multiply_vec_pair_into through precomputed tables lazily (get_or_init) instead of requiring explicit precompute_sparse_matrices - Add length validation in commit_T for mismatched W1/W2 and U1/U2 - Reject negative BigInt in nat_to_limbs fast path - Add n_bits bounds check in fits_in_bits against Scalar::NUM_BITS - Use flat buffer in multi_evaluate_with to avoid per-row Vec allocations - Document ensure_allocated requires enforce=true for soundness
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.