Skip to content

large int factorization#173

Draft
s-celles wants to merge 9 commits intoJuliaMath:mainfrom
s-celles:002-large-int-factorization
Draft

large int factorization#173
s-celles wants to merge 9 commits intoJuliaMath:mainfrom
s-celles:002-large-int-factorization

Conversation

@s-celles
Copy link
Copy Markdown

@s-celles s-celles commented Mar 11, 2026

Adds efficient large integer factorization via a polyalgorithm combining:

  • Perfect power detection — checks if n = k^d before expensive methods
  • ECM (Elliptic Curve Method) — Montgomery curves with Suyama parametrization and batched GCD; effective when one factor is much smaller
  • MPQS (Multiple Polynomial Quadratic Sieve) — Self-Initializing QS (SIQS) with Gray code polynomial switching;
    handles balanced semiprimes

Performance optimizations

  • In-place GMP arithmetic to minimize BigInt allocations
  • unsafe_store!/unsafe_load in sieve inner loops to bypass bounds checking
  • Interleaved two-root sieve writes for memory-level parallelism
  • Double Large Prime (DLP) variation with Pollard rho splitting for composite remainders
  • Factored-form a mod p computation avoiding GMP calls in SIQS polynomial setup
  • Guided trial factoring using sieve positions to skip non-dividing primes

Benchmarks (Apple Silicon, Julia 1.12)

  • 30-digit semiprime: ~0.01s
  • 50-digit semiprime: ~0.5s
  • 60-digit balanced semiprime: ~10s

Closes #159

Test plan

  • Existing test suite passes
  • New tests for perfect power check, ECM, MPQS, polyalgorithm dispatch
  • 60-digit balanced semiprime factorization test (the Factorization of "large" numbers #159 target)
  • Review parameter table tuning for digit ranges 30–76

Tools being used: Github Spec Kit + Claude Opus 4.6
Methodology: Spec Driven Development with AI assistance

…ation

Add efficient large integer factorization using a polyalgorithm that
combines perfect power detection, ECM (Elliptic Curve Method), and
MPQS (Multiple Polynomial Quadratic Sieve with Self-Initialization).

Key features:
- ECM with Montgomery curves, Suyama parametrization, and batched GCD
- SIQS with Gray code polynomial switching and incremental root updates
- Double Large Prime variation with Pollard rho splitting
- In-place GMP arithmetic to minimize BigInt allocations
- Allocation-free sieve using unsafe_store!/unsafe_load
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 86.63172% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.54%. Comparing base (20a92a0) to head (f86b0b6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/mpqs.jl 84.57% 97 Missing ⚠️
src/ecm.jl 96.39% 4 Missing ⚠️
src/Primes.jl 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #173      +/-   ##
==========================================
- Coverage   93.08%   89.54%   -3.55%     
==========================================
  Files           2        4       +2     
  Lines         463     1224     +761     
==========================================
+ Hits          431     1096     +665     
- Misses         32      128      +96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oscardssmith
Copy link
Copy Markdown
Member

Can you separate the ECM from the quadratic sieve? The ECM code is a lot smaller and looks like it's in better shape, so I would like to review/merge that first.

@s-celles s-celles marked this pull request as draft March 11, 2026 13:52
Comment on lines +8 to +9
"""
struct MontgomeryCurvePoint
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this is GMP only is somewhat unfortunate. Ideally this code would work for BitIntegers.jl also... I'm willing to accept it though since BigInt is probably what most users are using in practice.

should_widen = T <: BigInt || widemul(n - 1, n - 1) ≤ typemax(n)
p = should_widen ? pollardfactor(n) : pollardfactor(widen(n))
# For large cofactors, use polyalgorithm dispatch (ECM → MPQS)
if n > big"100000000000000000000" # > 10^20
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ECM slower than pollard for smaller numbers? That seems unexpected. Also, can you delete the polyalgorithm.jl file and move that code into here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is likely related to the ECM impl being BigInt only.

@s-celles
Copy link
Copy Markdown
Author

I don't think I can do better for code coverage.

Here is what my AI agent is writing

⏺ The remaining uncovered lines in mpqs.jl are mostly:
  - Edge cases in trial factoring (L351-370) — the s1 == 0 brute force path                                                  
  - DLP chaining (L881-912) — deep nesting in double large prime combination                                                 
  - Error paths (L1131, L1146) — failure modes                                                                             
  - Rare branches (L694, L725, L1042, L1104) — sentinel values, CRT sign flip                                                
  - Extract factor fallbacks (L585-596) — x + y path in factor extraction                                                  
                                                                                                                           
  Many of these are inherently hard to cover deterministically (they depend on random polynomial selection hitting specific
  number-theoretic edge cases). The important uncovered code is the _extract_factor fallback path and the _gf2_eliminate → x+y path.

  The remaining ~47 uncovered lines in mpqs.jl are mostly stochastic paths (DLP chaining, rare CRT sign flips) and error
  paths that require specific number-theoretic conditions difficult to trigger deterministically.

Any opinion?

@oscardssmith
Copy link
Copy Markdown
Member

IMO the coverage is less important than code complexity. As such, I would prefer if this was split into separate PRs for ECM vs MPQS.

I would also like for these methods (at least ECM) to not force BigInt since for smaller numbers, Int128 or Int256 (from BitIntegers) can be a lot faster.

@s-celles
Copy link
Copy Markdown
Author

Ok I will try to tackle that tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Factorization of "large" numbers

2 participants