Skip to content

perf: v0.12.21 — tagged start states, zero-alloc API#155

Merged
kolkov merged 1 commit intomainfrom
release/v0.12.21
Mar 27, 2026
Merged

perf: v0.12.21 — tagged start states, zero-alloc API#155
kolkov merged 1 commit intomainfrom
release/v0.12.21

Conversation

@kolkov
Copy link
Copy Markdown
Contributor

@kolkov kolkov commented Mar 27, 2026

Summary

Tagged start states (Rust LazyStateID), zero-alloc public API, DFA correctness fixes.
17 files, +749 -361 lines.

DFA Engine

  • Tagged start states — prefilter skip-ahead only at start state, eliminates O(n²)
  • DFA multiline $ fix — EndLine look-ahead re-computation
  • Dead-state prefilter restart in searchEarliestMatch (IsMatch path)
  • Tiny NFA → UseDFA routing (was UseNFA, 7x faster)

Allocation

  • 1100x fewer mallocs — flat buffer for FindAllIndex (203K → 182 per iter)
  • Local SearchState cache — atomic.Pointer, survives GC
  • Pool round-trip elimination in FindAll/Count

New Public API (zero-alloc)

  • AllIndex / AllStringIndexiter.Seq[[2]int] iterator (Go proposal #61902)
  • All / AllString — match content iterator
  • AppendAllIndex / AppendAllStringIndex — buffer-reuse (strconv.Append* pattern)

Benchmarks (LangArena, i7-1255U, same data as Rust)

Method errors (33K matches) Alloc vs Rust
FindAllStringIndex 8.2ms / 3890 KB 19 mallocs 2.6x
AllIndex 5.9ms / 0 KB 0 mallocs 1.7x
AppendAllIndex 5.5ms / 0 KB 0 mallocs 1.7x

emails: AppendAllIndex 2.0ms vs Rust 2.6ms — faster than Rust!

Verification

  • go test ./... — all 9 packages pass
  • gofmt -l / golangci-lint — clean
  • DFA correctness: 22 patterns verified vs Rust regex-automata find_fwd
  • No regression on wins vs Rust (ip 18.5x, multiline_php 2.0x, char_class 1.3x)
  • regex-bench CI (pending)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 54.81728% with 136 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
dfa/lazy/lazy.go 53.39% 44 Missing and 4 partials ⚠️
regex.go 35.08% 37 Missing ⚠️
meta/find_indices.go 57.33% 23 Missing and 9 partials ⚠️
meta/strategy.go 47.36% 6 Missing and 4 partials ⚠️
meta/findall.go 72.22% 2 Missing and 3 partials ⚠️
dfa/lazy/builder.go 69.23% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

Benchmark Comparison

Comparing main → PR #155

Summary: geomean 80.59n 80.47n -0.15%

⚠️ Potential regressions detected:

LazyDFAAlternation-4       46.96n ± ∞ ¹   49.48n ± ∞ ¹  +5.37% (p=0.008 n=5)
geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
LangArenaLogParser/bots-4                               1.735m ± ∞ ¹    2.557m ± ∞ ¹   +47.36% (p=0.008 n=5)
LangArenaLogParser/suspicious-4                         952.8µ ± ∞ ¹   1027.5µ ± ∞ ¹    +7.85% (p=0.008 n=5)
LangArenaLogParser/post_requests-4                      783.3µ ± ∞ ¹    918.3µ ± ∞ ¹   +17.24% (p=0.008 n=5)
LangArenaLogParser/auth_attempts-4                      777.4µ ± ∞ ¹    805.2µ ± ∞ ¹    +3.58% (p=0.008 n=5)
LangArenaLogParser/emails-4                             395.2µ ± ∞ ¹    405.8µ ± ∞ ¹    +2.67% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

DFA Engine:
- Tagged start states (Rust LazyStateID) — prefilter only at start state, no O(n²)
- DFA multiline $ EndLine look-ahead fix (Rust determinize mod.rs:131-212)
- Dead-state prefilter restart in searchEarliestMatch
- isMatchWithPrefilter pfSkip off-by-one fix
- Tiny NFA (< 20 states) → UseDFA routing (was UseNFA/UseBoth)

Allocation:
- Flat buffer FindAllIndex: 1100x fewer mallocs (203K → 182 per iter)
- Local SearchState cache: atomic.Pointer, survives GC
- Redundant pool round-trips eliminated in FindAll/Count

New Public API (zero-alloc, Go stdlib conventions):
- AllIndex/AllStringIndex: iter.Seq[[2]int] iterator (Go proposal #61902)
- All/AllString: iter.Seq match content iterator
- AppendAllIndex/AppendAllStringIndex: buffer-reuse (strconv.Append* pattern)

Benchmarks (LangArena, 7.2 MB, 13 patterns):
- Total: 163ms → 107ms (-34%)
- errors: 23ms → 5.5ms with AllIndex (-76%)
- vs Rust gap: 3.9x → 1.7x with AllIndex (-56%)
- emails AppendAllIndex: 2.0ms vs Rust 2.6ms (faster than Rust!)
- Mallocs: 203K → 182 per iter (-99.9%)
- AllIndex/AppendAllIndex: 0 KB / 0 mallocs (same as Rust find_iter)
@kolkov kolkov force-pushed the release/v0.12.21 branch from 423e71f to e147568 Compare March 27, 2026 16:44
@kolkov kolkov merged commit 87d600b into main Mar 27, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant