Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- ARM NEON SIMD support (Go 1.26 `simd/archsimd` intrinsics — [#120](https://github.com/coregx/coregex/issues/120))
- SIMD prefilter for CompositeSequenceDFA (#83)

## [0.12.20] - 2026-03-25

### Performance
- **Premultiplied State IDs** — StateID stores byte offset into flat transition table,
eliminating multiply from DFA hot loop. Single `flatTrans[sid+classIdx]` lookup.
Inspired by Rust `LazyStateID` (hybrid/id.rs).

- **Tagged State IDs** — match/dead/invalid/start flags encoded in StateID high bits.
Single `IsTagged()` branch replaces 3 separate comparisons in DFA hot loop.
4x loop unrolling breaks to slow path only on tagged states.

- **1-byte match delay** (Rust determinize approach) — match reporting delayed by 1 byte,
enabling correct look-around assertion resolution (^, $, \b) at match boundaries.
Reference: Rust `determinize` mod.rs:254-286.

- **Rust-aligned DFA determinize: break-at-match** — replaced `filterStatesAfterMatch`
with Rust's `determinize::next` break-at-match semantics (mod.rs:284). Epsilon closure
uses add-on-pop DFS with reverse Split push, matching Rust sparse set insertion order.
Incremental per-target epsilon closure preserves correct state ordering for leftmost-first.
**Eliminates Phase 3** anchored re-scan: bidirectional DFA reduced from 3-pass to 2-pass.
Verified against Rust regex-automata `find_fwd` — identical results on all test patterns.

- **Memmem: Memchr(rareByte) + verify** (Rust approach) — replaced `MemchrPair`-based
paired search in `simd.Memmem` with single rare byte Memchr scan + `bytes.Equal`
verify, matching Rust `memchr::memmem` architecture.

### Benchmarks (LangArena LogParser, 7.2 MB, 13 patterns)

| vs stdlib | vs Rust | Wins vs Rust |
|-----------|---------|-------------|
| **30x faster** total | 2-5x gap (local i7) | ip 18.5x, multiline_php 2.0x, char_class 1.3x |

## [0.12.19] - 2026-03-24

### Performance
Expand Down
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,19 +64,19 @@ Cross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/ko

| Pattern | Go stdlib | coregex | Rust regex | vs stdlib | vs Rust |
|---------|-----------|---------|------------|-----------|---------|
| Literal alternation | 475 ms | 4.4 ms | 0.7 ms | **109x** | 6.3x slower |
| Multi-literal | 1391 ms | 12.6 ms | 4.7 ms | **110x** | 2.6x slower |
| Inner `.*keyword.*` | 231 ms | 0.29 ms | 0.29 ms | **797x** | **~parity** |
| Suffix `.*\.txt` | 234 ms | 1.83 ms | 1.07 ms | **128x** | 1.7x slower |
| Multiline `(?m)^/.*\.php` | 103 ms | 0.66 ms | 0.66 ms | **156x** | **~parity** |
| Email validation | 261 ms | 0.54 ms | 0.31 ms | **482x** | 1.7x slower |
| URL extraction | 262 ms | 0.84 ms | 0.35 ms | **311x** | 2.4x slower |
| IP address | 498 ms | 2.1 ms | 12.0 ms | **237x** | **5.6x faster** |
| Char class `[\w]+` | 554 ms | 48.0 ms | 50.1 ms | **11x** | **1.0x faster** |
| Word repeat `(\w{2,8})+` | 641 ms | 185 ms | 48.7 ms | **3x** | 3.7x slower |
| Literal alternation | 466 ms | 4.2 ms | 0.65 ms | **110x** | 6.4x slower |
| Multi-literal | 1391 ms | 12.4 ms | 5.3 ms | **112x** | 2.3x slower |
| Inner `.*keyword.*` | 227 ms | 0.34 ms | 0.32 ms | **668x** | **~parity** |
| Suffix `.*\.txt` | 228 ms | 2.9 ms | 1.3 ms | **78x** | 2.3x slower |
| Multiline `(?m)^/.*\.php` | 101 ms | 0.35 ms | 0.72 ms | **288x** | **2.0x faster** |
| Email validation | 258 ms | 0.51 ms | 0.27 ms | **506x** | 1.8x slower |
| URL extraction | 259 ms | 0.71 ms | 0.35 ms | **364x** | 2.0x slower |
| IP address | 493 ms | 0.73 ms | 13.5 ms | **675x** | **18.5x faster** |
| Char class `[\w]+` | 483 ms | 40.9 ms | 56.0 ms | **11x** | **1.3x faster** |
| Word repeat `(\w{2,8})+` | 628 ms | 167 ms | 54.8 ms | **3x** | 3.0x slower |

**Where coregex excels:**
- Multiline patterns (`(?m)^/.*\.php`) — near Rust parity, 100x+ vs stdlib
- Multiline patterns (`(?m)^/.*\.php`) — **2x faster than Rust**, 288x vs stdlib
- IP/phone patterns (`\d+\.\d+\.\d+\.\d+`) — SIMD digit prefilter skips non-digit regions
- Suffix patterns (`.*\.log`, `.*\.txt`) — reverse search optimization (1000x+)
- Inner literals (`.*error.*`, `.*@example\.com`) — bidirectional DFA (900x+)
Expand Down
11 changes: 8 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> **Strategic Focus**: Production-grade regex engine with RE2/rust-regex level optimizations

**Last Updated**: 2026-03-24 | **Current Version**: v0.12.18 | **Target**: v1.0.0 stable
**Last Updated**: 2026-03-25 | **Current Version**: v0.12.19 | **Target**: v1.0.0 stable

---

Expand All @@ -12,7 +12,7 @@ Build a **production-ready, high-performance regex engine** for Go that matches

### Current State vs Target

| Metric | Current (v0.12.15) | Target (v1.0.0) |
| Metric | Current (v0.12.19) | Target (v1.0.0) |
|--------|-------------------|-----------------|
| Inner literal speedup | **280-3154x** | ✅ Achieved |
| Case-insensitive speedup | **263x** | ✅ Achieved |
Expand Down Expand Up @@ -93,7 +93,12 @@ v0.12.16 ✅ → WrapLineAnchor for (?m)^ patterns
v0.12.17 ✅ → Fix LogParser ARM64 regression, restore DFA/Teddy for (?m)^
v0.12.18 (Current) ✅ → Flat DFA transition table, integrated prefilter, PikeVM skip-ahead
v0.12.18 ✅ → Flat DFA transition table, integrated prefilter, PikeVM skip-ahead
v0.12.19 ✅ → Zero-alloc FindSubmatch, byte-based DFA cache, Rust-aligned visited limits
v0.12.20 (Current) → Premultiplied/tagged StateIDs, break-at-match DFA determinize,
Phase 3 elimination (2-pass bidirectional DFA)
v1.0.0-rc → Feature freeze, API locked
Expand Down
14 changes: 8 additions & 6 deletions dfa/lazy/accel_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -98,18 +98,20 @@ func TestDetectAccelerationFromCached(t *testing.T) {

func TestDetectAccelerationFromFlat(t *testing.T) {
// Test acceleration detection via flat transition table
// Using premultiplied state IDs: sid = stateIndex * stride
stride := 256
sid := StateID(1)
flatTrans := make([]StateID, 2*stride) // 2 states
sid := StateID(1 * stride) // premultiplied: state 1 at offset 256
state2 := StateID(2 * stride)
flatTrans := make([]StateID, 3*stride) // 3 states (0, 1, 2)

// State 1: 250 self-loops, 3 exits to state 2, 3 dead
base := int(sid) * stride
base := sid.Offset()
for i := 0; i < 250; i++ {
flatTrans[base+i] = sid // Self-loop
}
flatTrans[base+250] = StateID(2)
flatTrans[base+251] = StateID(2)
flatTrans[base+252] = StateID(2)
flatTrans[base+250] = state2
flatTrans[base+251] = state2
flatTrans[base+252] = state2
flatTrans[base+253] = DeadState
flatTrans[base+254] = DeadState
flatTrans[base+255] = DeadState
Expand Down
2 changes: 1 addition & 1 deletion dfa/lazy/anchored_search_prefilter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -525,7 +525,7 @@ func TestFindWithPrefilterAtWordBoundary(t *testing.T) {
// TestFindWithPrefilterAtCacheClear tests the cache-clear recovery path
// in findWithPrefilterAt using a very small cache.
func TestFindWithPrefilterAtCacheClear(t *testing.T) {
config := DefaultConfig().WithMaxStates(3).WithMaxCacheClears(10)
config := DefaultConfig().WithMaxStates(6).WithMaxCacheClears(20)
compiler := nfa.NewDefaultCompiler()
nfaObj, err := compiler.Compile("[a-zA-Z]+[0-9]+")
if err != nil {
Expand Down
Loading
Loading