diffx is a Go diffing library implementing the Myers O(ND) algorithm with heuristics for better output quality. It's designed for use in go-dwdiff.
Goal: Produce well-grouped diff output that avoids fragmenting changes around common short words.
# Run all tests
go test -v ./...
# Run benchmarks
go test -bench=. -benchmem ./...diffx/
├── diffx.go # Public API: Diff(), DiffElements(), Options
├── element.go # Element interface, StringElement
├── context.go # diffContext (algorithm state), partition struct
├── snake.go # findMiddleSnake() - bidirectional Myers search
├── compare.go # compareSeq() - divide-and-conquer
├── filter.go # filterConfusingElements() - preprocessing
├── shift.go # shiftBoundaries() - postprocessing
├── histogram.go # Histogram-style diff algorithm
├── anchor.go # Anchor elimination post-processing
├── *_test.go # Unit tests per module
└── example_test.go # Runnable examples for godoc
Input → filterConfusingElements() → compareSeq() → mapOps() → shiftBoundaries() → Output
(preprocessing) (core Myers) (restore) (postprocessing)
- Myers 1986 paper: "An O(ND) Difference Algorithm and Its Variations"
- Neil Fraser's "Diff Strategies" writeup
- imara-diff (Apache-2.0): https://github.com/pascalkuthe/imara-diff
- Filters high-frequency elements that cause spurious matches
- Elements classified as: keep, discard, provisional
- After diff on filtered sequences,
mapOps()expands results back to original indices - Critical: Equal operations must be expanded element-by-element to interleave filtered elements
significantMatchLen = 16- threshold for significant diagonal runs- Cost limit:
sqrt(n)*sqrt(m)/4 - "Too expensive" threshold:
sqrt(n) + sqrt(m) - Tracks best diagonal match for fallback when cost limit exceeded
- Scores boundary positions (blank lines, punctuation, edges)
- Shifts change regions to align with logical boundaries
- Merges adjacent operations
- Empty sequences, equal sequences, all different
- Property test: applying diff to A produces B
- Fox example: verifies "fox" preserved as anchor
// The motivating example - should keep "fox" as anchor
old := []string{"The", "quick", "brown", "fox", "jumps"}
new := []string{"A", "slow", "red", "fox", "leaps"}
// Expected: 2 change regions (before/after fox), not fragmented- Cause: ops have original indices but sequences were filtered
- Fix: Pass original sequences to shiftBoundaries, not filtered ones
- Cause: mapOps wasn't expanding Equal operations
- Fix: Iterate element-by-element and fill gaps with delete/insert
- Cause: Partition doesn't make progress
- Fix: Proper bounds checking, greedyFallback for edge cases
Based on comparison testing:
- Small (5 elements): ~1µs
- Medium (100 elements): ~20µs
- Large (1000 elements): ~400µs
- Should be 10x+ faster than go-diff on large inputs
- Should produce fewer, better-grouped change regions
- Word-level diffing convenience function for dwdiff use case
- More sophisticated boundary shifting (semantic awareness)
- Fuzz testing for robustness
- Real-world file comparison tests