perf: add benchmarks and speed up variant generation#11
Open
andrew wants to merge 3 commits into
Open
Conversation
Per-check and full-harness timings against the top 200 crates.io
packages by downloads, across four input lengths (3-22 chars).
Run with:
cargo bench --bench checks
util::rebuild_name went through format!() for every generated
variant, and each check allocated a fresh String per candidate.
Add rebuild_name_into which writes into a caller-provided buffer,
and convert the hot checks (Omitted, Typos, SwappedWords) to reuse
one buffer across the loop, only cloning when a match is found.
SwappedWords additionally allocated a delimiter String via
format!() and a joined String per permutation; both are now
written into the shared buffer.
Against the saved baseline on aarch64-apple-darwin:
omitted -58% to -66%
swapped_words -37% to -57%
typos -43% to -64%
swapped_characters -16% to -42% (from rebuild_name change only)
harness -58% to -67%
For the longest input (22 chars) the full harness drops from
~104us to ~36us.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a criterion benchmark suite and uses it to find and fix the main bottleneck.
The benchmarks run each check (and the full harness) against the 200 most-downloaded crates.io packages, with input names of 3, 10, 16 and 22 characters. Run with
cargo bench --bench checks.The slow part turned out to be
util::rebuild_name, which builds every candidate name withformat!(). That allocates a newStringfor every candidate.Omittedalone generates ~870 candidates for a 22-character name, so that's ~870 allocations per call.The fix adds
rebuild_name_into, which writes into a buffer the caller owns.Omitted,TyposandSwappedWordsnow reuse one buffer for the whole loop and only clone it when a candidate actually matches.Results on aarch64 macOS, full harness with all checks:
Roughly 3x faster. No API changes.
This will conflict with #9 in
omitted.rs,swapped.rs,typos.rsand with #10 inutil.rs. Happy to rebase once those land.Related to the discussion in #2, though it takes a different approach.