This repository was archived by the owner on Feb 15, 2025. It is now read-only.
Replace shuffled vector with an LCG-based shuffled access pattern.#162
Open
Replace shuffled vector with an LCG-based shuffled access pattern.#162
Conversation
This should reduce/remove measurement artifacts from the shuffle vector. Since the shuffle vector is, itself, the same size as the buffer being tested for cache, it can lower the measured value by a factor of 2. In fact, it can reduce it by more than 2, because the buffer is of char and the shuffle vector is of int32 or even int64 -- meaning it could cut it by a whole order of magnitude!
mmdriley
reviewed
Dec 18, 2020
| // A permutation function for 1 byte. | ||
| inline unsigned char PermuteChar(unsigned char x) { | ||
| // LCG | ||
| return x * 113 + 100; |
Member
There was a problem hiding this comment.
This isn't a linear congruential generator.
In an LCG the output is iteratively fed back into the input to create a pseudorandom sequence. Here, though, we're just passing the numbers 0..n-1 into this function expecting to get a permutation of 0..n-1 back.
This works because we're generating the finite cyclic group of integers under addition modulo m. That cyclic group is generated by any element coprime to the modulus. Here m is 28, so any odd number is coprime. ref.
We added the 100 to avoid mapping 0 to 0. Unlike an LCG, it's not necessary to ensure we cover the range.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should reduce/remove measurement artifacts from the shuffle vector. Since the shuffle vector is, itself, the same size as the buffer being tested for cache, it can lower the measured value by a factor of 2. In fact, it can reduce it by more than 2, because the buffer is of char and the shuffle vector is of int32 or even int64 -- meaning it could cut it by a whole order of magnitude!
Doing some concrete tests on my machine, with the new shuffle we can see a latency increase that really ramps up around 1MB (size of L2 on my laptop):
But with the old algorithm, it starts substantially earlier -- the median stabilizes to a new high value at aroung 100K, a size corresponding to nothing in particular as far as I'm aware: