Skip to content
This repository was archived by the owner on Feb 15, 2025. It is now read-only.

Replace shuffled vector with an LCG-based shuffled access pattern.#162

Open
ssbr wants to merge 3 commits intomainfrom
o1-shuffle
Open

Replace shuffled vector with an LCG-based shuffled access pattern.#162
ssbr wants to merge 3 commits intomainfrom
o1-shuffle

Conversation

@ssbr
Copy link
Copy Markdown
Contributor

@ssbr ssbr commented Dec 17, 2020

This should reduce/remove measurement artifacts from the shuffle vector. Since the shuffle vector is, itself, the same size as the buffer being tested for cache, it can lower the measured value by a factor of 2. In fact, it can reduce it by more than 2, because the buffer is of char and the shuffle vector is of int32 or even int64 -- meaning it could cut it by a whole order of magnitude!

Doing some concrete tests on my machine, with the new shuffle we can see a latency increase that really ramps up around 1MB (size of L2 on my laptop):

new boxplot

But with the old algorithm, it starts substantially earlier -- the median stabilizes to a new high value at aroung 100K, a size corresponding to nothing in particular as far as I'm aware:

old boxplot

This should reduce/remove measurement artifacts from the shuffle vector.
Since the shuffle vector is, itself, the same size as the buffer being
tested for cache, it can lower the measured value by a factor of 2. In
fact, it can reduce it by more than 2, because the buffer is of char and
the shuffle vector is of int32 or even int64 -- meaning it could cut it
by a whole order of magnitude!
@ssbr ssbr requested a review from sivachandra December 17, 2020 22:27
@google-cla google-cla bot added the cla: yes authors signed CLA label Dec 17, 2020
// A permutation function for 1 byte.
inline unsigned char PermuteChar(unsigned char x) {
// LCG
return x * 113 + 100;
Copy link
Copy Markdown
Member

@mmdriley mmdriley Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a linear congruential generator.

In an LCG the output is iteratively fed back into the input to create a pseudorandom sequence. Here, though, we're just passing the numbers 0..n-1 into this function expecting to get a permutation of 0..n-1 back.

This works because we're generating the finite cyclic group of integers under addition modulo m. That cyclic group is generated by any element coprime to the modulus. Here m is 28, so any odd number is coprime. ref.

We added the 100 to avoid mapping 0 to 0. Unlike an LCG, it's not necessary to ensure we cover the range.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes authors signed CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants