-
-
Notifications
You must be signed in to change notification settings - Fork 483
Description
I was just fiddling around with some RNG stuff when I noticed something that looks pretty odd:
These are time-per-call for calls to rng.gen_range(0..n) (you can see the value of n at the end of each name written on the vertical axis), sampled using Criterion. You'd think that power-of-2 range sizes would be the fastest to generate, since they can theoretically be done as just a single bitmask, with no rejections. But instead, they are the slowest!
I'm not deeply familiar with RNG algorithms, but I took a quick look through the code, and I wonder if this line from sample_single_inclusive is the culprit. Maybe it was intended to use (range-1).leading_zeros() rather than range.leading_zeros()? I haven't grokked what the line is doing, but that would be something that affects exactly power-of-2 sizes differently than their neighbors.
To test, this, I confirmed that Uniform::new(0, n).sample(&mut rng) does not have this pattern, and in fact, is faster than rng.gen_range(0..n) on my computer in almost every case I tested (but especially much faster on power-of-2 sizes):
Is it possible that the sample_single_inclusive code is out of date compared to optimizations that have been made more recently in sample_inclusive?

