Fix random element in skewed sparse hash table#2085
Fix random element in skewed sparse hash table#2085zuiderkwast merged 1 commit intovalkey-io:unstablefrom
Conversation
Signed-off-by: Viktor Söderqvist <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #2085 +/- ##
============================================
+ Coverage 71.08% 71.25% +0.17%
============================================
Files 122 122
Lines 66210 66022 -188
============================================
- Hits 47065 47044 -21
+ Misses 19145 18978 -167
🚀 New features to boost your workflow:
|
| size_t cursor = randomSizeT(); | ||
| while (samples.seen < count) { | ||
| cursor = hashtableScan(ht, cursor, sampleEntriesScanFn, &samples); | ||
| hashtableScan(ht, randomSizeT(), sampleEntriesScanFn, &samples); |
There was a problem hiding this comment.
Do you think this will have performance impact on random commands?
There was a problem hiding this comment.
No, not really. Memory access is the slow operation here. PRNG is fast compared to that.
There was a problem hiding this comment.
IIUC, using a random cursor breaks the SCAN method’s guarantee of uniqueness, risking duplicate key processing. For functions like evict, this could degrade performance or cause logical errors.
There was a problem hiding this comment.
This is the code to get some random keys, hashtableSampleEntries, similar to dictGetSomeKeys. It uses the scan implementation at a random cursor because the scan implementation can already handle rehashing and skip indexes that have already been rehashed.
In SCAN we don't use a random cursor.
Eviction does use kvstoreHashtableSampleEntries so you're right, it can now return duplicate entries. It's the same in dictGetSomeKeys. This PR makes hashtable sampling more similar to dictGetSomeKeys.
By default maxmemory_samples is 5, so it basically only samples one or two buckets.
There was a problem hiding this comment.
Eviction does use kvstoreHashtableSampleEntries so you're right, it can now return duplicate entries. It's the same in dictGetSomeKeys. This PR makes hashtable sampling more similar to dictGetSomeKeys.
It appears that's a more than acceptable trade-off considering we're basically improving the overall performance (at the very least for sparse hash tables) in here and a high keyspace churn rate (due repeat memory evictions) can evidently also quickly cause the keyspace hash table to become skewed, this is IMHO a net-positive tradeoff we're making.
Running a test with --maxmemory 100mb+--maxmemory-policy allkeys-random and filling the primary kv hashtable with random string entries:
Before this change, the primary kv hash table chain length can easily extend to over 40. After this change, the chain length stays under 10 for most of the time.
Red line is 8.1.1, blue line is 8.1.1 with the PR patch applied:
There was a problem hiding this comment.
I've already tried only adjusting FAIR_RANDOM_SAMPLE_SIZE downwards (20, 10, 5, and 1 times of ENTRIES_PER_BUCKET) in my very early investigation when I opened the issue and it only made matters significantly worse. I was easily able to reproduce the issue when adjusting that multiplier downwards in my testing (chain length would skyrocket very quickly) but didn't think it was worth mentioning since it wouldn't show the actual issue on an unmodified code base.
I think the only reason we're adjusting the FAIR_RANDOM_SAMPLE_SIZE multiplier down in here is that it's not necessary to do that many samples anymore after we're now randomizing the cursor during sampling rather than iterating on the same cursor and we therefore won't as easily run into the same randomness bias issue.
There was a problem hiding this comment.
In fact, I would even make the argument that with this entire patch applied it may be worth considering completely getting rid of FAIR_RANDOM_SAMPLE_SIZE and only using WEAK_RANDOM_SAMPLE_SIZE (and just rename it to RANDOM_SAMPLE_SIZE) since that also improved spop/srandmember performance by a good amount in my benchmarks without sacrificing too much of the randomness quality.
There was a problem hiding this comment.
I suspect the primary reason for the improvement might be adjusting the
FAIR_RANDOM_SAMPLE_SIZE. What would be the outcome if we only modifiedFAIR_RANDOM_SAMPLE_SIZEwhile keeping the previoushashtableScanimplementation unchanged?
@soloestoy Not at all. Only lowering the sample size makes the randomness worse without solving the issue.
I need to explain why this problem arises from the scan-order sampling:
- The fair random algorithm samples some elements by picking a random cursor and then scanning from that point until it finds a number of elements and then picking a random one of them.
- Now, assume there is an empty sequence (in scan order) of buckets.
- When we pick a random cursor and it points into this empty sequence, we continue scanning until we find some elements and then pick one of these. Therefore, the buckets just after an empty sequence are more likely to be picked than other buckets.
- If we repeat SPOP many times, this area just after the empty section will be emptied too and the empty section grows, the problem gets worse and worse.
|x.xx.x.xx.x.........x.xx..x.x..x.xx..x.|
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
Making sampling not follow any scan order solves the issue. The bad algorithm is my mistake. I wrote it when I implemented the hashtable. The dictGetFairRandomKey does just what this PR does: it picks a random bucket every time. I should have followed the dict's design here but I thought it was smart to sample in scan order. But it wasn't.
We can keep the same sample size, but we have a unit test where we actually do some statistical measurement of the fairness. Without this change, that fairness test would fail with a smaller sample size. This change allows us to lower the sample size.
We could probably lower it more, but I didn't want to risk the fairness of it in cases like ongoing rehashing, very sparse tables, etc. Maybe I'm too conservative, but to lower it more, we should tune this more carefully. The unit test code is useful for tuning it.
There was a problem hiding this comment.
it's better to add comments to explain the history.
There was a problem hiding this comment.
I will improve the PR top comment.
I just added an update to that issue. It appears that this patch solves the issue we've been running into. |
In that comment, @Fusl was referring to another patch which was wrong and didn't fix the problem, this one: #2019 (comment) |
|
I have not yet carefully reviewed the
|
Yes, it skips indexes in the source table (table 0) if they have already been rehashed.
No, but it does fewer samples when the table is sparse, so it is potentially less fair. This makes it faster but it loops until it finds an entry. This is similar to how dictGetFairRandomKey has a fallback to dictGetRandomKey when dictGetSomeKeys didn't return any elements. dictGetRandomKey is less fair but it loops until it finds an entry. |
Fusl
left a comment
There was a problem hiding this comment.
Not an authoritative PR approval but this LGTM overall, just one note
| void *samples[FAIR_RANDOM_SAMPLE_SIZE]; | ||
| unsigned count = hashtableSampleEntries(ht, &samples[0], FAIR_RANDOM_SAMPLE_SIZE); | ||
| /* Sample less if it's very sparse. */ | ||
| size_t num_samples = hashtableSize(ht) >= hashtableBuckets(ht) ? FAIR_RANDOM_SAMPLE_SIZE : WEAK_RANDOM_SAMPLE_SIZE; |
There was a problem hiding this comment.
This hashtableSize(ht) >= hashtableBuckets(ht) expression in my testing barely ever evaluates to false and when it does, we're typically already (or about to be) in the process of rehashing the hashtable to a smaller size. It doesn't hurt to reduce the sampling rate during this time, just figured it was worth a note in case this logic was intended to be different.
There was a problem hiding this comment.
Yeah, I was conservative when I picked this threshold. I didn't want to sample too few elements to get unfair randomness in normal circumstances. This case will kick in at fill factor 14%. If resize policy is set to AVOID, the table can be very sparse: MIN_FILL_PERCENT_HARD is just 3%.
In cases when half of the elements in a hashtable have been deleted in
scan order, i.e. using scan and deleting the returned elements, the
random selection algorithm becomes very slow. Also, if a random element
is deleted repeatedly, this can make the hash table to become skewed and
make random selection very slow. Behavior without this fix:
* The fair random algorithm samples some elements by picking a random
cursor and then scanning from that point until it finds a number of
elements and then picking a random one of them.
* Now, assume there is an empty sequence of buckets (in scan order).
* When we pick a random cursor and it points into this empty sequence,
we continue scanning until we find some elements and then pick one of
these. Therefore, the buckets just after an empty sequence are more
likely to be picked than other buckets.
* If we repeat SPOP many times, this area just after the empty section
will be emptied too and the empty section grows, the problem gets
worse and worse.
Bucket distribution (x = non-empty bucket, . = empty bucket)
x..xx.x.xx.x.........x.xx..x.x..x.xx..x.x
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
This PR changes the sampling to pick a new random index to scan for each
new bucket chain that has been sampled. This is more similar to how the
fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
1. The change allows us to lower the sample sizes for fair random
element selection. There are unit tests to verify the fairness of the
algorithm.
2. Additionally, the sampling size is decreased even more in very sparse
tables to prevent it from sampling empty buckets many times which
makes it slow.
One existing unit test is changed from a stack-allocated to a
heap-allocated array, because it was observed to use too much stack
memory when running locally with the `--large-memory` option.
Fixes valkey-io#2019.
Signed-off-by: Viktor Söderqvist <[email protected]>
In cases when half of the elements in a hashtable have been deleted in
scan order, i.e. using scan and deleting the returned elements, the
random selection algorithm becomes very slow. Also, if a random element
is deleted repeatedly, this can make the hash table to become skewed and
make random selection very slow. Behavior without this fix:
* The fair random algorithm samples some elements by picking a random
cursor and then scanning from that point until it finds a number of
elements and then picking a random one of them.
* Now, assume there is an empty sequence of buckets (in scan order).
* When we pick a random cursor and it points into this empty sequence,
we continue scanning until we find some elements and then pick one of
these. Therefore, the buckets just after an empty sequence are more
likely to be picked than other buckets.
* If we repeat SPOP many times, this area just after the empty section
will be emptied too and the empty section grows, the problem gets
worse and worse.
Bucket distribution (x = non-empty bucket, . = empty bucket)
x..xx.x.xx.x.........x.xx..x.x..x.xx..x.x
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
This PR changes the sampling to pick a new random index to scan for each
new bucket chain that has been sampled. This is more similar to how the
fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
1. The change allows us to lower the sample sizes for fair random
element selection. There are unit tests to verify the fairness of the
algorithm.
2. Additionally, the sampling size is decreased even more in very sparse
tables to prevent it from sampling empty buckets many times which
makes it slow.
One existing unit test is changed from a stack-allocated to a
heap-allocated array, because it was observed to use too much stack
memory when running locally with the `--large-memory` option.
Fixes valkey-io#2019.
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
In cases when half of the elements in a hashtable have been deleted in
scan order, i.e. using scan and deleting the returned elements, the
random selection algorithm becomes very slow. Also, if a random element
is deleted repeatedly, this can make the hash table to become skewed and
make random selection very slow. Behavior without this fix:
* The fair random algorithm samples some elements by picking a random
cursor and then scanning from that point until it finds a number of
elements and then picking a random one of them.
* Now, assume there is an empty sequence of buckets (in scan order).
* When we pick a random cursor and it points into this empty sequence,
we continue scanning until we find some elements and then pick one of
these. Therefore, the buckets just after an empty sequence are more
likely to be picked than other buckets.
* If we repeat SPOP many times, this area just after the empty section
will be emptied too and the empty section grows, the problem gets
worse and worse.
Bucket distribution (x = non-empty bucket, . = empty bucket)
x..xx.x.xx.x.........x.xx..x.x..x.xx..x.x
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
This PR changes the sampling to pick a new random index to scan for each
new bucket chain that has been sampled. This is more similar to how the
fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
1. The change allows us to lower the sample sizes for fair random
element selection. There are unit tests to verify the fairness of the
algorithm.
2. Additionally, the sampling size is decreased even more in very sparse
tables to prevent it from sampling empty buckets many times which
makes it slow.
One existing unit test is changed from a stack-allocated to a
heap-allocated array, because it was observed to use too much stack
memory when running locally with the `--large-memory` option.
Fixes #2019.
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
In cases when half of the elements in a hashtable have been deleted in
scan order, i.e. using scan and deleting the returned elements, the
random selection algorithm becomes very slow. Also, if a random element
is deleted repeatedly, this can make the hash table to become skewed and
make random selection very slow. Behavior without this fix:
* The fair random algorithm samples some elements by picking a random
cursor and then scanning from that point until it finds a number of
elements and then picking a random one of them.
* Now, assume there is an empty sequence of buckets (in scan order).
* When we pick a random cursor and it points into this empty sequence,
we continue scanning until we find some elements and then pick one of
these. Therefore, the buckets just after an empty sequence are more
likely to be picked than other buckets.
* If we repeat SPOP many times, this area just after the empty section
will be emptied too and the empty section grows, the problem gets
worse and worse.
Bucket distribution (x = non-empty bucket, . = empty bucket)
x..xx.x.xx.x.........x.xx..x.x..x.xx..x.x
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
This PR changes the sampling to pick a new random index to scan for each
new bucket chain that has been sampled. This is more similar to how the
fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
1. The change allows us to lower the sample sizes for fair random
element selection. There are unit tests to verify the fairness of the
algorithm.
2. Additionally, the sampling size is decreased even more in very sparse
tables to prevent it from sampling empty buckets many times which
makes it slow.
One existing unit test is changed from a stack-allocated to a
heap-allocated array, because it was observed to use too much stack
memory when running locally with the `--large-memory` option.
Fixes #2019.
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
In cases when half of the elements in a hashtable have been deleted in
scan order, i.e. using scan and deleting the returned elements, the
random selection algorithm becomes very slow. Also, if a random element
is deleted repeatedly, this can make the hash table to become skewed and
make random selection very slow. Behavior without this fix:
* The fair random algorithm samples some elements by picking a random
cursor and then scanning from that point until it finds a number of
elements and then picking a random one of them.
* Now, assume there is an empty sequence of buckets (in scan order).
* When we pick a random cursor and it points into this empty sequence,
we continue scanning until we find some elements and then pick one of
these. Therefore, the buckets just after an empty sequence are more
likely to be picked than other buckets.
* If we repeat SPOP many times, this area just after the empty section
will be emptied too and the empty section grows, the problem gets
worse and worse.
Bucket distribution (x = non-empty bucket, . = empty bucket)
x..xx.x.xx.x.........x.xx..x.x..x.xx..x.x
^ ^ ^^ ^ ^
| | || | |
random elements
cursor sampled
This PR changes the sampling to pick a new random index to scan for each
new bucket chain that has been sampled. This is more similar to how the
fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
1. The change allows us to lower the sample sizes for fair random
element selection. There are unit tests to verify the fairness of the
algorithm.
2. Additionally, the sampling size is decreased even more in very sparse
tables to prevent it from sampling empty buckets many times which
makes it slow.
One existing unit test is changed from a stack-allocated to a
heap-allocated array, because it was observed to use too much stack
memory when running locally with the `--large-memory` option.
Fixes valkey-io#2019.
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: shanwan1 <[email protected]>

In cases when half of the elements in a hashtable have been deleted in scan order, i.e. using scan and deleting the returned elements, the random selection algorithm becomes very slow. Also, if a random element is deleted repeatedly, this can make the hash table to become skewed and make random selection very slow. Behavior without this fix:
Bucket distribution (x = non-empty bucket, . = empty bucket)
This PR changes the sampling to pick a new random index to scan for each new bucket chain that has been sampled. This is more similar to how the fair random works in dict.c.
Additionally, this PR includes two small optimizations/tuning:
One existing unit test is changed from a stack-allocated to a heap-allocated array, because it was observed to use too much stack memory when running locally with the
--large-memoryoption.Fixes #2019.