Use Reservoir Sampling for random sampling of dict, and fix hang during fork#12276
Use Reservoir Sampling for random sampling of dict, and fix hang during fork#12276oranagra merged 10 commits intoredis:unstablefrom
Conversation
oranagra
left a comment
There was a problem hiding this comment.
neat! LGTM.
calling @inbaryuval to comment about the algorithm fairness.
@sundb don't we also wanna fix the rehash condition that we discussed here #12200 (comment)
i'd imagine that needs to be backported to anywhere were the issue was taken to, but considering the fix here is rather safe, maybe we can backport that as well.
|
btw, maybe it would be nice to create a test (one that takes the dict to the extreme using SCAN+DEL) and checks the random fairness. |
Because adding this test requires a fix for #12200 (comment) first, but I think this fix for #issuecomment-1563003032 should require a separate PR for backport, anyway I'll add the test in this PR first. |
|
@oranagra Added the test, please wait for me to run it for several hours. |
oranagra
left a comment
There was a problem hiding this comment.
i've added more details to the top comment, feel free to fix / extend.
Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
|
@sundb this is ready right? i can remove the |
|
the top comment is ok and it's ready. |
|
For the reference, there are more efficient sampling algorithms than reservoir sampling - see here. I'm not sure if it worth the trouble here. |
|
@LiorKogan Thank you for figue out. |
This was introduced by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Co-authored-by: Oran Agra <[email protected]> This is a partial cherry pick of: (cherry picked from commit b00a235) (cherry picked from commit d4c37320382edb342292a3e30250d46896a12016)
This was introduced by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Co-authored-by: Oran Agra <[email protected]> This is a partial cherry pick of: (cherry picked from commit b00a235) (cherry picked from commit d4c37320382edb342292a3e30250d46896a12016)
…ng fork (#12276) ## Issue: When a dict has a long chain or the length of the chain is longer than the number of samples, we will never be able to sample the elements at the end of the chain using dictGetSomeKeys(). This could mean that SRANDMEMBER can be hang in and endless loop. The most severe case, is the pathological case of when someone uses SCAN+DEL or SSCAN+SREM creating an unevenly distributed dict. This was amplified by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Before, we will stop sampling when we reach the maximum number of samples, even if there is more data after the current chain. Now when we reach the maximum we use the Reservoir Sampling algorithm to fairly sample the end of the chain that cannot be sampled 2. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Issue was introduced (or became more severe) by #11692 Co-authored-by: Oran Agra <[email protected]> (cherry picked from commit b00a235)
Issue
When a dict has a long chain or the length of the chain is longer than the number of samples, we will never be able to sample the elements at the end of the chain using dictGetSomeKeys().
This means that there's an unfair random sampling affecting: RANDOMKEY, SRANDMEMBER, SPOP, ZRANDMEMBER, HRANDFIELD, and the eviction mechanism.
Severe issue
This could mean that SRANDMEMBER, ZRANDMEMBER and HRANDFIELD (when used with
<count>) can be hang in and endless loop.The most severe case, is the pathological case of when someone uses SCAN+DEL or SSCAN+SREM creating an unevenly distributed dict.
Recent regression
The above was amplified by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork.
So in a pathological case of SSCAN+SREM causing an uneven hash table, followed by down-sizing of the hash table, if the rehashing would be suspended during a fork, and then calling SRANDMEMBER with
<count>can lead to a hang.Solution
Now when we reach the maximum we use the Reservoir Sampling algorithm to fairly sample the end of the chain that cannot be sampled
Fix #12200