Optimize HRANDFIELD and ZRANDMEMBER case 3 when listpack encoded#12205
Optimize HRANDFIELD and ZRANDMEMBER case 3 when listpack encoded#12205oranagra merged 4 commits intoredis:unstablefrom
Conversation
In redis#11290, we added listpack encoding to set, and added CASE 2.5 to SRANDMEMBER. This algorithm provides better performance, but doesn't provide random order. The order at which the members are returned is important. We provide this random order in HRANDFIELD and ZRANDMEMBER case 3. So for SRANDMEMBER we need to do the same. Note that this change will degrade CASE 3 performance by a factor of 3. Unrelated changes: remove useless setTypeRandomElements and fix a typo.
|
simple random order test: unstable: this branch: simple benchmark: unstable: this branch: |
zuiderkwast
left a comment
There was a problem hiding this comment.
According to the docs, "The order of elements in the reply is not truly random, so it is up to the client to shuffle them if needed".
The algorithm was optimized for the documented behaviour.
I don't really like the idea "looking like random but not really random". It sounds a bit like security by obscurity.
Why is this behaviour important?
|
i may be missing something (have a blackout), but what i see is this:
so the way i see it, like the docs say, if the count is positive, we return unique, but non-random order. |
|
ohh, thanks for the details. ok, does this mean we can also use #8444 to optimize CASE 3? It can also improve the performance of CASE 3 a simple benchmark (listpack with 500 entries): unstable case3: case 3 (use the same CASE 4 way): |
|
Yeah. Seems like a good idea |
oranagra
left a comment
There was a problem hiding this comment.
ok, so to sum up,
in t_set.c, we have case 2.5 using lpNextRandom (without any need to allocate a buffer to hold the results).
but in t_hash.c and t_zset.c we can't use it because we need pairs, so instead we pre-allocate a buffer to use lpRandomPairsUnique.
p.s. i suppose we can adjust lpNextRandom to handle pairs as well, if we want.
this new code code (in case 2.5 in hash and zset) is the same code that used to be in case 4 only, and now we moved it to case 2.5 (effectively applying it to case 3 as well).
so the only impact of this PR on efficiency is for what used to be case 3.
There is already an |
|
I've tried using lpNextRandom before, the code is like this (maybe i wrote it wrong): if (zsetobj->encoding == OBJ_ENCODING_LISTPACK) {
unsigned char *lp = zsetobj->ptr;
unsigned char *p = lpFirst(lp);
unsigned int index = 0;
unsigned char *str = NULL;
unsigned int len = 0;
long long llele = 0;
while (count) {
if (withscores && c->resp > 2)
addReplyArrayLen(c,2);
p = lpNextRandom(lp, p, &index, count--, 1);
str = lpGetValue(p, &len, (long long *)&llele);
if (str == NULL) {
addReplyBulkLongLong(c, llele);
} else {
addReplyBulkCBuffer(c, str, len);
}
p = lpNext(lp, p);
if (withscores) {
str = lpGetValue(p, &len, (long long *)&llele);
if (str == NULL) {
addReplyDouble(c, llele);
} else {
addReplyDouble(c, zzlStrtod(str, len));
}
}
index++;
}
return;
}But his performance doesn't seem to be that good. like this branch: using lpNextRandom with even_flag: |
|
@enjoy-binbin That's surprising results. I would have guessed the algorithm without allocations would be faster than the one with allocations. Maybe when the benchmark is run with the same size command every time (50 elements) and only one connection, the same allocation (the same memory in the allocator) is used every time. Have you tried the benchmark with --threads 2? Maybe it behaves differently in a real deployment with many mixed commands. |
|
yean, maybe. or maybe the former has a better cache hit ratio? using --threads 2 (i am using 1c2g machine) src/redis-benchmark -P 100 -n 400000 --threads 2 zrandmember zset 50 withscores |
|
aren't you missing another call to |
|
ohh right, missing a if (zsetobj->encoding == OBJ_ENCODING_LISTPACK) {
unsigned char *lp = zsetobj->ptr;
unsigned char *p = lpFirst(lp);
unsigned int index = 0;
unsigned char *str = NULL;
unsigned int len = 0;
long long llele = 0;
while (count) {
if (withscores && c->resp > 2)
addReplyArrayLen(c,2);
p = lpNextRandom(lp, p, &index, count--, 1);
str = lpGetValue(p, &len, &llele);
if (str == NULL) {
addReplyBulkLongLong(c, llele);
} else {
addReplyBulkCBuffer(c, str, len);
}
p = lpNext(lp, p);
index++;
if (withscores) {
str = lpGetValue(p, &len, &llele);
if (str == NULL) {
addReplyDouble(c, llele);
} else {
addReplyDouble(c, zzlStrtod(str, len));
}
}
p = lpNext(lp, p);
index++;
}
return;
}new change benchmark: |
|
Yeah I noticed the missing lpNext and index++ but I think the algorithm is correct even without them. It does an extra iteration in the beginning if index is odd. |
|
well, so i guess the malloc is relatively cheap when requesting 50 picks compared to the cache locality. |
|
yes. I voted for the current version |
Optimized HRANDFIELD and ZRANDMEMBER commands as in #8444,
CASE 3 under listpack encoding. Boost optimization to CASE 2.5.
CASE 2.5 listpack only. Sampling unique elements, in non-random order.
Listpack encoded hashes / zsets are meant to be relatively small, so
HRANDFIELD_SUB_STRATEGY_MUL / ZRANDMEMBER_SUB_STRATEGY_MUL
isn't necessary and we rather not make copies of the entries. Instead, we
emit them directly to the output buffer.
Simple benchmarks shows it provides some 400% improvement in HRANDFIELD
and ZRANGESTORE both in CASE 3.
Unrelated changes: remove useless setTypeRandomElements and fix a typo.
hash: a simple benchmark (listpack with 500 entries):
unstable:
this branch:
zset: a simple benchmark (listpack with 121 entries):
unstable:
this branch: