add hrandmember command by dewxin · Pull Request #8219 · redis/redis

dewxin · 2020-12-21T08:04:20Z

Last week, I met the same problem as this question posted on stackoverflow . It seems that it's a common feature which is needed since a long time ago.

I know the code may looks like a shit, but as a user of redis, I hope the feature will be added before long.

And the new command hrandmember for now doesn't behave like srandmember when it comes to negative count, as you may concern

oranagra · 2020-12-21T15:54:14Z

@dewxin thanks for this PR.
We would indeed like to add HRANDMEMBER and also ZRANDMEMBER (see #6323) to the nearby Redis 6.2 if there's someone willing to make the effort and code them (we got our hands full with other tasks).

I've skimmed over the code and there are a few concerns that need to be resolved before a final review.

i see you copied some logic from SRANDMEMBER but it looks like you stripped the comments that explain the code, it's reasoning and steps.
as you mentioned in your opening post, this implementation currently seems to be lucking support for unique random members for the ziplist case, we must resolve that.
also, i'm not sure about the code you did implement for ziplist, where did you take it from? what's the reasoning for that algorithm? at the very least, it's missing some comments that explain it.
i don't think this command needs to reply with both the field names and their values, at least not by default. i guess we can add a WITHVALUES option, like ZRANGE has WITHSCORES. (we'll probably want to add a similar thing for ZRANDMEMBER too). This also means that we don't add the values (which may be large) to the temporary dict, so it can make this command more efficient.
maybe instead of copying this case we can somehow share it between Hashes, Sets and Sorted sets. (maybe not).

Some lower importance notes:

i see zsets have a special case that handles the edge case of an empty response shared.emptyset[c->resp] i guess it would be good idea to add one here too.
i rather the call to addReplyMapLen be in the response part and not so far above
i rather create and destroy the hash type iterator next to the loop that iterates it (so it won't be forgotten)
it looks like your editor also reformatted some lines, i see some missing spaces, spare empty lines, and also line comments (redis uses only block comments)

dewxin · 2020-12-22T08:57:48Z

Hi, @oranagra ,thanks for your suggestions. I didn't make it clear, thanks to your confusion, now I am kinda confused. Try to make it clear, here are my answers.

i see you copied some logic from SRANDMEMBER but it looks like you stripped the comments that explain the code, it's reasoning and steps.

I just got used to making the functions look tiny. I will paste those comments back ,and add extra comments making it easier to understand the modifications.

as you mentioned in your opening post, this implementation currently seems to be lucking support for unique random members for the ziplist case, we must resolve that.

emmm, what I said is when the count is negative, srandmember tends to return the exact count required, even if the set itself doesn't have so many members, and these members are of course not unique. However, hrandmember for now will still return unique members when your count parameter is negative if the encoding is ziplist, and the count will be min(count_required, hash_size). The users may be surprised when they use hrandmember if they get used to srandmember cause the count is shrinked.

also, i'm not sure about the code you did implement for ziplist, where did you take it from? what's the reasoning for that algorithm? at the very least, it's missing some comments that explain it.

It's a problem to get a random member when we cannot random access members. The intuitive way solving this problem is to get a random index, and scan, and get your value, but it may cost a lot when the count is near ziplist size.

Why can't we just do one-round scan?

Let's say the length of ziplist is m , we want to pick n entries from the ziplist. And the count of members we haven't visited is m_left, the count of entries we haven't picked is n_left.

The only thing left to do is to find the possibility expression by which every member will be chosen equally.

The first expression came into my mind is n/m, I believe you guys have tried this one. It didn't work. Let's consider the edge case,now we are picking the last entry, and we haven't picked any entry before ,cause they are so unlucky. Because we haven't picked any entry, we should pick the last one at least. It is of course a disagreement to n/m.

The possibility seems should increase when the m_left is reducing , and decrease when the n_left is reducing.

So Let's give n_left/m_left a try. Every time we try to pick an entry, the possibility to pick it is n_left/m_left, which leading to the result we want, the possibility being picked for every entry is n/m.

We could prove it by Mathematical Induction.

Say we already have the entries(entry₁ entry₂.. entry_i-1)whose possibilities being picked is n/m, when it comes to entry_i, the mathematical expectation of the number of already picked entries is n/m*(i-1), then the possibility of entry_i being picked is n_left/m_left=(n-n/m*(i-1)) / (m-i+1) = n/m.

In short words, if the possibility of previous entries being picked is n/m, using n_left/m_left expression, we can make the current entry being picked equally as previous ones.

And when i-1 = 1, P₁= n_left/m_left=n/m leading to P₂=n/m.
and so on. Now is proved.

i don't think this command needs to reply with both the field names and their values, at least not by default. i guess we can add a WITHVALUES option, like ZRANGE has WITHSCORES. (we'll probably want to add a similar thing for ZRANDMEMBER too). This also means that we don't add the values (which may be large) to the temporary dict, so it can make this command more efficient.

copy that

maybe instead of copying this case we can somehow share it between Hashes, Sets and Sorted sets. (maybe not).

I see many little plays in redis source code to improve efficiency, so I did mine, for dbDictType dict, I just copy the ptr, and create a new dict type to store these ptrs, so redis don't need to allocate memories to duplicate these strings. In this case, I am afraid we cannot share as far as I am concerned.

oranagra · 2020-12-22T10:59:45Z

@dewxin thanks a lot for the response (and PR).
I only skimmed though the code, and didn't analyze it deeply since it seemed that it still has some distance before being ready.

your responses are great, i'll try to quickly respond to each big topic that i think needs to be handled:

This complicated mechanism needs to be clearly documented, even if it make it span over many lines. but maybe we can break it to smaller chunks of code (functions) each with a clear purpose, which will make it easier to read.
maybe we can also break it into re-usable bits so that we can share code between SRANDMEMBER, HRANDMEMBER and ZRANDMEMBER. maybe we need specific ziplistGetRandElements, dictGetRandElements, intsetGetRandElements, so that the commands can call these, and the ziplist code is reused between zset and hash, and the dict code is reused between all 3.
I suppose the new compromise you made (which i misunderstood) about not returning exact count could be ok for a new command (if we document it), but maybe we can also write extra code to extract more members (in a loop) until the count is satisfied).
i'm sorry i didn't analyze the ziplist algorithm, i completely skipped it since it was not commented and clearly too complicated to understand without your explanation. i'll try to validate that later using your response, but either way it should have some big comment that explains the algorithm in the code.
i missed the fact that you didn't copy the keys and values to the temporary dict. that's great (although i guess there's now a possible memory leak in case the ziplist created a string from an integer encoded record). in any case, we also don't want to always respond with the values, let's add an optional WITHVALUES argument.

thanks a lot for making this effort.

oranagra · 2020-12-22T15:07:18Z

@dewxin after consulting with people about the random algorithm, we realized that the order at which the members are returned might be important.
i.e. someone issuing an HRANDMEMBER or ZRANDMEMBER asking for 10 random members, would expect a random order too (maybe he's always asking for 10 members and usually using the first two).
@itamarhaber what do you think?

a different algorithm for this may be:

create a pull of random indexes (by using the known size of the ziplist), either unique of with repetition.
sort the indexes
fetch the elements form the ziplist into a temporary array. (efficient since they're sorted)
un-sort them and reply.
this is probably less efficient than your current implementation, and maybe less elegant, but the actual O complexity is probably as good, and anyway ziplists are usually small.

dewxin · 2020-12-23T10:57:27Z

@oranagra , I have to say you guys are really awesome and redis is really really an amazing project.
My implementation now works pretty well (for my case) on the machine, and I can't wait to explore other parts of redis( replication , sentinel, cluster..) good luck !

oranagra · 2020-12-23T12:57:06Z

@dewxin thanks for the complements.
Does "good luck" mean that you won't be working on this PR to implement my suggestions?
We (the core team) have our hands full working on bigger / more complicated things at the moment, we rely on community contributors to help with things such as this one.
Would appreciate if you would like to take this to completion.

dewxin · 2020-12-24T07:20:57Z

@oranagra I am willing to , but without a deep understanding of redis, I am afraid I don't have the ability to complete the implementation to the level you are satisfied. And there will be a lot of details , for example , breaking into re-usable bits, ziplistGetRandElements will work well if we call it using a zset, what if I am using a hash, how can I tell it's a key or value, and which format should I return, should I return an array of zipEntry or sds.

well, considering it's out of my league, I just give up..

oranagra · 2020-12-24T07:56:40Z

@dewxin i don't imagine it's out of your league, but i does consume time to get just right (the PR won't be merged before it's perfect, and it may take many rounds of back and forth reviews until we reach there).
anyway, i respect your decision, and appreciate the contribution you made.
i suppose it's just a matter of time until someone else steps up and resumes that work.
hope to keep seeing you in github. 8-)

itamarhaber · 2020-12-25T16:48:47Z

i.e. someone issuing an HRANDMEMBER or ZRANDMEMBER asking for 10 random members, would expect a random order too

Agreed, this would also match SRANDMEMBER's behavior.

bionicles · 2020-12-31T11:27:21Z

just want to throw in i would absolutely 100% use this and zrandmember ASAP for a project which keeps running OOM on redis because we have to keep a sorted set, a hash, and a set, since they each do different things. feature parity across these different data types would help a lot (perhaps there's a way to implement such functions abstractly to save work?)

sundb · 2021-01-02T11:53:44Z

src/t_hash.c

+                addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE);
+            }
+
+            index ++;


Suggested change

index ++;

index++;

Should the space be deleted?

sundb · 2021-01-02T11:54:28Z

src/t_hash.c

+            dictReleaseIterator(di);
+            dictRelease(d);
+        }
+


Should the space line be deleted?

sundb · 2021-01-02T11:58:34Z

src/t_hash.c

+    }
+
+    hrandmemberWithCountCommand(c, 1);
+}


Suggested change

}

void hrandmemberCommand(client *c) {

long l = 1;

if (c->argc == 3) {

if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return;

} else if (c->argc > 3) {

addReply(c,shared.syntaxerr);

return;

}

hrandmemberWithCountCommand(c, l);

}

Would this be better?

sundb · 2021-01-02T12:06:41Z

src/t_hash.c

+    long l;
+
+    if (c->argc == 3) {
+        if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return;


Would it be more appropriate to change getLongFromObjectOrReply to getPositiveLongFromObjectOrReply?

sundb · 2021-01-04T08:33:38Z

src/t_hash.c

+                addHashIteratorCursorToReply(c, hi, OBJ_HASH_KEY);
+                addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE);
+            }
+            return;


Forget the hashTypeReleaseIterator?

sundb · 2021-01-04T08:34:23Z

src/t_hash.c

+            NULL,                       /* val destructor */
+            NULL                        /* allow to expand */
+        };
+        d = dictCreate(&dt,NULL);    


Suggested change

d = dictCreate(&dt,NULL);

d = dictCreate(&dt,NULL);

4 extra spaces at the end.

sundb · 2021-01-04T08:36:07Z

src/t_hash.c

+                value = hashTypeCurrentFromHashTable(hi,OBJ_HASH_VALUE);
+                ret = dictAdd(d, key, value);
+
+                serverAssert(ret == DICT_OK);


Suggested change

serverAssert(ret == DICT_OK);

serverAssert(ret == DICT_OK);

Since d is a new dictionary, this is not necessary.

sundb · 2021-01-04T09:08:57Z

@dewxin Don't give up, I too have tried to give up many times, and it is a great experience to participating in such a great Project.

oranagra · 2021-01-04T10:56:30Z

@sundb This PR in it's current form is still very far from being ready, which is why i avoided commenting on specific code bits, asking for style changes and other minor suggestions.
You probably already know that I also try my best to encourage people not to give up, and contribute more, but the flip side of that is asking for small changes which consume the author's time, and then the PR not being merged because of some bigger issue.

I think i summed up what it would take to bring this to completion in these posts:
#8219 (comment)
#8219 (comment)

the bigger parts are to share code between the various *RANDMEMBER commands, and change the ziplist algorithm to return random order.

if @dewxin wants to pursue it, that's great news for me.
if not, maybe someone else would like to pick it up, let's just avoid collisions, so we don't waste anyone's time.

yangbodong22011 · 2021-01-04T11:45:10Z

@oranagra hi, I am trying to understand and complete this work. As you said, the core points that need to be done are as follows:

share code between the various *RANDMEMBER commands.
change-the ziplist algorithm to return random order.

oranagra · 2021-01-27T11:34:20Z

closing this in favor of #8297 thank you for pushing this forward.

add hrandmember command

906714c

oranagra added this to the Next minor backlog milestone Dec 21, 2020

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository labels Dec 22, 2020

oranagra added the state:help-wanted No member is currently implementing this change label Dec 24, 2020

sundb reviewed Jan 2, 2021

View reviewed changes

dewxin added 2 commits January 4, 2021 16:17

apply review suggestions

1e95713

run tests..

24eee37

sundb reviewed Jan 4, 2021

View reviewed changes

yangbodong22011 mentioned this pull request Jan 7, 2021

Add HRANDFIELD and ZRANDMEMBER. improvements to SRANDMEMBER #8297

Merged

oranagra closed this Jan 27, 2021

oranagra mentioned this pull request Feb 3, 2021

Optimize HRANDFIELD and ZRANDMEMBER case 4 when ziplist encoded #8444

Merged

enjoy-binbin mentioned this pull request May 20, 2023

Optimize HRANDFIELD and ZRANDMEMBER case 3 when listpack encoded #12205

Merged

-}
+void hrandmemberCommand(client *c) {
+    long l = 1;
+    if (c->argc == 3) {
+        if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return;
+    } else if (c->argc > 3) {
+        addReply(c,shared.syntaxerr);
+        return;
+    }
+    hrandmemberWithCountCommand(c, l);
+}

Conversation

dewxin commented Dec 21, 2020

Uh oh!

oranagra commented Dec 21, 2020

Uh oh!

dewxin commented Dec 22, 2020

Uh oh!

oranagra commented Dec 22, 2020

Uh oh!

oranagra commented Dec 22, 2020

Uh oh!

dewxin commented Dec 23, 2020

Uh oh!

oranagra commented Dec 23, 2020

Uh oh!

dewxin commented Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 24, 2020

Uh oh!

itamarhaber commented Dec 25, 2020

Uh oh!

bionicles commented Dec 31, 2020

Uh oh!

sundb Jan 2, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 2, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 2, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 2, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

sundb commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jan 4, 2021

Uh oh!

yangbodong22011 commented Jan 4, 2021

Uh oh!

oranagra commented Jan 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dewxin commented Dec 24, 2020 •

edited

Loading

sundb commented Jan 4, 2021 •

edited

Loading