Skip to content

add hrandmember command#8219

Closed
dewxin wants to merge 3 commits intoredis:unstablefrom
dewxin:dewxin-unstable
Closed

add hrandmember command#8219
dewxin wants to merge 3 commits intoredis:unstablefrom
dewxin:dewxin-unstable

Conversation

@dewxin
Copy link
Contributor

@dewxin dewxin commented Dec 21, 2020

Last week, I met the same problem as this question posted on stackoverflow . It seems that it's a common feature which is needed since a long time ago.

I know the code may looks like a shit, but as a user of redis, I hope the feature will be added before long.

And the new command hrandmember for now doesn't behave like srandmember when it comes to negative count, as you may concern

@oranagra oranagra added this to the Next minor backlog milestone Dec 21, 2020
@oranagra
Copy link
Member

@dewxin thanks for this PR.
We would indeed like to add HRANDMEMBER and also ZRANDMEMBER (see #6323) to the nearby Redis 6.2 if there's someone willing to make the effort and code them (we got our hands full with other tasks).

I've skimmed over the code and there are a few concerns that need to be resolved before a final review.

  • i see you copied some logic from SRANDMEMBER but it looks like you stripped the comments that explain the code, it's reasoning and steps.
  • as you mentioned in your opening post, this implementation currently seems to be lucking support for unique random members for the ziplist case, we must resolve that.
  • also, i'm not sure about the code you did implement for ziplist, where did you take it from? what's the reasoning for that algorithm? at the very least, it's missing some comments that explain it.
  • i don't think this command needs to reply with both the field names and their values, at least not by default. i guess we can add a WITHVALUES option, like ZRANGE has WITHSCORES. (we'll probably want to add a similar thing for ZRANDMEMBER too). This also means that we don't add the values (which may be large) to the temporary dict, so it can make this command more efficient.
  • maybe instead of copying this case we can somehow share it between Hashes, Sets and Sorted sets. (maybe not).

Some lower importance notes:

  • i see zsets have a special case that handles the edge case of an empty response shared.emptyset[c->resp] i guess it would be good idea to add one here too.
  • i rather the call to addReplyMapLen be in the response part and not so far above
  • i rather create and destroy the hash type iterator next to the loop that iterates it (so it won't be forgotten)
  • it looks like your editor also reformatted some lines, i see some missing spaces, spare empty lines, and also line comments (redis uses only block comments)

@dewxin
Copy link
Contributor Author

dewxin commented Dec 22, 2020

Hi, @oranagra ,thanks for your suggestions. I didn't make it clear, thanks to your confusion, now I am kinda confused. Try to make it clear, here are my answers.

i see you copied some logic from SRANDMEMBER but it looks like you stripped the comments that explain the code, it's reasoning and steps.

I just got used to making the functions look tiny. I will paste those comments back ,and add extra comments making it easier to understand the modifications.

as you mentioned in your opening post, this implementation currently seems to be lucking support for unique random members for the ziplist case, we must resolve that.

emmm, what I said is when the count is negative, srandmember tends to return the exact count required, even if the set itself doesn't have so many members, and these members are of course not unique. However, hrandmember for now will still return unique members when your count parameter is negative if the encoding is ziplist, and the count will be min(count_required, hash_size). The users may be surprised when they use hrandmember if they get used to srandmember cause the count is shrinked.

also, i'm not sure about the code you did implement for ziplist, where did you take it from? what's the reasoning for that algorithm? at the very least, it's missing some comments that explain it.

It's a problem to get a random member when we cannot random access members. The intuitive way solving this problem is to get a random index, and scan, and get your value, but it may cost a lot when the count is near ziplist size.

Why can't we just do one-round scan?

Let's say the length of ziplist is m , we want to pick n entries from the ziplist. And the count of members we haven't visited is mleft, the count of entries we haven't picked is nleft.

The only thing left to do is to find the possibility expression by which every member will be chosen equally.

The first expression came into my mind is n/m, I believe you guys have tried this one. It didn't work. Let's consider the edge case,now we are picking the last entry, and we haven't picked any entry before ,cause they are so unlucky. Because we haven't picked any entry, we should pick the last one at least. It is of course a disagreement to n/m.

The possibility seems should increase when the mleft is reducing , and decrease when the nleft is reducing.

So Let's give nleft/mleft a try. Every time we try to pick an entry, the possibility to pick it is nleft/mleft, which leading to the result we want, the possibility being picked for every entry is n/m.

We could prove it by Mathematical Induction.

Say we already have the entries(entry1 entry2.. entryi-1)whose possibilities being picked is n/m, when it comes to entryi, the mathematical expectation of the number of already picked entries is n/m*(i-1), then the possibility of entryi being picked is nleft/mleft=(n-n/m*(i-1)) / (m-i+1) = n/m.

In short words, if the possibility of previous entries being picked is n/m, using nleft/mleft expression, we can make the current entry being picked equally as previous ones.

And when i-1 = 1, P1= nleft/mleft=n/m leading to P2=n/m.
and so on. Now is proved.

i don't think this command needs to reply with both the field names and their values, at least not by default. i guess we can add a WITHVALUES option, like ZRANGE has WITHSCORES. (we'll probably want to add a similar thing for ZRANDMEMBER too). This also means that we don't add the values (which may be large) to the temporary dict, so it can make this command more efficient.

copy that

maybe instead of copying this case we can somehow share it between Hashes, Sets and Sorted sets. (maybe not).

I see many little plays in redis source code to improve efficiency, so I did mine, for dbDictType dict, I just copy the ptr, and create a new dict type to store these ptrs, so redis don't need to allocate memories to duplicate these strings. In this case, I am afraid we cannot share as far as I am concerned.

@oranagra
Copy link
Member

@dewxin thanks a lot for the response (and PR).
I only skimmed though the code, and didn't analyze it deeply since it seemed that it still has some distance before being ready.

your responses are great, i'll try to quickly respond to each big topic that i think needs to be handled:

  • This complicated mechanism needs to be clearly documented, even if it make it span over many lines. but maybe we can break it to smaller chunks of code (functions) each with a clear purpose, which will make it easier to read.
  • maybe we can also break it into re-usable bits so that we can share code between SRANDMEMBER, HRANDMEMBER and ZRANDMEMBER. maybe we need specific ziplistGetRandElements, dictGetRandElements, intsetGetRandElements, so that the commands can call these, and the ziplist code is reused between zset and hash, and the dict code is reused between all 3.
  • I suppose the new compromise you made (which i misunderstood) about not returning exact count could be ok for a new command (if we document it), but maybe we can also write extra code to extract more members (in a loop) until the count is satisfied).
  • i'm sorry i didn't analyze the ziplist algorithm, i completely skipped it since it was not commented and clearly too complicated to understand without your explanation. i'll try to validate that later using your response, but either way it should have some big comment that explains the algorithm in the code.
  • i missed the fact that you didn't copy the keys and values to the temporary dict. that's great (although i guess there's now a possible memory leak in case the ziplist created a string from an integer encoded record). in any case, we also don't want to always respond with the values, let's add an optional WITHVALUES argument.

thanks a lot for making this effort.

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository labels Dec 22, 2020
@oranagra
Copy link
Member

@dewxin after consulting with people about the random algorithm, we realized that the order at which the members are returned might be important.
i.e. someone issuing an HRANDMEMBER or ZRANDMEMBER asking for 10 random members, would expect a random order too (maybe he's always asking for 10 members and usually using the first two).
@itamarhaber what do you think?

a different algorithm for this may be:

  1. create a pull of random indexes (by using the known size of the ziplist), either unique of with repetition.
  2. sort the indexes
  3. fetch the elements form the ziplist into a temporary array. (efficient since they're sorted)
  4. un-sort them and reply.
    this is probably less efficient than your current implementation, and maybe less elegant, but the actual O complexity is probably as good, and anyway ziplists are usually small.

@dewxin
Copy link
Contributor Author

dewxin commented Dec 23, 2020

@oranagra , I have to say you guys are really awesome and redis is really really an amazing project.
My implementation now works pretty well (for my case) on the machine, and I can't wait to explore other parts of redis( replication , sentinel, cluster..) good luck !

@oranagra
Copy link
Member

@dewxin thanks for the complements.
Does "good luck" mean that you won't be working on this PR to implement my suggestions?
We (the core team) have our hands full working on bigger / more complicated things at the moment, we rely on community contributors to help with things such as this one.
Would appreciate if you would like to take this to completion.

@dewxin
Copy link
Contributor Author

dewxin commented Dec 24, 2020

@oranagra I am willing to , but without a deep understanding of redis, I am afraid I don't have the ability to complete the implementation to the level you are satisfied. And there will be a lot of details , for example , breaking into re-usable bits, ziplistGetRandElements will work well if we call it using a zset, what if I am using a hash, how can I tell it's a key or value, and which format should I return, should I return an array of zipEntry or sds.

well, considering it's out of my league, I just give up..

@oranagra
Copy link
Member

@dewxin i don't imagine it's out of your league, but i does consume time to get just right (the PR won't be merged before it's perfect, and it may take many rounds of back and forth reviews until we reach there).
anyway, i respect your decision, and appreciate the contribution you made.
i suppose it's just a matter of time until someone else steps up and resumes that work.
hope to keep seeing you in github. 8-)

@oranagra oranagra added the state:help-wanted No member is currently implementing this change label Dec 24, 2020
@itamarhaber
Copy link
Member

i.e. someone issuing an HRANDMEMBER or ZRANDMEMBER asking for 10 random members, would expect a random order too

Agreed, this would also match SRANDMEMBER's behavior.

@bionicles
Copy link

just want to throw in i would absolutely 100% use this and zrandmember ASAP for a project which keeps running OOM on redis because we have to keep a sorted set, a hash, and a set, since they each do different things. feature parity across these different data types would help a lot (perhaps there's a way to implement such functions abstractly to save work?)

src/t_hash.c Outdated
addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE);
}

index ++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index ++;
index++;

Should the space be deleted?

src/t_hash.c Outdated
dictReleaseIterator(di);
dictRelease(d);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the space line be deleted?

}

hrandmemberWithCountCommand(c, 1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
void hrandmemberCommand(client *c) {
long l = 1;
if (c->argc == 3) {
if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return;
} else if (c->argc > 3) {
addReply(c,shared.syntaxerr);
return;
}
hrandmemberWithCountCommand(c, l);
}

Would this be better?

src/t_hash.c Outdated
long l;

if (c->argc == 3) {
if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more appropriate to change getLongFromObjectOrReply to getPositiveLongFromObjectOrReply?

addHashIteratorCursorToReply(c, hi, OBJ_HASH_KEY);
addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE);
}
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget the hashTypeReleaseIterator?

NULL, /* val destructor */
NULL /* allow to expand */
};
d = dictCreate(&dt,NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d = dictCreate(&dt,NULL);
d = dictCreate(&dt,NULL);

4 extra spaces at the end.

value = hashTypeCurrentFromHashTable(hi,OBJ_HASH_VALUE);
ret = dictAdd(d, key, value);

serverAssert(ret == DICT_OK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serverAssert(ret == DICT_OK);
serverAssert(ret == DICT_OK);

Since d is a new dictionary, this is not necessary.

@sundb
Copy link
Collaborator

sundb commented Jan 4, 2021

@dewxin Don't give up, I too have tried to give up many times, and it is a great experience to participating in such a great Project.

@oranagra
Copy link
Member

oranagra commented Jan 4, 2021

@sundb This PR in it's current form is still very far from being ready, which is why i avoided commenting on specific code bits, asking for style changes and other minor suggestions.
You probably already know that I also try my best to encourage people not to give up, and contribute more, but the flip side of that is asking for small changes which consume the author's time, and then the PR not being merged because of some bigger issue.

I think i summed up what it would take to bring this to completion in these posts:
#8219 (comment)
#8219 (comment)

the bigger parts are to share code between the various *RANDMEMBER commands, and change the ziplist algorithm to return random order.

if @dewxin wants to pursue it, that's great news for me.
if not, maybe someone else would like to pick it up, let's just avoid collisions, so we don't waste anyone's time.

@yangbodong22011
Copy link
Contributor

@oranagra hi, I am trying to understand and complete this work. As you said, the core points that need to be done are as follows:

  • share code between the various *RANDMEMBER commands.
  • change-the ziplist algorithm to return random order.

@oranagra
Copy link
Member

closing this in favor of #8297 thank you for pushing this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:help-wanted No member is currently implementing this change state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants