Conversation
|
@dewxin thanks for this PR. I've skimmed over the code and there are a few concerns that need to be resolved before a final review.
Some lower importance notes:
|
|
Hi, @oranagra ,thanks for your suggestions. I didn't make it clear, thanks to your confusion, now I am kinda confused. Try to make it clear, here are my answers.
I just got used to making the functions look tiny. I will paste those comments back ,and add extra comments making it easier to understand the modifications.
emmm, what I said is when the count is negative,
It's a problem to get a random member when we cannot random access members. The intuitive way solving this problem is to get a random index, and scan, and get your value, but it may cost a lot when the count is near ziplist size. Why can't we just do one-round scan? Let's say the length of ziplist is m , we want to pick n entries from the ziplist. And the count of members we haven't visited is mleft, the count of entries we haven't picked is nleft. The only thing left to do is to find the possibility expression by which every member will be chosen equally. The first expression came into my mind is n/m, I believe you guys have tried this one. It didn't work. Let's consider the edge case,now we are picking the last entry, and we haven't picked any entry before ,cause they are so unlucky. Because we haven't picked any entry, we should pick the last one at least. It is of course a disagreement to n/m. The possibility seems should increase when the mleft is reducing , and decrease when the nleft is reducing. So Let's give nleft/mleft a try. Every time we try to pick an entry, the possibility to pick it is nleft/mleft, which leading to the result we want, the possibility being picked for every entry is n/m. We could prove it by Mathematical Induction. Say we already have the entries(entry1 entry2.. entryi-1)whose possibilities being picked is n/m, when it comes to entryi, the mathematical expectation of the number of already picked entries is n/m*(i-1), then the possibility of entryi being picked is nleft/mleft=(n-n/m*(i-1)) / (m-i+1) = n/m. In short words, if the possibility of previous entries being picked is n/m, using nleft/mleft expression, we can make the current entry being picked equally as previous ones. And when i-1 = 1, P1= nleft/mleft=n/m leading to P2=n/m.
copy that
I see many little plays in redis source code to improve efficiency, so I did mine, for dbDictType dict, I just copy the ptr, and create a new dict type to store these ptrs, so redis don't need to allocate memories to duplicate these strings. In this case, I am afraid we cannot share as far as I am concerned. |
|
@dewxin thanks a lot for the response (and PR). your responses are great, i'll try to quickly respond to each big topic that i think needs to be handled:
thanks a lot for making this effort. |
|
@dewxin after consulting with people about the random algorithm, we realized that the order at which the members are returned might be important. a different algorithm for this may be:
|
|
@oranagra , I have to say you guys are really awesome and redis is really really an amazing project. |
|
@dewxin thanks for the complements. |
|
@oranagra I am willing to , but without a deep understanding of redis, I am afraid I don't have the ability to complete the implementation to the level you are satisfied. And there will be a lot of details , for example , breaking into re-usable bits, ziplistGetRandElements will work well if we call it using a zset, what if I am using a hash, how can I tell it's a key or value, and which format should I return, should I return an array of zipEntry or sds. well, considering it's out of my league, I just give up.. |
|
@dewxin i don't imagine it's out of your league, but i does consume time to get just right (the PR won't be merged before it's perfect, and it may take many rounds of back and forth reviews until we reach there). |
Agreed, this would also match |
|
just want to throw in i would absolutely 100% use this and zrandmember ASAP for a project which keeps running OOM on redis because we have to keep a sorted set, a hash, and a set, since they each do different things. feature parity across these different data types would help a lot (perhaps there's a way to implement such functions abstractly to save work?) |
src/t_hash.c
Outdated
| addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE); | ||
| } | ||
|
|
||
| index ++; |
There was a problem hiding this comment.
| index ++; | |
| index++; |
Should the space be deleted?
src/t_hash.c
Outdated
| dictReleaseIterator(di); | ||
| dictRelease(d); | ||
| } | ||
|
|
There was a problem hiding this comment.
Should the space line be deleted?
| } | ||
|
|
||
| hrandmemberWithCountCommand(c, 1); | ||
| } |
There was a problem hiding this comment.
| } | |
| void hrandmemberCommand(client *c) { | |
| long l = 1; | |
| if (c->argc == 3) { | |
| if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return; | |
| } else if (c->argc > 3) { | |
| addReply(c,shared.syntaxerr); | |
| return; | |
| } | |
| hrandmemberWithCountCommand(c, l); | |
| } |
Would this be better?
src/t_hash.c
Outdated
| long l; | ||
|
|
||
| if (c->argc == 3) { | ||
| if (getLongFromObjectOrReply(c,c->argv[2],&l,NULL) != C_OK) return; |
There was a problem hiding this comment.
Would it be more appropriate to change getLongFromObjectOrReply to getPositiveLongFromObjectOrReply?
| addHashIteratorCursorToReply(c, hi, OBJ_HASH_KEY); | ||
| addHashIteratorCursorToReply(c, hi, OBJ_HASH_VALUE); | ||
| } | ||
| return; |
There was a problem hiding this comment.
Forget the hashTypeReleaseIterator?
| NULL, /* val destructor */ | ||
| NULL /* allow to expand */ | ||
| }; | ||
| d = dictCreate(&dt,NULL); |
There was a problem hiding this comment.
| d = dictCreate(&dt,NULL); | |
| d = dictCreate(&dt,NULL); |
4 extra spaces at the end.
| value = hashTypeCurrentFromHashTable(hi,OBJ_HASH_VALUE); | ||
| ret = dictAdd(d, key, value); | ||
|
|
||
| serverAssert(ret == DICT_OK); |
There was a problem hiding this comment.
| serverAssert(ret == DICT_OK); | |
| serverAssert(ret == DICT_OK); |
Since d is a new dictionary, this is not necessary.
|
@dewxin Don't give up, I too have tried to give up many times, and it is a great experience to participating in such a great Project. |
|
@sundb This PR in it's current form is still very far from being ready, which is why i avoided commenting on specific code bits, asking for style changes and other minor suggestions. I think i summed up what it would take to bring this to completion in these posts: the bigger parts are to share code between the various *RANDMEMBER commands, and change the ziplist algorithm to return random order. if @dewxin wants to pursue it, that's great news for me. |
|
@oranagra hi, I am trying to understand and complete this work. As you said, the core points that need to be done are as follows:
|
|
closing this in favor of #8297 thank you for pushing this forward. |
Last week, I met the same problem as this question posted on stackoverflow . It seems that it's a common feature which is needed since a long time ago.
I know the code may looks like a shit, but as a user of redis, I hope the feature will be added before long.
And the new command
hrandmemberfor now doesn't behave likesrandmemberwhen it comes to negative count, as you may concern