Replace all usage of ziplist with listpack for t_zset#9366
Replace all usage of ziplist with listpack for t_zset#9366oranagra merged 16 commits intoredis:unstablefrom
Conversation
d33a9c8 to
c77f56e
Compare
c77f56e to
91213f5
Compare
oranagra
left a comment
There was a problem hiding this comment.
few minor comments and suggestions..
src/listpack.c
Outdated
| while (i--) { | ||
| tail = lpNext(lp, tail); | ||
| assert(p != NULL); | ||
| } |
There was a problem hiding this comment.
despite the fact we have a previous check if num is too big, i think we still need to handle a case we reached to the end of the listpack in the middle of that loop (lpLength may be unreliable).
in which case we'll also need to update num since it's used below.
also, why change long to int ?
There was a problem hiding this comment.
Ohh, I forgot to corrupt data.
long to int was my mistake.
There was a problem hiding this comment.
Done, This piece of code is obsolete.
src/rdb.c
Outdated
| if (deep_integrity_validation) server.stat_dump_payload_sanitizations++; | ||
| if (!zsetZiplistValidateIntegrity(encoded, encoded_len, deep_integrity_validation)) { | ||
| rdbReportCorruptRDB("Zset ziplist integrity check failed."); | ||
| if (!listpackValidateIntegrity(encoded, encoded_len, deep_integrity_validation)) { |
There was a problem hiding this comment.
we shouldn't use the plain listpackValidateIntegrity, that one doesn't check for duplicate records.
There was a problem hiding this comment.
I think my naming caused your misunderstanding(lpValidateIntegrity is in listpack.c), perhaps it should be changed to hashAndZsetZiplistValidateIntegrity.
There was a problem hiding this comment.
yes, i've been mixing lpValidateIntegrity and listpackValidateIntegrity. you must admit that it's confusing 8-)
so you renamed and moved hashListpackValidateIntegrity to serve as common code.
in the past, with ziplists, i've let hash and ziplist have separate integrity validation, just to let them be independent.
our options:
- rename your new
listpackValidateIntegrity, either tolpPaisValidateIntegrityAndDupsor something alike (it's not a generic listpack validation) - move it back to hash and zset to be independently (like before).
i'm ok with option 1.
There was a problem hiding this comment.
I prefer 1, because I hate repetitive code.
There was a problem hiding this comment.
Done, change to lpPairsValidateIntegrityAndDups.
src/t_zset.c
Outdated
| /* TODO: add function to ziplist API to delete N elements from offset. */ | ||
| zl = ziplistDelete(zl,&p); | ||
| zl = ziplistDelete(zl,&p); | ||
| zl = lpDelete(zl,p,&p); | ||
| zl = lpDelete(zl,p,&p); |
There was a problem hiding this comment.
you can now use the new method you created to implement that TODO
There was a problem hiding this comment.
I thought about it, but lpDeleteRange uses index for deletion, and I actually wanted to write lpDeleteEntryRange, but I didn't like the name, need help.
There was a problem hiding this comment.
Done, Change all similar code for deleting 2 elements.
| verify_log_message 0 "*skipping empty key: zset_ziplist*" 0 | ||
| verify_log_message 0 "*empty keys skipped: 8*" 0 | ||
| verify_log_message 0 "*skipping empty key: zset_listpack*" 0 |
There was a problem hiding this comment.
are there really both types of encoded zsets in the same rdb file? how did you generate it?
There was a problem hiding this comment.
I used debug populate to generate it, and modified the saved code.
// zset ziplist
robj *o = createObject(OBJ_ZSET,ziplistNew());
o->encoding = OBJ_ENCODING_LISTPACK;
dbAdd(c->db,createStringObject("zset_ziplist",12),o);
// zset ziplist
dbAdd(c->db,createStringObject("zset_listpack",13),createZsetListpackObject());3184d20 to
34a9508
Compare
34a9508 to
6a45efa
Compare
|
Following are the benchmarking results for loading rdb under different scenarios.
|
|
IIRC for hashes the factor was about 1.5x, and here the factor seems like 3x. |
|
|
@oranagra I was sceptical about the test results, but I checked the code several times and found no problems, the improvement in the
zadd: zpopmin:
zadd: zpopmin: There is indeed a bug, back to my words. |
|
The reason for the error in the previous benchmark (#9366 (comment)) is that when the listpack length exceeds UINT16_MAX, zset benchmark test. Note: The reason for setting
zadd: zpopmin:
zadd: zpopmin: |
|
@oranagra In #9366 (comment). |
089ce2d to
05d6358
Compare
05d6358 to
6bac662
Compare
src/listpack.c
Outdated
| if (numele == LP_HDR_NUMELE_UNKNOWN) { | ||
| /* If the listpack length cannot be obtained in constant time, | ||
| * using lpDeleteRangeWithEntry will be much faster. */ | ||
| lp = lpDeleteRangeWithEntry(lp, &p, num); |
There was a problem hiding this comment.
i don't understand this. it looks like the alternative is doing the same (also using lpDeleteRangeWithEntry)
There was a problem hiding this comment.
Yes, but if we do that, we can't use lpGetNumElements, we need to use lpLength, which will traverse the entire listpack when listpack length > UINT16_MAX. Instead, we use lpDeleteRangeWithEntry to delete the length, so it only traverses to the index.
There was a problem hiding this comment.
ok, i tihnk i see what you mean.
the code below has an "optimization" to just move the EOF marker, and that optimization can't be used without knowing the real length (which in this case would require calling lpLength).
we rather not use lpLength, so instead in this case we skip the EOF "optimization" and just fall back to the normal path.
the current code and comment is confusing because it mentions we can do something faster, but the thing it's faster from, isn't at all there..
i suggest to refactor that code and improve the comment.
it can be something like:
/* If we know we're gonna delete beyond the end of the listpack, we can just move
* the EOF marker, and there's no need to iterate though the entries.
* but if we can't be sure how many entries there are, we rather avoid calling lpLength
* since that means an additional iteration on all elements. */
if (we know the real length, and we know we're deleting beyond the range) {
do the optimal thing and move the EOF
} else {
lpDeleteRangeWithEntry
}
|
@sundb so you're saying that now a loading that used to take 6 seconds takes 14 (and before that fix it used to take 18s)? did that fix also affect (improve) the conversion benchmark we did for hashes? besides that, i see you also showed an improved performance of zsets that use listpack vs ziplists (not related to any of the commits of recent days, but rather just that listpack is better than ziplist), right? anything else left before we can merge it? please go over the recent comment that are not yet marked as resolve and make sure they're handled, and please also update the top comment to list all the changes of this PR. thank you! |
|
ohh, sorry, i see i missed one commit (1c72ad5), so the improvement are a result of a recent change. |
467ac3a to
ea0ed69
Compare
|
ee2956a to
8873fd8
Compare
1f391b9 to
b22020d
Compare
oranagra
left a comment
There was a problem hiding this comment.
ok, so are we done?
anything else left before we can merge it?
please go over the recent comments that are not yet marked as resolve and make sure they're handled, and please also update the top comment to list all the changes of this PR.
src/listpack.c
Outdated
| /* Note that index could overflow, but we use the value | ||
| * after seek, so when we use it no overflow happens. */ |
There was a problem hiding this comment.
you may still wanna keep this comment if you think it's useful (personally i didn't dive into it)
There was a problem hiding this comment.
I thought no one would care about this comment, I think it's necessary (I often confusing).
There was a problem hiding this comment.
I still need to do the benchmark.
There was a problem hiding this comment.
I also need to run "corrupt-dump-fuzzer " for several hours.
|
Second Rdb loading zset benchmark test after #9366 (comment).
|
|
6bac662 also optimises the loading of the listpack with sanitation. Following is a benchmark test for loading hash listpack.
|
|
@redis/core-team technically, this is a major decision, but since it follows the footsteps of the same thing we did for hashes, I'll merge this one normally. |
|
@oranagra It's ready to go. |
Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <[email protected]>
Part three of implementing redis#8702, following redis#8887 and redis#9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following redis#9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at redis#9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from redis#9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <[email protected]>
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in redis#285 ziplist - hash: replaced by listpack in redis#8887 ziplist - zset: replaced by listpack in redis#9366 ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in #285 ziplist - hash: replaced by listpack in #8887 ziplist - zset: replaced by listpack in #9366 ziplist - list: replaced by quicklist (listpack) in #2143 / #9740 Moved the location of ziplist.h in the server.c
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in redis#285 ziplist - hash: replaced by listpack in redis#8887 ziplist - zset: replaced by listpack in redis#9366 ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740 Moved the location of ziplist.h in the server.c
Part two of implementing #8702 (zset), after #8887.
Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.
Rdb format changes
New
RDB_TYPE_ZSET_LISTPACKrdb type.Rdb loading improvements:
Interface changes
zset-max-listpack-entriesconfig is an alias forzset-max-ziplist-entries(same withzset-max-listpack-value).Listpack improvements:
lpDeleteRangeandlpDeleteRangeWithEntryfunctions to delete a range of entries from listpack.lpCompare, converting from string to integer is faster than converting from integer to string.snprintfwithll2stringto improve performance in converting numbers to strings inlpGet().Zset improvements:
zzlFindmethod, uselpFindinstead oflpComparein a loop.lpDeleteRangeWithEntryinstead oflpDeletetwice to delete a element of zset.Tests
lpDeleteRangeandlpDeleteRangeWithEntryfunction.lpCompareandziplsitCompare.