Add support for list type to store elements larger than 4GB#9357
Add support for list type to store elements larger than 4GB#9357oranagra merged 59 commits intoredis:unstablefrom
Conversation
…ist node rather then ziplist
|
in this pr I refactored the internal implementation of Redis list in order to add elements larger then 4gb. curretnly redis list is built on quicklist data structure, every node in the quicklist is a ziplist, after my refactor if the the entry is bigger then 4gb it will be added to the quicklist node as is. |
|
@perryitay I took a cursory look at the code.
|
|
@sundb you mean add i think we do want to support large nodes in a list, same as we support large ones in hash. |
|
@oranagra Yes, I think QUICKLIST_NODE_CONTAINER_SDS would be more appropriate. |
|
@sundb i don't follow you. |
|
@oranagra Yes, unrelated to naming, I left out my ultimate intention. |
|
@sundb i'm not sure i follow you again. are you referring to a common interface (struct with pointers) to reduce the I don't think that's really that important, there aren't that many |
This is what I mean. I tried it during the qucilist ziplist->listpack migration, maybe we can wait for the finished version of @perryitay. @perryitay, This is a version of my previous implementation. |
|
yeah, that's what i imagined ( ironically, this also gives us a good hint that |
oranagra
left a comment
There was a problem hiding this comment.
initial review.
the two important aspects are maybe:
- terminology: maybe if we'll call the non-ziplist nodes "plain" instead of "none" it'll be easier to refer to them anywhere in the code.
- i feel that we didn't handle compressed nodes correctly, not sure how well this is covered by the tests.
oranagra
left a comment
There was a problem hiding this comment.
i saw a commit so i reviewed it. i now realize it's half baked, but since i already wrote down some comments i'll post them.
Co-authored-by: sundb <[email protected]>
|
@sundb @perryitay what's the status now? i see all comments are addressed.. let me know when this is ready to be merged. |
|
@oranagra LGTM. I have no more comments. |
|
@perryitay there are some walgrind warnings for the unit tests. please fix them and we can finally merge this |
Redis supports inserting data over 4GB into string (and recently for lists too, see #9357), But LZF compression used in RDB files (see `rdbcompression` config), and in quicklist (see `list-compress-depth` config) does not support compress/decompress data over UINT32_MAX, which will result in corrupting the rdb after compression. Internal changes: 1. Modify the `unsigned int` parameter of `lzf_compress/lzf_decompress` to `size_t`. 2. Modify the variable types in `lzf_compress` involving offsets and lengths to `size_t`. 3. Set LZF_USE_OFFSETS to 0. When LZF_USE_OFFSETS is 1, lzf store offset into `LZF_HSLOT`(32bit). Even in 64-bit, `LZF_USE_OFFSETS` defaults to 1, because lzf assumes that it only compresses and decompresses data smaller than UINT32_MAX. But now we need to make lzf support 64-bit, turning on `LZF_USE_OFFSETS` will make it impossible to store 64-bit offsets or pointers. BTW, disable LZF_USE_OFFSETS also brings a few performance improvements. Tests: 1. Add test for compress/decompress string large than UINT32_MAX. 2. Add unittest for compress/decompress quicklistNode.
Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <[email protected]>
Part three of implementing redis#8702, following redis#8887 and redis#9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following redis#9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at redis#9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from redis#9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <[email protected]>
Change the sentinel config file to a directory in SENTINEL SET test. So it will now fail on the `rename` in `rewriteConfigOverwriteFile`. The test used to set the sentinel config file permissions to `000` to simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in redis#10151) Other changes: 1. More error messages after the config rewrite failure. 2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in redis#9304) 3. Fix a typo in debug quicklist-packed-threshold, then -> than. (redis#9357)
Change the sentinel config file to a directory in SENTINEL SET test. So it will now fail on the `rename` in `rewriteConfigOverwriteFile`. The test used to set the sentinel config file permissions to `000` to simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151) Other changes: 1. More error messages after the config rewrite failure. 2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304) 3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)
… split quicklistNode (#11242) This PR mainly deals with 2 crashes introduced in #9357, and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode. 1. Fix crash due to deleting an entry from a compress quicklistNode When inserting a large element, we need to create a new quicklistNode first, and then delete its previous element, if the node where the deleted element is located is compressed, it will cause a crash. Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode after some operation, we can use this flag like following: ```c node->dont_compress = 1; /* Prevent to be compressed */ some_operation(node); /* This operation might try to compress this node */ some_other_operation(node); /* We can use this node without decompress it */ node->dont_compress = 0; /* Re-able compression */ quicklistCompressNode(node); ``` Perhaps in the future, we could just disable the current entry from being compressed during the iterator loop, but that would require more work. 2. Fix crash due to wrongly split quicklist before #9357, the offset param of _quicklistSplitNode() will not negative. For now, when offset is negative, the split extent will be wrong. following example: ```c int orig_start = after ? offset + 1 : 0; int orig_extent = after ? -1 : offset; int new_start = after ? 0 : offset; int new_extent = after ? offset + 1 : -1; # offset: -2, after: 1, node->count: 2 # current wrong range: [-1,-1] [0,-1] # correct range: [1,-1] [0, 1] ``` Because only `_quicklistInsert()` splits the quicklistNode and only `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash. But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is always positive), so it is not affected. The final conclusion is this crash only occur when we insert a large element with negative index into a list, that affects `LSET` command and `RM_ListSet` module api. 3. In external test mode, we need to restore quicklist packed threshold after when the end of test. 4. Show `node->count` in quicklistRepr(). 5. Add new tcl proc `config_get_set` to support restoring config in tests.
… split quicklistNode (#11242) This PR mainly deals with 2 crashes introduced in #9357, and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode. 1. Fix crash due to deleting an entry from a compress quicklistNode When inserting a large element, we need to create a new quicklistNode first, and then delete its previous element, if the node where the deleted element is located is compressed, it will cause a crash. Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode after some operation, we can use this flag like following: ```c node->dont_compress = 1; /* Prevent to be compressed */ some_operation(node); /* This operation might try to compress this node */ some_other_operation(node); /* We can use this node without decompress it */ node->dont_compress = 0; /* Re-able compression */ quicklistCompressNode(node); ``` Perhaps in the future, we could just disable the current entry from being compressed during the iterator loop, but that would require more work. 2. Fix crash due to wrongly split quicklist before #9357, the offset param of _quicklistSplitNode() will not negative. For now, when offset is negative, the split extent will be wrong. following example: ```c int orig_start = after ? offset + 1 : 0; int orig_extent = after ? -1 : offset; int new_start = after ? 0 : offset; int new_extent = after ? offset + 1 : -1; # offset: -2, after: 1, node->count: 2 # current wrong range: [-1,-1] [0,-1] # correct range: [1,-1] [0, 1] ``` Because only `_quicklistInsert()` splits the quicklistNode and only `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash. But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is always positive), so it is not affected. The final conclusion is this crash only occur when we insert a large element with negative index into a list, that affects `LSET` command and `RM_ListSet` module api. 3. In external test mode, we need to restore quicklist packed threshold after when the end of test. 4. Show `node->count` in quicklistRepr(). 5. Add new tcl proc `config_get_set` to support restoring config in tests. (cherry picked from commit 13d25dd)
… split quicklistNode (redis#11242) This PR mainly deals with 2 crashes introduced in redis#9357, and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode. 1. Fix crash due to deleting an entry from a compress quicklistNode When inserting a large element, we need to create a new quicklistNode first, and then delete its previous element, if the node where the deleted element is located is compressed, it will cause a crash. Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode after some operation, we can use this flag like following: ```c node->dont_compress = 1; /* Prevent to be compressed */ some_operation(node); /* This operation might try to compress this node */ some_other_operation(node); /* We can use this node without decompress it */ node->dont_compress = 0; /* Re-able compression */ quicklistCompressNode(node); ``` Perhaps in the future, we could just disable the current entry from being compressed during the iterator loop, but that would require more work. 2. Fix crash due to wrongly split quicklist before redis#9357, the offset param of _quicklistSplitNode() will not negative. For now, when offset is negative, the split extent will be wrong. following example: ```c int orig_start = after ? offset + 1 : 0; int orig_extent = after ? -1 : offset; int new_start = after ? 0 : offset; int new_extent = after ? offset + 1 : -1; # offset: -2, after: 1, node->count: 2 # current wrong range: [-1,-1] [0,-1] # correct range: [1,-1] [0, 1] ``` Because only `_quicklistInsert()` splits the quicklistNode and only `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash. But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is always positive), so it is not affected. The final conclusion is this crash only occur when we insert a large element with negative index into a list, that affects `LSET` command and `RM_ListSet` module api. 3. In external test mode, we need to restore quicklist packed threshold after when the end of test. 4. Show `node->count` in quicklistRepr(). 5. Add new tcl proc `config_get_set` to support restoring config in tests.
… split quicklistNode (redis#11242) This PR mainly deals with 2 crashes introduced in redis#9357, and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode. 1. Fix crash due to deleting an entry from a compress quicklistNode When inserting a large element, we need to create a new quicklistNode first, and then delete its previous element, if the node where the deleted element is located is compressed, it will cause a crash. Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode after some operation, we can use this flag like following: ```c node->dont_compress = 1; /* Prevent to be compressed */ some_operation(node); /* This operation might try to compress this node */ some_other_operation(node); /* We can use this node without decompress it */ node->dont_compress = 0; /* Re-able compression */ quicklistCompressNode(node); ``` Perhaps in the future, we could just disable the current entry from being compressed during the iterator loop, but that would require more work. 2. Fix crash due to wrongly split quicklist before redis#9357, the offset param of _quicklistSplitNode() will not negative. For now, when offset is negative, the split extent will be wrong. following example: ```c int orig_start = after ? offset + 1 : 0; int orig_extent = after ? -1 : offset; int new_start = after ? 0 : offset; int new_extent = after ? offset + 1 : -1; # offset: -2, after: 1, node->count: 2 # current wrong range: [-1,-1] [0,-1] # correct range: [1,-1] [0, 1] ``` Because only `_quicklistInsert()` splits the quicklistNode and only `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash. But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is always positive), so it is not affected. The final conclusion is this crash only occur when we insert a large element with negative index into a list, that affects `LSET` command and `RM_ListSet` module api. 3. In external test mode, we need to restore quicklist packed threshold after when the end of test. 4. Show `node->count` in quicklistRepr(). 5. Add new tcl proc `config_get_set` to support restoring config in tests.
Following #12568 In issue #9357, when inserting an element larger than 1GB, we currently store it in a plain node instead of a listpack. Presently, when we insert an element that exceeds the maximum size of a packed node, it cannot be accommodated in any other nodes, thus ending up isolated like a large element. I.e. it's a node with only one element, but it's listpack encoded rather than a plain buffer. This PR lowers the threshold for considering an element as 'large' from 1GB to the maximum size of a node. While this change doesn't completely resolve the bug mentioned in the previous PR, it does mitigate its potential impact. As a result of this change, we can now only use LSET to replace an element with another element that falls below the maximum size threshold. In the worst-case scenario, with a fill of -5, the largest packed node we can create is 2GB (32k * 64k): * 32k: The smallest element in a listpack is 2 bytes, which allows us to store up to 32k elements. * 64k: This is the maximum size for a single quicklist node. ## Others To fully fix #9357, we need more work, as discussed in #12568, when we insert an element into a quicklistNode, it may be created in a new node, put into another node, or merged, and we can't correctly delete the node that was supposed to be deleted. I'm not sure it's worth it, since it involves a lot of modifications.
…12659) Following redis#12568 In issue redis#9357, when inserting an element larger than 1GB, we currently store it in a plain node instead of a listpack. Presently, when we insert an element that exceeds the maximum size of a packed node, it cannot be accommodated in any other nodes, thus ending up isolated like a large element. I.e. it's a node with only one element, but it's listpack encoded rather than a plain buffer. This PR lowers the threshold for considering an element as 'large' from 1GB to the maximum size of a node. While this change doesn't completely resolve the bug mentioned in the previous PR, it does mitigate its potential impact. As a result of this change, we can now only use LSET to replace an element with another element that falls below the maximum size threshold. In the worst-case scenario, with a fill of -5, the largest packed node we can create is 2GB (32k * 64k): * 32k: The smallest element in a listpack is 2 bytes, which allows us to store up to 32k elements. * 64k: This is the maximum size for a single quicklist node. ## Others To fully fix redis#9357, we need more work, as discussed in redis#12568, when we insert an element into a quicklistNode, it may be created in a new node, put into another node, or merged, and we can't correctly delete the node that was supposed to be deleted. I'm not sure it's worth it, since it involves a lot of modifications.
Redis lists are stored in quicklist, which is currently a linked list of ziplists.
Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated.
This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist.
As part of the PR there were few other changes in redis: