Skip to content

Add support for list type to store elements larger than 4GB#9357

Merged
oranagra merged 59 commits intoredis:unstablefrom
perryitay:RED-59009-add-elements-larger-then-4gb
Nov 3, 2021
Merged

Add support for list type to store elements larger than 4GB#9357
oranagra merged 59 commits intoredis:unstablefrom
perryitay:RED-59009-add-elements-larger-then-4gb

Conversation

@perryitay
Copy link
Contributor

@perryitay perryitay commented Aug 11, 2021

Redis lists are stored in quicklist, which is currently a linked list of ziplists.
Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated.
This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist.

As part of the PR there were few other changes in redis:

  1. new DEBUG sub-commands:
    • QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to be plan or ziplist. default (1GB)
    • QUICKLIST - Shows low level info about the quicklist encoding of
  2. rdb format change:
    • A new type was added - RDB_TYPE_LIST_QUICKLIST_2 .
    • container type (packed / plain) was added to the beginning of the rdb object (before the actual node list).
  3. testing:
    • Tests that requires over 100MB will be by default skipped. a new flag was added to 'runtest' to run the large memory tests (not used by default)

@perryitay perryitay closed this Aug 11, 2021
@perryitay perryitay reopened this Aug 11, 2021
@perryitay perryitay changed the title Red 59009 add elements larger then 4gb refactor internal implementation of Redis list in order to add elements larger then 4gb Aug 11, 2021
@perryitay
Copy link
Contributor Author

in this pr I refactored the internal implementation of Redis list in order to add elements larger then 4gb. curretnly redis list is built on quicklist data structure, every node in the quicklist is a ziplist, after my refactor if the the entry is bigger then 4gb it will be added to the quicklist node as is.

@oranagra oranagra marked this pull request as draft August 11, 2021 13:12
@sundb
Copy link
Collaborator

sundb commented Aug 11, 2021

@perryitay I took a cursory look at the code.

  1. I'm not sure if we really need to support 4g nodes, such large data should be more suitable for files.
  2. In order to support 4g, I think it would be more appropriate to add a new container type for quicklsit, rather than just using QUICKLIST_NODE_CONTAINER_NONE, the way you implemented it makes the code confusing.

@oranagra
Copy link
Member

@sundb you mean add QUICKLIST_NODE_CONTAINER_SDS rather than use QUICKLIST_NODE_CONTAINER_NONE?
so what does NONE stands for?
they way i looked at it, sds is not a container of multiple records (unlike ziplist / listpack)

i think we do want to support large nodes in a list, same as we support large ones in hash.

@sundb
Copy link
Collaborator

sundb commented Aug 11, 2021

@oranagra Yes, I think QUICKLIST_NODE_CONTAINER_SDS would be more appropriate.
QUICKLIST_NODE_CONTAINER_NONE should only be for future (for now) multi-container support (as you said to me once, and I removed it), just like having quicklist support both ziplist and listpack.
We can dynamically convert its container type in a node, converting the container to SDS when it is larger than a certain threshold.

@oranagra
Copy link
Member

oranagra commented Aug 11, 2021

@sundb i don't follow you.
IIRC you wanted to completely drop the container field (i.e. unsigned int container : 2; /* NONE==1 or ZIPLIST==2 */), and i said that we better keep is so we can represent a case where one quicklist node is zliplist and another one is listpack, in that sense sds is the 3rd. and note that either one of these can be RAW, or LZF (the int encoding field).
But i don't see how this discussion applies on whatever SDS is considered an SDS "container" or a NONE "container".
anyway, it's just an enum.. we can change it at any point. (unless i'm completely missing your point)

@sundb
Copy link
Collaborator

sundb commented Aug 12, 2021

@oranagra Yes, unrelated to naming, I left out my ultimate intention.
Because if we modify the code directly with QUICKLIST_NODE_CONTAINER_NONE (or QUICKLIST_NODE_CONTAINER_SDS), we might end up with a lot of ifs in the quicklist, which would make the quicklist unmaintainable.
I would like to add a layer of adapters, such as QuicklistContainerSds or QuciklistContainerZiplist, to reduce the number of ifs.

@oranagra
Copy link
Member

@sundb i'm not sure i follow you again. are you referring to a common interface (struct with pointers) to reduce the ifs?
maybe you wanna draft a quick pseudo code example?

I don't think that's really that important, there aren't that many ifs here.
i think the main complexity of quicklist is about splitting, joining and iterating, inserting and deleting elements inside its containers. but in this case the node is not a container, ti's a plain one element in one node, so the modifications and "early exit" ifs are not reaching the complex parts.

@sundb
Copy link
Collaborator

sundb commented Aug 12, 2021

@sundb i'm not sure i follow you again. are you referring to a common interface (struct with pointers) to reduce the ifs?
maybe you wanna draft a quick pseudo code example?

This is what I mean. I tried it during the qucilist ziplist->listpack migration, maybe we can wait for the finished version of @perryitay.

@perryitay, This is a version of my previous implementation.
https://github.com/sundb/redis/blob/listpack-migration-quicklist/src/quicklist.h
https://github.com/sundb/redis/blob/listpack-migration-quicklist/src/quicklist.c

@oranagra
Copy link
Member

yeah, that's what i imagined (quicklistContainerType), i think that case is very different because the work with the container is a lot more complicated (insertBefore, insertAfter, merge, etc).

ironically, this also gives us a good hint that QUICKLIST_NODE_CONTAINER_NONE is not a bad name after all...

@oranagra oranagra changed the title refactor internal implementation of Redis list in order to add elements larger then 4gb Add support for list type to store elements larger than 4GB Aug 15, 2021
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review.
the two important aspects are maybe:

  1. terminology: maybe if we'll call the non-ziplist nodes "plain" instead of "none" it'll be easier to refer to them anywhere in the code.
  2. i feel that we didn't handle compressed nodes correctly, not sure how well this is covered by the tests.

@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 16, 2021
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i saw a commit so i reviewed it. i now realize it's half baked, but since i already wrote down some comments i'll post them.

@oranagra
Copy link
Member

oranagra commented Nov 3, 2021

@sundb @perryitay what's the status now? i see all comments are addressed.. let me know when this is ready to be merged.

@oranagra
Copy link
Member

oranagra commented Nov 3, 2021

@sundb
Copy link
Collaborator

sundb commented Nov 3, 2021

@oranagra LGTM. I have no more comments.

@oranagra
Copy link
Member

oranagra commented Nov 3, 2021

@perryitay there are some walgrind warnings for the unit tests. please fix them and we can finally merge this

@oranagra oranagra merged commit f27083a into redis:unstable Nov 3, 2021
oranagra pushed a commit that referenced this pull request Nov 16, 2021
Redis supports inserting data over 4GB into string (and recently for lists too, see #9357),
But LZF compression used in RDB files (see `rdbcompression` config), and in quicklist
(see `list-compress-depth` config) does not support compress/decompress data over
UINT32_MAX, which will result in corrupting the rdb after compression.

Internal changes:
1. Modify the `unsigned int` parameter of `lzf_compress/lzf_decompress` to `size_t`.
2. Modify the variable types in `lzf_compress` involving offsets and lengths to `size_t`.
3. Set LZF_USE_OFFSETS to 0.
    When LZF_USE_OFFSETS is 1, lzf store offset into `LZF_HSLOT`(32bit). 
    Even in 64-bit, `LZF_USE_OFFSETS` defaults to 1, because lzf assumes that it only
    compresses and decompresses data smaller than UINT32_MAX.
    But now we need to make lzf support 64-bit, turning on `LZF_USE_OFFSETS` will make
    it impossible to store 64-bit offsets or pointers.
    BTW, disable LZF_USE_OFFSETS also brings a few performance improvements.

Tests:
1. Add test for compress/decompress string large than UINT32_MAX.
2. Add unittest for compress/decompress quicklistNode.
oranagra added a commit that referenced this pull request Nov 24, 2021
Part three of implementing #8702, following #8887 and #9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <[email protected]>
hwware pushed a commit to hwware/redis that referenced this pull request Dec 20, 2021
Part three of implementing redis#8702, following redis#8887 and redis#9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following redis#9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at redis#9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from redis#9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 4, 2022
Change the sentinel config file to a directory in SENTINEL SET test.
So it will now fail on the `rename` in `rewriteConfigOverwriteFile`.

The test used to set the sentinel config file permissions to `000` to
simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in redis#10151)

Other changes:
1. More error messages after the config rewrite failure.
2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in redis#9304)
3. Fix a typo in debug quicklist-packed-threshold, then -> than. (redis#9357)
oranagra pushed a commit that referenced this pull request Feb 4, 2022
Change the sentinel config file to a directory in SENTINEL SET test.
So it will now fail on the `rename` in `rewriteConfigOverwriteFile`.

The test used to set the sentinel config file permissions to `000` to
simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151)

Other changes:
1. More error messages after the config rewrite failure.
2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304)
3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)
oranagra pushed a commit that referenced this pull request Sep 19, 2022
… split quicklistNode (#11242)

This PR mainly deals with 2 crashes introduced in #9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.

1. Fix crash due to deleting an entry from a compress quicklistNode
   When inserting a large element, we need to create a new quicklistNode first,
   and then delete its previous element, if the node where the deleted element is
   located is compressed, it will cause a crash.
   Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
   after some operation, we can use this flag like following:

    ```c
    node->dont_compress = 1; /* Prevent to be compressed */
    some_operation(node); /* This operation might try to compress this node */
    some_other_operation(node); /* We can use this node without decompress it */
    node->dont_compress = 0; /* Re-able compression */
    quicklistCompressNode(node);
    ```

   Perhaps in the future, we could just disable the current entry from being
   compressed during the iterator loop, but that would require more work.

2. Fix crash due to wrongly split quicklist
   before #9357, the offset param of _quicklistSplitNode() will not negative.
   For now, when offset is negative, the split extent will be wrong.
   following example:
    ```c
    int orig_start = after ? offset + 1 : 0;
    int orig_extent = after ? -1 : offset;
    int new_start = after ? 0 : offset;
    int new_extent = after ? offset + 1 : -1;
    # offset: -2, after: 1, node->count: 2
    # current wrong range: [-1,-1] [0,-1]
    # correct range: [1,-1] [0, 1]
    ```

   Because only `_quicklistInsert()` splits the quicklistNode and only
   `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), 
   so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
   But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
   always positive), so it is not affected.
   The final conclusion is this crash only occur when we insert a large element
   with negative index into a list, that affects `LSET` command and `RM_ListSet`
   module api.
     
3. In external test mode, we need to restore quicklist packed threshold after
   when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.
oranagra pushed a commit that referenced this pull request Sep 21, 2022
… split quicklistNode (#11242)

This PR mainly deals with 2 crashes introduced in #9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.

1. Fix crash due to deleting an entry from a compress quicklistNode
   When inserting a large element, we need to create a new quicklistNode first,
   and then delete its previous element, if the node where the deleted element is
   located is compressed, it will cause a crash.
   Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
   after some operation, we can use this flag like following:

    ```c
    node->dont_compress = 1; /* Prevent to be compressed */
    some_operation(node); /* This operation might try to compress this node */
    some_other_operation(node); /* We can use this node without decompress it */
    node->dont_compress = 0; /* Re-able compression */
    quicklistCompressNode(node);
    ```

   Perhaps in the future, we could just disable the current entry from being
   compressed during the iterator loop, but that would require more work.

2. Fix crash due to wrongly split quicklist
   before #9357, the offset param of _quicklistSplitNode() will not negative.
   For now, when offset is negative, the split extent will be wrong.
   following example:
    ```c
    int orig_start = after ? offset + 1 : 0;
    int orig_extent = after ? -1 : offset;
    int new_start = after ? 0 : offset;
    int new_extent = after ? offset + 1 : -1;
    # offset: -2, after: 1, node->count: 2
    # current wrong range: [-1,-1] [0,-1]
    # correct range: [1,-1] [0, 1]
    ```

   Because only `_quicklistInsert()` splits the quicklistNode and only
   `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), 
   so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
   But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
   always positive), so it is not affected.
   The final conclusion is this crash only occur when we insert a large element
   with negative index into a list, that affects `LSET` command and `RM_ListSet`
   module api.
     
3. In external test mode, we need to restore quicklist packed threshold after
   when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.

(cherry picked from commit 13d25dd)
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
… split quicklistNode (redis#11242)

This PR mainly deals with 2 crashes introduced in redis#9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.

1. Fix crash due to deleting an entry from a compress quicklistNode
   When inserting a large element, we need to create a new quicklistNode first,
   and then delete its previous element, if the node where the deleted element is
   located is compressed, it will cause a crash.
   Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
   after some operation, we can use this flag like following:

    ```c
    node->dont_compress = 1; /* Prevent to be compressed */
    some_operation(node); /* This operation might try to compress this node */
    some_other_operation(node); /* We can use this node without decompress it */
    node->dont_compress = 0; /* Re-able compression */
    quicklistCompressNode(node);
    ```

   Perhaps in the future, we could just disable the current entry from being
   compressed during the iterator loop, but that would require more work.

2. Fix crash due to wrongly split quicklist
   before redis#9357, the offset param of _quicklistSplitNode() will not negative.
   For now, when offset is negative, the split extent will be wrong.
   following example:
    ```c
    int orig_start = after ? offset + 1 : 0;
    int orig_extent = after ? -1 : offset;
    int new_start = after ? 0 : offset;
    int new_extent = after ? offset + 1 : -1;
    # offset: -2, after: 1, node->count: 2
    # current wrong range: [-1,-1] [0,-1]
    # correct range: [1,-1] [0, 1]
    ```

   Because only `_quicklistInsert()` splits the quicklistNode and only
   `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), 
   so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
   But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
   always positive), so it is not affected.
   The final conclusion is this crash only occur when we insert a large element
   with negative index into a list, that affects `LSET` command and `RM_ListSet`
   module api.
     
3. In external test mode, we need to restore quicklist packed threshold after
   when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
… split quicklistNode (redis#11242)

This PR mainly deals with 2 crashes introduced in redis#9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.

1. Fix crash due to deleting an entry from a compress quicklistNode
   When inserting a large element, we need to create a new quicklistNode first,
   and then delete its previous element, if the node where the deleted element is
   located is compressed, it will cause a crash.
   Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
   after some operation, we can use this flag like following:

    ```c
    node->dont_compress = 1; /* Prevent to be compressed */
    some_operation(node); /* This operation might try to compress this node */
    some_other_operation(node); /* We can use this node without decompress it */
    node->dont_compress = 0; /* Re-able compression */
    quicklistCompressNode(node);
    ```

   Perhaps in the future, we could just disable the current entry from being
   compressed during the iterator loop, but that would require more work.

2. Fix crash due to wrongly split quicklist
   before redis#9357, the offset param of _quicklistSplitNode() will not negative.
   For now, when offset is negative, the split extent will be wrong.
   following example:
    ```c
    int orig_start = after ? offset + 1 : 0;
    int orig_extent = after ? -1 : offset;
    int new_start = after ? 0 : offset;
    int new_extent = after ? offset + 1 : -1;
    # offset: -2, after: 1, node->count: 2
    # current wrong range: [-1,-1] [0,-1]
    # correct range: [1,-1] [0, 1]
    ```

   Because only `_quicklistInsert()` splits the quicklistNode and only
   `quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(), 
   so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
   But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
   always positive), so it is not affected.
   The final conclusion is this crash only occur when we insert a large element
   with negative index into a list, that affects `LSET` command and `RM_ListSet`
   module api.
     
3. In external test mode, we need to restore quicklist packed threshold after
   when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.
oranagra pushed a commit that referenced this pull request Feb 22, 2024
Following #12568

In issue #9357, when inserting an element larger than 1GB, we currently
store it in a plain node instead of a listpack.
Presently, when we insert an element that exceeds the maximum size of a
packed node, it cannot be accommodated in any other nodes, thus ending
up isolated like a large element.
I.e. it's a node with only one element, but it's listpack encoded rather
than a plain buffer.

This PR lowers the threshold for considering an element as 'large' from
1GB to the maximum size of a node.
While this change doesn't completely resolve the bug mentioned in the
previous PR, it does mitigate its potential impact.

As a result of this change, we can now only use LSET to replace an
element with another element that falls below the maximum size
threshold.
In the worst-case scenario, with a fill of -5, the largest packed node
we can create is 2GB (32k * 64k):
* 32k: The smallest element in a listpack is 2 bytes, which allows us to
store up to 32k elements.
* 64k: This is the maximum size for a single quicklist node.

## Others
To fully fix #9357, we need more work, as discussed in #12568, when we
insert an element into a quicklistNode, it may be created in a new node,
put into another node, or merged, and we can't correctly delete the node
that was supposed to be deleted.
I'm not sure it's worth it, since it involves a lot of modifications.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…12659)

Following redis#12568

In issue redis#9357, when inserting an element larger than 1GB, we currently
store it in a plain node instead of a listpack.
Presently, when we insert an element that exceeds the maximum size of a
packed node, it cannot be accommodated in any other nodes, thus ending
up isolated like a large element.
I.e. it's a node with only one element, but it's listpack encoded rather
than a plain buffer.

This PR lowers the threshold for considering an element as 'large' from
1GB to the maximum size of a node.
While this change doesn't completely resolve the bug mentioned in the
previous PR, it does mitigate its potential impact.

As a result of this change, we can now only use LSET to replace an
element with another element that falls below the maximum size
threshold.
In the worst-case scenario, with a fill of -5, the largest packed node
we can create is 2GB (32k * 64k):
* 32k: The smallest element in a listpack is 2 bytes, which allows us to
store up to 32k elements.
* 64k: This is the maximum size for a single quicklist node.

## Others
To fully fix redis#9357, we need more work, as discussed in redis#12568, when we
insert an element into a quicklistNode, it may be created in a new node,
put into another node, or merged, and we can't correctly delete the node
that was supposed to be deleted.
I'm not sure it's worth it, since it involves a lot of modifications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants