Quicklist (linked list + ziplist)#2143
Conversation
4cda935 to
99bab4a
Compare
src/quicklist.c
Outdated
There was a problem hiding this comment.
The compiler warns that this variable may be used unintialized. Of course it isn't, as target will always be either a or b, but we could surpress this warning by just initalizing where anyway.
There was a problem hiding this comment.
The compiler
Which compiler/version? It didn't complain under my menagerie of compilers yet.
this variable
Maybe I don't know how to use GitHub, but I see a 26 line diff attached to this comment (and all the comments actually), so it's difficult to tell what "this variable" is. :-\ (obviously next I see you mean where, but the github code reply doesn't make it clear. boo github. Though, on second look, it seems the last line in the diff is the comment target? I'll assume that going forward. still, boo github.)
surpress this warning by just initalizing where anyway.
Good point! I'll add it.
There was a problem hiding this comment.
Ah yes. Comments are always made below the line I meant. GitHub could really show this better.
There was a problem hiding this comment.
[~]% gcc --version
gcc (GCC) 4.9.2
There was a problem hiding this comment.
Ah, that's the difference. My Linux testing box was 4.9.0. I've got a 4.9.2 on OS X and I see your extra warnings now. Fixing.
|
Impressive work. I read the code all in once and it was quite clear and concise. I saw no immediate bugs. Also great you have extensive test code in there (nearly half the new code is tests). I'm sure we can get this into unstable soon. |
38f2a2b to
c36c9d5
Compare
|
Thanks for the review! I incorporated your fixes (plus a few more cleanups) into the existing commit. The diff is below since it's difficult to see changes between rebased forced pushes. I don't see any warnings against clang or gcc-4.9.2 anymore, but feel free to try more compilers and tests. The more problems we find now the fewer cleanup commits we have to apply later. :) matt@ununoctium:~/repos/redis/src% git diff
diff --git a/src/quicklist.c b/src/quicklist.c
index bb29121..081296d 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -28,12 +28,15 @@
* POSSIBILITY OF SUCH DAMAGE.
*/
-#include <stdlib.h>
-#include <stdio.h> /* for snprintf */
#include <string.h> /* for memcpy */
#include "quicklist.h"
#include "zmalloc.h"
#include "ziplist.h"
+#include "util.h" /* for ll2string */
+
+#if defined(REDIS_TEST) || defined(REDIS_TEST_VERBOSE)
+#include <stdio.h> /* for printf (debug printing), snprintf (genstr) */
+#endif
/* If not verbose testing, remove all debug printing. */
#ifndef REDIS_TEST_VERBOSE
@@ -323,15 +326,14 @@ static quicklistNode *_quicklistZiplistMerge(quicklist *quicklist,
target->count);
int where;
-
unsigned char *p = NULL;
if (target == a) {
/* If target is node a, we append node b to node a, in-order */
where = ZIPLIST_TAIL;
p = ziplistIndex(b->zl, 0);
D("WILL TRAVERSE B WITH LENGTH: %u, %u", b->count, ziplistLen(b->zl));
- } else if (target == b) {
- /* If target b, we pre-pend node a to node b, in reverse order of a */
+ } else {
+ /* If target b, we prepend node a to node b, in reverse order of a */
where = ZIPLIST_HEAD;
p = ziplistIndex(a->zl, -1);
D("WILL TRAVERSE A WITH LENGTH: %u, %u", a->count, ziplistLen(a->zl));
@@ -347,7 +349,7 @@ static quicklistNode *_quicklistZiplistMerge(quicklist *quicklist,
* complex than using the existing ziplist API to read/push as below. */
while (ziplistGet(p, &val, &sz, &longval)) {
if (!val) {
- sz = snprintf(lv, sizeof(lv), "%lld", longval);
+ sz = ll2string(lv, sizeof(lv), longval);
val = (unsigned char *)lv;
}
target->zl = ziplistPush(target->zl, val, sz, where);
@@ -747,7 +749,7 @@ int quicklistNext(quicklistIter *iter, quicklistEntry *entry) {
return 0;
}
- unsigned char *(*nextFn)(unsigned char *, unsigned char*) = NULL;
+ unsigned char *(*nextFn)(unsigned char *, unsigned char *) = NULL;
int offset_update = 0;
if (!iter->zi) {
@@ -842,7 +844,7 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
quicklistNode *n;
unsigned long long accum = 0;
unsigned long long index;
- int forward = idx < 0 ? 0 : 1; /* < 0 -> reverse, positive -> forward */
+ int forward = idx < 0 ? 0 : 1; /* < 0 -> reverse, 0+ -> forward */
initEntry(entry);
entry->quicklist = quicklist;
@@ -862,7 +864,8 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
if ((accum + n->count) > index) {
break;
} else {
- D("Skipping over (%p) %u at accum %lld", n, n->count, accum);
+ D("Skipping over (%p) %u at accum %lld", (void *)n, n->count,
+ accum);
accum += n->count;
n = forward ? n->next : n->prev;
}
@@ -871,8 +874,8 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
if (!n)
return 0;
- D("Found node: %p at accum %llu, idx %llu, sub+ %llu, sub- %llu", n, accum,
- index, index - accum, (-index) - 1 + accum);
+ D("Found node: %p at accum %llu, idx %llu, sub+ %llu, sub- %llu", (void *)n,
+ accum, index, index - accum, (-index) - 1 + accum);
entry->node = n;
if (forward) {
@@ -906,7 +909,7 @@ void quicklistRotate(quicklist *quicklist, const size_t fill) {
/* If value found is NULL, then ziplistGet populated longval instead */
if (!value) {
/* Write the longval as a string so we can re-add it */
- int wrote = snprintf(longstr, sizeof(longstr), "%lld", longval);
+ int wrote = ll2string(longstr, sizeof(longstr), longval);
value = (unsigned char *)longval;
sz = wrote;
}
@@ -1017,15 +1020,16 @@ void quicklistPush(quicklist *quicklist, const size_t fill, void *value,
/* The rest of this file is test cases and test helpers. */
#ifdef REDIS_TEST
-#include <stdio.h>
#include <stdint.h>
#define assert(_e) \
- ((_e) ? (void)0 : (_assert(#_e, __FILE__, __LINE__), exit(1)))
-static void _assert(char *estr, char *file, int line) {
- printf("\n\n=== ASSERTION FAILED ===\n");
- printf("==> %s:%d '%s' is not true\n", file, line, estr);
-}
+ do { \
+ if (!(_e)) { \
+ printf("\n\n=== ASSERTION FAILED ===\n"); \
+ printf("==> %s:%d '%s' is not true\n", __FILE__, __LINE__, #_e); \
+ err++; \
+ } \
+ } while (0)
#define yell(str, ...) printf("ERROR! " str "\n\n", __VA_ARGS__)
@@ -1718,7 +1722,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 5000; i++) {
nums[i] = -5157318210846258176 + i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, F, num, sz);
}
quicklistPushTail(ql, F, "xxxxxxxxxxxxxxxxxxxx", 20);
@@ -1876,7 +1880,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 760; i++) {
nums[i] = -5157318210846258176 + i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, f, num, sz);
}
@@ -1901,7 +1905,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 32; i++) {
nums[i] = -5157318210846258176 + i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, f, num, sz);
}
if (f == 32)
@@ -1929,7 +1933,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 33; i++) {
nums[i] = i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, f, num, sz);
}
if (f == 32)
@@ -1973,7 +1977,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 33; i++) {
nums[i] = -5157318210846258176 + i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, f, num, sz);
}
if (f == 32)
@@ -1999,7 +2003,7 @@ int quicklistTest(int argc, char *argv[]) {
long long nums[5000];
for (int i = 0; i < 33; i++) {
nums[i] = -5157318210846258176 + i;
- int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+ int sz = ll2string(num, sizeof(num), nums[i]);
quicklistPushTail(ql, f, num, sz);
}
if (f == 32) |
src/quicklist.c
Outdated
There was a problem hiding this comment.
I tried to create the new operation with __ziplistCascadeUpdate().
It may be useful to rewrite this part. :)
commit
src/quicklist.c
Outdated
There was a problem hiding this comment.
If center and center->prev doesn't merge, center and center->next will definitely not merge?
There was a problem hiding this comment.
Let's draw it! The numbers at the top boxes are the count for that node.
+------------+ +-----------+ +------------+ +-------------+ +------------+
| 1 | | 2 | | 3 | | 4 | | 5 |
| | | | | | | | | |
| prev->prev | | prev | | center | | next | | next->next |
| | | | | | | | | |
| | | | | | | | | |
+------------+ +-----------+ +------------+ +-------------+ +------------+
+------------------+ +--------------------+
| | | |
| | | |
| | | |
|(prev->prev)+prev | | next+(next->next) |
| | | |
| (1 + 2) = 3 | | (4 + 5) = 9 |
| | | |
+------------------+ +--------------------+
(this is new 'prev') (this is new 'next')
+-------------------+
| |
| |
| prev+center |
| |
| (3 + 3) = 6 |
| |
+-------------------+
(this now 'target',
since it potentially
invalidated 'center'
during the merge)
+------------------------+
| |
| |
| target+next |
| |
| (6 + 9) = 15 |
| |
| |
+------------------------+
now all five original nodes
are merged into one node of
length 15So, the case you noted is:
If center and center->prev doesn't merge, center and center->next will definitely not merge?
That's a good point since target is only created if prev is merged. We need to set target = center if the (center+center->prev) doesn't happen. I'll add it.
Thanks!
|
Updated with below diff thanks to sunheehnus at https://github.com/antirez/redis/pull/2143#discussion_r20757598! The fix also caught an invalid check in a test case. Now that we're merging quicklist nodes more properly, we use fewer nodes in a test (26 -> 25) and that's a good thing. diff --git a/src/quicklist.c b/src/quicklist.c
index dbb2dae..c27827b 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -380,6 +380,9 @@ static void _quicklistMergeNodes(quicklist *quicklist, const size_t fill,
if (center->prev && (center->count + center->prev->count) <= fill) {
target = _quicklistZiplistMerge(quicklist, center->prev, center);
center = NULL; /* center could have been deleted, invalidate it. */
+ } else {
+ /* If can't merge here, target still needs to be valid for below. */
+ target = center;
}
/* Use result of center merge to try and merge result with next node. */
@@ -1467,7 +1470,7 @@ int quicklistTest(int argc, char *argv[]) {
quicklistInsertBefore(ql, f, &entry, genstr("abc", i), 32);
}
if (f == 32)
- ql_verify(ql, 26, 750, 32, 20);
+ ql_verify(ql, 25, 750, 32, 20);
quicklistRelease(ql);
}
} |
|
Love the feature and that there are even reviews here. Thank you. Tomorrow I'll be back in Sicily from Bracelona, and Monday I'll review the code as well. |
src/quicklist.c
Outdated
There was a problem hiding this comment.
Forget to free new_node :)
|
Updated: diff --git a/src/quicklist.c b/src/quicklist.c
index 664a6c6..7a2490c 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -393,10 +393,22 @@ static void _quicklistMergeNodes(quicklist *quicklist, const size_t fill,
}
}
-/* Split 'node' at 'offset' into two parts.
+/* Split 'node' into two parts, parameterized by 'offset' and 'after'.
+ *
+ * The 'after' argument controls which quicklistNode gets returned.
+ * If 'after'==1, returned node has elements after 'offset'.
+ * input node keeps elements up to 'offset', including 'offset'.
+ * If 'after'==0, returned node has elements up to 'offset', including 'offset'.
+ * input node keeps elements after 'offset'.
+ *
+ * If 'after'==1, returned node will have elements _after_ 'offset'.
+ * The returned node will have elements [OFFSET+1, END].
+ * The input node keeps elements [0, OFFSET].
+ *
+ * If 'after'==0, returned node will keep elements up to and including 'offset'.
+ * The returned node will have elements [0, OFFSET].
+ * The input node keeps elements [OFFSET+1, END].
*
- * If after==1, then the returned node has elements [OFFSET, END].
- * Otherwise if after==0, then the new node has [0, OFFSET)
* The input node keeps all elements not taken by the returned node.
*
* Returns newly created node or NULL if split not possible. */
@@ -409,8 +421,10 @@ static quicklistNode *_quicklistSplitNode(quicklistNode *node, int offset,
return NULL;
new_node->zl = zmalloc(zl_sz);
- if (!new_node->zl)
+ if (!new_node->zl) {
+ zfree(new_node);
return NULL;
+ }
/* Copy original ziplist so we can split it */
memcpy(new_node->zl, node->zl, zl_sz);Thanks for the ongoing reviews! |
src/quicklist.c
Outdated
There was a problem hiding this comment.
Without fill limitation?
There was a problem hiding this comment.
That's a great observation!
We need to rename quicklistPushTailZiplist(quicklist, ziplist) to something like quicklistAppendValuesFromZiplist(quicklist, fill, ziplist).
Then we just:
- iterate over the existing ziplist
- insert each element from the ziplist into tail of the quicklist (
quicklistPushTail())
- insert each element from the ziplist into tail of the quicklist (
- then free the original ziplist
Adding each value from the old ziplist individually will create nodes with the proper fill for this instance. (this is only used when users restore an old RDB with a native ziplist encoding. it'll help them rebuild their single (maybe large 512+) entry ziplist into a quicklist with a smaller fill factor.)
I'll add it to the todo list.
Freeing our test lists helps keep valgrind output clean
Fill factor now has two options:
- negative (1-5) for size-based ziplist filling
- positive for length-based ziplist filling with implicit size cap.
Negative offsets define ziplist size limits of:
-1: 4k
-2: 8k
-3: 16k
-4: 32k
-5: 64k
Positive offsets now automatically limit their max size to 8k. Any
elements larger than 8k will be in individual nodes.
Positive ziplist fill factors will keep adding elements
to a ziplist until one of:
- ziplist has FILL number of elements
- or -
- ziplist grows above our ziplist max size (currently 8k)
When using positive fill factors, if you insert a large
element (over 8k), that element will automatically allocate
an individual quicklist node with one element and no other elements will be
in the same ziplist inside that quicklist node.
When using negative fill factors, elements up to the size
limit can be added to one quicklist node. If an element
is added larger than the max ziplist size, that element
will be allocated an individual ziplist in a new quicklist node.
Tests also updated to start testing at fill factor -5.
Use the existing memory space for an SDS to convert it to a regular character buffer so we don't need to allocate duplicate space just to extract a usable buffer for native operations.
This saves us an unnecessary zmalloc, memcpy, and two frees.
Turns out it's a huge improvement during save/reload/migrate/restore because, with compression enabled, we're compressing 4k or 8k chunks of data consisting of multiple elements in one ziplist instead of compressing series of smaller individual elements.
Previously, the old test ran 5,000 loops and used about 500k. With quicklist, storing those same 5,000 loops takes up 24k, so the "large value check" failed! This increases the test to 20,000 loops which makes the object dump 96k.
We trust zmalloc to kill the whole process on memory failure
Added field 'ql_nodes' and 'ql_avg_per_node'. ql_nodes is the number of quicklist nodes in the quicklist. ql_avg_node is the average fill level in each quicklist node. (LLEN / QL_NODES) Sample output: 127.0.0.1:6379> DEBUG object b Value at:0x7fa42bf2fed0 refcount:1 encoding:quicklist serializedlength:18489 lru:8983768 lru_seconds_idle:3 ql_nodes:430 ql_avg_per_node:511.73 127.0.0.1:6379> llen b (integer) 220044
Let user set how many nodes to *not* compress.
We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.
Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
- (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
- ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
- ("skip 3 nodes...")
etc.
This also:
- updates RDB storage to use native quicklist compression (if node is
already compressed) instead of uncompressing, generating the RDB string,
then re-compressing the quicklist node.
- internalizes the "fill" parameter for the quicklist so we don't
need to pass it to _every_ function. Now it's just a property of
the list.
- allows a runtime-configurable compression option, so we can
expose a compresion parameter in the configuration file if people
want to trade slight request-per-second performance for up to 90%+
memory savings in some situations.
- updates the quicklist tests to do multiple passes: 200k+ tests now.
Small fixes due to a new version of clang-format (it's less crazy than the older version).
Actually makes a noticeable difference. Branch hints were selected based on profiler hotspots.
This removes: - list-max-ziplist-entries - list-max-ziplist-value This adds: - list-max-ziplist-size - list-compress-depth Also updates config file with new sections and updates tests to use quicklist settings instead of old list settings.
Adds: ql_compressed (boolean, 1 if compression enabled for list, 0 otherwise) Adds: ql_uncompressed_size (actual uncompressed size of all quicklistNodes) Adds: ql_ziplist_max (quicklist max ziplist fill factor) Compression ratio of the list is then ql_uncompressed_size / serializedlength We report ql_uncompressed_size for all quicklists because serializedlength is a _compressed_ representation anyway. Sample output from a large list: 127.0.0.1:6379> llen abc (integer) 38370061 127.0.0.1:6379> debug object abc Value at:0x7ff97b51d140 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718164 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:0 ql_uncompressed_size:1643187761 (1.36s) The 1.36s result time is because rdbSavedObjectLen() is serializing the object, not because of any new stats reporting. If we run DEBUG OBJECT on a compressed list, DEBUG OBJECT takes almost *zero* time because rdbSavedObjectLen() reuses already-compressed ziplists: 127.0.0.1:6379> debug object abc Value at:0x7fe5c5800040 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718109 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:1 ql_uncompressed_size:1643187761
This also defines REDIS_STATIC='' for building everything inside src/ and everything inside deps/lua/.
|
Hello Matt, in order to merge I'm doing my changes to RDB here: I see that we diverged a bit, you mostly changed spacing and did a force update: trivial to rebase my changes upon yours, but please for future change add commits instead of force-updating so that it will be simpler to merge. Thx! |
Quicklist (linked list + ziplist)
|
Merged! Applying my commits from the other branch to master as well, and RDB v7 is done... |
Sorry for the confusion! Thanks for getting this added (and thanks for the new K/V RDB format)!
That's the problem with distributed revision control (and github PRs)... the code lives in the author's repo and can be updated at any time. If I did only add new commits after this PR was first created, it would have 100 cleanup commits sitting here. :) The real problem is git has a bad default interface. The default should be So, if Or, worst case,
|
|
👍 |
|
It's currently in the |
This replaces individual ziplist vs. linkedlist representations for Redis list operations. Big thanks for all the reviews and feedback from everybody in https://github.com/antirez/redis/pull/2143
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in redis#285 ziplist - hash: replaced by listpack in redis#8887 ziplist - zset: replaced by listpack in redis#9366 ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in #285 ziplist - hash: replaced by listpack in #8887 ziplist - zset: replaced by listpack in #9366 ziplist - list: replaced by quicklist (listpack) in #2143 / #9740 Moved the location of ziplist.h in the server.c
Remove some dead code in object.c, ziplist is no longer used in 7.0 Some backgrounds: zipmap - hash: replaced by ziplist in redis#285 ziplist - hash: replaced by listpack in redis#8887 ziplist - zset: replaced by listpack in redis#9366 ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740 Moved the location of ziplist.h in the server.c

What
My implementation of what Twitter has previously described as ziplists in a linked list for better memory efficiency.
Longer writeup with improvement graphs at https://matt.sh/redis-quicklist
How
Redis currently uses two list encodings. This replace both. The code in this PR/branch removes the two current list encodings of
REDIS_ENCODING_ZIPLISTandREDIS_ENCODING_LINKEDLISTand replaces them withREDIS_ENCODING_QUICKLIST.The only parameter for the new list type is the existing
list-max-ziplist-entriesfield which tells Redis how many elements to allow per ziplist node before creating a new node. The other param oflist-max-ziplist-valueis now deleted.The RDB format remains the same. A quicklist is written to the RDB as a linked list, so the RDB is still usable by prior Redis versions. The existing RDB format for saved ziplists and linkedlists both get converted to quicklists when the RDB is loaded. Quicklists are saved as RDB linkedlist types so on reload, Redis will re-create a new compact quicklist with full interior ziplists even if the saved quicklist had not-full interior ziplists.
Tests
All test pass for me. You should try to break it too. There is one spurious timing error in Redis tests (waiting too long for RDB loading), but I don't think it's new. (?)
The quicklist code also includes ~1,000 lines of internal tests.
I added a new compile parameter of "REDIS_TEST" — if you compile with
-DREDIS_TESTyou can run Redis code tests withredis-server test <module>. So, to test quicklist with code-level tests, compile with that define and runredis-server test quicklist. Also supported: ziplist, util, intset, and more.Extras
Also includes
DEBUG JEMALLOC INFOto get internal jemalloc stats.Figuring out what any of that means is left as an exercise for the reader.
Future Work
This approach could be applied to some other areas too. Redis currently converts hashes from zipmaps to full hash tables when they get too big, but we could make a hash table of zipmaps to save a lot of space too.