Skip to content

Optimize __ziplistCascadeUpdate algorithm#6886

Merged
oranagra merged 13 commits intoredis:unstablefrom
neal-zhu:unstable
Aug 28, 2020
Merged

Optimize __ziplistCascadeUpdate algorithm#6886
oranagra merged 13 commits intoredis:unstablefrom
neal-zhu:unstable

Conversation

@neal-zhu
Copy link
Contributor

@neal-zhu neal-zhu commented Feb 13, 2020

Dear sir:
Current implementation of __ziplistCascadeUpdate is of O(n^2) time complexity. I try to optimize it with an algorithm of O(n) time complexity.
Here is the basic idea:

  1. loop ziplist to find how many extra bytes need to update the whole list
  2. resize ziplist once and do update job from back to front

This PR has passed make test.

@neal-zhu
Copy link
Contributor Author

Looking forward to your reply!

@antirez
Copy link
Contributor

antirez commented Feb 20, 2020

@neal-zhu thank you, this is interesting. However while the worst case of the algorithm is O(N^2), what is the normal time it is spending there? In other words, are you able to measure a real improvement here if you test Redis itself in a common workload? Thanks.

@neal-zhu
Copy link
Contributor Author

Sorry, I have not done any tests under the common workload yet. I just ran make bench
and of course, there was no big difference compared with the original implementation because normally the worst case won't happen.
I know this optimization will not make a difference in most cases, but I think it won't be a bad idea to replace it with an O(n) algorithm. So I just tried to optimize it while I was reading the source code.
BTW: I'm really excited to receive your reply!

@antirez
Copy link
Contributor

antirez commented Feb 27, 2020

@neal-zhu you are totally right that the O(n) case is not good! The only reason I want to think a bit more about this, is that ziplist.c is going to be totally replaced in the future by listpack.c, so not sure if it is worth to change anything and potentially expose ourselves to some new bug. I'll check it more in depth. Thanks!

@neal-zhu
Copy link
Contributor Author

Glad to hear that

@neal-zhu
Copy link
Contributor Author

Looking forward to your reply!

add log

set to 100000
@neal-zhu
Copy link
Contributor Author

neal-zhu commented Apr 1, 2020

Dear sir:
I add a basic benchmark test for optimized __ziplistCascadeUpdate. The result of the un-optimized version:

benchmark __ziplistCascadeUpdat elapsed 68(seconds)

while after the optimization result is

benchmark __ziplistCascadeUpdat elapsed 0

Please take a look at it when you are available @antirez

@neal-zhu
Copy link
Contributor Author

neal-zhu commented Apr 8, 2020

Performance has been improved a lot under specific situations. I think it can help.

@neal-zhu
Copy link
Contributor Author

neal-zhu commented Aug 3, 2020

Would there be anyone to take a look at this pr please

@oranagra oranagra self-assigned this Aug 3, 2020
@neal-zhu
Copy link
Contributor Author

neal-zhu commented Aug 7, 2020

@oranagra hi, do we have any update?

@neal-zhu
Copy link
Contributor Author

@oranagra kindly ping 😁

@oranagra
Copy link
Member

@neal-zhu sorry for not responding to your earlier messages. i intend to review this soon since i'm working on other areas in ziplist at the moment which may de-stabilize it anyway, so i hope to get both shipped into the same redis version.

@oranagra oranagra added this to the Redis 6.2 milestone Aug 13, 2020
@neal-zhu
Copy link
Contributor Author

Great! Thanks for your responding!

@oranagra
Copy link
Member

@neal-zhu I reviewed the code and i'd like to ask for a few changes:

  1. Please add some comments. before each block of a few lines, add a short line that describes what the block does. specifically before any loop or if scope. this will make the code much easier to review and edit.
  2. Other places in this file clearly make an effort to declare all the variables at the top. although redis does violate this rule in a few places, but this function stands out when compared with others in that file.
  3. Correct me if i'm wrong, but the building of that list and all the heap allocations that come with it are not really necessary, the ziplist can be "walked" tail to head, and if we keep track of some delta to skip after each memmove, i think we can do all of that without the additional list. doing so may improve efficiency further since it'll reduce all the heap allocations and improve cache locality.

@neal-zhu
Copy link
Contributor Author

neal-zhu commented Aug 17, 2020

@oranagra hi, about the third question, do you mean entries could be eliminated? I used it mainly for avoiding decode zipentry twice(for the same offset). And yes it can be eliminated.

@oranagra
Copy link
Member

@neal-zhu i'm predicting that the extra heap allocations (2 per items), and reads from them (2 indirections and possible cache misses), make more damage to performance than decoding the header again (considering that memory is being accessed anyway, so it should be in the cache)

@neal-zhu
Copy link
Contributor Author

@oranagra Hi, please take a look at the newest commit.

@oranagra
Copy link
Member

@neal-zhu thank you. i feel this is better. how do you feel?
did you happen to compare the performance to the previous implementation?
can you please run both the redis test suite, and the ziplist test (the one you modified) with valgrind.
for the redis test suite: make distclean; make valgrind; ./runtest --valgrind

@neal-zhu
Copy link
Contributor Author

@oranagra Sorry but it seems there is something wrong when I run ./runtest --valgrind both for the previous and the current implementation so I assume it has nothing to do with the change.
Here's the error output.

Testing unit/type/stream-cgroups
[err]: Can't start the Redis server
CONFIGURATION:always-show-logo yes
notify-keyspace-events KEA
daemonize no
pidfile /var/run/redis.pid
port 27611
timeout 0
bind 127.0.0.1
loglevel verbose
logfile ''
databases 16
latency-monitor-threshold 1
save 60 10000
rdbcompression yes
dbfilename dump.rdb
dir ./tests/tmp/server.90913.1
slave-serve-stale-data yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
activerehashing yes
unixsocket /Users/lovecc/SourceCode/remote-redis/tests/tmp/server.90913.1/socket
set-max-intset-entries 512
ERROR:90937:C 19 Aug 2020 07:45:40.782 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
90937:C 19 Aug 2020 07:45:41.042 # Redis version=999.999.999, bits=64, commit=0f741a9e, modified=1, pid=90937, just started
90937:C 19 Aug 2020 07:45:41.044 # Configuration loaded
90937:M 19 Aug 2020 07:45:41.833 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
90937:M 19 Aug 2020 07:45:41.835 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
90937:M 19 Aug 2020 07:45:41.836 # Current maximum open files is 4864. maxclients has been redu
ced to 4832 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 999.999.999 (0f741a9e/1) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 27611
 |    `-._   `._    /     _.-'    |     PID: 90937
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

90937:M 19 Aug 2020 07:45:44.754 # Server initialized
90937:M 19 Aug 2020 07:45:45.014 * Ready to accept connections
==90937== Memcheck, a memory error detector
==90937== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==90937== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==90937== Command: src/redis-server ./tests/tmp/redis.conf.90913.2
==90937==
--90937-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--90937-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--90937-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--90937-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
==90937== Thread 3:
==90937== Invalid read of size 4
==90937==    at 0x10082F5BA: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10082F50C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10082EBF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==90937==
==90937== Invalid read of size 8
==90937==    at 0x10082D992: pthread_getspecific (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10000BF05: serverLogRaw (server.c:1049)
==90937==    by 0x100071F85: bugReportStart (debug.c:923)
==90937==    by 0x100073217: sigsegvHandler (debug.c:1634)
==90937==    by 0x258057DA5: ???
==90937==    by 0x10082F50C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10082EBF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==  Address 0x50 is not stack'd, malloc'd or (recently) free'd
==90937==
==90937==
==90937== Process terminating with default action of signal 11 (SIGSEGV)
==90937==  Access not within mapped region at address 0x50
==90937==    at 0x10082D992: pthread_getspecific (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10000BF05: serverLogRaw (server.c:1049)
==90937==    by 0x100071F85: bugReportStart (debug.c:923)
==90937==    by 0x100073217: sigsegvHandler (debug.c:1634)
==90937==    by 0x258057DA5: ???
==90937==    by 0x10082F50C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==    by 0x10082EBF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==90937==  If you believe this happened as a result of a stack
==90937==  overflow in your program's main thread (unlikely but
==90937==  possible), you can try to increase the size of the
==90937==  main thread stack using the --main-stacksize= flag.
==90937==  The main thread stack size used in this run was 8388608.
--90937:0:schedule VG_(sema_down): read returned -4
==90937==
==90937== HEAP SUMMARY:
==90937==     in use at exit: 767,574 bytes in 12,724 blocks
==90937==   total heap usage: 17,259 allocs, 4,535 frees, 1,179,058 bytes allocated
==90937==
==90937== LEAK SUMMARY:
==90937==    definitely lost: 0 bytes in 0 blocks
==90937==    indirectly lost: 0 bytes in 0 blocks
==90937==      possibly lost: 10,679 bytes in 555 blocks
==90937==    still reachable: 738,552 bytes in 12,014 blocks
==90937==         suppressed: 18,343 bytes in 155 blocks
==90937== Reachable blocks (those to which a pointer was found) are not shown.
==90937== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==90937==
==90937== For lists of detected and suppressed errors, rerun with: -s
==90937== ERROR SUMMARY: 57 errors from 57 contexts (suppressed: 12 from 12)
[1/57 done]: unit/type/set (142 seconds)

And here's the output of the ziplist test(stress test and benchmark of cascadeupdate):

# new
benchmark __ziplistCascadeUpdate elapsed 0(s)
Stress with variable ziplist size:
List size:        0, bytes:       11, 100000x push+pop (HEAD):  66425 usec
List size:      256, bytes:     1547, 100000x push+pop (HEAD):  42937 usec
List size:      512, bytes:     3083, 100000x push+pop (HEAD):  53906 usec
List size:      768, bytes:     4619, 100000x push+pop (HEAD):  52631 usec
List size:     1024, bytes:     6155, 100000x push+pop (HEAD):  66011 usec
List size:     1280, bytes:     7691, 100000x push+pop (HEAD):  86290 usec
List size:     1536, bytes:     9227, 100000x push+pop (HEAD):  78539 usec
List size:     1792, bytes:    10763, 100000x push+pop (HEAD):  74843 usec
List size:     2048, bytes:    12299, 100000x push+pop (HEAD):  72499 usec
List size:     2304, bytes:    13835, 100000x push+pop (HEAD):  81760 usec
List size:     2560, bytes:    15371, 100000x push+pop (HEAD):  93621 usec
List size:     2816, bytes:    16907, 100000x push+pop (HEAD): 113514 usec
List size:     3072, bytes:    18443, 100000x push+pop (HEAD): 130362 usec
List size:     3328, bytes:    19979, 100000x push+pop (HEAD): 139929 usec
List size:     3584, bytes:    21515, 100000x push+pop (HEAD): 138629 usec
List size:     3840, bytes:    23051, 100000x push+pop (HEAD): 149879 usec
List size:     4096, bytes:    24587, 100000x push+pop (HEAD): 138361 usec
List size:     4352, bytes:    26123, 100000x push+pop (HEAD): 140045 usec
List size:     4608, bytes:    27659, 100000x push+pop (HEAD): 205197 usec
List size:     4864, bytes:    29195, 100000x push+pop (HEAD): 178800 usec
List size:     5120, bytes:    30731, 100000x push+pop (HEAD): 182027 usec
List size:     5376, bytes:    32267, 100000x push+pop (HEAD): 182718 usec
List size:     5632, bytes:    33803, 100000x push+pop (HEAD): 190930 usec
List size:     5888, bytes:    35339, 100000x push+pop (HEAD): 204771 usec
List size:     6144, bytes:    36875, 100000x push+pop (HEAD): 204249 usec
List size:     6400, bytes:    38411, 100000x push+pop (HEAD): 218825 usec
List size:     6656, bytes:    39947, 100000x push+pop (HEAD): 231680 usec
List size:     6912, bytes:    41483, 100000x push+pop (HEAD): 225043 usec
List size:     7168, bytes:    43019, 100000x push+pop (HEAD): 236982 usec
List size:     7424, bytes:    44555, 100000x push+pop (HEAD): 234464 usec
List size:     7680, bytes:    46091, 100000x push+pop (HEAD): 236968 usec
List size:     7936, bytes:    47627, 100000x push+pop (HEAD): 266071 usec
List size:     8192, bytes:    49163, 100000x push+pop (HEAD): 261158 usec
List size:     8448, bytes:    50699, 100000x push+pop (HEAD): 295019 usec
List size:     8704, bytes:    52235, 100000x push+pop (HEAD): 292790 usec
List size:     8960, bytes:    53771, 100000x push+pop (HEAD): 278995 usec
List size:     9216, bytes:    55307, 100000x push+pop (HEAD): 322599 usec
List size:     9472, bytes:    56843, 100000x push+pop (HEAD): 325445 usec
List size:     9728, bytes:    58379, 100000x push+pop (HEAD): 320267 usec
List size:     9984, bytes:    59915, 100000x push+pop (HEAD): 337838 usec
List size:    10240, bytes:    61451, 100000x push+pop (HEAD): 342899 usec
List size:    10496, bytes:    62987, 100000x push+pop (HEAD): 350127 usec
List size:    10752, bytes:    64523, 100000x push+pop (HEAD): 370496 usec
List size:    11008, bytes:    66059, 100000x push+pop (HEAD): 377070 usec
List size:    11264, bytes:    67595, 100000x push+pop (HEAD): 380798 usec
List size:    11520, bytes:    69131, 100000x push+pop (HEAD): 391745 usec
List size:    11776, bytes:    70667, 100000x push+pop (HEAD): 368513 usec
List size:    12032, bytes:    72203, 100000x push+pop (HEAD): 376149 usec
List size:    12288, bytes:    73739, 100000x push+pop (HEAD): 435755 usec
List size:    12544, bytes:    75275, 100000x push+pop (HEAD): 427156 usec
List size:    12800, bytes:    76811, 100000x push+pop (HEAD): 418141 usec
List size:    13056, bytes:    78347, 100000x push+pop (HEAD): 445604 usec
List size:    13312, bytes:    79883, 100000x push+pop (HEAD): 423353 usec
List size:    13568, bytes:    81419, 100000x push+pop (HEAD): 455115 usec
List size:    13824, bytes:    82955, 100000x push+pop (HEAD): 461052 usec
List size:    14080, bytes:    84491, 100000x push+pop (HEAD): 458316 usec
List size:    14336, bytes:    86027, 100000x push+pop (HEAD): 446117 usec
List size:    14592, bytes:    87563, 100000x push+pop (HEAD): 521245 usec
List size:    14848, bytes:    89099, 100000x push+pop (HEAD): 497121 usec
List size:    15104, bytes:    90635, 100000x push+pop (HEAD): 514916 usec
List size:    15360, bytes:    92171, 100000x push+pop (HEAD): 526222 usec
List size:    15616, bytes:    93707, 100000x push+pop (HEAD): 548707 usec
List size:    15872, bytes:    95243, 100000x push+pop (HEAD): 565255 usec
List size:    16128, bytes:    96779, 100000x push+pop (HEAD): 546814 usec
List size:        0, bytes:       11, 100000x push+pop (TAIL):  81725 usec
List size:      256, bytes:     1547, 100000x push+pop (TAIL):  47419 usec
List size:      512, bytes:     3083, 100000x push+pop (TAIL):  45277 usec
List size:      768, bytes:     4619, 100000x push+pop (TAIL):  49850 usec
List size:     1024, bytes:     6155, 100000x push+pop (TAIL):  52053 usec
List size:     1280, bytes:     7691, 100000x push+pop (TAIL):  48119 usec
List size:     1536, bytes:     9227, 100000x push+pop (TAIL):  48766 usec
List size:     1792, bytes:    10763, 100000x push+pop (TAIL):  56170 usec
List size:     2048, bytes:    12299, 100000x push+pop (TAIL):  59973 usec
List size:     2304, bytes:    13835, 100000x push+pop (TAIL):  58729 usec
List size:     2560, bytes:    15371, 100000x push+pop (TAIL):  60901 usec
List size:     2816, bytes:    16907, 100000x push+pop (TAIL):  78637 usec
List size:     3072, bytes:    18443, 100000x push+pop (TAIL):  83046 usec
List size:     3328, bytes:    19979, 100000x push+pop (TAIL):  90431 usec
List size:     3584, bytes:    21515, 100000x push+pop (TAIL):  97437 usec
List size:     3840, bytes:    23051, 100000x push+pop (TAIL):  98961 usec
List size:     4096, bytes:    24587, 100000x push+pop (TAIL):  99948 usec
List size:     4352, bytes:    26123, 100000x push+pop (TAIL): 107399 usec
List size:     4608, bytes:    27659, 100000x push+pop (TAIL): 111884 usec
List size:     4864, bytes:    29195, 100000x push+pop (TAIL): 111836 usec
List size:     5120, bytes:    30731, 100000x push+pop (TAIL): 118533 usec
List size:     5376, bytes:    32267, 100000x push+pop (TAIL): 118970 usec
List size:     5632, bytes:    33803, 100000x push+pop (TAIL): 129452 usec
List size:     5888, bytes:    35339, 100000x push+pop (TAIL): 136264 usec
List size:     6144, bytes:    36875, 100000x push+pop (TAIL): 143802 usec
List size:     6400, bytes:    38411, 100000x push+pop (TAIL): 159126 usec
List size:     6656, bytes:    39947, 100000x push+pop (TAIL): 166625 usec
List size:     6912, bytes:    41483, 100000x push+pop (TAIL): 168556 usec
List size:     7168, bytes:    43019, 100000x push+pop (TAIL): 168186 usec
List size:     7424, bytes:    44555, 100000x push+pop (TAIL): 157647 usec
List size:     7680, bytes:    46091, 100000x push+pop (TAIL): 188975 usec
List size:     7936, bytes:    47627, 100000x push+pop (TAIL): 187957 usec
List size:     8192, bytes:    49163, 100000x push+pop (TAIL): 184847 usec
List size:     8448, bytes:    50699, 100000x push+pop (TAIL): 199119 usec
List size:     8704, bytes:    52235, 100000x push+pop (TAIL): 193598 usec
List size:     8960, bytes:    53771, 100000x push+pop (TAIL): 201206 usec
List size:     9216, bytes:    55307, 100000x push+pop (TAIL): 206412 usec
List size:     9472, bytes:    56843, 100000x push+pop (TAIL): 213090 usec
List size:     9728, bytes:    58379, 100000x push+pop (TAIL): 209633 usec
List size:     9984, bytes:    59915, 100000x push+pop (TAIL): 219765 usec
List size:    10240, bytes:    61451, 100000x push+pop (TAIL): 231658 usec
List size:    10496, bytes:    62987, 100000x push+pop (TAIL): 240560 usec
List size:    10752, bytes:    64523, 100000x push+pop (TAIL): 249407 usec
List size:    11008, bytes:    66059, 100000x push+pop (TAIL): 226956 usec
List size:    11264, bytes:    67595, 100000x push+pop (TAIL): 246359 usec
List size:    11520, bytes:    69131, 100000x push+pop (TAIL): 252138 usec
List size:    11776, bytes:    70667, 100000x push+pop (TAIL): 252453 usec
List size:    12032, bytes:    72203, 100000x push+pop (TAIL): 260630 usec
List size:    12288, bytes:    73739, 100000x push+pop (TAIL): 265188 usec
List size:    12544, bytes:    75275, 100000x push+pop (TAIL): 257830 usec
List size:    12800, bytes:    76811, 100000x push+pop (TAIL): 276850 usec
List size:    13056, bytes:    78347, 100000x push+pop (TAIL): 270987 usec
List size:    13312, bytes:    79883, 100000x push+pop (TAIL): 301213 usec
List size:    13568, bytes:    81419, 100000x push+pop (TAIL): 283456 usec
List size:    13824, bytes:    82955, 100000x push+pop (TAIL): 284872 usec
List size:    14080, bytes:    84491, 100000x push+pop (TAIL): 371349 usec
List size:    14336, bytes:    86027, 100000x push+pop (TAIL): 318616 usec
List size:    14592, bytes:    87563, 100000x push+pop (TAIL): 348357 usec
List size:    14848, bytes:    89099, 100000x push+pop (TAIL): 329863 usec
List size:    15104, bytes:    90635, 100000x push+pop (TAIL): 325542 usec
List size:    15360, bytes:    92171, 100000x push+pop (TAIL): 344338 usec
List size:    15616, bytes:    93707, 100000x push+pop (TAIL): 364104 usec
List size:    15872, bytes:    95243, 100000x push+pop (TAIL): 324645 usec
List size:    16128, bytes:    96779, 100000x push+pop (TAIL): 325645 usec
# old
benchmark __ziplistCascadeUpdate elapsed 110(s)
Stress with variable ziplist size:
List size:        0, bytes:       11, 100000x push+pop (HEAD):  69590 usec
List size:      256, bytes:     1547, 100000x push+pop (HEAD):  53067 usec
List size:      512, bytes:     3083, 100000x push+pop (HEAD):  59400 usec
List size:      768, bytes:     4619, 100000x push+pop (HEAD):  60519 usec
List size:     1024, bytes:     6155, 100000x push+pop (HEAD):  71448 usec
List size:     1280, bytes:     7691, 100000x push+pop (HEAD):  67849 usec
List size:     1536, bytes:     9227, 100000x push+pop (HEAD):  69832 usec
List size:     1792, bytes:    10763, 100000x push+pop (HEAD): 116685 usec
List size:     2048, bytes:    12299, 100000x push+pop (HEAD):  99763 usec
List size:     2304, bytes:    13835, 100000x push+pop (HEAD):  98430 usec
List size:     2560, bytes:    15371, 100000x push+pop (HEAD):  96804 usec
List size:     2816, bytes:    16907, 100000x push+pop (HEAD): 115557 usec
List size:     3072, bytes:    18443, 100000x push+pop (HEAD): 116645 usec
List size:     3328, bytes:    19979, 100000x push+pop (HEAD): 127602 usec
List size:     3584, bytes:    21515, 100000x push+pop (HEAD): 131082 usec
List size:     3840, bytes:    23051, 100000x push+pop (HEAD): 140100 usec
List size:     4096, bytes:    24587, 100000x push+pop (HEAD): 150294 usec
List size:     4352, bytes:    26123, 100000x push+pop (HEAD): 152073 usec
List size:     4608, bytes:    27659, 100000x push+pop (HEAD): 157152 usec
List size:     4864, bytes:    29195, 100000x push+pop (HEAD): 156953 usec
List size:     5120, bytes:    30731, 100000x push+pop (HEAD): 172935 usec
List size:     5376, bytes:    32267, 100000x push+pop (HEAD): 177044 usec
List size:     5632, bytes:    33803, 100000x push+pop (HEAD): 203372 usec
List size:     5888, bytes:    35339, 100000x push+pop (HEAD): 201301 usec
List size:     6144, bytes:    36875, 100000x push+pop (HEAD): 290670 usec
List size:     6400, bytes:    38411, 100000x push+pop (HEAD): 344198 usec
List size:     6656, bytes:    39947, 100000x push+pop (HEAD): 249557 usec
List size:     6912, bytes:    41483, 100000x push+pop (HEAD): 229199 usec
List size:     7168, bytes:    43019, 100000x push+pop (HEAD): 264791 usec
List size:     7424, bytes:    44555, 100000x push+pop (HEAD): 246285 usec
List size:     7680, bytes:    46091, 100000x push+pop (HEAD): 248869 usec
List size:     7936, bytes:    47627, 100000x push+pop (HEAD): 244107 usec
List size:     8192, bytes:    49163, 100000x push+pop (HEAD): 281472 usec
List size:     8448, bytes:    50699, 100000x push+pop (HEAD): 299974 usec
List size:     8704, bytes:    52235, 100000x push+pop (HEAD): 288898 usec
List size:     8960, bytes:    53771, 100000x push+pop (HEAD): 293631 usec
List size:     9216, bytes:    55307, 100000x push+pop (HEAD): 334021 usec
List size:     9472, bytes:    56843, 100000x push+pop (HEAD): 324113 usec
List size:     9728, bytes:    58379, 100000x push+pop (HEAD): 340302 usec
List size:     9984, bytes:    59915, 100000x push+pop (HEAD): 388277 usec
List size:    10240, bytes:    61451, 100000x push+pop (HEAD): 331109 usec
List size:    10496, bytes:    62987, 100000x push+pop (HEAD): 377906 usec
List size:    10752, bytes:    64523, 100000x push+pop (HEAD): 355446 usec
List size:    11008, bytes:    66059, 100000x push+pop (HEAD): 373285 usec
List size:    11264, bytes:    67595, 100000x push+pop (HEAD): 416693 usec
List size:    11520, bytes:    69131, 100000x push+pop (HEAD): 395189 usec
List size:    11776, bytes:    70667, 100000x push+pop (HEAD): 397131 usec
List size:    12032, bytes:    72203, 100000x push+pop (HEAD): 408967 usec
List size:    12288, bytes:    73739, 100000x push+pop (HEAD): 435046 usec
List size:    12544, bytes:    75275, 100000x push+pop (HEAD): 415978 usec
List size:    12800, bytes:    76811, 100000x push+pop (HEAD): 464672 usec
List size:    13056, bytes:    78347, 100000x push+pop (HEAD): 465276 usec
List size:    13312, bytes:    79883, 100000x push+pop (HEAD): 472333 usec
List size:    13568, bytes:    81419, 100000x push+pop (HEAD): 485695 usec
List size:    13824, bytes:    82955, 100000x push+pop (HEAD): 469131 usec
List size:    14080, bytes:    84491, 100000x push+pop (HEAD): 503441 usec
List size:    14336, bytes:    86027, 100000x push+pop (HEAD): 613167 usec
List size:    14592, bytes:    87563, 100000x push+pop (HEAD): 491637 usec
List size:    14848, bytes:    89099, 100000x push+pop (HEAD): 561232 usec
List size:    15104, bytes:    90635, 100000x push+pop (HEAD): 522780 usec
List size:    15360, bytes:    92171, 100000x push+pop (HEAD): 550180 usec
List size:    15616, bytes:    93707, 100000x push+pop (HEAD): 562919 usec
List size:    15872, bytes:    95243, 100000x push+pop (HEAD): 546814 usec
List size:    16128, bytes:    96779, 100000x push+pop (HEAD): 550483 usec
List size:        0, bytes:       11, 100000x push+pop (TAIL):  61551 usec
List size:      256, bytes:     1547, 100000x push+pop (TAIL):  37145 usec
List size:      512, bytes:     3083, 100000x push+pop (TAIL):  40994 usec
List size:      768, bytes:     4619, 100000x push+pop (TAIL):  44730 usec
List size:     1024, bytes:     6155, 100000x push+pop (TAIL):  51579 usec
List size:     1280, bytes:     7691, 100000x push+pop (TAIL):  53598 usec
List size:     1536, bytes:     9227, 100000x push+pop (TAIL):  52917 usec
List size:     1792, bytes:    10763, 100000x push+pop (TAIL):  49397 usec
List size:     2048, bytes:    12299, 100000x push+pop (TAIL):  62312 usec
List size:     2304, bytes:    13835, 100000x push+pop (TAIL):  67265 usec
List size:     2560, bytes:    15371, 100000x push+pop (TAIL):  72416 usec
List size:     2816, bytes:    16907, 100000x push+pop (TAIL):  87863 usec
List size:     3072, bytes:    18443, 100000x push+pop (TAIL):  89073 usec
List size:     3328, bytes:    19979, 100000x push+pop (TAIL):  94778 usec
List size:     3584, bytes:    21515, 100000x push+pop (TAIL):  96733 usec
List size:     3840, bytes:    23051, 100000x push+pop (TAIL):  84265 usec
List size:     4096, bytes:    24587, 100000x push+pop (TAIL):  88190 usec
List size:     4352, bytes:    26123, 100000x push+pop (TAIL): 108942 usec
List size:     4608, bytes:    27659, 100000x push+pop (TAIL): 133935 usec
List size:     4864, bytes:    29195, 100000x push+pop (TAIL): 129772 usec
List size:     5120, bytes:    30731, 100000x push+pop (TAIL): 120794 usec
List size:     5376, bytes:    32267, 100000x push+pop (TAIL): 140537 usec
List size:     5632, bytes:    33803, 100000x push+pop (TAIL): 128874 usec
List size:     5888, bytes:    35339, 100000x push+pop (TAIL): 152881 usec
List size:     6144, bytes:    36875, 100000x push+pop (TAIL): 148444 usec
List size:     6400, bytes:    38411, 100000x push+pop (TAIL): 156484 usec
List size:     6656, bytes:    39947, 100000x push+pop (TAIL): 155503 usec
List size:     6912, bytes:    41483, 100000x push+pop (TAIL): 175219 usec
List size:     7168, bytes:    43019, 100000x push+pop (TAIL): 171304 usec
List size:     7424, bytes:    44555, 100000x push+pop (TAIL): 169119 usec
List size:     7680, bytes:    46091, 100000x push+pop (TAIL): 177559 usec
List size:     7936, bytes:    47627, 100000x push+pop (TAIL): 189411 usec
List size:     8192, bytes:    49163, 100000x push+pop (TAIL): 183241 usec
List size:     8448, bytes:    50699, 100000x push+pop (TAIL): 187471 usec
List size:     8704, bytes:    52235, 100000x push+pop (TAIL): 214781 usec
List size:     8960, bytes:    53771, 100000x push+pop (TAIL): 214886 usec
List size:     9216, bytes:    55307, 100000x push+pop (TAIL): 218149 usec
List size:     9472, bytes:    56843, 100000x push+pop (TAIL): 212692 usec
List size:     9728, bytes:    58379, 100000x push+pop (TAIL): 212899 usec
List size:     9984, bytes:    59915, 100000x push+pop (TAIL): 223848 usec
List size:    10240, bytes:    61451, 100000x push+pop (TAIL): 228226 usec
List size:    10496, bytes:    62987, 100000x push+pop (TAIL): 219939 usec
List size:    10752, bytes:    64523, 100000x push+pop (TAIL): 237202 usec
List size:    11008, bytes:    66059, 100000x push+pop (TAIL): 226524 usec
List size:    11264, bytes:    67595, 100000x push+pop (TAIL): 248473 usec
List size:    11520, bytes:    69131, 100000x push+pop (TAIL): 295931 usec
List size:    11776, bytes:    70667, 100000x push+pop (TAIL): 265339 usec
List size:    12032, bytes:    72203, 100000x push+pop (TAIL): 262084 usec
List size:    12288, bytes:    73739, 100000x push+pop (TAIL): 285727 usec
List size:    12544, bytes:    75275, 100000x push+pop (TAIL): 283947 usec
List size:    12800, bytes:    76811, 100000x push+pop (TAIL): 288975 usec
List size:    13056, bytes:    78347, 100000x push+pop (TAIL): 287040 usec
List size:    13312, bytes:    79883, 100000x push+pop (TAIL): 291584 usec
List size:    13568, bytes:    81419, 100000x push+pop (TAIL): 297374 usec
List size:    13824, bytes:    82955, 100000x push+pop (TAIL): 297695 usec
List size:    14080, bytes:    84491, 100000x push+pop (TAIL): 299169 usec
List size:    14336, bytes:    86027, 100000x push+pop (TAIL): 293931 usec
List size:    14592, bytes:    87563, 100000x push+pop (TAIL): 310399 usec
List size:    14848, bytes:    89099, 100000x push+pop (TAIL): 331845 usec
List size:    15104, bytes:    90635, 100000x push+pop (TAIL): 319536 usec
List size:    15360, bytes:    92171, 100000x push+pop (TAIL): 336435 usec
List size:    15616, bytes:    93707, 100000x push+pop (TAIL): 327747 usec
List size:    15872, bytes:    95243, 100000x push+pop (TAIL): 327862 usec
List size:    16128, bytes:    96779, 100000x push+pop (TAIL): 341477 usec

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what's the problem with your valgrind.
seems to work for me (still waiting for it to finish, no errors so far)

regarding the benchmark, do i understand correctly that it indicates marginal improvement for most checks (and a small degradation for a few)? could very well be just noise, but at least we know it didn't get worse.

@neal-zhu
Copy link
Contributor Author

@oranagra I think it won't make redis slower, and for the very special case(when we really need to update a lot of entries), it can be much more faster.

@oranagra
Copy link
Member

@neal-zhu i was referring to the last change we did (remove the use of the linked list). that's what your last benchmark result compared, right (old = list based, new = decoding again).

valgrind passed with no errors on both the ziplist test and the test suite.

oranagra
oranagra previously approved these changes Aug 19, 2020
@oranagra oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Aug 19, 2020
@oranagra
Copy link
Member

@redis/core-team anyone else wanna review or comment before i merge this?
it's basically an optimization, instead of multiple reallocs and multiple memmove of the entire tail of the list, it does one realloc and multiple single-entry memmoves.

@neal-zhu
Copy link
Contributor Author

@oranagra Great thanks!

moving it to the end.
measuring just the cascade part, using usec measurement.
@oranagra
Copy link
Member

realized the last benchmark you posted was of the other "Stress" test.
updated the benchmark to be more accurate, and measured the two approaches:

linked list approach:
Done. usec=16613

new approach:
Done. usec=6828

hwware
hwware previously approved these changes Aug 27, 2020
@oranagra oranagra changed the title Optimize __ziplistCascadeUpdate Optimize __ziplistCascadeUpdate algorithm Aug 28, 2020
@oranagra oranagra merged commit ee4a15a into redis:unstable Aug 28, 2020
@oranagra
Copy link
Member

@neal-zhu thank you. merged!
I run the new tests with Valgrind, and it passed.

@neal-zhu
Copy link
Contributor Author

@oranagra my pleasure

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 2, 2020
The previous algorithm is of O(n^2) time complexity.
It would have run through the ziplist entries one by one, each time doing a `realloc` and a
`memmove` (moving the entire tail of the ziplist).

The new algorithm is O(n), it runs over all the records once, computing the size of the `realloc`
needed, then does one `realloc`, and run thought the records again doing many smaller `memmove`s,
each time moving just one record.

So this change reduces many reallocs, and moves each record just once.

Co-authored-by: zhumaohua <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
@oranagra oranagra removed this from the Next minor backlog milestone Oct 19, 2020
@oranagra
Copy link
Member

@neal-zhu i wanna mention your name in the 6.2 release notes. i noticed that your name in github is maohuazhu and there's another github account with that name. maybe you wanna provide a different string?
p.s. the email address with which you signed off your commits is unreachable.

@neal-zhu
Copy link
Contributor Author

neal-zhu commented Dec 14, 2020

Please just use maohuazhu which is my real name. Tanks a lot, it's my honor to contribute to a project like redis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu efficiency state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants