-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
65976:M 15 Mar 2020 22:05:11.079 # Lua slow script detected: still in execution after 5003 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 5cc492ff8069ef954920c191887905108182c43b
=== REDIS BUG REPORT START: Cut & paste starting from here ===
65976:M 15 Mar 2020 22:05:48.563 # === ASSERTION FAILED ===
65976:M 15 Mar 2020 22:05:48.563 # ==> networking.c:1024 'ln != NULL' is not true
65976:M 15 Mar 2020 22:05:48.563 # (forcing SIGSEGV to print the bug report.)
65976:M 15 Mar 2020 22:05:48.563 # Redis 999.999.999 crashed by signal: 11
65976:M 15 Mar 2020 22:05:48.563 # Crashed running the instruction at: 0x47126a
65976:M 15 Mar 2020 22:05:48.563 # Accessing address: 0xffffffffffffffff
65976:M 15 Mar 2020 22:05:48.563 # Failed assertion: ln != NULL (networking.c:1024)
------ STACK TRACE ------
EIP:
./redis-server *:6379(_serverAssert+0x6a)[0x47126a]
Backtrace:
./redis-server *:6379(logStackTrace+0x29)[0x473739]
./redis-server *:6379(sigsegvHandler+0xa6)[0x473dd6]
/lib64/libpthread.so.0(+0xf620)[0x7f26b5388620]
./redis-server *:6379(_serverAssert+0x6a)[0x47126a]
./redis-server *:6379(unlinkClient+0x229)[0x4412d9]
./redis-server *:6379(freeClient+0xf5)[0x441465]
./redis-server *:6379(freeClientsInAsyncFreeQueue+0x27)[0x441b57]
./redis-server *:6379(beforeSleep+0x8c)[0x431bac]
./redis-server *:6379(aeMain+0x1e)[0x42ee8e]
./redis-server *:6379(main+0x4c4)[0x42b974]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f26b4fcd445]
./redis-server *:6379[0x42bbfd]
We can see that redis crashes because assertion failed, the assertion is in freeClient:
/* Remove from the list of pending reads if needed. */
if (c->flags & CLIENT_PENDING_READ) {
ln = listSearchKey(server.clients_pending_read,c);
serverAssert(ln != NULL);
listDelNode(server.clients_pending_read,ln);
c->flags &= ~CLIENT_PENDING_READ;
}It means the client we try to unlink is in server.clients_pending_read, it's easy to reproduce:
-
Start redis with config open
io-threads-do-reads./redis-server --io-threads 4 --io-threads-do-reads yes
-
Process some commands using redis-benchmark
./redis-benchmark -t get,set -n 10000000 -l --threads 6
-
Send a slow lua script
127.0.0.1:6379> eval "for i=1,100000000,1 do redis.call('get','a') end" 0
-
Kill redis-benchmark after Lua slow script detected and before the script done.
now redis crashes
The root cause is we use function processEventsWhileBlocked to handle events if lua timeout, and the read events are delayed and just put client into server.clients_pending_read's head, but aeProcessEvents is called without flag AE_CALL_AFTER_SLEEP in processEventsWhileBlocked, so the new fired read event cannot be processed, since we call handleClientsWithPendingReadsUsingThreads in afterSleep. And client close connection before lua script done.
Then after lua script done, function handleClientsWithPendingReadsUsingThreads empty the list server.clients_pending_read but does not remove the new fired client's CLIENT_PENDING_READ flag.
Finally freeClient try to remove the client from server.clients_pending_read because it's still with CLIENT_PENDING_READ flag but cannot find it, so the assertion failed.
To fix it, I don't call beforeSleep and handleClientsWithPendingReadsUsingThreads in processEventsWhileBlocked, because I think in this scenario we do not need the io-threads mode, just handle all events in main thread is OK.
So I add a new variable process_while_blocked to check, if we are in processEventsWhileBlocked do not postpone client read.
See commit 606a01d in PR #6989
There are also other scenarios too, like DEBUG RELOAD|LOADAOF or reboot or slave loading data.