-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
The master server starts streaming the RDB into the slave. Within a few mins, the default "client-output-buffer-limit" is reached and slave client connection (psync) getting closed. Even though the master didn't stop the streaming RDB. So, assuming the client buffer keeps on increasing. Which led to the OOM in master. Attaching the logs below.
client-output-buffer-limit "slave 268435456 67108864 60"
Crash Log:
[39075742.256857] Kernel panic - not syncing: Out of memory: compulsory panic_on_oom is enabled
[39075742.257168] CPU: 3 PID: 19170 Comm: redis-server Not tainted 3.10.0-693.21.1.el7.x86_64
Master Log:
19170:S 03 Feb 2020 08:06:07.621 * Replica XXX.XX.XXX.XX:6379 asks for synchronization
19170:S 03 Feb 2020 08:06:07.621 * Unable to partial resync with replica XXX.XX.XXX.XX:6379 for lack of backlog (Replica request was: 76512498416396).
19170:S 03 Feb 2020 08:06:07.621 * Delay next BGSAVE for diskless SYNC
19170:S 03 Feb 2020 08:06:13.009 * Starting BGSAVE for SYNC with target: replicas sockets
19170:S 03 Feb 2020 08:06:13.297 * Background RDB transfer started by pid 13574
19170:S 03 Feb 2020 08:09:49.102 # Client id=750 addr=XXX.XX.XXX.XX:36454 fd=9 name= age=222 idle=222 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16359 oll=4750 omem=97394000 events=r cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
19170:S 03 Feb 2020 08:09:49.199 # Connection with replica XXX.XX.XXX.XX:6379 lost.
13574:C 03 Feb 2020 08:45:28.335 * RDB: 6980 MB of memory used by copy-on-write
19170:S 03 Feb 2020 08:45:29.418 * Background RDB transfer terminated with success
19170:S 03 Feb 2020 08:48:38.513 * Replica XXX.XX.XXX.XX:6379 asks for synchronization
19170:S 03 Feb 2020 08:48:38.513 * Unable to partial resync with replica XXX.XX.XXX.XX:6379 for lack of backlog (Replica request was: 76513089252891).
Slave Log:
7814:S 03 Feb 2020 08:06:07.462 * Connecting to MASTER XXX.XX.XXX.XXX:6379
7814:S 03 Feb 2020 08:06:07.462 * MASTER <-> REPLICA sync started
7814:S 03 Feb 2020 08:06:07.501 * Non blocking connect for SYNC fired the event.
7814:S 03 Feb 2020 08:06:07.539 * Master replied to PING, replication can continue...
7814:S 03 Feb 2020 08:06:07.616 * Trying a partial resynchronization (request 57b6910baf3bec5fdc88673ccafe3e06a815959b:76512498416396).
7814:S 03 Feb 2020 08:06:13.043 * Full resync from master: 57b6910baf3bec5fdc88673ccafe3e06a815959b:76513089252890
7814:S 03 Feb 2020 08:06:13.043 * Discarding previously cached master state.
7814:S 03 Feb 2020 08:06:13.332 * MASTER <-> REPLICA sync: receiving streamed RDB from master
7814:S 03 Feb 2020 08:45:29.078 * MASTER <-> REPLICA sync: Flushing old data
7814:S 03 Feb 2020 08:46:45.679 * MASTER <-> REPLICA sync: Loading DB in memory
7814:S 03 Feb 2020 08:48:38.153 * MASTER <-> REPLICA sync: Finished with success
7814:S 03 Feb 2020 08:48:38.153 # Connection with master lost.