Skip to content

[BUG] Disk Based Replication Causing OOM in Master (v6.0.5) #7717

@ganeshkumarganesan

Description

@ganeshkumarganesan

The Redis master starting the BGSAVE on disk, Within a few mins, the default "client-output-buffer-limit" is reached and slave client connection (psync) getting closed. But the master didn't kill the RDB saving (child) process. So, the Copy-On-Write buffer kept accumulating. Which led to the OOM in master. Attaching the logs below.

Master Log

11292:M 26 Aug 2020 22:07:58.489 * Starting BGSAVE for SYNC with target: disk
11292:M 26 Aug 2020 22:07:58.543 * Background saving started by pid 15513
11292:M 26 Aug 2020 22:09:07.780 # Client id=107 addr=x.x.x.x:55004 fd=28 name= age=69 idle=69 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16304 oll=13092 omem=268438368 events=r cmd=psync user=default scheduled to be closed ASAP for overcoming of output buffer limits.
11292:M 26 Aug 2020 22:09:07.787 # Connection with replica x.x.x.x:6379 lost.
11292:M 26 Aug 2020 22:14:08.749 * Replica x.x.x.x:6379 asks for synchronization
11292:M 26 Aug 2020 22:14:08.757 * Full resync requested by replica x.x.x.x:6379
11292:M 26 Aug 2020 22:14:08.757 * Can't attach the replica to the current BGSAVE. Waiting for next BGSAVE for SYNC

Whereas the diskless replication working as expected in 6.0.5. #6866

19538:M 26 Aug 2020 22:27:58.351 * Starting BGSAVE for SYNC with target: replicas sockets
19538:M 26 Aug 2020 22:27:58.407 * Background RDB transfer started by pid 21474
19538:M 26 Aug 2020 22:29:28.268 # Client id=53 addr=x.x.x.x:55408 fd=28 name= age=96 idle=96 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16362 oll=8389 omem=172008056 events=r cmd=psync user=default scheduled to be closed ASAP for overcoming of output buffer limits.
19538:M 26 Aug 2020 22:29:28.281 # Connection with replica x.x.x.x:6379 lost.
19538:M 26 Aug 2020 22:29:29.076 # Diskless rdb transfer, last replica dropped, killing fork child.
21474:signal-handler (1598461169) Received SIGUSR1 in child, exiting now.
19538:M 26 Aug 2020 22:29:29.481 # Background transfer terminated by signal 10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions