Kill disk-based fork child when all replicas drop and 'save' is not enabled by ShooterIT · Pull Request #7819 · redis/redis

ShooterIT · 2020-09-19T07:17:29Z

If there is no any other slave waiting dumping RDB finished, the current child process need not continue to dump RDB, then we kill it. So child process won't use more memory, and we also can fork a new child process asap to dump rdb for next full synchronization or bgsave. But we also need to check if users enable 'save' RDB, if enable, we should not remove directly since that means RDB is important for users to keep data safe.

Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.

tests/integration/replication.tcl

src/networking.c

yossigo · 2020-10-26T16:18:12Z

Hi @ShooterIT, looks like the test produce some false positives. Looking at the test code it seems very timing dependent. Would you consider changing the simple delays with a more accurate check that probes the server state instead?

oranagra · 2020-10-26T16:35:51Z

for the record, failed on CI for MacOS:

*** [err]: Kill rdb child process if its dumping RDB is not useful in tests/integration/replication.tcl
Expected [s 0 rdb_bgsave_in_progress] == 1 (context: type eval line 27 cmd {assert {[s 0 rdb_bgsave_in_progress] == 1}} proc ::start_server)

oranagra · 2020-10-26T17:37:44Z

it looks like the one that fails is this one:

                # Slave1 disconnect with master
                $slave1 slaveof no one
                # Shouldn't kill child since another slave wait for rdb
                after 100
                assert {[s 0 rdb_bgsave_in_progress] == 1}

which is odd considering we already know the it's in progress by the previous wait_for_condition. maybe there was a race, and only one slave connected?
maybe increasing repl-diskless-sync-delay to more than 5 seconds is needed?

meanwhile, just to be on the safe side, i'm taking this commit out of 6.0.9

ShooterIT · 2020-10-27T06:13:55Z

Copy that @oranagra @yossigo
My bad, i found. It fails when only one slave connected with master and another slave is slow to connect with master. I will make a PR asap.

I could reproduce it on oranagra:6.0.9 branch but i didn't reproduce on unstable(i also don't find failed daily action in this test last month ), do we improve sync implementation in next commits?

oranagra · 2020-10-27T06:36:30Z

ohh, now it all makes sense.
6.0 doesn't have #6271

…nabled (redis#7819) When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit), It may be good to kill the bgsave child. in diskless replication it already happens, but in disk-based, the child may still serve some purpose (for persistence). By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave. We do that only if rdb persistence wasn't enabled in the configuration. Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.

Kill child process if its dumping RDB isn't useful when full sync

fc8af8e

oranagra reviewed Sep 20, 2020

View reviewed changes

tests/integration/replication.tcl Outdated Show resolved Hide resolved

tests/integration/replication.tcl Outdated Show resolved Hide resolved

tests/integration/replication.tcl Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

oranagra linked an issue Sep 20, 2020 that may be closed by this pull request

[BUG] Disk Based Replication Causing OOM in Master (v6.0.5) #7717

Closed

Simplify code and improve test cases

a81a614

oranagra approved these changes Sep 22, 2020

View reviewed changes

ShooterIT mentioned this pull request Sep 22, 2020

Remove tmp rdb file in background thread #7762

Merged

oranagra changed the title ~~Kill child process if its dumping RDB isn't useful when full sync~~ Kill disk-based fork child when all replicas drop and 'save' is not enabled Sep 22, 2020

oranagra merged commit 1bb5794 into redis:unstable Sep 22, 2020

oranagra mentioned this pull request Oct 26, 2020

release 6.0.9 #7965

Merged

ShooterIT deleted the kill-child branch January 11, 2021 12:58

oranagra mentioned this pull request Jan 13, 2021

Redis 6.2 RC1. #8187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kill disk-based fork child when all replicas drop and 'save' is not enabled#7819

Kill disk-based fork child when all replicas drop and 'save' is not enabled#7819
oranagra merged 2 commits intoredis:unstablefrom
ShooterIT:kill-child

ShooterIT commented Sep 19, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yossigo commented Oct 26, 2020

Uh oh!

oranagra commented Oct 26, 2020

Uh oh!

oranagra commented Oct 26, 2020

Uh oh!

ShooterIT commented Oct 27, 2020

Uh oh!

oranagra commented Oct 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShooterIT commented Sep 19, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yossigo commented Oct 26, 2020

Uh oh!

oranagra commented Oct 26, 2020

Uh oh!

oranagra commented Oct 26, 2020

Uh oh!

ShooterIT commented Oct 27, 2020

Uh oh!

oranagra commented Oct 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants