Skip to content

if diskless repl child is killed, make sure to reap the pid#7742

Merged
oranagra merged 2 commits intoredis:unstablefrom
oranagra:reap-killed-diskless-fork
Sep 6, 2020
Merged

if diskless repl child is killed, make sure to reap the pid#7742
oranagra merged 2 commits intoredis:unstablefrom
oranagra:reap-killed-diskless-fork

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Sep 2, 2020

Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.

@oranagra oranagra requested a review from yossigo September 2, 2020 10:53
@oranagra oranagra changed the title if Diskless repl child killed, make sure to reaping the pid if Diskless repl child killed, make sure to reap the pid Sep 2, 2020
@oranagra oranagra changed the title if Diskless repl child killed, make sure to reap the pid if diskless repl child is killed, make sure to reap the pid Sep 2, 2020
yossigo
yossigo previously approved these changes Sep 6, 2020
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.
@oranagra
Copy link
Member Author

oranagra commented Sep 6, 2020

@yossigo i had to rebase due to a conflict (two adjacent changes).
while doing that i noticed a compilation warning and did some additional adjustment please review.

this adjustment was part of the original #6271 and got removed before merging (#6271 (review)) because it was no longer relevant after the introduction of the TLS fork/pipe changes (5a47794). but with this change it becomes relevant again.

p.s. testing the diskless loading short read still manages to run 100 tests in about 3 seconds regardless of this adjustment, but that's probably only because it sets the cron hz to 500.
p.p.s it looks like this test is even slightly faster with this PR than before this PR.

i'll update the commit comment to reflect the above when squashing.

@oranagra oranagra merged commit 573246f into redis:unstable Sep 6, 2020
@oranagra oranagra deleted the reap-killed-diskless-fork branch September 6, 2020 13:44
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Nov 4, 2020
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.

Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of redis#6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (5a47794).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.
@oranagra oranagra mentioned this pull request Jan 13, 2021
@oranagra oranagra mentioned this pull request Jul 21, 2021
oranagra added a commit that referenced this pull request Jul 21, 2021
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.

Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (5a47794).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.

(cherry picked from commit 573246f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants