Fix crash during SLAVEOF when clients are blocked on lazyfree#13853
Fix crash during SLAVEOF when clients are blocked on lazyfree#13853sundb merged 7 commits intoredis:unstablefrom
Conversation
8d641af to
8d32fe8
Compare
59c8a73 to
481e9cc
Compare
oranagra
left a comment
There was a problem hiding this comment.
LGTM, but i'll take this opportunity to note something else i recently noticed which might be related.
in unblockClient we're calling resetClient, but for some flows (notably flushallSyncBgDone, we're calling both unblockClient and commandProcessed (which calls resetClient too).
currently, there's no harm for calling resetClient twice, but there could be (in my POC, there was).
i ended up adding an exception in unblockClient (like we have for BLOCKED_SHUTDOWN), but this is probably wrong, and we should better define who's responsible of clearing both of these (resetClient, and updateStatsOnUnblock in each flow, avoid calling them twice, or not calling them at all).
|
changes LGTM, from my perspective, unless you want to proceed with the cleanup around |
@oranagra since this PR needs to backport to 7.4, let's put some other PR to handle it? |
Fix #13853 (review) This PR ensures that the client's current command is not reset by unblockClient(), while still needing to be handled after `unblockclient()`. The FLUSH command still requires reprocessing (update the replication offset) after unblockClient(). Therefore, we mark such blocked clients with the CLIENT_PENDING_COMMAND flag to prevent the command from being reset during unblockClient().
This test was introduced by #13853 We determine if the client is in blocked status, but if async flushdb is completed before checking the blocked status, the test will fail. So modify the test to only determine if `lazyfree_pending_objects` is correct to ensure that flushdb is async, that is, the client must be blocked.
…13853) After redis#13167, when a client calls `FLUSHDB` command, we still async empty database, and the client was blocked until the lazyfree completes. 1) If another client calls `SLAVEOF` command during this time, the server will unblock all blocked clients, including those blocked by the lazyfree. However, when unblocking a lazyfree blocked client, we forgot to call `updateStatsOnUnblock()`, which ultimately triggered the following assertion. 2) If a client blocked by Lazyfree is unblocked midway, and at this point the `bio_comp_list` has already received the completion notification for the bio, we might end up processing a client that has already been unblocked in `flushallSyncBgDone()`. Therefore, we need to filter it out. --------- Co-authored-by: oranagra <[email protected]>
…13853) After redis#13167, when a client calls `FLUSHDB` command, we still async empty database, and the client was blocked until the lazyfree completes. 1) If another client calls `SLAVEOF` command during this time, the server will unblock all blocked clients, including those blocked by the lazyfree. However, when unblocking a lazyfree blocked client, we forgot to call `updateStatsOnUnblock()`, which ultimately triggered the following assertion. 2) If a client blocked by Lazyfree is unblocked midway, and at this point the `bio_comp_list` has already received the completion notification for the bio, we might end up processing a client that has already been unblocked in `flushallSyncBgDone()`. Therefore, we need to filter it out. --------- Co-authored-by: oranagra <[email protected]>
Fix redis#13853 (review) This PR ensures that the client's current command is not reset by unblockClient(), while still needing to be handled after `unblockclient()`. The FLUSH command still requires reprocessing (update the replication offset) after unblockClient(). Therefore, we mark such blocked clients with the CLIENT_PENDING_COMMAND flag to prevent the command from being reset during unblockClient().
This test was introduced by redis#13853 We determine if the client is in blocked status, but if async flushdb is completed before checking the blocked status, the test will fail. So modify the test to only determine if `lazyfree_pending_objects` is correct to ensure that flushdb is async, that is, the client must be blocked.
After #13167, when a client calls
FLUSHDBcomand, we still async empty database, and the client was blocked until the lazyfree completes.SLAVEOFcommand during this time, the server will unblock all blocked clients, including those blocked by the lazyfree. However, when unblocking a lazyfree blocked client, we forgot to callupdateStatsOnUnblock(), which ultimately triggered the following assertion.bio_comp_listhas already received the completion notification for the bio, we might end up processing a client that has already been unblocked influshallSyncBgDone(). Therefore, we need to filter it out.failed CI: https://github.com/sundb/redis/actions/runs/13802290567/job/38606761833