Skip to content

Revert async loading check#9770

Closed
soloestoy wants to merge 2 commits intoredis:unstablefrom
soloestoy:revert-async-loading
Closed

Revert async loading check#9770
soloestoy wants to merge 2 commits intoredis:unstablefrom
soloestoy:revert-async-loading

Conversation

@soloestoy
Copy link
Contributor

@soloestoy soloestoy commented Nov 11, 2021

PR #9323 introduces some bugs, since now we allow almost all commands during async loading, but execute some commands like config eval is very dangerous, for example disable diskless-load during async loading can lead to data inconsistency(if replication succeeds) and memory leak(if replication fails) and crash(free master client after full resync), because the global config is modified, you can use the new test case in this PR to reproduce it.

a simple way to fix it is revert the async_loading check in processCommand(), or we need more work to improve this feature, maybe refactor the rdb load mechanism to make it independent.

ping @oranagra @eduardobr

@eduardobr
Copy link
Contributor

@soloestoy Ideally only changes of configuration should be denied during async_loading.
It's true there's an issue and missing test regarding CONFIG SET being allowed when it shouldn't.
But denying all commands the same way as when loading kinda defeats the purpose (which is also to allow serving data while doing the full sync).
Can you think about any other case that should be denied other than CONFIG SET?
What is the risk of EVAL?

Thanks

@soloestoy
Copy link
Contributor Author

@eduardobr about eval, you can look deep into function processEventsWhileBlocked(), when rdb loading the rdbLoadProgressCallback callback would call it, and set the global variable ProcessingEventsWhileBlocked to 1, if we allow eval during rdb loading, and reach the lua_timeout, then the lua_timeout hook will step into processEventsWhileBlocked(), and then after lua timeout back to rdbLoadProgressCallback()'s processEventsWhileBlocked(), the global variable ProcessingEventsWhileBlocked is already set to 0.

I'm not sure if it has actual damage, but it's very dangerous, and maybe more commands are dangerous like module.

@oranagra
Copy link
Member

@soloestoy good catch about processEventsWhileBlocked collision, i'm ashamed that i missed it 8-).
i suppose we can solve it with a moderate effort.

Regarding config, i never really liked the fact it was blocked during loading (in the quire recent past, CONFIG GET was blocked too). there are only a handful configs that can be abused during loading, and i would argue that this abuse is unlikely to be a real problem for users.
however, since we identified that, let's solve that.

  1. we should cache the config on the stack when entering the loading state, and avoid accessing the config for a second time.
  2. let's see if there are any other problematic configs, and we can add an explicit check for async_loading in these to block them (or add a flag if there are many)

the configs that could have some possible issue are:

  • appendonly
  • repl-diskless-load
  • sanitize-dump-payload (i actually don't care)
  • loading-process-events-interval-bytes (don't see a real issue, maybe even an advantage)

apart from that, maybe we need to block some commands, specifically:

  • REPLICAOF

@oranagra
Copy link
Member

@eduardobr @soloestoy can one of you take this issue and come up with a PR to solve these issues?
if not, maybe i'll try to find time to do that...

@soloestoy
Copy link
Contributor Author

a easy fix is don't allow config set and eval/evalsha when async_loading, wdyt @oranagra

@oranagra
Copy link
Member

I don't mind blocking config set, but that one is actually easy to solve (forbid a few configs, and cache others on the stack).
Regarding EVAL, I think it would limit the usability of that feature, I rather solve it properly without blocking it, and I don't think it should be very hard.

@oranagra
Copy link
Member

oranagra commented Dec 1, 2021

ok, i understand no one else is looking into this, so i'll try to handle it myself.

@oranagra oranagra self-assigned this Dec 1, 2021
@oranagra
Copy link
Member

oranagra commented Dec 1, 2021

Note that i just realized the case of nested calls to ProcessingEventsWhileBlocked already existed anyway, see: #8284 i.e. when a slow script is executed when loading AOF file.

And the fix for that seems as simple as incrementing / decrementing ProcessingEventsWhileBlocked instead of just setting it to 0 and 1.

@oranagra
Copy link
Member

oranagra commented Dec 1, 2021

handled in #9878

@oranagra oranagra added state:to-be-closed requesting the core team to close the issue and removed 7.0-must-have labels Dec 2, 2021
@oranagra
Copy link
Member

Closing in favor of #9878

@oranagra oranagra closed this Dec 15, 2021
oranagra added a commit that referenced this pull request Dec 22, 2021
…ading (#9878)

## background
Till now CONFIG SET was blocked during loading.
(In the not so distant past, GET was disallowed too)

We recently (not released yet) added an async-loading mode, see #9323,
and during that time it'll serve CONFIG SET and any other command.
And now we realized (#9770) that some configs, and commands are dangerous
during async-loading.

## changes
* Allow most CONFIG SET during loading (both on async-loading and normal loading)
* Allow CONFIG REWRITE and CONFIG RESETSTAT during loading
* Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`)
* Block a few commands during loading (list below)

## the blocked commands:
* SAVE - obviously we don't wanna start a foregreound save during loading 8-)
* BGSAVE - we don't mind to schedule one, but we don't wanna fork now
* BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now
* MODULE - we obviously don't wanna unload a module during replication / rdb loading
  (MODULE HELP and MODULE LIST are not blocked)
* SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync
  requests now.
* REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let
  the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state.
* CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT,
  GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo

## other fixes
* processEventsWhileBlocked had an issue when being nested, this could happen with a busy script
  during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in
  the scenario described in #6988
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-closed requesting the core team to close the issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants