Fix crash in script timeout during AOF loading#7870
Fix crash in script timeout during AOF loading#7870oranagra merged 1 commit intoredis:unstablefrom oranagra:aof_script_timeout
Conversation
|
Closes #7869 |
|
@oranagra Perhaps it's time to step up the deprecation process of script replication and produce a warning message if it's enabled. |
tests/unit/scripting.tcl
Outdated
There was a problem hiding this comment.
I'm a bit concerned about the flakiness of this, did you consider other options?
(I don't have a better suggestion that doesn't involve creating new synthetic delay commands that are script-accessible)
There was a problem hiding this comment.
i can't think of any other option.
scripts are by definition immune from being affected by external factors.
DEBUG can't be used in scripts.
and SCRIPT KILL can't be used since this one has to have a write command in it.
the only thing i can think of is find some command that has a deterministic timing.
but obviously blocked commands with timeout are not allowed either.
maybe we can settle on the fact i was able to reproduce it when i fixed it, and we can drop the test now.
or maybe keep the test, but remove the assertion that makes sure it was able to trigger the BUSY and LOADING state. (so that it won't be flaky)
There was a problem hiding this comment.
i guess i was missing the rd flush and after 100 to make it work.
added some timing to see what's going on, and added more time so that on my machine it took 900ms for each eval.
on actions it took:
script took 20125 milliseconds
loading took 19310 milliseconds
i guess we can cut it by half?
There was a problem hiding this comment.
with the change it takes some 400ms on my machine, and 10s on github actions
script took 10532 milliseconds
loading took 11399 milliseconds
|
@yossigo perhaps.. i'm not sure if i wanna blame that bug on the connection abstraction and script timeout, or on the |
|
@oranagra probably both but how can it reproduce without |
|
maybe loading an AOF file from v3.2? |
This is a regression in 6.0 (connection abstraction) it seems it can be triggered only when setting script command replication to no (or loading old AOF files)
|
@redis/core-team please let me know if you think this fix should be released as 6.0.9 ASAP.
|
(cherry picked from commit dc803d2)
(cherry picked from commit dc803d2)
This is a regression in 6.0 (connection abstraction)
it seems it can be triggered only when setting script command
replication to no (or loading old AOF files)