Skip to content

Fixed cluster test where node was waiting to disable AOF#9703

Closed
madolson wants to merge 2 commits intoredis:unstablefrom
madolson:unstable-cluster-aof-fix
Closed

Fixed cluster test where node was waiting to disable AOF#9703
madolson wants to merge 2 commits intoredis:unstablefrom
madolson:unstable-cluster-aof-fix

Conversation

@madolson
Copy link
Contributor

Two tests were getting delayed since there were waiting for the command to disable AOF to go through while it was being killed. However, the node was paused since the next part of the test relied on the node being unreachable from the replica, so the waiting to execute the AOF command broke the test timing.

https://github.com/redis/redis/actions/runs/1403850622

@madolson madolson requested a review from oranagra October 31, 2021 04:20
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come it passed when I run it locally? (multiple times).

Btw, it could be that there's a node that's both doing the initial rewrite, and also paused.
Maybe we should change the order, and unpause first?

Or maybe for these tests we wanna pass an argument so that it uses SIGKILL like it used to do in 6.0?

@oranagra
Copy link
Member

ohh, i see it didn't pass on my machine, but sine there's nothing colored in red on the last page, i didn't notice the failure.

@oranagra
Copy link
Member

@madolson please see #9707 as an alternative to this one. let me know what you think.

@madolson madolson closed this Oct 31, 2021
oranagra added a commit that referenced this pull request Oct 31, 2021
Fix failures introduced by #9695 which was an attempt to solve failures introduced by #9679.
And alternative to #9703 (i didn't like the extra argument to kill_instance).

Reverting #9695.
Instead of stopping AOF on all terminations, stop it only on the two which need it.
Do it as part of the test rather than the infra (it was add that kill_instance used `R`
to communicate to the instance)

Note that the original purpose of these tests was to trigger a crash, but that upsets
valgrind so in redis 6.2 i changed it to use SIGTERM, so i now rename the tests
(remove "kill" and "crash").

Also add some colors to failures, and the word "FAILED" so that it's searchable.

And solve a semi-related race condition in 14-consistency-check.tcl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants