CURATOR-653: fix potential double leader for LeaderLatch #398

woaishixiaoxiao · 2021-10-29T09:30:09Z

When I use the LeaderLatch to select leader, there is a double-leader phenomenon.
The timeline is as follows：

The zk cluster switch leader node bescause of zxid overflow. The cluster is unavailable to the outside world
A client(not leader befor zxid overflow) and B client(is leader before zxid overflow) enter the suspend state, B client set its leader status to false
The zk cluster complete the leader node election and the cluster back to normal
A client enter the reconnect state and call the reset function, set its leader status to false.
B client enter the reconnect state, call the reset function. set its leader status to false. Delete its old path.
A client receive preNodeDeleteEvent. Then getChildren from zkServer. Find itself is the smallest number and set itself as a leader.
B client create a new temporary node and then getChildren from zkServer. Find itself not the node with the smallest serial number and listen to the previous node delete event.
A client delete its old path.
B client receive the preNodeDeleteEvent. then getchildren from zkServer. Find itself is the smallest sequence number and then set itself as a leader
A client create a new temporary node and then getChildren from zkServer. Find itself not the node with the smallest serial number and listen to the previous node delete event. but it doesn't set itself as a non-leader state. because of the sixth step operation, A still is leader state now.
now A client and B client are the leader at the same time

eolivelli · 2021-10-29T10:02:18Z

@woaishixiaoxiao thanks for sharing your fix,
do you think that we can add a test case to cover this change ?

woaishixiaoxiao · 2021-10-29T10:52:30Z

@woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?

OK. I will try it.
and I find another question related to the leader-selection scenari. When zkServer switch the leader and then returns to normal, all clients will execute state switching: connected->suspend->reconn
Because leaderlatch processing the reconn state will reset leader status, that is mean first set itself leader status false and then delete old temporary sequence Node and create a new one. This operation will cause the business side to perform a leader switch multiple. Some businesses don’t want to see such frequent switchovers happen such as mq. Also this operation will cause nodeDeleteEvent push once from zk server but client execute multiple times nodeDeleteCallback on same path because client saves mutiple watch local(create new path will getchild and listen. and prenodedeleteEvent also will getchild and listen ).
Why don't we replace StandardConnectionStateErrorPolicy with SessionConnectionStateErrorPolicy? The above phenomenon will be avoided

woaishixiaoxiao · 2021-11-02T15:09:07Z

@woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?

HI i have added a unit test. please approval thanks

tisonkun · 2022-09-25T14:53:01Z

LGTM. I also think of this change days before. cc @eolivelli @Randgalt can you also give a review?

tisonkun · 2022-09-25T14:54:01Z

@woaishixiaoxiao can you create a JIRA ticket on https://issues.apache.org/jira/projects/CURATOR for this patch?

tisonkun

I think after #430 merged, RECONNECT no more causes reset. But it's reasonable to set leadership to false if getChildren finds itself not the leader.

cc @eolivelli @Randgalt

tisonkun · 2022-09-27T03:52:07Z

Well. After #430 merged the test added in this patch failed. Need a closer look.

Signed-off-by: tison <[email protected]>

tisonkun · 2022-09-29T13:24:18Z

I adjust the test to inject force resets instead of depending on connection loss. Although this means it should be a non-real-world case now, I still agree on setLeadership(false) on checkLeadership find the latch isn't the leader. setLeadership(false) is idempotent.

eolivelli

+1

eolivelli · 2022-09-29T13:34:25Z

@tisonkun thanks for fixing the test
@woaishixiaoxiao do you agree with @tisonkun 's fix ?

XComp

The change looks good. I went over the test and it does what it should do. There are only two things that require an actual change (typo in the variable and catch clause in the test implementation hiding any Exception).

About the other nit-picky comments: I'm aware that I'm nit picking here. Hence, see them as proposal rather than requests to change. :-) Nice proposal 👍

Additionally, besides reasoning the code I verified that the test fails with the fix being reverted and succeeds with the fix being included.

XComp · 2022-10-07T11:55:17Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java

    volatile CountDownLatch debugResetWaitLatch = null;

+    @VisibleForTesting
+    volatile CountDownLatch debugRestWaitBeforeNodeDelete = null;


Suggested change

volatile CountDownLatch debugRestWaitBeforeNodeDelete = null;

volatile CountDownLatch debugResetWaitBeforeNodeDeleteLatch = null;

There's a typo in the name. Additionally, we might want to add Latch at the end to reflect the purpose of this member analogously to the other latches.

XComp · 2022-10-07T13:00:10Z