-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
Bug Description:
Version: Redis version=7.0.11
Hi, I observed an issue with Redis cluster nodes where the following happened with our automated Redis management control plane:
- A node N1 was issued a CLUSTER REPLICATE command
- Control Plane checked if the node was unassigned (replication didn't happen yet so it returned unassigned)
- The node N1 was reassigned for Primary role - and issued CLUSTER ADDSLOTSRANGE for some other range
- Final state had the node both as replica and primary:
CLUSTER NODES O/P for Node on Port 7001
3dd70494e109dfdfa44fd2ff69d1a12be1f3642b 172.18.133.131:7001@17001,padgupta-lappy. myself,slave,nofailover c23aa8a150e0291eaad4b88cc505b88b8b2479b0 0 1698913926000 3 connected 8192-16383
c23aa8a150e0291eaad4b88cc505b88b8b2479b0 172.18.133.131:7000@17000,padgupta-lappy. master,nofailover - 0 1698914350449 3 connected 0-8191
CLUSTER NODES O/P for Node on Port 7000
c23aa8a150e0291eaad4b88cc505b88b8b2479b0 172.18.133.131:7000@17000,padgupta-lappy. myself,master,nofailover - 0 0 3 connected 0-8191
3dd70494e109dfdfa44fd2ff69d1a12be1f3642b 172.18.133.131:7001@17001,padgupta-lappy. slave,nofailover c23aa8a150e0291eaad4b88cc505b88b8b2479b0 0 1698914345168 3 connected
Post this, if you resend CLUSTER REPLICATE command to the NodeN1, Redis crashes.
I'm not sure if this dual role is expected in the first place, but it does cause a crash later.
BUG REPORT:
------ STACK TRACE ------
Backtrace:
/usr/bin/redis-server *:7001 [cluster] (clusterSetMaster+0xdf)[0x563c35e2a22f]
/usr/bin/redis-server *:7001 [cluster] (clusterCommand+0x1c13)[0x563c35e31823]
/usr/bin/redis-server *:7001 [cluster] (call+0xee)[0x563c35da510e]
/usr/bin/redis-server *:7001 [cluster] (processCommand+0x6fd)[0x563c35da617d]
/usr/bin/redis-server *:7001 [cluster] (processInputBuffer+0x107)[0x563c35dc2517]
/usr/bin/redis-server *:7001 [cluster] (readQueryFromClient+0x318)[0x563c35dc2a58]
/usr/bin/redis-server *:7001 [cluster] (+0x17323c)[0x563c35e9823c]
/usr/bin/redis-server *:7001 [cluster] (aeProcessEvents+0x1e2)[0x563c35d9c182]
/usr/bin/redis-server *:7001 [cluster] (aeMain+0x1d)[0x563c35d9c4bd]
/usr/bin/redis-server *:7001 [cluster] (main+0x354)[0x563c35d93df4]
/lib/x86_64-linux-gnu/libc.so.6 (+0x29d90)[0x7f5f788ccd90]
/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x80)[0x7f5f788cce40]
/usr/bin/redis-server *:7001 [cluster] (_start+0x25)[0x563c35d944c5]
Repro Steps:
Minimal Pseduo-Code for Repro:
firstShard.ClusterAddSlots(context.TODO(), 0, 8191)
firstShard.ClusterBumpEpoch(context.TODO()))
secondShard.ClusterReplicate(context.TODO(), NodeIdOfFirstShard)
secondShard.ClusterAddSlots(context.TODO(), 8192, 16383)
secondShard.ClusterBumpEpoch(context.TODO()))
[For Crash- issue another replicate]
secondShard.ClusterReplicate(context.TODO(), NodeIdOfFirstShard)
Expectation:
If we send CLUSTER REPLICATE to a master node, it denies replication command. I would expect similar if we send CLUSTER ADDSLOTSRANGE to a replica node (until it's state is reset). It should atleast be consistent with view from other nodes eventually from gossip propogation. Currently, The CLUSTER NODES of the 2 shards don't converge (before the crash).