[BUG] Redis Cluster node after becoming Replica allows adding of primary slot range

Bug Description:
Version: Redis version=7.0.11

Hi, I observed an issue with Redis cluster nodes where the following happened with our automated Redis management control plane:
1. A node N1 was issued a **_CLUSTER REPLICATE_** command
2. Control Plane checked if the node was unassigned (replication didn't happen yet so it returned unassigned)
3. The node N1 was reassigned for Primary role - and issued **_CLUSTER ADDSLOTSRANGE_** for some other range
4. Final state had the node both as replica and primary: 


CLUSTER NODES O/P for Node on Port 7001

 3dd70494e109dfdfa44fd2ff69d1a12be1f3642b 172.18.133.131:7001@17001,padgupta-lappy. myself,**_slave_**,nofailover c23aa8a150e0291eaad4b88cc505b88b8b2479b0 0 1698913926000 3 connected **_8192-16383_**

c23aa8a150e0291eaad4b88cc505b88b8b2479b0 172.18.133.131:7000@17000,padgupta-lappy. master,nofailover - 0 1698914350449 3 connected 0-8191

CLUSTER NODES O/P for Node on Port 7000

c23aa8a150e0291eaad4b88cc505b88b8b2479b0 172.18.133.131:7000@17000,padgupta-lappy. myself,master,nofailover - 0 0 3 connected 0-8191

3dd70494e109dfdfa44fd2ff69d1a12be1f3642b 172.18.133.131:7001@17001,padgupta-lappy. slave,nofailover c23aa8a150e0291eaad4b88cc505b88b8b2479b0 0 1698914345168 3 connected


 Post this, if you resend CLUSTER REPLICATE command to the NodeN1, Redis crashes.
 
I'm not sure if this dual role is expected in the first place, but it does cause a crash later.

BUG REPORT:


------ STACK TRACE ------

Backtrace:
/usr/bin/redis-server *:7001 [cluster] (clusterSetMaster+0xdf)[0x563c35e2a22f]
/usr/bin/redis-server *:7001 [cluster] (clusterCommand+0x1c13)[0x563c35e31823]
/usr/bin/redis-server *:7001 [cluster] (call+0xee)[0x563c35da510e]
/usr/bin/redis-server *:7001 [cluster] (processCommand+0x6fd)[0x563c35da617d]
/usr/bin/redis-server *:7001 [cluster] (processInputBuffer+0x107)[0x563c35dc2517]
/usr/bin/redis-server *:7001 [cluster] (readQueryFromClient+0x318)[0x563c35dc2a58]
/usr/bin/redis-server *:7001 [cluster] (+0x17323c)[0x563c35e9823c]
/usr/bin/redis-server *:7001 [cluster] (aeProcessEvents+0x1e2)[0x563c35d9c182]
/usr/bin/redis-server *:7001 [cluster] (aeMain+0x1d)[0x563c35d9c4bd]
/usr/bin/redis-server *:7001 [cluster] (main+0x354)[0x563c35d93df4]
/lib/x86_64-linux-gnu/libc.so.6 (+0x29d90)[0x7f5f788ccd90]
/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x80)[0x7f5f788cce40]
/usr/bin/redis-server *:7001 [cluster] (_start+0x25)[0x563c35d944c5]

Repro Steps:

Minimal Pseduo-Code for Repro:

```
firstShard.ClusterAddSlots(context.TODO(), 0, 8191)
firstShard.ClusterBumpEpoch(context.TODO()))
secondShard.ClusterReplicate(context.TODO(), NodeIdOfFirstShard)
secondShard.ClusterAddSlots(context.TODO(), 8192, 16383)
secondShard.ClusterBumpEpoch(context.TODO()))

[For Crash- issue another replicate]
secondShard.ClusterReplicate(context.TODO(), NodeIdOfFirstShard)
```

Expectation:

If we send CLUSTER REPLICATE to a master node, it denies replication command. I would expect similar if we send CLUSTER ADDSLOTSRANGE to a replica node (until it's state is reset). It should atleast be consistent with view from other nodes eventually from gossip propogation. Currently, The CLUSTER NODES of the 2 shards don't converge (before the crash).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Redis Cluster node after becoming Replica allows adding of primary slot range #12717

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Redis Cluster node after becoming Replica allows adding of primary slot range #12717

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions