[Cluster inconsistency] slave migrates when all slots moved to new master but doesn't migrate back

A very peculiar bug, albeit easy to replicate, found in some in-house tests for production

I have checked it on 3.0.5 and the latest build from 3.0 branch in github
- Steps are as below:
### build a cluster with 6 nodes and replication turned on <ips to be replaced appropriately>

./redis-trib.rb create --replicas 1 192.168.10.25:8000 192.168.10.25:8001 192.168.10.25:8002 

192.168.10.25:8003 192.168.10.25:8004 192.168.10.25:8005
### the cluster will have 3 masters and 3 slaves replicating each of the masters

192.168.10.25:8004> cluster nodes
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072496887 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072495885 1 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 6d367efab8a48baf7d1c0e924049e86099dbb272 0 1454072497888 4 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072497088 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072497389 2 connected 5461-10922
- slots info

192.168.10.25:8004> cluster slots
1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "192.168.10.25"
      2) (integer) 8002
   4) 1) "192.168.10.25"
      2) (integer) 8005
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "192.168.10.25"
      2) (integer) 8000
   4) 1) "192.168.10.25"
      2) (integer) 8003
3) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "192.168.10.25"
      2) (integer) 8001
   4) 1) "192.168.10.25"
      2) (integer) 8004
### migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)

./redis-trib.rb reshard --from 6d367efab8a48baf7d1c0e924049e86099dbb272 --to 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --slots 5461 --yes 192.168.10.25:8001
### the slave of the original slot holder also migrates
- slot info

1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "192.168.10.25"
      2) (integer) 8002
   4) 1) "192.168.10.25"
      2) (integer) 8005
2) 1) (integer) 0
   2) (integer) 10922
   3) 1) "192.168.10.25"
      2) (integer) 8001
   4) 1) "192.168.10.25"
      2) (integer) 8004
   5) 1) "192.168.10.25"
      2) (integer) 8003
- node info (node on port 8001 has 2 slaves now and that on 8000 is left without slaves)

74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072791540 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072792041 1 connected
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072792542 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072789534 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072790538 7 connected 0-10922
### migrate all the slots back

./redis-trib.rb reshard --from 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --to 6d367efab8a48baf7d1c0e924049e86099dbb272 --slots 5461 --yes 192.168.10.25:8001
### the slave doesnt migrate back
- slot info

1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "192.168.10.25"
      2) (integer) 8002
   4) 1) "192.168.10.25"
      2) (integer) 8005
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "192.168.10.25"
      2) (integer) 8000
3) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "192.168.10.25"
      2) (integer) 8001
   4) 1) "192.168.10.25"
      2) (integer) 8004
   5) 1) "192.168.10.25"
      2) (integer) 8003
- node info

74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072917769 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072917769 8 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072918772 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072916766 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072916766 7 connected 5461-10922

If I migrate all slots from the remaining master and migrate all of them back then the cluster will have one master 
with all 3 slaves and other 2 masters with no slaves, whereas the slots are all equally distributed.

I can't literally say, what should be the expected behavour or what not, but logical points as 

below:
- either no mechanism, which detects "oh all slots migrated" so lets migrate the slave
- or if above mechanism is in place then, if the original master gets some/all slots back, it 

should have some slave also (not necessarily the original one), if there are enough slaves
### NOW THE PECULIER BITS.. :)

Which I found after spending some more time to understand the issue, and possibly useful for 

correct/clear analysis.

The issue does not exist on 3.0.5 (with redis-trib.rb from 3.0.5). The slave does not migrate 

in the first place (even if master loses all the slots), but this code exists:

```
in function clusterUpdateSlotsConfigWith()

  /* If at least one slot was reassigned from a node to another node
     * with a greater configEpoch, it is possible that:
     * 1) We are a master left without slots. This means that we were
     *    failed over and we should turn into a replica of the new
     *    master.
     * 2) We are a slave and our master is left without slots. We need
     *    to replicate to the new slots owner. */
    if (newmaster && curmaster->numslots == 0) {
        redisLog(REDIS_WARNING,
            "Configuration change detected. Reconfiguring myself "
            "as a replica of %.40s", sender->name);
        clusterSetMaster(sender);
        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
                             CLUSTER_TODO_UPDATE_STATE|
                             CLUSTER_TODO_FSYNC_CONFIG);

```

The issue happens on the latest build from 3.2 branch, (with redis-trib.rb from latest of 3.2 

branch). The slave migrates when all slots are moved but does not migrate back, when slots are 

moved back, but this code exists:

```
in function clusterCron()

    /* Orphaned master check, useful only if the current instance
         * is a slave that may migrate to another master. */
        if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
            int okslaves = clusterCountNonFailingSlaves(node);

            /* A master is orphaned if it is serving a non-zero number of
             * slots, have no working slaves, but used to have at least one
             * slave, or failed over a master that used to have slaves. */
            if (okslaves == 0 && node->numslots > 0 &&
                node->flags & REDIS_NODE_MIGRATE_TO)
            {
                orphaned_masters++;
            }
            if (okslaves > max_slaves) max_slaves = okslaves;
            if (nodeIsSlave(myself) && myself->slaveof == node)
                this_slaves = okslaves;
        }

```

The most peculier bit.
The difference in behaviour is because of some update in redis-trib.rb

the move slot flow is
set slot to receiving in destination
set slot to migrating in source
actual setslot on all nodes (or only on master nodes) <<-- this is the difference

if "cluster setslot <slot> node <target>" is done only on master nodes, the first behaviour is 

observed (slave migrates, but does not migrate)
if "cluster setslot <slot> node <target>" is done all nodes, the second behaviour is observed 

(slave does not migrate in the first place)

The above is consistent from 3.0.5 onward.. :-)

execute final setslot only in masters was introduced somewhere after 3.0.6

```
in redis-trib.rb
    move_slot...

        # Set the new node as the owner of the slot in all the known nodes.
        if !o[:cold]
            @nodes.each{|n|
                next if n.has_flag?("slave")
                n.r.cluster("setslot",slot,"node",target.info[:name])
            }
        end
```

I am guessing both migrate and migrate back should happen as per server code, but eventual 

percolation of the info across the cluster doesn't happen/gets overridden by older info; but I 

am just guessing..!

Hope the above would be of some use to resolve this.

I believe the system info would not be much relevant, all the config details are as below 

(ports will change)

port 8000
dir ./
bind 192.168.10.25
dbfilename redis-2-0.rdb
pidfile ./rdbredis-2-0.pid
logfile ./rdbredis-2-0.log
syslog-ident test-db1
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 7000
tcp-backlog 511
timeout 0
tcp-keepalive 0
slave-serve-stale-data yes
slave-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cluster inconsistency] slave migrates when all slots moved to new master but doesn't migrate back #3043

build a cluster with 6 nodes and replication turned on

the cluster will have 3 masters and 3 slaves replicating each of the masters

migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)

the slave of the original slot holder also migrates

migrate all the slots back

the slave doesnt migrate back

NOW THE PECULIER BITS.. :)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Cluster inconsistency] slave migrates when all slots moved to new master but doesn't migrate back #3043

Description

build a cluster with 6 nodes and replication turned on

the cluster will have 3 masters and 3 slaves replicating each of the masters

migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)

the slave of the original slot holder also migrates

migrate all the slots back

the slave doesnt migrate back

NOW THE PECULIER BITS.. :)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions