-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
A very peculiar bug, albeit easy to replicate, found in some in-house tests for production
I have checked it on 3.0.5 and the latest build from 3.0 branch in github
- Steps are as below:
build a cluster with 6 nodes and replication turned on
./redis-trib.rb create --replicas 1 192.168.10.25:8000 192.168.10.25:8001 192.168.10.25:8002
192.168.10.25:8003 192.168.10.25:8004 192.168.10.25:8005
the cluster will have 3 masters and 3 slaves replicating each of the masters
192.168.10.25:8004> cluster nodes
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072496887 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072495885 1 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 6d367efab8a48baf7d1c0e924049e86099dbb272 0 1454072497888 4 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072497088 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072497389 2 connected 5461-10922
- slots info
192.168.10.25:8004> cluster slots
-
- (integer) 10923
- (integer) 16383
-
- "192.168.10.25"
- (integer) 8002
-
- "192.168.10.25"
- (integer) 8005
-
- (integer) 0
- (integer) 5460
-
- "192.168.10.25"
- (integer) 8000
-
- "192.168.10.25"
- (integer) 8003
-
- (integer) 5461
- (integer) 10922
-
- "192.168.10.25"
- (integer) 8001
-
- "192.168.10.25"
- (integer) 8004
migrate all slots of a particular master to another master (from node on port 8000 to node on port 8001 in this case)
./redis-trib.rb reshard --from 6d367efab8a48baf7d1c0e924049e86099dbb272 --to 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --slots 5461 --yes 192.168.10.25:8001
the slave of the original slot holder also migrates
- slot info
-
- (integer) 10923
- (integer) 16383
-
- "192.168.10.25"
- (integer) 8002
-
- "192.168.10.25"
- (integer) 8005
-
- (integer) 0
- (integer) 10922
-
- "192.168.10.25"
- (integer) 8001
-
- "192.168.10.25"
- (integer) 8004
-
- "192.168.10.25"
- (integer) 8003
- node info (node on port 8001 has 2 slaves now and that on 8000 is left without slaves)
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072791540 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072792041 1 connected
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072792542 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072789534 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072790538 7 connected 0-10922
migrate all the slots back
./redis-trib.rb reshard --from 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 --to 6d367efab8a48baf7d1c0e924049e86099dbb272 --slots 5461 --yes 192.168.10.25:8001
the slave doesnt migrate back
- slot info
-
- (integer) 10923
- (integer) 16383
-
- "192.168.10.25"
- (integer) 8002
-
- "192.168.10.25"
- (integer) 8005
-
- (integer) 0
- (integer) 5460
-
- "192.168.10.25"
- (integer) 8000
-
- (integer) 5461
- (integer) 10922
-
- "192.168.10.25"
- (integer) 8001
-
- "192.168.10.25"
- (integer) 8004
-
- "192.168.10.25"
- (integer) 8003
- node info
74efdfbbacd99745a27d43aabce947d80d3a9051 192.168.10.25:8002 master - 0 1454072917769 3 connected 10923-16383
6d367efab8a48baf7d1c0e924049e86099dbb272 192.168.10.25:8000 master - 0 1454072917769 8 connected 0-5460
478e1f5a49363ff8a78dd4192cee0389c9344763 192.168.10.25:8003 slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 1454072918772 7 connected
bcb8b8d4b4349860fc2d7fea2a0d99e07d12ab7a 192.168.10.25:8005 slave 74efdfbbacd99745a27d43aabce947d80d3a9051 0 1454072916766 6 connected
532b72609d8264caeab21fc39bb380b98b81cc34 192.168.10.25:8004 myself,slave 07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 0 0 5 connected
07598e66b97494e18dd55ce3c8cd44d6ace0a2c0 192.168.10.25:8001 master - 0 1454072916766 7 connected 5461-10922
If I migrate all slots from the remaining master and migrate all of them back then the cluster will have one master
with all 3 slaves and other 2 masters with no slaves, whereas the slots are all equally distributed.
I can't literally say, what should be the expected behavour or what not, but logical points as
below:
- either no mechanism, which detects "oh all slots migrated" so lets migrate the slave
- or if above mechanism is in place then, if the original master gets some/all slots back, it
should have some slave also (not necessarily the original one), if there are enough slaves
NOW THE PECULIER BITS.. :)
Which I found after spending some more time to understand the issue, and possibly useful for
correct/clear analysis.
The issue does not exist on 3.0.5 (with redis-trib.rb from 3.0.5). The slave does not migrate
in the first place (even if master loses all the slots), but this code exists:
in function clusterUpdateSlotsConfigWith()
/* If at least one slot was reassigned from a node to another node
* with a greater configEpoch, it is possible that:
* 1) We are a master left without slots. This means that we were
* failed over and we should turn into a replica of the new
* master.
* 2) We are a slave and our master is left without slots. We need
* to replicate to the new slots owner. */
if (newmaster && curmaster->numslots == 0) {
redisLog(REDIS_WARNING,
"Configuration change detected. Reconfiguring myself "
"as a replica of %.40s", sender->name);
clusterSetMaster(sender);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
CLUSTER_TODO_UPDATE_STATE|
CLUSTER_TODO_FSYNC_CONFIG);
The issue happens on the latest build from 3.2 branch, (with redis-trib.rb from latest of 3.2
branch). The slave migrates when all slots are moved but does not migrate back, when slots are
moved back, but this code exists:
in function clusterCron()
/* Orphaned master check, useful only if the current instance
* is a slave that may migrate to another master. */
if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
int okslaves = clusterCountNonFailingSlaves(node);
/* A master is orphaned if it is serving a non-zero number of
* slots, have no working slaves, but used to have at least one
* slave, or failed over a master that used to have slaves. */
if (okslaves == 0 && node->numslots > 0 &&
node->flags & REDIS_NODE_MIGRATE_TO)
{
orphaned_masters++;
}
if (okslaves > max_slaves) max_slaves = okslaves;
if (nodeIsSlave(myself) && myself->slaveof == node)
this_slaves = okslaves;
}
The most peculier bit.
The difference in behaviour is because of some update in redis-trib.rb
the move slot flow is
set slot to receiving in destination
set slot to migrating in source
actual setslot on all nodes (or only on master nodes) <<-- this is the difference
if "cluster setslot node " is done only on master nodes, the first behaviour is
observed (slave migrates, but does not migrate)
if "cluster setslot node " is done all nodes, the second behaviour is observed
(slave does not migrate in the first place)
The above is consistent from 3.0.5 onward.. :-)
execute final setslot only in masters was introduced somewhere after 3.0.6
in redis-trib.rb
move_slot...
# Set the new node as the owner of the slot in all the known nodes.
if !o[:cold]
@nodes.each{|n|
next if n.has_flag?("slave")
n.r.cluster("setslot",slot,"node",target.info[:name])
}
end
I am guessing both migrate and migrate back should happen as per server code, but eventual
percolation of the info across the cluster doesn't happen/gets overridden by older info; but I
am just guessing..!
Hope the above would be of some use to resolve this.
I believe the system info would not be much relevant, all the config details are as below
(ports will change)
port 8000
dir ./
bind 192.168.10.25
dbfilename redis-2-0.rdb
pidfile ./rdbredis-2-0.pid
logfile ./rdbredis-2-0.log
syslog-ident test-db1
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 7000
tcp-backlog 511
timeout 0
tcp-keepalive 0
slave-serve-stale-data yes
slave-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes