-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
Describe the bug
"corrupted cluster config file" on redis 7.2 error when running redis cluster with mixed 7.0 and 7.2 nodes.
To reproduce
- Create a 3 node cluster with redis 7.0 (I've used 7.0.14)
- Create 3 slave nodes with redis 7.2 (I've used 7.2.3)
- Stop any of the 7.2 nodes and try to start it again. It fails with the following error:
70878:M 14 Nov 2023 09:47:27.115 # Unrecoverable error: corrupted cluster config file "c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 127.0.0.1:6381@16381,,tls-port=0,shard-id=f5de5cc8bc87f66210d34f1017149ebd89e03ad7 master - 0 1699951301788 3 connected 10923-16383
".
Expected behavior
7.2 nodes can be restarted and can rejoin the cluster.
Additional information
I've came across this problem when trying to update one of my 7.0 cluster to 7.2. I've updated several slaves, as usual. Everything seemed to work fine, until I had to restart one of them and it failed. So it seems that the rolling update from 7.0 to 7.2 is right now impossible, but maybe I'm doing something wrong?
Corrupted config file generated by redis 7.2 (6379-6381 nodes are 7.0, 36379-36381 are 7.2):
8f542d6e23d29a9106e4d5db7c028dafd67d4649 127.0.0.1:6380@16380,,tls-port=0,shard-id=502af3db7f3f11593bb9754e9fd4f58e309cd719 master - 0 1699951299000 2 connected 5461-10922
f83290999eab469341f81dea9a783eaaec18b96a 127.0.0.1:6379@16379,,tls-port=0,shard-id=653bfbe1b8b4f698f81e93c173a04fb886641e51 master - 0 1699951300780 1 connected 0-5460
9662b7ddd38a17cd7a294e1c2bac692c5fc1cbcc 127.0.0.1:36381@46381,,tls-port=0,shard-id=a6af4619eec99aabd7c4b9e4ccc5070f43f642b8 slave,fail c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 1699951288673 1699951282616 3 disconnected
c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 127.0.0.1:6381@16381,,tls-port=0,shard-id=f5de5cc8bc87f66210d34f1017149ebd89e03ad7 master - 0 1699951301788 3 connected 10923-16383
0eabd0c4889de11a25ac373c66251e5021d157a0 127.0.0.1:36380@46380,,tls-port=0,shard-id=98461c28bba1a38fdb7100cba6e98eacbb95e97b slave 8f542d6e23d29a9106e4d5db7c028dafd67d4649 0 1699951300000 2 connected
bd1667d21001886891be7def40c298d6873967ce 127.0.0.1:36379@46379,,tls-port=0,shard-id=653bfbe1b8b4f698f81e93c173a04fb886641e51 myself,slave f83290999eab469341f81dea9a783eaaec18b96a 0 1699951301000 1 connected
vars currentEpoch 3 lastVoteEpoch 0
I've also used gdb to check why it is failing and it seems to fail in this place when trying to parse shard-id field:
Line 502 in 7f4bae8
| goto fmterr; |