Skip to content

[BUG]"corrupted cluster config file" on redis 7.2.3 error when running redis cluster with mixed 7.0 and 7.2 nodes #12761

@jdziemidowicz

Description

@jdziemidowicz

Describe the bug

"corrupted cluster config file" on redis 7.2 error when running redis cluster with mixed 7.0 and 7.2 nodes.

To reproduce

  1. Create a 3 node cluster with redis 7.0 (I've used 7.0.14)
  2. Create 3 slave nodes with redis 7.2 (I've used 7.2.3)
  3. Stop any of the 7.2 nodes and try to start it again. It fails with the following error:

70878:M 14 Nov 2023 09:47:27.115 # Unrecoverable error: corrupted cluster config file "c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 127.0.0.1:6381@16381,,tls-port=0,shard-id=f5de5cc8bc87f66210d34f1017149ebd89e03ad7 master - 0 1699951301788 3 connected 10923-16383
".

Expected behavior

7.2 nodes can be restarted and can rejoin the cluster.

Additional information

I've came across this problem when trying to update one of my 7.0 cluster to 7.2. I've updated several slaves, as usual. Everything seemed to work fine, until I had to restart one of them and it failed. So it seems that the rolling update from 7.0 to 7.2 is right now impossible, but maybe I'm doing something wrong?

Corrupted config file generated by redis 7.2 (6379-6381 nodes are 7.0, 36379-36381 are 7.2):

8f542d6e23d29a9106e4d5db7c028dafd67d4649 127.0.0.1:6380@16380,,tls-port=0,shard-id=502af3db7f3f11593bb9754e9fd4f58e309cd719 master - 0 1699951299000 2 connected 5461-10922
f83290999eab469341f81dea9a783eaaec18b96a 127.0.0.1:6379@16379,,tls-port=0,shard-id=653bfbe1b8b4f698f81e93c173a04fb886641e51 master - 0 1699951300780 1 connected 0-5460
9662b7ddd38a17cd7a294e1c2bac692c5fc1cbcc 127.0.0.1:36381@46381,,tls-port=0,shard-id=a6af4619eec99aabd7c4b9e4ccc5070f43f642b8 slave,fail c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 1699951288673 1699951282616 3 disconnected
c5cb6e214d955fe19cb2eb2d5d3e8a35a165f7d2 127.0.0.1:6381@16381,,tls-port=0,shard-id=f5de5cc8bc87f66210d34f1017149ebd89e03ad7 master - 0 1699951301788 3 connected 10923-16383
0eabd0c4889de11a25ac373c66251e5021d157a0 127.0.0.1:36380@46380,,tls-port=0,shard-id=98461c28bba1a38fdb7100cba6e98eacbb95e97b slave 8f542d6e23d29a9106e4d5db7c028dafd67d4649 0 1699951300000 2 connected
bd1667d21001886891be7def40c298d6873967ce 127.0.0.1:36379@46379,,tls-port=0,shard-id=653bfbe1b8b4f698f81e93c173a04fb886641e51 myself,slave f83290999eab469341f81dea9a783eaaec18b96a 0 1699951301000 1 connected
vars currentEpoch 3 lastVoteEpoch 0

I've also used gdb to check why it is failing and it seems to fail in this place when trying to parse shard-id field:

goto fmterr;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions