Don't update node ip when peer ip is unreachable by uvletter · Pull Request #10696 · redis/redis

uvletter · 2022-05-09T11:43:29Z

As discussed in #9351, @bjosv mentioned that the cluster node IP may be updated to ? when getpeername() fails due to some reasons like the fd is invalid somehow, and the log looks like:

15:S 27 Apr 2022 00:38:49.999 # Address updated for node f55db6b9314afec8816cbb258cf8a8358ad1718c, now ?:6380
15:S 27 Apr 2022 00:38:49.999 # Address updated for node c16a99900ec94d8bfd4dc4e1b0c76340f7e853a6, now ?:6380
15:S 27 Apr 2022 00:38:49.999 # Address updated for node 530b194579251e73473b6c9f94b2ce79ab1ad7eb, now ?:6380

To solve this issue, the patch primarily do 2 things:

forward the error code outside when the try to get peer name fails, and skip the current IP update in nodeUpdateAddressIfNeeded. It won't affect the normal IP update process as it will be retried the next round of PING.
log the failure info of getpeername(), for further debug and so on

release notes:
Fix a bug where a cluster enabled replica node may permanently set its master's hostname to '?'.

bjosv

It would be nice if we had a testcase to trigger this. Current testcases seem all have a valid ip.

src/connection.c

bjosv · 2022-05-10T08:50:05Z

Just as a thought, I think the reviewers in the core team would appreciate some details in the PR description above.
It would probably help if you add some text about the reason of this change and what users can expect (new behavior, new logs, ..)

uvletter · 2022-05-10T09:32:39Z

@bjosv thanks a lot for your suggestions! I will deal them one by one.

It would be nice if we had a testcase to trigger this. Current testcases seem all have a valid ip.

I will try to make up a testcase for it, feeling it's not easy thouth...

pieturin · 2022-05-26T23:33:54Z

I left a comment on the issue explaining the root cause.

In order to reproduce the issue you could try to write a test that does the following:

Create a cluster with 2 primaries and 2 replicas.
Once they the replicas are fully in sync with their primaries, block a replica Redis process for an extend period of time (in my case 10min worked). This should allow the OS to close the cluster bus connection between the blocked replica and its primary.
After the wait time, block the primary of the blocked replica for a short period of time so that replica gets the IP of the primary from another node in the cluster through gossiping.
Once the primary is blocked, unblock the replica. After unblocking the replica syscalls on the master<->replica connection socket will return a ENOTCONN (Transport endpoint is not connected) error.
Wait a few seconds (10sec should be enough), then unblock the primary.
Now the replica should have ? as IP for its primary.

pieturin · 2022-05-26T23:54:23Z

@uvletter - Another potential way of fixing this issue would be to test if the returned IP is ? in nodeUpdateAddressIfNeeded after:

redis/src/cluster.c

Lines 1780 to 1782 in 6f7c1a8

    
           nodeIp2String(ip,link,hdr->myip); 
        
           if (node->port == port && node->cport == cport && node->pport == pport && 
        
               strcmp(ip,node->ip) == 0) return 0;

But I like your solution better since you are logging a warning in case we cannot get the socket info.

uvletter · 2022-05-27T11:33:47Z

@pieturin Yeah I read the comment you left on the issue, the analysis is pretty reasonable. I tried to reproduce this problem by getpeername a socket which's closed by the peer, but it failed, your comment solved my puzzle. It's lucky my solution seems to work well while I cannot produce the problem.

uvletter · 2022-07-04T03:39:35Z

@madolson @oranagra hey guys, can you give some attention to it and advises is welcome!

madolson

If there isn't a good way to test this change, I'm okay merging it as is.

src/cluster.c

uvletter · 2022-07-12T14:37:07Z

If there isn't a good way to test this change, I'm okay merging it as is.

A little late to get back. It's somewhat hard to simulate the situation... since it need something blocking the process quite a long time exactly just before handling the PING message. But at least it won't have negative effect, if the inference by pieturin is correct, it should solve it in theory.

I'm not sure this should be a warning, since it's being handled gracefully. It's an interesting failure, warranting a message, but it's not really an "error" which is what we use WARNING for.

Yep, as for result it's not a problem as it's being handled gracefully, a NOTICE to log the event is just sufficient.

madolson · 2022-07-20T23:40:30Z

https://github.com/redis/redis/actions/runs/2708342317 running some tests before merging.

(cherry picked from commit d00b8af)

don't update node ip when peer ip is unreachable

0e11414

bjosv reviewed May 10, 2022

View reviewed changes

src/connection.c Outdated Show resolved Hide resolved

fix case of no connection

ca88c2c

oranagra assigned madolson Jul 4, 2022

madolson reviewed Jul 8, 2022

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Show resolved Hide resolved

madolson linked an issue Jul 8, 2022 that may be closed by this pull request

Cluster state changed: fail -----connecting to MASTER ?:7200 #9351

Closed

some fix via cr

10c5b3d

madolson approved these changes Jul 20, 2022

View reviewed changes

madolson added the release-notes indication that this issue needs to be mentioned in the release notes label Jul 20, 2022

madolson merged commit d00b8af into redis:unstable Jul 20, 2022

madolson removed their assignment Aug 8, 2022

oranagra mentioned this pull request Sep 21, 2022

Release 7.0.5 #11299

Merged

oranagra pushed a commit that referenced this pull request Sep 21, 2022

Don't update node ip when peer fd is closed (#10696)

babe90c

(cherry picked from commit d00b8af)

enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023

Don't update node ip when peer fd is closed (redis#10696)

a6fb467

wangweilica mentioned this pull request Feb 22, 2024

Don't update node ip when peer ip is unreachable #10696 #13084

Closed

wangweilica pushed a commit to wangweilica/redis that referenced this pull request Apr 7, 2024

Don't update node ip when peer fd is closed (redis#10696)

fd456ba

wangweilica mentioned this pull request Apr 7, 2024

Don't update node ip when peer fd is closed (#10696) #13197

Closed

wangweilica pushed a commit to wangweilica/redis that referenced this pull request Apr 17, 2024

Don't update node ip when peer fd is closed (redis#10696)

d6f7e69

wangweilica mentioned this pull request Apr 17, 2024

Don't update node ip when peer fd is closed (#10696) #13216

Open

enjoy-binbin mentioned this pull request Jun 12, 2025

Fix cluster replica unable to establish replication link in race condition valkey-io/valkey#965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't update node ip when peer ip is unreachable#10696

Don't update node ip when peer ip is unreachable#10696
madolson merged 3 commits intoredis:unstablefrom
uvletter:fix_cluster_update_ip

uvletter commented May 9, 2022 •

edited by madolson

Loading

Uh oh!

bjosv left a comment

Uh oh!

Uh oh!

bjosv commented May 10, 2022

Uh oh!

uvletter commented May 10, 2022

Uh oh!

pieturin commented May 26, 2022

Uh oh!

pieturin commented May 26, 2022

Uh oh!

uvletter commented May 27, 2022

Uh oh!

uvletter commented Jul 4, 2022

Uh oh!

madolson left a comment

Uh oh!

Uh oh!

Uh oh!

uvletter commented Jul 12, 2022

Uh oh!

madolson commented Jul 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

uvletter commented May 9, 2022 • edited by madolson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjosv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bjosv commented May 10, 2022

Uh oh!

uvletter commented May 10, 2022

Uh oh!

pieturin commented May 26, 2022

Uh oh!

pieturin commented May 26, 2022

Uh oh!

uvletter commented May 27, 2022

Uh oh!

uvletter commented Jul 4, 2022

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

uvletter commented Jul 12, 2022

Uh oh!

madolson commented Jul 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

uvletter commented May 9, 2022 •

edited by madolson

Loading