Skip to content

Fixed Sentinel support for hostnames and dynamic IP addresses#10146

Merged
yossigo merged 1 commit intoredis:unstablefrom
moticless:fix-sentinel-resolve-hostname
Jan 29, 2022
Merged

Fixed Sentinel support for hostnames and dynamic IP addresses#10146
yossigo merged 1 commit intoredis:unstablefrom
moticless:fix-sentinel-resolve-hostname

Conversation

@moticless
Copy link
Collaborator

@moticless moticless commented Jan 19, 2022

Sentinel tries to resolve instances hostname to IP only during registration.
It might be that the instance is unavailable during that time, such as
leader crashed and failover took place. Yet, promoted replica must support:

  • Register leader, even if it fails to resolve its hostname during failover
  • Try later to resolve it, if instance is disconnected. Note that
    this condition also support ip-change of an instance.

Resolves #8540, Resolves #9103
Test plan available here

Sentinel only tries to resolve instances hostname to ip only during
registration. It might be that the instance is unavailable during
that time, such as leader crashed and failover took place. Yet,
promoted replica must support:
 - Register the leader, even if fail to resolve instance hostname
 - Try later to resolve it, if instance is disconnected. Note that
   this condition also support ip-change of an instance.
@moticless moticless force-pushed the fix-sentinel-resolve-hostname branch from a1ce654 to d5c2f04 Compare January 20, 2022 08:55
Copy link
Collaborator

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moticless LGTM, do you think we need to get some confirmation that this indeed solves the reported issues?

Also, ideally we should have tests that follow this test plan - but I don't see a simple, effective and portable way to actually do that other than mocking anetResolve() which might be a bit of an overkill. So I'm OK with leaving it as it is, unless another simple ideas comes up.

@moticless
Copy link
Collaborator Author

moticless commented Jan 20, 2022

@yossigo ,

  • maybe either start considering seriously about adding unit testing to this repo.
  • Or more specific option is to use something like this in order to manipulate getaddrinfo() syscall.
  • Another option is to consider having a set of tests that will run within containers to mimic more real networking and in addition be able to manipulate networking as we want.

@yossigo yossigo added the release-notes indication that this issue needs to be mentioned in the release notes label Jan 29, 2022
@yossigo yossigo merged commit 79f089b into redis:unstable Jan 29, 2022
@yossigo
Copy link
Collaborator

yossigo commented Jan 29, 2022

@moticless I think that being able to create tests that run beyond the scope of processes on a single instance will be useful, perhaps we can push this as part of bigger Redis Cluster efforts.

As for simulating resolution errors, at least on Linux/glibc systems, I think there's another option: use the HOSTALIASES environment variable to point to a local file that can be manipulated by the test.

@moticless moticless deleted the fix-sentinel-resolve-hostname branch January 30, 2022 07:23
@ghollies
Copy link

ghollies commented Mar 9, 2022

👀 We think we are running into the issue that this will fix. Given that the last 6.2 release was several months ago, what are the odds we'll see this fix in the next one?

@moticless
Copy link
Collaborator Author

@ghollies it is already part of 7.0 RC2. I will ask to add it to release notes. Thanks.

@ghollies
Copy link

ghollies commented Mar 9, 2022

@moticless thanks for the quick response! any chance it will be backported to 6.x? or should we just wait for 7.0 to go final?

@moticless
Copy link
Collaborator Author

@moticless thanks for the quick response! any chance it will be backported to 6.x? or should we just wait for 7.0 to go final?

@yossigo, please refer. Thanks.

@yossigo
Copy link
Collaborator

yossigo commented Mar 9, 2022

@ghollies I think it makes sense to backport it, yes.
Update: I see it's already listed for backport.

This was referenced Apr 27, 2022
oranagra pushed a commit that referenced this pull request Apr 27, 2022
Sentinel tries to resolve instances hostname to IP only during registration.
It might be that the instance is unavailable during that time, such as
leader crashed and failover took place. Yet, promoted replica must support:

 - Register leader, even if it fails to resolve its hostname during failover
 - Try later to resolve it, if instance is disconnected. Note that
   this condition also support ip-change of an instance.

(cherry picked from commit 79f089b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

4 participants