Resolve and switch master from sentinel. #121

CamW · 2014-11-03T11:51:43Z

So I've had a first go at putting service resolution and master switching based on sentinel messages into the client. I read #22 and http://redis.io/topics/sentinel-clients before getting started. While I tried to follow Marc's advice I ran into some trouble (possibly just because I'm unfamiliar with the code base) around having multiple server types in the same ConnectionMultiplexer.
I decided instead to do it with one multiplexer for sentinel servers and another for the standard redis servers. The issues I ran into were mostly down to config of the multiplexer and that you would often want different config for the sentinels and actual redis servers. Examples of this: Password, Admin, Tie Breaking.
Also there is a single CommandMap which would need to be switched around depending on which type of server you're communicating with.
Let me know what you think. I'm happy to look at reworking this if you have some ideas.

Thanks
Cameron.

psulek · 2015-07-19T19:51:09Z

@CamW I want to use your solution within our project and need to know if you successfully run changes in production and if master switch on failure works as expected?
Thanks a lot!

CamW · 2015-07-20T06:03:47Z

@psulek Yes, we have been running the changes in production since shortly after that commit, so for almost a year now. Master switch on failure has been working and we've found no issues with it. This version is getting a little old now though, I've been meaning to merge it in with the latest official Stack Exchange version. Also note, we're still running redis version 2.8. I'll try to find time to merge in the latest changes this week.
For details on how to use the changes I put in, have a look at the SentinelMasterSwitching.cs test which I built as part of the commit.

Let me know if you need any help.
Cameron.

psulek · 2015-07-20T07:12:27Z

@CamW thanks for reply. So you want to merge your commit to latest SE.Redis version? This will be great! Btw we have redis 3.x servers but i hope it should work with it too.

psulek · 2015-07-20T08:01:27Z

@CamW As i see in commit, you just utilize single sentinel server. Do you plan to add support for multiple sentinel servers?

CamW · 2015-07-20T08:21:19Z

Multiple sentinel servers are already supported, if you have a look at the
test, you'll see constants defined for Sentinel1Port and Sentinel2Port
those are the ports of 2 sentinel servers. In the test they are running on
the same machine. In production, we have 6 sentinel servers. Also, have a
look at line 31 in the test where the sentinel endpoints are configured.

On 20 July 2015 at 10:01, Peter Šulek [email protected] wrote:

@CamW https://github.com/CamW As i see in commit, you just utilize
single sentinel server. What about multiple sentinel servers? Do you plan
to add support for multiple sentinel servers?

—
Reply to this email directly or view it on GitHub
#121 (comment)
.

CamW · 2015-07-20T08:23:41Z

Yes, that's right I do want to merge the latest changes in, I'll let you
know when it's done although you could check the existing client. It may
work with newer servers unless you're using new functionality not available
in the older client.

Cameron Waldron
On 20 July 2015 at 09:12, Peter Šulek [email protected] wrote:

@CamW https://github.com/CamW thanks for reply. So you want to merge
your commit to latest SE.Redis version? This will be great! Btw we have
redis 3.x servers but i hope it should work with it too.

—
Reply to this email directly or view it on GitHub
#121 (comment)
.

psulek · 2015-07-20T08:36:05Z

Does method "GetConfiguredMasterForService" or other iterate all sentinel servers to detect master redis? I mean if majority of sentinels agree on same master, that redis server will act as master? Or other code which skip faling sentinel server and ask another working sentinel server for master?

CamW · 2015-07-20T09:07:52Z

The "GetConfiguredMasterForService" method makes parallel calls to all
configured sentinel servers to ask for the master server of the requested
service. When the first sentinel responds with a master, We accept that as
the master for the service. We do not need to check that the majority of
sentinels agree. It is the sentinel protocol and sentinel servers that are
responsible for that. We can trust the response from any one of the
configured sentinels.

Cameron Waldron
On 20 July 2015 at 10:36, Peter Šulek [email protected] wrote:

Does method "GetConfiguredMasterForService" or other iterate all sentinel
servers to detect master redis? I mean if majority of sentinels agree on
same master, that redis server will act as master? Or other code which skip
faling sentinel server and ask another working sentinel server for master?

—
Reply to this email directly or view it on GitHub
#121 (comment)
.

psulek · 2015-07-20T09:20:10Z

Thanks, i understand that. Really thanks for that contribution!

psulek · 2015-08-12T05:59:02Z

Any update on merge? Is it already in main branch?

chester89 · 2015-08-12T11:00:51Z

@CamW any chance you can merge the latest master?

psulek · 2015-08-12T13:26:01Z

@CamW I'm experiencing problems with this solution after merge with current master branch (12th of August 2015). When master-switch occurs , then your code block:

sub.Subscribe("+switch-master", (channel, message) =>
{
    string[] messageParts = ((string)message).Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    EndPoint switchBlame = Format.TryParseEndPoint(string.Format("{0}:{1}", messageParts[1], messageParts[2]));
    ReconfigureAsync(false, false, log, switchBlame, "master switch", false, CommandFlags.PreferMaster).Wait();
});

"fight" with internal ReconfigureIfNeeded called from OnConnectionFailed and real swtich to new master does not affect.

Will you have time to try your code with current latest master branch and post update?

…ge.Redis Conflicts: MigratedBookSleeveTestSuite/MigratedBookSleeveTestSuite.csproj StackExchange.Redis.Tests/StackExchange.Redis.Tests.csproj StackExchange.Redis/StackExchange/Redis/ConfigurationOptions.cs StackExchange.Redis/StackExchange/Redis/ConnectionMultiplexer.cs

CamW · 2015-08-12T15:11:23Z

@psulek I performed the merge when we discussed it several weeks ago but haven't had a chance to test so I didn't commit it but I have committed now (still not tested). You or @chester89 are welcome to have a look at my latest push and let me know if there are any issues.

psulek · 2015-08-13T05:52:42Z

@CamW Thanks a lot, i will test it right now a let you know.

psulek · 2015-08-13T07:23:43Z

@CamW I test your latest merge and it still failing on same problem described above.

Long problem description:
When received event +switch-master your code calls ReconfigureAsync but that call can be blocked in some situations (detected by log message Reconfiguration was already in progress). For example when some internal code calls ReconfigureAsync (event OnConnectionFailed or other). In such case recondiguration required to switch master is not called and connection still using old master server and never switch to new one.

In your unit test SentinelMasterSwitching.cs you cant see it because you havent code with can throw error like connection failed or other.

I managed to work for me when i change ResponseTimeout option for RedisConnection (not sentinel). Default value is taken from SyncTimeout and you set this timeout to 5000. When i set ResponseTimeout to lower value like 1000 it seems that your code is not blocked and everything works as expected.

But anyway there should be some way to ensure that reconfiguration of master happens ALWAYS when needed and will not be blocked.

I just created sample Console app to demonstrate the problem, found it on my GitHub-StackExchange.Redis.SentinelTests.
Before run, modify app.config key <add key="sentinel.cfg" value="ip1:26379,ip2:26379,ip3:26379,serviceName=XYZ"/> to point to yours sentinel servers and correct serviceName. If key redis.responsetimeout is set to 5000 your should see Reconfiguration was already in progress in produced file log.txt. Changing this value to 1000, error is no longer and everthing works fine.

@CamW if you have time to look at it and try to "fix" it somehow i will buy you a beer. I try it myself to fix it but without success.

obscurerichard · 2015-10-16T13:42:00Z

@CamW it would be so nice to have this working, is there any chance you might take a look again?

I'll second @psulek's notion, but I'd buy you a whole case of beer if you can get this integrated 🍻

ranjithvikram · 2015-10-16T21:48:47Z

Hi Everyone, I tried redis master/slave server setup (slave - readonly) where there are 4 sentinels (1 in master server , 2 in slave and 2 witness sentinels in 2 separate servers).

Instead of connecting to sentinels from the client to know who is the master, i configured my connection multiplexer as mentioned below
ConfigurationOptions config = new ConfigurationOptions
{
EndPoints =
{
{ "redis0", 6379 },
{ "redis1", 6380 }
},
}

Whenever the master goes down and the slave is elected as a master , the redis client connects to the current master. It tested this automatic failover serveral times and it works fine. Is there any reason on why client must connect to the sentinels ?

Below is an extract from https://github.com/StackExchange/StackExchange.Redis/blob/8e6c8c0d281862df01ce47eef0c2efa536959f95/Docs/Basics.md

"A more complicated scenario might involve a master/slave setup; for this usage, simply specify all the desired nodes that make up that logical redis tier (it will automatically identify the master):

ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("server1:6379,server2:6379");".

obscurerichard · 2015-10-16T22:22:53Z

Thanks for contributing to the discussion, @ranjithvikram. The trouble I encountered with attempting to use the automatic master detection when I last tried it, in the December 2014 to January 2015 state, was thus:

When the ConnectionMultiplexer was first created, it identified the master sucessfully and sent writes to it.
However, if the master failed and a slave was promoted to master, and the old master then became a slave, the StackExchange.Redis software kept trying to send writes to the old master. It did not understand that a state change had happened.
Listening to the sentinels for state change messages would address this issue, and let the software know when to switch masters.

If the underlying behavior of this client software has changed, that would be great news.

Maybe the software, on failure of a write operation, could poll the set of servers it has to determine which is the new master. That might be an alternate way to solve this problem that does not require integrating Sentinel, although this strategy has its own pitfalls. I'm going to go read some of the source code now to better educate myself on what is really happening under the hood.

ranjithvikram · 2015-10-18T04:00:43Z

@obscurerichard Thanks for the details provided. What you said is correct. I looked into the code and found that after every failure, the connectionmultiplexer reconfigures and connects to the newly elected Master.

Refer : PhysicalBridge.OnConnectionFailed(PhysicalConnection connection, ConnectionFailureType failureType, Exception innerException)

In both the approaches(connecting to sentinels/connecting to logical redis tier), there will be write failures until the sentinels elects the new master. This is well known. Other than this, can you tell me what are the pitfalls in this approach?

obscurerichard · 2015-10-18T14:00:54Z

In a split brain cluster partition situation, where the sentinels have elected a new master, but the old master still thinks it is a master, this reliance on the clients to poll the set of servers fails. This will lead to a possibly extended period of lost writes when the cluster heals.

That's why you really want to talk to the sentinels in order to figure out who the real master of the cluster is.

psulek · 2015-10-18T19:06:21Z

Using sentinels over direct list of master servers has another benefit. For example we have 2 masters(and some slaves) and 3 sentinels watching on them. But anytime we need to "hot plug" another master server, we can do this without stopping our application and add new master endpoint to app config. Instead we add new master to redis sentinel configs. After adding new master to sentinel, if any/all of old masters fails, sentinel will switch to new master just plugged in and our application stay alive.

CamW · 2015-12-18T06:04:13Z

@psulek I can't reproduce the error you're seeing using your test app.

Are you connecting to local redis instances or remote ones?
How many redis instances are you using to test?
If they're local could you provide the config files you're using?
How to you produce the error? Is it by running the console app, pressing enter to start and then shutting down your master redis instance while the console app is running through the GET# process?
Where do you see the error message? Is it in the logs or an unhandled exception thrown by the console app?
Are you using windows or linux redis instances to test?

psulek · 2016-02-09T07:35:07Z

@CamW , my answers:

Redis in our private network so not public cloud but also not local on dev pc
1xmaster, 1xslave and 1xsentinel
I cant provide whole config but can copy some part, just tell which
run console app, simulate master is down to take sentinel action and switch masters using redis-cli: redis-cli -p 6379 DEBUG sleep 30
exception from console app
Redis runs on linux (centos)

Hopes this helps

NickCraver · 2017-08-31T00:41:08Z

Not closing this out to close the idea, just consolidating PRs. Working on #692 to get Sentinel support in soon.

Resolve and switch master from sentinel.

2f8900e

CamW mentioned this pull request Feb 24, 2015

Sentinel support #22

Closed

Just merging in latest from upstream - this fork is still broken

bc15a25

This was referenced May 17, 2016

Sentinel Support Areson/StackExchange.Redis#1

Closed

Added Sentinel Support to ConnectionMultiplexer #406

Closed

NickCraver added the ⚙️ area:sentinel label Nov 20, 2016

vijay8059 mentioned this pull request May 3, 2017

How i can ask sentinel to give the current master #624

Closed

NickCraver closed this Aug 31, 2017

Resolve and switch master from sentinel. #121

Resolve and switch master from sentinel. #121

Uh oh!

Conversation

CamW commented Nov 3, 2014

Uh oh!

psulek commented Jul 19, 2015

Uh oh!

CamW commented Jul 20, 2015

Uh oh!

psulek commented Jul 20, 2015

Uh oh!

psulek commented Jul 20, 2015

Uh oh!

CamW commented Jul 20, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CamW commented Jul 20, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psulek commented Jul 20, 2015

Uh oh!

CamW commented Jul 20, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psulek commented Jul 20, 2015

Uh oh!

psulek commented Aug 12, 2015

Uh oh!

chester89 commented Aug 12, 2015

Uh oh!

psulek commented Aug 12, 2015

Uh oh!

CamW commented Aug 12, 2015

Uh oh!

psulek commented Aug 13, 2015

Uh oh!

psulek commented Aug 13, 2015

Uh oh!

obscurerichard commented Oct 16, 2015

Uh oh!

ranjithvikram commented Oct 16, 2015

Uh oh!

obscurerichard commented Oct 16, 2015

Uh oh!

ranjithvikram commented Oct 18, 2015

Uh oh!

obscurerichard commented Oct 18, 2015

Uh oh!

psulek commented Oct 18, 2015

Uh oh!

CamW commented Dec 18, 2015

Uh oh!

psulek commented Feb 9, 2016

Uh oh!

NickCraver commented Aug 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CamW commented Jul 20, 2015 •

edited

Loading

CamW commented Jul 20, 2015 •

edited

Loading

CamW commented Jul 20, 2015 •

edited

Loading