Skip to content

Conversation

@CamW
Copy link

@CamW CamW commented Nov 3, 2014

So I've had a first go at putting service resolution and master switching based on sentinel messages into the client. I read #22 and http://redis.io/topics/sentinel-clients before getting started. While I tried to follow Marc's advice I ran into some trouble (possibly just because I'm unfamiliar with the code base) around having multiple server types in the same ConnectionMultiplexer.
I decided instead to do it with one multiplexer for sentinel servers and another for the standard redis servers. The issues I ran into were mostly down to config of the multiplexer and that you would often want different config for the sentinels and actual redis servers. Examples of this: Password, Admin, Tie Breaking.
Also there is a single CommandMap which would need to be switched around depending on which type of server you're communicating with.
Let me know what you think. I'm happy to look at reworking this if you have some ideas.

Thanks
Cameron.

@CamW CamW mentioned this pull request Feb 24, 2015
@psulek
Copy link

psulek commented Jul 19, 2015

@CamW I want to use your solution within our project and need to know if you successfully run changes in production and if master switch on failure works as expected?
Thanks a lot!

@CamW
Copy link
Author

CamW commented Jul 20, 2015

@psulek Yes, we have been running the changes in production since shortly after that commit, so for almost a year now. Master switch on failure has been working and we've found no issues with it. This version is getting a little old now though, I've been meaning to merge it in with the latest official Stack Exchange version. Also note, we're still running redis version 2.8. I'll try to find time to merge in the latest changes this week.
For details on how to use the changes I put in, have a look at the SentinelMasterSwitching.cs test which I built as part of the commit.

Let me know if you need any help.
Cameron.

@psulek
Copy link

psulek commented Jul 20, 2015

@CamW thanks for reply. So you want to merge your commit to latest SE.Redis version? This will be great! Btw we have redis 3.x servers but i hope it should work with it too.

@psulek
Copy link

psulek commented Jul 20, 2015

@CamW As i see in commit, you just utilize single sentinel server. Do you plan to add support for multiple sentinel servers?

@CamW
Copy link
Author

CamW commented Jul 20, 2015

Multiple sentinel servers are already supported, if you have a look at the
test, you'll see constants defined for Sentinel1Port and Sentinel2Port
those are the ports of 2 sentinel servers. In the test they are running on
the same machine. In production, we have 6 sentinel servers. Also, have a
look at line 31 in the test where the sentinel endpoints are configured.

On 20 July 2015 at 10:01, Peter Šulek [email protected] wrote:

@CamW https://github.com/CamW As i see in commit, you just utilize
single sentinel server. What about multiple sentinel servers? Do you plan
to add support for multiple sentinel servers?


Reply to this email directly or view it on GitHub
#121 (comment)
.

@CamW
Copy link
Author

CamW commented Jul 20, 2015

Yes, that's right I do want to merge the latest changes in, I'll let you
know when it's done although you could check the existing client. It may
work with newer servers unless you're using new functionality not available
in the older client.

Cameron Waldron
On 20 July 2015 at 09:12, Peter Šulek [email protected] wrote:

@CamW https://github.com/CamW thanks for reply. So you want to merge
your commit to latest SE.Redis version? This will be great! Btw we have
redis 3.x servers but i hope it should work with it too.


Reply to this email directly or view it on GitHub
#121 (comment)
.

@psulek
Copy link

psulek commented Jul 20, 2015

Does method "GetConfiguredMasterForService" or other iterate all sentinel servers to detect master redis? I mean if majority of sentinels agree on same master, that redis server will act as master? Or other code which skip faling sentinel server and ask another working sentinel server for master?

@CamW
Copy link
Author

CamW commented Jul 20, 2015

The "GetConfiguredMasterForService" method makes parallel calls to all
configured sentinel servers to ask for the master server of the requested
service. When the first sentinel responds with a master, We accept that as
the master for the service. We do not need to check that the majority of
sentinels agree. It is the sentinel protocol and sentinel servers that are
responsible for that. We can trust the response from any one of the
configured sentinels.

Cameron Waldron
On 20 July 2015 at 10:36, Peter Šulek [email protected] wrote:

Does method "GetConfiguredMasterForService" or other iterate all sentinel
servers to detect master redis? I mean if majority of sentinels agree on
same master, that redis server will act as master? Or other code which skip
faling sentinel server and ask another working sentinel server for master?


Reply to this email directly or view it on GitHub
#121 (comment)
.

@psulek
Copy link

psulek commented Jul 20, 2015

Thanks, i understand that. Really thanks for that contribution!

@psulek
Copy link

psulek commented Aug 12, 2015

Any update on merge? Is it already in main branch?

@chester89
Copy link

@CamW any chance you can merge the latest master?

@psulek
Copy link

psulek commented Aug 12, 2015

@CamW I'm experiencing problems with this solution after merge with current master branch (12th of August 2015). When master-switch occurs , then your code block:

sub.Subscribe("+switch-master", (channel, message) =>
{
    string[] messageParts = ((string)message).Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    EndPoint switchBlame = Format.TryParseEndPoint(string.Format("{0}:{1}", messageParts[1], messageParts[2]));
    ReconfigureAsync(false, false, log, switchBlame, "master switch", false, CommandFlags.PreferMaster).Wait();
});

"fight" with internal ReconfigureIfNeeded called from OnConnectionFailed and real swtich to new master does not affect.

Will you have time to try your code with current latest master branch and post update?

…ge.Redis

Conflicts:
	MigratedBookSleeveTestSuite/MigratedBookSleeveTestSuite.csproj
	StackExchange.Redis.Tests/StackExchange.Redis.Tests.csproj
	StackExchange.Redis/StackExchange/Redis/ConfigurationOptions.cs
	StackExchange.Redis/StackExchange/Redis/ConnectionMultiplexer.cs
@CamW
Copy link
Author

CamW commented Aug 12, 2015

@psulek I performed the merge when we discussed it several weeks ago but haven't had a chance to test so I didn't commit it but I have committed now (still not tested). You or @chester89 are welcome to have a look at my latest push and let me know if there are any issues.

@psulek
Copy link

psulek commented Aug 13, 2015

@CamW Thanks a lot, i will test it right now a let you know.

@psulek
Copy link

psulek commented Aug 13, 2015

@CamW I test your latest merge and it still failing on same problem described above.

Long problem description:
When received event +switch-master your code calls ReconfigureAsync but that call can be blocked in some situations (detected by log message Reconfiguration was already in progress). For example when some internal code calls ReconfigureAsync (event OnConnectionFailed or other). In such case recondiguration required to switch master is not called and connection still using old master server and never switch to new one.

In your unit test SentinelMasterSwitching.cs you cant see it because you havent code with can throw error like connection failed or other.

I managed to work for me when i change ResponseTimeout option for RedisConnection (not sentinel). Default value is taken from SyncTimeout and you set this timeout to 5000. When i set ResponseTimeout to lower value like 1000 it seems that your code is not blocked and everything works as expected.

But anyway there should be some way to ensure that reconfiguration of master happens ALWAYS when needed and will not be blocked.

I just created sample Console app to demonstrate the problem, found it on my GitHub-StackExchange.Redis.SentinelTests.
Before run, modify app.config key <add key="sentinel.cfg" value="ip1:26379,ip2:26379,ip3:26379,serviceName=XYZ"/> to point to yours sentinel servers and correct serviceName. If key redis.responsetimeout is set to 5000 your should see Reconfiguration was already in progress in produced file log.txt. Changing this value to 1000, error is no longer and everthing works fine.

@CamW if you have time to look at it and try to "fix" it somehow i will buy you a beer. I try it myself to fix it but without success.

@obscurerichard
Copy link

@CamW it would be so nice to have this working, is there any chance you might take a look again?

I'll second @psulek's notion, but I'd buy you a whole case of beer if you can get this integrated 🍻

@ranjithvikram
Copy link

Hi Everyone, I tried redis master/slave server setup (slave - readonly) where there are 4 sentinels (1 in master server , 2 in slave and 2 witness sentinels in 2 separate servers).

Instead of connecting to sentinels from the client to know who is the master, i configured my connection multiplexer as mentioned below
ConfigurationOptions config = new ConfigurationOptions
{
EndPoints =
{
{ "redis0", 6379 },
{ "redis1", 6380 }
},
}

Whenever the master goes down and the slave is elected as a master , the redis client connects to the current master. It tested this automatic failover serveral times and it works fine. Is there any reason on why client must connect to the sentinels ?

Below is an extract from https://github.com/StackExchange/StackExchange.Redis/blob/8e6c8c0d281862df01ce47eef0c2efa536959f95/Docs/Basics.md

"A more complicated scenario might involve a master/slave setup; for this usage, simply specify all the desired nodes that make up that logical redis tier (it will automatically identify the master):

ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("server1:6379,server2:6379");".

@obscurerichard
Copy link

Thanks for contributing to the discussion, @ranjithvikram. The trouble I encountered with attempting to use the automatic master detection when I last tried it, in the December 2014 to January 2015 state, was thus:

  • When the ConnectionMultiplexer was first created, it identified the master sucessfully and sent writes to it.
  • However, if the master failed and a slave was promoted to master, and the old master then became a slave, the StackExchange.Redis software kept trying to send writes to the old master. It did not understand that a state change had happened.
  • Listening to the sentinels for state change messages would address this issue, and let the software know when to switch masters.

If the underlying behavior of this client software has changed, that would be great news.

Maybe the software, on failure of a write operation, could poll the set of servers it has to determine which is the new master. That might be an alternate way to solve this problem that does not require integrating Sentinel, although this strategy has its own pitfalls. I'm going to go read some of the source code now to better educate myself on what is really happening under the hood.

@ranjithvikram
Copy link

@obscurerichard Thanks for the details provided. What you said is correct. I looked into the code and found that after every failure, the connectionmultiplexer reconfigures and connects to the newly elected Master.

Refer : PhysicalBridge.OnConnectionFailed(PhysicalConnection connection, ConnectionFailureType failureType, Exception innerException)

In both the approaches(connecting to sentinels/connecting to logical redis tier), there will be write failures until the sentinels elects the new master. This is well known. Other than this, can you tell me what are the pitfalls in this approach?

@obscurerichard
Copy link

In a split brain cluster partition situation, where the sentinels have elected a new master, but the old master still thinks it is a master, this reliance on the clients to poll the set of servers fails. This will lead to a possibly extended period of lost writes when the cluster heals.

That's why you really want to talk to the sentinels in order to figure out who the real master of the cluster is.

@psulek
Copy link

psulek commented Oct 18, 2015

Using sentinels over direct list of master servers has another benefit. For example we have 2 masters(and some slaves) and 3 sentinels watching on them. But anytime we need to "hot plug" another master server, we can do this without stopping our application and add new master endpoint to app config. Instead we add new master to redis sentinel configs. After adding new master to sentinel, if any/all of old masters fails, sentinel will switch to new master just plugged in and our application stay alive.

@CamW
Copy link
Author

CamW commented Dec 18, 2015

@psulek I can't reproduce the error you're seeing using your test app.

  • Are you connecting to local redis instances or remote ones?
  • How many redis instances are you using to test?
  • If they're local could you provide the config files you're using?
  • How to you produce the error? Is it by running the console app, pressing enter to start and then shutting down your master redis instance while the console app is running through the GET# process?
  • Where do you see the error message? Is it in the logs or an unhandled exception thrown by the console app?
  • Are you using windows or linux redis instances to test?

@psulek
Copy link

psulek commented Feb 9, 2016

@CamW , my answers:

  1. Redis in our private network so not public cloud but also not local on dev pc
  2. 1xmaster, 1xslave and 1xsentinel
  3. I cant provide whole config but can copy some part, just tell which
  4. run console app, simulate master is down to take sentinel action and switch masters using redis-cli: redis-cli -p 6379 DEBUG sleep 30
  5. exception from console app
  6. Redis runs on linux (centos)

Hopes this helps

@NickCraver
Copy link
Collaborator

Not closing this out to close the idea, just consolidating PRs. Working on #692 to get Sentinel support in soon.

@NickCraver NickCraver closed this Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants