Auto retry message on connection failure #1755

deepakverma · 2021-06-07T21:58:27Z

PR to Auto Retry requests on a connection failure:

Requirement

During a network blip with the current implementation Se.Redis multiplexer marks requests as failed for any inflight or yet to be sent requests.
The feature is to have a SE.Redis inbuilt retriable mechanism. Se.Redis based on a retry manager will queue the requests and play them back once connection has been restored.

Allow SetWriteTime for sync messages as well

…design is solidified

NickCraver · 2021-06-21T19:49:53Z

@deepakverma if you merge main in here, should help with test stability - can you when time allows please?

# Conflicts: # src/StackExchange.Redis/PhysicalBridge.cs

deepakverma · 2021-06-21T20:27:59Z

@deepakverma if you merge main in here, should help with test stability - can you when time allows please?

Great, I have merged & see at least one of the flaky tests I had failures is green now :)

NickCraver · 2021-06-22T16:54:41Z

Will try and dig in tonight or tomorrow, but I do see SSL.Issue883_Exhaustive is for some reason failing across all environments - something's up there.

deepakverma · 2021-06-22T20:00:26Z

Will try and dig in tonight or tomorrow, but I do see SSL.Issue883_Exhaustive is for some reason failing across all environments - something's up there.

Thanks, test is fixed now.

NickCraver

Added initial thoughts for discussion - I'm unsure about the queues but wanted review in to chat with Marc on Monday.

src/StackExchange.Redis/ConfigurationOptions.cs

src/StackExchange.Redis/MessageRetry/MessageRetryManager.cs

src/StackExchange.Redis/ConnectionMultiplexer.cs

src/StackExchange.Redis/Message.cs

src/StackExchange.Redis/ConnectionMultiplexer.cs

NickCraver · 2021-06-27T11:26:52Z

Had some more thoughts after sleeping on it: there are inherent problems that this being global yields, for example we probably don't want to re-issue REPLICAOF commands and things like that to the server, or SHUTDOWN for that matter. Which things should be retried needs to be determined, and either flagged or have some other approach.

There's one other point of confusion I thought of: we do retry a command when we get a MOVED response on cluster already today - let's just be cognizant in the final story that CommandFlags.NoRedirect remains clearly different.

deepakverma · 2021-06-29T01:02:57Z

Had some more thoughts after sleeping on it: there are inherent problems that this being global yields, for example we probably don't want to re-issue REPLICAOF commands and things like that to the server, or SHUTDOWN for that matter. Which things should be retried needs to be determined, and either flagged or have some other approach.

There's one other point of confusion I thought of: we do retry a command when we get a MOVED response on cluster already today - let's just be cognizant in the final story that CommandFlags.NoRedirect remains clearly different.
Good point, Shutdown, replicaof or any other similar command shouldn't be retried (generally cloud providers disable this command)
for noredirect flag, I have removed the reset of no redirect flag in messageretry so that it's being honored during retry

deepakverma · 2021-07-29T05:39:37Z

ah, I will be taking a look at how to resolve conflicts here tomm morning.

….Redis into StackExchange-main # Conflicts: # src/StackExchange.Redis/ConnectionMultiplexer.cs # src/StackExchange.Redis/Message.cs

deepakverma · 2021-08-02T15:53:13Z

I have resolved the conflicts. CommandFlags unit test is not liking the addition of the new enums, where it's failing with the duplicate enum value for prefer*. @mgravell , do you have any recommendations on how to fix it? thanks.

Make ConfigOption specifically about "RetryCommandsOnReconnect"

Autoretry2

src/StackExchange.Redis/MessageRetry/MessageRetryManager.cs

TimLovellSmith · 2021-06-21T17:29:44Z

src/StackExchange.Redis/MessageRetry/MessageRetryManager.cs

+        internal bool PushMessageForRetry(Message message)
+        {
+            bool wasEmpty;
+            lock (queue)


This code seems very similar to the old PhysicalBridge backlog code, though it has changed, I wonder if we should extract out a MessageBacklog class to handle both functions? Reducing duplication of bugs is usually a win, and it might even turn out to be good for contention to be doing it the lock-free way.

TimLovellSmith · 2021-07-29T16:35:56Z

src/StackExchange.Redis/CommandRetry/DefaultRetry.cs

+        /// <returns></returns>
+        public ICommandRetryPolicy AlwaysRetryOnConnectionException()
+        {
+            return new AlwaysRetryOnConnectionException();


Nit, don't really need to new each time, could just have a static instance.

TimLovellSmith · 2021-07-29T16:40:47Z