-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Auto retry message on connection failure #1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Allow SetWriteTime for sync messages as well
…design is solidified
|
@deepakverma if you merge main in here, should help with test stability - can you when time allows please? |
# Conflicts: # src/StackExchange.Redis/PhysicalBridge.cs
Great, I have merged & see at least one of the flaky tests I had failures is green now :) |
|
Will try and dig in tonight or tomorrow, but I do see |
Thanks, test is fixed now. |
NickCraver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added initial thoughts for discussion - I'm unsure about the queues but wanted review in to chat with Marc on Monday.
|
Had some more thoughts after sleeping on it: there are inherent problems that this being global yields, for example we probably don't want to re-issue There's one other point of confusion I thought of: we do retry a command when we get a |
|
|
ah, I will be taking a look at how to resolve conflicts here tomm morning. |
….Redis into StackExchange-main # Conflicts: # src/StackExchange.Redis/ConnectionMultiplexer.cs # src/StackExchange.Redis/Message.cs
|
I have resolved the conflicts. CommandFlags unit test is not liking the addition of the new enums, where it's failing with the duplicate enum value for prefer*. @mgravell , do you have any recommendations on how to fix it? thanks. |
Make ConfigOption specifically about "RetryCommandsOnReconnect"
Autoretry2
| internal bool PushMessageForRetry(Message message) | ||
| { | ||
| bool wasEmpty; | ||
| lock (queue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems very similar to the old PhysicalBridge backlog code, though it has changed, I wonder if we should extract out a MessageBacklog class to handle both functions? Reducing duplication of bugs is usually a win, and it might even turn out to be good for contention to be doing it the lock-free way.
| /// <returns></returns> | ||
| public ICommandRetryPolicy AlwaysRetryOnConnectionException() | ||
| { | ||
| return new AlwaysRetryOnConnectionException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, don't really need to new each time, could just have a static instance.
| { | ||
| var task = Task.Run(ProcessRetryQueueAsync); | ||
| if (task.IsFaulted) | ||
| throw task.Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw task.Exception - Won't that overwrite the exception stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I saw a good helper method used for this scenario elsewhere, called 'ObserveException' or something like that.
| { | ||
| lock (queue) | ||
| { | ||
| var now = Environment.TickCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see 'now' used anywhere, is it just for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed unused, but the current implementation could end up accessing Environment.TickCount n times which isn't awesome - think we should refactor that a bit.
|
|
||
| try | ||
| { | ||
| if (messageRetryHelper.HasTimedOut(message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we want to check for timeout of messages before we do break from the loop, otherwise timed out messages would linger in (and block?) the queue a long time. i.e. we stop dequeueing until their endpoint is available. When instead we could have timed them out and unblocked the rest of the queue.
' if (!messageRetryHelper.IsEndpointAvailable(message))
{
break;
}'
| { | ||
| while (queue.Count != 0) | ||
| { | ||
| message = queue.Dequeue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, could change to TryDequeue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same < netstandard2.0 issue here :(
| lock (queue) | ||
| { | ||
| var now = Environment.TickCount; | ||
| while (queue.Count != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, could change to while (queue.TryPeek)
| bool createWorker = !_backlog.IsEmpty; | ||
| if (createWorker) StartBacklogProcessor(); | ||
|
|
||
| Multiplexer.RetryQueueManager.StartRetryQueueProcessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are retries always in a race with the current backlog to be dequeued and written? This could cause confusing message reordering.
|
@deepakverma Apparently I fail at using github - I had some old comments which were sitting around in a 'review' that I had not 'published'. Were you able to see them until now? How about now? Sorry for the ones which are outdated. |
|
I have a PR against this for simplifications we talked about at deepakverma#5 but it's against a fork repo so a bit hard to see...and we can't change the source branch for this PR, but opening a new one loses all discussion and great points & suggestions above. I'm not sure on best path here since @deepakverma's out this week, @TimLovellSmith should we work on those changes against my amendment branch? I want to talk through some items like not being in love with the |
|
Collapsing into #1856 to get us on a branch in this repo to collab on as discussed on call! |
PR to Auto Retry requests on a connection failure:
Requirement
During a network blip with the current implementation Se.Redis multiplexer marks requests as failed for any inflight or yet to be sent requests.
The feature is to have a SE.Redis inbuilt retriable mechanism. Se.Redis based on a retry manager will queue the requests and play them back once connection has been restored.