Handle node failure properly in sender#1135
Conversation
c8f4f51 to
1290a2c
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1135 +/- ##
==========================================
- Coverage 94.83% 94.80% -0.03%
==========================================
Files 88 88
Lines 15671 15716 +45
Branches 1374 1374
==========================================
+ Hits 14861 14900 +39
- Misses 565 570 +5
- Partials 245 246 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0a7c691 to
d23e281
Compare
|
@ods Does it look like a good approach to you ? |
|
@ods The logic of this PR is too mark the (transaction or group) coordinator dead when we encounter a low level error while trying to send a message to it. On the retry, it will give the opportunity to find the eventual new coordinator, while currently it is just looping over and over like described in #1134 |
779d9f2 to
7a3889a
Compare
|
Back to draft. |
7cecec0 to
5ccca4d
Compare
@ods I was wrong ! I did a dumb mistake as on my testing project, the __transaction_state topic had a replication factor of 1. When I was killing the coordinator, no new leader was elected and the client was getting a Setting the proper replication factor for __transaction_state, I managed to have the proper use cases:
On I reduce the exception scope to only two exceptions, we might need to had more if needed, but it feels more appropriated like this. So I would consider this PR ready to merge |
Various sender activities require the sender to talk to a dedicated coordinator node, for a related group or for transaction management. When trying to reach this node, an error could happen because of the broker being unavailable or unreachable. In this case, we must mark the coordinator dead, so instead of retrying over and over to the same node, we will first try to find an active coordinator. fixes aio-libs#1134
Various sender activities require the sender to talk to a dedicated coordinator node, for a related group or for transaction management.
When trying to reach this node, an error could happen because of the broker being unavailable or unreachable. In this case, we must mark the coordinator dead, so instead of retrying over and over to the same node, we will first try to find an active coordinator.
fixes #1134
Changes
Fixes #1134
Checklist
CHANGESfolder<issue_id>.<type>(e.g.588.bugfix)issue_idchange it to the pr id after creating the PR.feature: Signifying a new feature..bugfix: Signifying a bug fix..doc: Signifying a documentation improvement..removal: Signifying a deprecation or removal of public API..misc: A ticket has been closed, but it is not of interest to users.Fix issue with non-ascii contents in doctest text files.