feat(translator): set min retry limit of 100#2769
feat(translator): set min retry limit of 100#2769guydc wants to merge 1 commit intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Guy Daich <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2769 +/- ##
==========================================
+ Coverage 63.63% 63.66% +0.02%
==========================================
Files 123 123
Lines 20189 20193 +4
==========================================
+ Hits 12847 12855 +8
+ Misses 6519 6516 -3
+ Partials 823 822 -1 ☔ View full report in Codecov by Sentry. |
| Priority: corev3.RoutingPriority_DEFAULT, | ||
| // concurrent retries limit = max(100, current-active-requests) | ||
| RetryBudget: &clusterv3.CircuitBreakers_Thresholds_RetryBudget{ | ||
| BudgetPercent: &xdstype.Percent{ |
There was a problem hiding this comment.
if min_rety_concurrency is adequate, should be rm retryBudget:100% setting ?
There was a problem hiding this comment.
also can we disable min_retry_concurrency by setting it to 0 ?
There was a problem hiding this comment.
what im trying to get at is can we do without this RetryBudget knob, if the 20% default is too low, can we increase maxParallelRequests (guesing the retry budget percentage is off this value) in the circuit breaker settings ?
There was a problem hiding this comment.
if
min_rety_concurrencyis adequate, should be rmretryBudget:100%setting ?
I understand them as being complementary, with min retries acting as a guardrail. The budget-based limit will dynamically scale with traffic. However, in smaller numbers, the budget could be a problem. Imagine that you had some traffic that generated transient failures, and then client traffic drops to 0: this means that the budget-based limit is now also 0 and all retries are aborted.
There was a problem hiding this comment.
@guydc isnt that what the knob is designed for ? to not invest more cycles when failure is evident ?
since youve already added the host predicate setting for retry it should hopefully retry and achieve success
There was a problem hiding this comment.
what im trying to get at is can we do without this
RetryBudgetknob, if the 20% default is too low, can we increasemaxParallelRequests(guesing the retry budget percentage is off this value) in the circuit breaker settings ?
The circuit breaker value only define the upper limit of the budget. But, the actual limit is calculated dynamically from the volume of traffic seen. So, increasing one of the circuit breakers wouldn't necessarily produce the expected outcome (actual retry limit being increased).
There was a problem hiding this comment.
based on issues like envoyproxy/envoy#30205, id like to avoid this setting for now
There was a problem hiding this comment.
@arkodg - Sure, that makes sense. During the API discussion for retries, we preferred the more recent budget-based approach, which is why I tried to use it here.
I have to admit that I'm also a bit surprised by the behavior of this feature, and this follow-up PR is trying to further tweak our defaults so that our upgrades are hitless. Maybe the best approach would be to wait for this feature to receive some additional improvements, like the ones outlined in the mentioned issue.
In that case, I can remove the budget altogether and replace it with some default max_retries value that's derived from one of the circuit breakers (e.g. 20% of parallel requests) . WDYT? Is that ok?
|
closing in favor of #2773 |
What this PR does / why we need it:
In addition to #2757, set a default minimum value of
100for the limit on concurrent retries. Since the concurrent budget is calculated based on the current active requests, the dynamic retry limit may be too low when the volume of traffic is not significant and there are many upstream failures.In the future, we should make these options configurable, so that users can tune them to their needs.
Which issue(s) this PR fixes:
Fixes #2754