Skip to content

feat(translator): set min retry limit of 100#2769

Closed
guydc wants to merge 1 commit intoenvoyproxy:mainfrom
guydc:add-retry-min-concurrency
Closed

feat(translator): set min retry limit of 100#2769
guydc wants to merge 1 commit intoenvoyproxy:mainfrom
guydc:add-retry-min-concurrency

Conversation

@guydc
Copy link
Copy Markdown
Contributor

@guydc guydc commented Mar 4, 2024

What this PR does / why we need it:
In addition to #2757, set a default minimum value of 100 for the limit on concurrent retries. Since the concurrent budget is calculated based on the current active requests, the dynamic retry limit may be too low when the volume of traffic is not significant and there are many upstream failures.

In the future, we should make these options configurable, so that users can tune them to their needs.

Which issue(s) this PR fixes:
Fixes #2754

@guydc guydc requested a review from a team as a code owner March 4, 2024 18:44
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.66%. Comparing base (ca572b8) to head (869be1d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2769      +/-   ##
==========================================
+ Coverage   63.63%   63.66%   +0.02%     
==========================================
  Files         123      123              
  Lines       20189    20193       +4     
==========================================
+ Hits        12847    12855       +8     
+ Misses       6519     6516       -3     
+ Partials      823      822       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Priority: corev3.RoutingPriority_DEFAULT,
// concurrent retries limit = max(100, current-active-requests)
RetryBudget: &clusterv3.CircuitBreakers_Thresholds_RetryBudget{
BudgetPercent: &xdstype.Percent{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if min_rety_concurrency is adequate, should be rm retryBudget:100% setting ?

Copy link
Copy Markdown
Contributor

@arkodg arkodg Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we disable min_retry_concurrency by setting it to 0 ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what im trying to get at is can we do without this RetryBudget knob, if the 20% default is too low, can we increase maxParallelRequests (guesing the retry budget percentage is off this value) in the circuit breaker settings ?

Copy link
Copy Markdown
Contributor Author

@guydc guydc Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if min_rety_concurrency is adequate, should be rm retryBudget:100% setting ?

I understand them as being complementary, with min retries acting as a guardrail. The budget-based limit will dynamically scale with traffic. However, in smaller numbers, the budget could be a problem. Imagine that you had some traffic that generated transient failures, and then client traffic drops to 0: this means that the budget-based limit is now also 0 and all retries are aborted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guydc isnt that what the knob is designed for ? to not invest more cycles when failure is evident ?
since youve already added the host predicate setting for retry it should hopefully retry and achieve success

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what im trying to get at is can we do without this RetryBudget knob, if the 20% default is too low, can we increase maxParallelRequests (guesing the retry budget percentage is off this value) in the circuit breaker settings ?

The circuit breaker value only define the upper limit of the budget. But, the actual limit is calculated dynamically from the volume of traffic seen. So, increasing one of the circuit breakers wouldn't necessarily produce the expected outcome (actual retry limit being increased).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on issues like envoyproxy/envoy#30205, id like to avoid this setting for now

Copy link
Copy Markdown
Contributor Author

@guydc guydc Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg - Sure, that makes sense. During the API discussion for retries, we preferred the more recent budget-based approach, which is why I tried to use it here.

I have to admit that I'm also a bit surprised by the behavior of this feature, and this follow-up PR is trying to further tweak our defaults so that our upgrades are hitless. Maybe the best approach would be to wait for this feature to receive some additional improvements, like the ones outlined in the mentioned issue.

In that case, I can remove the budget altogether and replace it with some default max_retries value that's derived from one of the circuit breakers (e.g. 20% of parallel requests) . WDYT? Is that ok?

@guydc
Copy link
Copy Markdown
Contributor Author

guydc commented Mar 4, 2024

closing in favor of #2773

@guydc guydc closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

set a default retry budget and host retry predicate

2 participants