retry policy: make retry policy pluggable#10099
retry policy: make retry policy pluggable#10099yxue wants to merge 31 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
|
I think I was imagining the scope of this extension to be a bit bigger, eventually making it able to express the entire current RetryPolicy, including managing retry attempts if desirable (e.g. someone might want certain failures to not count against the retry count). How about something that looks like this: With this kind of interface its now capable of tracking the lifetime of multiple requests, so retry policies that might want to do dynamic retry count based on what happened during the attempts could be implemented. The key thing to notice is that the router might call it multiple times to determine whether something should or would have been be retried (in order to accept one of multiple hedged requests), so splitting up recording a response with the decision makes it possible to reason about the individual attempts. Then we can probably update the My envisioned end goal is basically to be able to express the entire current RetryPolicy as a combination of such extensions, reducing the complexity of the default retry handling while making it more expressive for power users. Hopefully this makes sense! @mattklein123 @alyssawilk might have thoughts as well |
That totally makes sense. I misunderstood and thought that we'd like to have another retry plugin. I agree with the envision to make the whole retry policy pluggable.
I didn't get the meaning of Should the create factory interface be: class RetryPolicyFactory {
RetryPolicySharedPtr createRetryPolicy(const Http::HeaderMap& request_headers, <config>);
};since the retry policy will consume |
|
By tracking requests I mean upstream requests, i.e. each retry attempt. Right now the retry state logic is immutable, so calling re the request headers yeah my thinking was that all the state that is known before the upstream requests are made should be passed to the factory, as it makes it possible to do some neat optimizations (eg return a noop RetryPolicy when the request headers imply that the request should not be retried) |
This reverts commit 9d56a23. Signed-off-by: Yan Xue <[email protected]>
This reverts commit 3b7a33f. Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
|
I made some change based on the discussion. I didn't added // Might need something like this to support the existing hedging behavior, can probably
// skip for now and not call this during the hedging path.
bool wouldRetry(const Http::HeaderMap& response_headers) const;since I haven't fully understand what it's used for. If it looks good to you, I am going to work on test and doc. @snowp |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. A few high level questions. Thank you!
/wait
| oneof config_type { | ||
| google.protobuf.Struct config = 12 [deprecated = true]; | ||
|
|
||
| google.protobuf.Any typed_config = 13; |
There was a problem hiding this comment.
Does this replace the entire policy above? As such should it really live as a oneof next to the policy definition where that gets defined? Or does this use part of the above policy?
There was a problem hiding this comment.
The retry policy extension doesn't use above fields in the existing retry policy. I was thinking to use oneof to include existing retry policy and pluggable retry policy, like this:
message RetryPolicy{
// RetryPolicy definition
}
message PluggableRetryPolicy{
// PluggableRetryPolicy definition
}
message RouteAction{
...
oneof retry_policy_speficier{
RetryPolicy retry_policy = x;
PluggableRetryPolicy pluggable_retry_policy = y;
}
}I thought we were going to replace existing retry policy with an extension which means we need to deprecate current retry policy definition in the future. Will using the oneof make the deprecation complicate? If putting them in one message, we could just deprecate fields without deprecating the whole message.
There was a problem hiding this comment.
When we separate the extension message from the above message, another problem is the retry plugin. Currently, the extension retry policy interface doesn't include the retry priority and host selection, which means that user cannot integrate the retry policy extension with retry policy extensions. In order to replace the whole message, I think we should make the extension interface the same as the core retry policy interface.
envoy/include/envoy/router/router.h
Lines 150 to 231 in 3d16207
At least
virtual std::chrono::milliseconds perTryTimeout() const PURE;
virtual uint32_t numRetries() const PURE;
virtual std::vector<Upstream::RetryHostPredicateSharedPtr> retryHostPredicates() const PURE;
virtual Upstream::RetryPrioritySharedPtr retryPriority() const PURE;
virtual uint32_t hostSelectionMaxAttempts() const PURE;
virtual absl::optional<std::chrono::milliseconds> baseInterval() const PURE;
virtual absl::optional<std::chrono::milliseconds> maxInterval() const PURE; from my perspective. WDYT @mattklein123 @snowp
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Lizan Zhou <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
|
I remove the retry policy extension implementation from the PR and keep the API only for review. I can send another PR to implement the API after this one gets merged. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks a ton for adding the docs. This is great. I've added a bunch of docs/API questions. I would definitely like @snowp to take a look as I'm still unsure about how the plugin and the core retry logic should interact. I'm concerned it's going to be confusing if we let them both coexist.
/wait
|
|
||
| // Implementation specific configuration which depends on the implementation being instantiated. | ||
| // See the supported retry policy implementations for further documentation. | ||
| google.protobuf.Any typed_config = 11; |
There was a problem hiding this comment.
Please ref link to the new docs you added from the field docs.
| api.v2.route.HeaderMatcher http_request_header_match = 5; | ||
|
|
||
| // HTTP request trailer match configuration. | ||
| api.v2.route.HeaderMatcher http_request_trailer_match = 6; |
There was a problem hiding this comment.
Out of curiosity what would this actually be used for?
There was a problem hiding this comment.
Maybe I don't need the trailer match now. The initial goal is to match the retry policy by request method.
| // | ||
| // The above configuration shows how to configure different retry policies for | ||
| // different HTTP request methods. | ||
| repeated ConditionalRetryPolicy retry_policies = 1; |
There was a problem hiding this comment.
Can you clarify that this is an or match? Also, given that we have an or match within the predicate, do we really need this so be repeated? Could this be a required singleton message?
|
|
||
| Envoy allows :ref:`pluggable retry policy <envoy_api_field_route.RetryPolicy.typed_config>` to be | ||
| configured in the :ref:`route configuration <envoy_api_field_route.RouteAction.retry_policy>`. The | ||
| extenison applies additional logic to the retry decision logic when the downstream Envoy receives |
There was a problem hiding this comment.
typo extenison. Same below.
| :ref:`retriable_status_codes <envoy_api_field_route.RetryPolicy.retriable_status_codes>`, | ||
| :ref:`retriable_headers<envoy_api_field_route.RetryPolicy.retriable_headers>` or | ||
| :ref:`retriable_request_headers<envoy_api_field_route.RetryPolicy.retriable_request_headers>` | ||
| is specific in :ref:`retry policy <envoy_api_field_route.RouteAction.retry_policy>`, the retry |
| :ref:`retriable_headers<envoy_api_field_route.RetryPolicy.retriable_headers>` or | ||
| :ref:`retriable_request_headers<envoy_api_field_route.RetryPolicy.retriable_request_headers>` | ||
| is specific in :ref:`retry policy <envoy_api_field_route.RouteAction.retry_policy>`, the retry | ||
| decision could be overridden by the core retry policy implementation. |
There was a problem hiding this comment.
This part doesn't thrill me and I think could be pretty confusing. Should we just block setting these fields if a retry extension is provided? This is related to my comment/question about where the retry config should live. WDYT?
| The final retry decision is the **OR** value of extension retry policy decision and core retry | ||
| policy decision. |
There was a problem hiding this comment.
This contradicts the above statement around overriding, or is that because it's an or? Hmm. I'm pretty inclined to just have a retry plugin take over the whole thing? @snowp WDYT?
| The retry policy extension can count the retry number inside the extension. There is a global | ||
| :ref:`configuration<envoy_api_field_route.RetryPolicy.num_retries>` for controlling the retry | ||
| count. If the retry number inside the extension is greater than the global retry number, the | ||
| downstream Envoy won't retry after exceeding the gobal retry number. |
There was a problem hiding this comment.
This is a bit unclear to me. Can you clarify?
snowp
left a comment
There was a problem hiding this comment.
Long term I'd like to see that we pull all the existing retry policies into extensions so that we avoid having to deal with both the extension and the core retry policy stuff. My thinking to begin with was that we could incrementally support both so that we didn't need to migrate all the existing matchers, but if that makes for a very confusing API then perhaps we should just make them exclusive and fully flesh out the extension?
| MatchPredicate not_match = 3; | ||
|
|
||
| // The match configuration will always match. | ||
| bool any_match = 4 [(validate.rules).bool = {const: true}]; |
There was a problem hiding this comment.
would it make more sense to pull this out of the oneof and say that in the absence of any rules (or the absence of a MatchPredicate?) every request will be matched? Not sure if having it be part of the expression tree is helpful
| bool any_match = 4 [(validate.rules).bool = {const: true}]; | ||
|
|
||
| // HTTP request header match configuration. | ||
| api.v2.route.HeaderMatcher http_request_header_match = 5; |
There was a problem hiding this comment.
maybe skip the http_ prefix?
| // conditions documented for | ||
| // :ref:`config_http_filters_router_x-envoy-retry-on` and | ||
| // :ref:`config_http_filters_router_x-envoy-retry-grpc-on`. | ||
| string retry_on = 2; |
There was a problem hiding this comment.
does this make more sense as a repeated field? or is there a reason why we want to stick to the comma separated format?
There was a problem hiding this comment.
I think making this field an array makes sense to me.
| MatchPredicate match_config = 1; | ||
|
|
||
| // Specifies the conditions under which retry takes place. These are the | ||
| // same |
|
I agree on the point that allowing extension retry policy and core retry policy cause a confusing API for users. Let me refactor the PR
message RetryPolicy{
// RetryPolicy definition
}
message RetryPolicyExtension{
// RetryPolicyExtension definition
}
message RouteAction{
...
oneof retry_policy_speficier{
RetryPolicy retry_policy = x;
RetryPolicyExtension retry_policy_extension = y;
}
}
class RetryPolicyExtension{
/*
* Record headers and reset during the request lifecycle.
*/
virtual void recordHeader(const Http::HeaderMap& response_headers);
virtual void recordReset(const Http::StreamResetReason reset_reason);
/*
* Functions for querying the retry decision.
*/
virtual bool wouldRetryFromHeaders() PURE;
virtual bool wouldRetryFromReset() PURE;
/*
* Functions for retry plugins.
*/
virtual void onHostAttempted(Upstream::HostDescriptionConstSharedPtr host) PURE;
virtual bool shouldSelectAnotherHost(const Upstream::Host& host) PURE;
virtual const Upstream::HealthyAndDegradedLoad& priorityLoadForRetry
(const Upstream::PrioritySet& priority_set, const Upstream::HealthyAndDegradedLoad& original_priority_load) PURE;
virtual uint32_t hostSelectionMaxAttempts() const PURE;
/*
* Retry remainings.
*/
virtual uint32_t retriesRemaining() PURE;
/*
* Retry backoff.
*/
virtual std::unique_ptr<BackOffStrategy> backoffStrategy() PURE;
/*
* Per try timeout.
*/
virtual std::chrono::milliseconds perTryTimeout() const PURE;
};WDYT @mattklein123 @snowp |
|
Yes ^ looks great and is I think what I would expect from an API/user perspective. |
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
Signed-off-by: Yan Xue <[email protected]>
|
I just push the commit to refactor the
WDYT? @mattklein123 @snowp |
| // route level entry will take precedence over this config and it'll be treated | ||
| // independently (e.g.: values are not inherited). | ||
| RetryPolicy retry_policy = 16; | ||
| oneof retry_policy_specifier { |
There was a problem hiding this comment.
A oneof upgrade is a breaking compile change so we can't do it. I would just add the new field and have a check in code that both can't be specified. There is a oneof upgrade annotation that can be applied so it gets upgraded in the next major version.
| // Implementation specific configuration which depends on the implementation being instantiated. | ||
| // Documentation for :ref:`retry policy extension <arch_overview_http_routing_retry_policy>`. See | ||
| // the supported retry policy implementations for further documentation. | ||
| google.protobuf.Any typed_config = 1; |
There was a problem hiding this comment.
If we think this is going to be the only field (it probably will be) I would just inline this directly in the message above?
snowp
left a comment
There was a problem hiding this comment.
Yeah this aligns with my thinking as well, thanks for iterating!
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Description: Extract functions from
retry_state_implinto pluggable class to make retry policy configurable and support composite retry policy.Risk Level: Med
Testing: Unit test
Docs Changes: N/A
Release Notes:
added :ref:typed_config <envoy_api_field_route.RetryPolicy.typed_config> to support pluggable retry policy.Fixes #Issue: #9946