Control % of requests faults are applied to with HTTP headers#10724
Control % of requests faults are applied to with HTTP headers#10724mattklein123 merged 28 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks really nice work (as usual!). Just a few small comments. Thank you!
/wait
docs/root/intro/version_history.rst
Outdated
| * ext_authz: disabled the use of lowercase string matcher for headers matching in HTTP-based `ext_authz`. | ||
| Can be reverted temporarily by setting runtime feature `envoy.reloadable_features.ext_authz_http_service_enable_case_sensitive_string_matcher` to false. | ||
| * fault: added support for controlling abort faults with :ref:`HTTP header fault configuration <config_http_filters_fault_injection_http_header>` to the HTTP fault filter. | ||
| * fault: added support for controlling the percentage of requests that abort, delay and response rate limits faults are applied to using :ref:`HTTP headers <config_http_filters_fault_injection_http_header>` to the HTTP fault filter. |
There was a problem hiding this comment.
Needs a master merge and the move to the new "current" version history file.
|
|
||
| fault.http.abort.http_status | ||
| HTTP status code that will be used as the of requests that will be | ||
| HTTP status code that will be used as the of requests that will be |
There was a problem hiding this comment.
not your bug but this sentence doesn't make sense. Can you fix?
| return false; | ||
| } | ||
|
|
||
| const auto percentage_header = |
There was a problem hiding this comment.
From a perf perspective it would be optimal to not do this lookup unless we are actually using a header provider (similar elsewhere), especially because right now this is going to be an O(N) scan to find the header. Can you figure out a way to only do the lookup on demand? Perhaps by passing the entire map and initializing the header provider with the header to look for?
There was a problem hiding this comment.
Good idea, I updated all headers providers to work like this - we pass headers map and we do a specific header lookup only if a header provider is used. After my recent changes, there should be no performance impact related to header lookup at all if fault filter is not configured to use header faults.
One more question regarding the implementation of the percentage header providers. As it is now, we try to read a percentage header (such as x-envoy-fault-abort-request-percentage) and if it's not present or its value is invalid we fallback to the value that's specified in proto config file. Do you think that falling back to 100% would make more sense (as opposed to looking at proto config file)?
envoy/source/extensions/filters/common/fault/fault_config.cc
Lines 14 to 37 in ed6fa7f
There was a problem hiding this comment.
Do you think that falling back to 100% would make more sense (as opposed to looking at proto config file)?
I think falling back to the config is more flexible, as the operator can configure a default of 100%, 0%, etc. if they want?
There was a problem hiding this comment.
That was my thought too 👍 Leaving it as it is then.
Signed-off-by: Rafal Augustyniak <[email protected]>
…tp-headers Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
|
/azp run envoy-presubmit |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
|
|
CI is broken right now. We are trying to figure out how to fix it. |
|
Please merge master. /wait |
…tp-headers Signed-off-by: Rafal Augustyniak <[email protected]>
|
I think that 44eedc7 broke some of the git pre-push checks. |
|
Fixing spell checking in #10763 |
Signed-off-by: Rafal Augustyniak <[email protected]>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, a few more comments.
/wait
| return percentage_; | ||
| } | ||
|
|
||
| if (header_numerator > 100) { |
There was a problem hiding this comment.
Sorry just noticed this. Fractional percent supports 1/100, 1/10,000, etc. Do we want to support this somehow in a header? I'm wondering if this should somehow be scaled by the default percentage that is configured? WDYT?
There was a problem hiding this comment.
We could definitely use a denominator of a configured default percentage - that could work. If we do this we could also have a logic that would compute an output percentage's numerator as the minimum of two values - the value of HTTP percentage header and the value defined in default configured percentage.
Why would this additional complexity be helpful?
- That would be an additional safety mechanism that would allow Envoy users to define the maximum percentage of requests a given type of fault should be applied to.
- It would ensure that no change in the configuration of a default percentage results in too many faults being ingested via HTTP headers. Example when the lack of this mechanism could lead to unintended behavior is presented below.
Example
Step 1
default percentage configuration:
numerator: 100
denumerator: 1000Effective fault injection rate is 10% since 100/1000 = 10.
HTTP headers used by clients injecting faults via HTTP headers:
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100Effective fault injection rate is 10% since 100/1000 = 10.
Step 2
Somebody on the server side decides to change a default percentage configuration to:
numerator: 10
denumerator: 100From their perspective there should be no negative results of this change since the effective failure rate doesn't change (it's still equal to 10% since 100/1000 = 10/100 = 10)
The person responsible for a change doesn't update clients that use HTTP headers to ingest faults.
What happens with clients using HTTP headers to inject faults?
Without proposed min logic for the calculation of a numerator
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100Effective fault injection rate increases to 100% since 100/100 (new denominator of a default percentage) = 100.
With proposed min logic for the calculation of a numerator
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100Effective fault injection rate is still at 100% since min(100/100, 10 [Default Percentage Effective Rate]) = 100.
Let me know what you think about the proposed additional to your proposed change.
There was a problem hiding this comment.
Thanks for the super detailed proposal! I agree the capping makes sense, but please make sure it's well documented.
There was a problem hiding this comment.
Thanks for your suggestion! Implemented here https://github.com/envoyproxy/envoy/pull/10724/files#diff-186e4ff5610d03e9dc6f8fcace2aa4a3R26-R29.
| if (!request_headers) { | ||
| return absl::nullopt; | ||
| } |
There was a problem hiding this comment.
I don't think this can happen? Same elsehwere.
There was a problem hiding this comment.
Yes, it will never happen. I added this a safety measure in case we start using this class in more places in the future. That being said, I can remove this check if this kind of defensive programming is not something we do in Envoy repo.
There was a problem hiding this comment.
Yes in general we don't put in this type of defensive logic which we know we can't hit currently, so I would remove.
There was a problem hiding this comment.
Just so that we are on the same page. There is one place where we call percentage method with empty headers (nullptr) in Mongo related code but I don't think that header faults are supported there so we should be fine. The code I'm talking about can be found here https://github.com/envoyproxy/envoy/pull/10724/files#diff-eb7cf1a112138011ff766f2a9dc402f8R405
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
…tp-headers Signed-off-by: Rafal Augustyniak <[email protected]>
Signed-off-by: Rafal Augustyniak <[email protected]>
|
Thanks, can you check CI? You can build the docs locally with |
Signed-off-by: Rafal Augustyniak <[email protected]>
Description: Add an option to control the percentage of requests we apply fault to using HTTP headers. See #10648 for more details.
Risk Level: Low, new functionality.
Testing: Added unit tests
Docs Changes: Updated
Release Notes: Updated