circuit breaker: track remaining resource count#6269
circuit breaker: track remaining resource count#6269snowp merged 11 commits intoenvoyproxy:masterfrom
Conversation
This commit adds cluster stats tracking the resources remaining until a circuit breaker opens. Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
|
@snowp I think it's possible to configure stats via the StatsMatcher. So, we may not actually need a config flag. |
Signed-off-by: Spencer Lewis <[email protected]>
snowp
left a comment
There was a problem hiding this comment.
Thanks, looks good for the most part
| // to the resource max | ||
| for (const Stats::GaugeSharedPtr& gauge : host_->cluster_.stats_store_.gauges()) { | ||
| EXPECT_EQ(0U, gauge->value()); | ||
| if (!absl::StrContains(gauge->name(), "circuit_breakers")) { |
There was a problem hiding this comment.
I think this whitelisting is fine, thought it might make sense at one point to add a initialValue() function to Gauge so that these checks can just be EXPECT_EQ(gauge->initialValue(), gauge->value()), but that would arguably fall out of the scope of this PR
There was a problem hiding this comment.
I will defer to your (or anyone else's) judgment on whether or not to make that change here. It makes sense to me.
Signed-off-by: Spencer Lewis <[email protected]>
snowp
left a comment
There was a problem hiding this comment.
One comment but otherwise LGTM
|
@snowp Thanks for the heads up. Yeah, adding 8 additional stats per cluster will, by default, increase the per-cluster cost by probably over 1k bytes, which is significant It might not be reasonable to expect high-cluster-cardinality sites that care about memory to explicitly construct efficient stats-matcher expressions to exclude the ones they don't want. Also note that stats-matcher isn't free. Currently it adds CPU overhead to every stat lookup, and after #6207 is merged (if its merged :), it will not add CPU overhead, but it will add some smaller memory overhead for excluded stats that we hold in a name-cache. If we can register the stats only if actually useful or explicitly enabled, that might be better. |
|
@jmarantz Thanks for chiming in, sounds like this should be guarded by a config flag (since circuit breakers cant be turned off themselves). From what I can tell to make registration optional we'd avoid calling |
Signed-off-by: Spencer Lewis <[email protected]>
* master: (59 commits) http fault: add response rate limit injection (envoyproxy#6267) xds: introduce initial_fetch_timeout option to limit initialization time (envoyproxy#6048) test: fix cpuset-threads tests (envoyproxy#6278) server: add an API for registering for notifications for server instance life… (envoyproxy#6254) remove remains of TestBase (envoyproxy#6286) dubbo_proxy: Implement the routing of Dubbo requests (envoyproxy#5973) Revert "stats: add new BoolIndicator stat type (envoyproxy#5813)" (envoyproxy#6280) runtime: codifying runtime guarded features (envoyproxy#6134) mysql_filter: fix integration test flakes (envoyproxy#6272) tls: update BoringSSL to debed9a4 (3683). (envoyproxy#6273) rewrite buffer implementation to eliminate evbuffer dependency (envoyproxy#5441) Remove the dependency from TimeSystem to libevent by using the Event::Scheduler abstraction as a delegate. (envoyproxy#6240) fuzz: fix use of literal in default initialization. (envoyproxy#6268) http: add HCM functionality required for rate limiting (envoyproxy#6242) Disable mysql_integration_test until it is deflaked. (envoyproxy#6250) test: use ipv6_only IPv6 addresses in custom cluster integration tests. (envoyproxy#6260) tracing: If parent span is propagated with empty string, it causes th… (envoyproxy#6263) upstream: fix oss-fuzz issue envoyproxy#11095. (envoyproxy#6220) Wire up panic mode subset to receive updates (envoyproxy#6221) docs: clarify xds docs with warming information (envoyproxy#6236) ...
Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
jmarantz
left a comment
There was a problem hiding this comment.
Thanks for setting up this limitation, and for being explicit in the doc, etc.
mattklein123
left a comment
There was a problem hiding this comment.
Thanks looks awesome. 1 small test comment.
/wait
| // gauges which default to the resource max. | ||
| for (const Stats::GaugeSharedPtr& gauge : cluster_->stats_store_.gauges()) { | ||
| EXPECT_EQ(0U, gauge->value()); | ||
| if (!absl::StrContains(gauge->name(), "remaining")) { |
There was a problem hiding this comment.
This seems a little fragile to me. Is it possible to do a regex match here which matches on "circuit_breaker.*.remaining"? Same below? Also given this pattern has been copy/pasted into a bunch of different tests, maybe move into a utility function in case we need to do anything like this in the future?
There was a problem hiding this comment.
Sure thing, I gave this a shot. Thanks for taking a look.
Signed-off-by: Spencer Lewis <[email protected]>
Description: Adds additional cluster stats to track the resources remaining until circuit breakers open.
Risk Level: Low
Testing: Added unit tests to verify gauges are set correctly.
Docs Changes: Added note about new stats in circuit breaking doc and documented them in cluster stats.
Release Notes: Added
Fixes #6163
Should these gauges be optional based on a config flag?