Skip to content

circuit breaker: track remaining resource count#6269

Merged
snowp merged 11 commits intoenvoyproxy:masterfrom
spenceral:addl-cb-stats
Mar 21, 2019
Merged

circuit breaker: track remaining resource count#6269
snowp merged 11 commits intoenvoyproxy:masterfrom
spenceral:addl-cb-stats

Conversation

@spenceral
Copy link
Copy Markdown
Contributor

Description: Adds additional cluster stats to track the resources remaining until circuit breakers open.
Risk Level: Low
Testing: Added unit tests to verify gauges are set correctly.
Docs Changes: Added note about new stats in circuit breaking doc and documented them in cluster stats.
Release Notes: Added
Fixes #6163

Should these gauges be optional based on a config flag?

This commit adds cluster stats tracking the resources remaining
until a circuit breaker opens.

Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
@snowp
Copy link
Copy Markdown
Contributor

snowp commented Mar 12, 2019

This adds 8 stats per cluster (4 for DEFAULT, 4 for HIGH priority) which will be used by everyone, which seems like it might warrant a config flag to avoid increasing memory usage per cluster. WDYT @htuch @jmarantz ?

@venilnoronha
Copy link
Copy Markdown
Member

@snowp I think it's possible to configure stats via the StatsMatcher. So, we may not actually need a config flag.

Signed-off-by: Spencer Lewis <[email protected]>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good for the most part

// to the resource max
for (const Stats::GaugeSharedPtr& gauge : host_->cluster_.stats_store_.gauges()) {
EXPECT_EQ(0U, gauge->value());
if (!absl::StrContains(gauge->name(), "circuit_breakers")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whitelisting is fine, thought it might make sense at one point to add a initialValue() function to Gauge so that these checks can just be EXPECT_EQ(gauge->initialValue(), gauge->value()), but that would arguably fall out of the scope of this PR

Copy link
Copy Markdown
Contributor Author

@spenceral spenceral Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to your (or anyone else's) judgment on whether or not to make that change here. It makes sense to me.

Signed-off-by: Spencer Lewis <[email protected]>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment but otherwise LGTM

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Mar 13, 2019

@snowp Thanks for the heads up. Yeah, adding 8 additional stats per cluster will, by default, increase the per-cluster cost by probably over 1k bytes, which is significant It might not be reasonable to expect high-cluster-cardinality sites that care about memory to explicitly construct efficient stats-matcher expressions to exclude the ones they don't want.

Also note that stats-matcher isn't free. Currently it adds CPU overhead to every stat lookup, and after #6207 is merged (if its merged :), it will not add CPU overhead, but it will add some smaller memory overhead for excluded stats that we hold in a name-cache.

If we can register the stats only if actually useful or explicitly enabled, that might be better.

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Mar 13, 2019

@jmarantz Thanks for chiming in, sounds like this should be guarded by a config flag (since circuit breakers cant be turned off themselves).

From what I can tell to make registration optional we'd avoid calling gauge(std::string) on the StatsStore and add a nullGauge() function to it so we can point to a null gauge instead?

Signed-off-by: Spencer Lewis <[email protected]>
* master: (59 commits)
  http fault: add response rate limit injection (envoyproxy#6267)
  xds: introduce initial_fetch_timeout option to limit initialization time (envoyproxy#6048)
  test: fix cpuset-threads tests (envoyproxy#6278)
  server: add an API for registering for notifications for server instance life… (envoyproxy#6254)
  remove remains of TestBase (envoyproxy#6286)
  dubbo_proxy: Implement the routing of Dubbo requests (envoyproxy#5973)
  Revert "stats: add new BoolIndicator stat type (envoyproxy#5813)" (envoyproxy#6280)
  runtime: codifying runtime guarded features (envoyproxy#6134)
  mysql_filter: fix integration test flakes (envoyproxy#6272)
  tls: update BoringSSL to debed9a4 (3683). (envoyproxy#6273)
  rewrite buffer implementation to eliminate evbuffer dependency (envoyproxy#5441)
  Remove the dependency from TimeSystem to libevent by using the Event::Scheduler abstraction as a delegate. (envoyproxy#6240)
  fuzz: fix use of literal in default initialization. (envoyproxy#6268)
  http: add HCM functionality required for rate limiting (envoyproxy#6242)
  Disable mysql_integration_test until it is deflaked. (envoyproxy#6250)
  test: use ipv6_only IPv6 addresses in custom cluster integration tests. (envoyproxy#6260)
  tracing: If parent span is propagated with empty string, it causes th… (envoyproxy#6263)
  upstream: fix oss-fuzz issue envoyproxy#11095. (envoyproxy#6220)
  Wire up panic mode subset to receive updates (envoyproxy#6221)
  docs: clarify xds docs with warming information (envoyproxy#6236)
  ...
Signed-off-by: Spencer Lewis <[email protected]>
Signed-off-by: Spencer Lewis <[email protected]>
snowp
snowp previously approved these changes Mar 20, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think this looks good. @jmarantz Mind looking over the stats related changes?

jmarantz
jmarantz previously approved these changes Mar 20, 2019
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for setting up this limitation, and for being explicit in the doc, etc.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks awesome. 1 small test comment.

/wait

// gauges which default to the resource max.
for (const Stats::GaugeSharedPtr& gauge : cluster_->stats_store_.gauges()) {
EXPECT_EQ(0U, gauge->value());
if (!absl::StrContains(gauge->name(), "remaining")) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little fragile to me. Is it possible to do a regex match here which matches on "circuit_breaker.*.remaining"? Same below? Also given this pattern has been copy/pasted into a bunch of different tests, maybe move into a utility function in case we need to do anything like this in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I gave this a shot. Thanks for taking a look.

@spenceral spenceral dismissed stale reviews from jmarantz and snowp via 40fa909 March 21, 2019 02:14
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@snowp snowp merged commit 0cd4054 into envoyproxy:master Mar 21, 2019
@spenceral spenceral deleted the addl-cb-stats branch March 21, 2019 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants