Skip to content

upstream: allow configuration of connection pool limits#6298

Merged
mattklein123 merged 20 commits intoenvoyproxy:masterfrom
klarose:add_limit_config
Mar 26, 2019
Merged

upstream: allow configuration of connection pool limits#6298
mattklein123 merged 20 commits intoenvoyproxy:masterfrom
klarose:add_limit_config

Conversation

@klarose
Copy link
Copy Markdown
Contributor

@klarose klarose commented Mar 15, 2019

Description: We recently added a connection pool map class which maps keys to connection pools. A new key means a new connection pool. To prevent unbounded growth of connection pools, we placed a limit on the pool. Allocating beyond the limit causes the map to attempt to reclaim unused pools prior to allocating a new one. If none can be reclaimed, the map returns a failure.

We did not add the ability to configure the limit; its default was unlimited. This commit adds the ability to configure a limit by extending the cluster circuit breaker thresholds with a new optional configuration field: max_connection_pools.

Worth noting:

  1. The default if no configuration is set is still unlimited.
  2. The configuration is per priority, like the other thresholds.
  3. The per priority configuration required some small refactoring: I added a class which creates one connection pool map per priority, so that each connection pool map can track its resources independently.
  4. I added an extremely simple integration test which shows the behaviour when the limit is reached.

Risk Level: Medium
We're in the main code path for http. But, the default behaviour should not have changed, so there shouldn't be any major impact.

Testing:
Unit testing + integration testing
I spun up a local envoy instance, showing that:

  1. When the limit was reached, I got a 503 back
  2. If a pool was idle, even though I was at the limit, an old pool would be recycled and the request would go through.

Docs Changes: Protobuf docs + additions to the circuit breaking section in the overview.

Release Notes: Added a release not for the new circuit breaker.

klarose added 5 commits March 14, 2019 14:54
We want to limit the number of connection pools per cluster. Add it to
the circut breaker thresholds so we can do it per priority.

Signed-off-by: Kyle Larose <[email protected]>
Signed-off-by: Kyle Larose <[email protected]>
@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 15, 2019

There are a few TODOs here for my next steps. First, though, I want to make sure my approach is good. Does everyone agree with using circuit breakers to limit the number of connection pools?

  1. I may need to change the error handling: Is my error handling (returning 503) the correct response here?
  2. I need to add stats in a similar vein to the other circuit breakers.
  3. I need to add documentation for this new circuit breaker.
  4. I need to add a release note.
  5. Is the "unlimited" default correct? Should I instead set it to be equal to the max number of concurrent connections?

Following that, I think I can move on from the connection pool map work and add an http plugin to infer the original source from the XFF.

Finally, snooping through the open PRs, I noticed: #6269. There were some concerns raised there about increasing memory use by adding more stats. Does my circuit breaker approach exacerbate the problem/does it have a similar problem?

Ultimately, I think any solution I take that is per cluster will need a similar number of stats per cluster. I guess the question is: how can I prevent excess resource consumption if no limits are necessary?

@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 15, 2019

@mattklein123 Hey Matt, I finally got around to implementing the next phase of this. Sorry it took so long: flu + conference == not much time for other stuff.

PTAL! I'm not sure my approach is any good (nobody replied to my slack question about it. :)) So, any feedback is appreciated.

@mattklein123 mattklein123 self-assigned this Mar 15, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this approach makes sense to me.

It would probably be good to document the intended use for this vs the max connections circuit breaker. And yes, a release note seems in order since we're adding a new circuit breaker.

@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 18, 2019

@snowp Thanks for the quick feedback. I'll address your comments today.

klarose added 5 commits March 18, 2019 15:53
- Fixed some nits and typos
- Removed priority from the http conn pool hash key

Signed-off-by: Kyle Larose <[email protected]>
Signed-off-by: Kyle Larose <[email protected]>
Somehow missed 'Guage` despite ample proper examples around it...

Signed-off-by: Kyle Larose <[email protected]>
snowp
snowp previously approved these changes Mar 19, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment but I think this looks good

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, cool stuff. A bunch of random comments but generally LGTM.

/wait

klarose added 4 commits March 20, 2019 17:17
Signed-off-by: Kyle Larose <[email protected]>
Signed-off-by: Kyle Larose <[email protected]>
Merge in new connection pool limits with changes to track resources
remaining.

Signed-off-by: Kyle Larose <[email protected]>
@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 21, 2019

@mattklein123 Thanks for the review. I've addressed most of your comments. I've left two open -- they need a bit more feedback.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow ups. I dropped a comment on the deferred delete issue and flushing a comment here about the TODO. Thank you!

/wait

klarose added 3 commits March 22, 2019 18:17
This reverts commit 3a7dce7.

Signed-off-by: Kyle Larose <[email protected]>
 - Add a todo re: resource counters
 - Go back to old method of bypassing deferredDelete

Signed-off-by: Kyle Larose <[email protected]>
@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 25, 2019

I've addressed the last few comments. Thanks.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. I found one nit if you don't mind fixing.

/wait

@klarose
Copy link
Copy Markdown
Contributor Author

klarose commented Mar 26, 2019

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🙀 failed invoking rebuild of ci/circleci: format: 500 Internal Server Error
🙀 failed invoking rebuild of ci/circleci: ipv6_tests: 500 Internal Server Error

🐱

Caused by: a #6298 (comment) was created by @klarose.

see: more, trace.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 7de2b39 into envoyproxy:master Mar 26, 2019
@klarose klarose deleted the add_limit_config branch March 26, 2019 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants