-
-
Notifications
You must be signed in to change notification settings - Fork 878
Closed
Labels
Description
Hi there!
Thanks for making Concourse!
Bug Report
Just after updating from 3.13.0 to 3.14.1 we noticed 100% CPU usage and a 10x increase in the DB connections on the ATC:
- the high CPU usage lasts for about 3 minutes with a period of 5 minutes

- when the CPU usage is high, the ATC seems to be busy with checking resources
- the "build scheduling duration" shows a peaks of ~20minutes, similarly to the CPU load, with 5 minutes period

- the worker load graph forms a sawtooth like shape, also with 5 minutes period

- the number of database connections goes as high as 40 (the nominal value is around 4) also the number of database accesses increases

In relation to this problem (not sure if it's a cause or effect) the our SSM credential store starts to throttle and the resource checks fail.
Very similar behavior is reported in
- this issue High CPU usage as a result of constant to CredHub client construction #2300
- in this forum post https://discuss.concourse-ci.org/t/increased-db-connections-after-upgrade-to-3-14-1/301
I have the feeling that when a resource check fails, the ATC leaks DB (and possibly other) handles. This results in an avalanche of queries to the credential store.
The following can also be handy:
- Concourse version: 3.14.1
- Deployment type (BOSH/Docker/binary): binary
- Infrastructure/IaaS: AWS
- Browser (if applicable):
- Did this used to work? yes, in 3.13.0 it worked fine
Workaround
We managed to stabilize the system by increasing the resource check interval from 1 minute to 5 minutes.
Reactions are currently unavailable