-
-
Notifications
You must be signed in to change notification settings - Fork 878
Description
Bug Report
In order to provide monitoring for our concourse instance, we are running a canary build (that exercises various pieces of functionality) every 30 seconds. This creates a large number of builds (~2800 / day). To help mitigate this, we have set build_logs_to_retain to 3000. Over the time that we've been running this canary build, we have observed a consistent growth of outgoing database network traffic (bytes sent / time period) that seems to be proportional to the number of builds. Deleting all old canary build rows, except the last 3500, cut our outgoing network traffic in half. (We deleted around 64K builds.)
We do not have any monitors that show the canary pipeline's UI and people only occasionally look at the canary pipeline's UI on weekdays. Given that, we can probably rule out the web requests since this network traffic steadily grows on weekends. On a related note, we believe this to be due to a process on the web node because the CPU utilization dropped on the web node when we cleared old build rows from the database.
Steps to Reproduce
- Run a bunch of builds over time (with
build_logs_to_retainset to a reasonable number). - Observe the outgoing network traffic from the database steadily grow without requesting the job's build history from the web.
Expected Results
The bytes sent / time period should be relatively constant once the system reaches a steady state.
Actual Results
The bytes sent / time period grows proportionally to the number of builds (not build logs).
Additional Info
We're pretty sure this is due to a read query who's number of results grows with the number of builds.
Version Info
- Concourse version: 4.2.1
- Deployment type (BOSH/Docker/binary): BOSH
- Infrastructure/IaaS: GCP
- Did this used to work? Unknown