-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What problem are you trying to solve?
I have an issue when our Kubernetes cluster is under high CPU load. In this case kubelet will be slow to read readiness and liveness probe responses. In some cases we see kubernetes restarting linkerd-proxy pods due to failed livenessProbes. To reduce the chance of this happing we would like to increase the timeout (from the default 1s) to something like 10s or 20s which would be enough even under very high load.
How should the problem be solved?
Add some config parameter for probe timeouts in linkerd-proxy-injector. I would also set the timeout for livenessProbe a bit higher by default to follow kubernetes best practice.
Any alternatives you've considered?
This is partially caused by kubernetes/kubernetes#89898 and there already have been some improvements. More is coming. As a workaround we can reserve more CPU for kubelet but that harms resource utilization because less CPU will be available for payload on our nodes.
How would users interact with this feature?
Users can optionally set this timeout in their helm chart or in their linkerd-proxy-injector config.
Would you like to work on this feature?
yes