Set timeout when collecting metrics from shim's Stat#6781
Set timeout when collecting metrics from shim's Stat#6781estesp merged 1 commit intocontainerd:mainfrom
Conversation
|
Hi @phanhuy1502. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e7bb2b8 to
f7cc385
Compare
|
The goroutine dump for references. |
|
Build succeeded.
|
|
/ok-to-test |
fuweid
left a comment
There was a problem hiding this comment.
Left comment about fixed timeout
f7cc385 to
58fdbb1
Compare
|
Build succeeded.
|
I've updated to use the timeout package |
|
Would you mind rebasing and resolving the merge conflicts? |
Signed-off-by: Nguyen Phan Huy <[email protected]>
58fdbb1 to
c525aa5
Compare
updated, thanks |
|
Build succeeded.
|
Issue
containerd/metrics/cgroups/v2/metrics.go
Line 118 in d3aa7ee
containerd/metrics/cgroups/v1/metrics.go
Line 125 in d3aa7ee
If the shim process is not responsive, the request blocks there,
Collect.Collect()blocks.containerd/metrics/cgroups/v2/metrics.go
Line 92 in d3aa7ee
This causes leaked goroutines and memory. Requests to prometheus metrics /v1/metrics to hangs when there is a "bad shim"
Reproducing
cat /etc/containerd/config.toml [metrics] address = ":8080" [debug] address = "/run/containerd/debug.sock"I launched a container with a mock shim implementation that hangs on Stats: phanhuyn@a6add78
Observe request to prometheus endpoint hangs
curl localhost:8080/v1/metrics # hangsStatscall to the shim processFix
Timeout setting
The default timeout is set to 2 seconds. Can be configured by:
Signed-off-by: Nguyen Phan Huy [email protected]