Description
We have a CI job in k8s - https://testgrid.k8s.io/google-gce#gci-gce-alpha-enabled-default&width=20 where we are consistently seeing errors from ListPodSandboxStats
In https://github.com/containerd/containerd/pull/9905/files we handled the condition where we were getting errdefs.ErrUnavailable i think we need to handle this case as well
|
if errdefs.IsNotFound(err) { |
|
return nil, fmt.Errorf("no running task found: %w", err) |
|
} |
STEP: Gather node-problem-detector cpu and memory stats - k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:192 @ 03/28/24 11:47:15.168
I0328 11:47:20.969542 10424 node_problem_detector.go:380] Unexpected error:
<*errors.StatusError | 0xc00280a280>:
an error on the server ("Internal Error: failed to list pod stats: rpc error: code = NotFound desc = 2 errors occurred:\n\t* failed to decode sandbox container metrics for sandbox \"44c8a7812bfbbd43c9607c017f77db5dd976d774d14d9881d1d7c63f8c3e76fd\": no running task found: task a1416456bfb47e71f5446700fef5f24b7fe31f965017df25ce85a4d37108af82 not found: not found\n\t* failed to decode sandbox container metrics for sandbox \"61f4e0d6e4b3d1ee296908ebeaf8a2668c2e6b5e895906b277b945ea40fa5393\": no running task found: task 1b3c578821277faf98e6355697309c7db529daeb7e42e51bf8355b366a95449d not found: not found") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-3q6b:10250)
{
ErrStatus:
code: 500
details:
causes:
- message: "Internal Error: failed to list pod stats: rpc error: code = NotFound
desc = 2 errors occurred:\n\t* failed to decode sandbox container metrics for
sandbox \"44c8a7812bfbbd43c9607c017f77db5dd976d774d14d9881d1d7c63f8c3e76fd\":
no running task found: task a1416456bfb47e71f5446700fef5f24b7fe31f965017df25ce85a4d37108af82
not found: not found\n\t* failed to decode sandbox container metrics for sandbox
\"61f4e0d6e4b3d1ee296908ebeaf8a2668c2e6b5e895906b277b945ea40fa5393\": no running
task found: task 1b3c578821277faf98e6355697309c7db529daeb7e42e51bf8355b366a95449d
not found: not found"
reason: UnexpectedServerResponse
kind: nodes
name: bootstrap-e2e-minion-group-3q6b:10250
message: 'an error on the server ("Internal Error: failed to list pod stats: rpc error:
code = NotFound desc = 2 errors occurred:\n\t* failed to decode sandbox container
metrics for sandbox \"44c8a7812bfbbd43c9607c017f77db5dd976d774d14d9881d1d7c63f8c3e76fd\":
no running task found: task a1416456bfb47e71f5446700fef5f24b7fe31f965017df25ce85a4d37108af82
not found: not found\n\t* failed to decode sandbox container metrics for sandbox
\"61f4e0d6e4b3d1ee296908ebeaf8a2668c2e6b5e895906b277b945ea40fa5393\": no running
task found: task 1b3c578821277faf98e6355697309c7db529daeb7e42e51bf8355b366a95449d
not found: not found") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-3q6b:10250)'
metadata: {}
reason: InternalError
status: Failure,
}
[FAILED] an error on the server ("Internal Error: failed to list pod stats: rpc error: code = NotFound desc = 2 errors occurred:\n\t* failed to decode sandbox container metrics for sandbox \"44c8a7812bfbbd43c9607c017f77db5dd976d774d14d9881d1d7c63f8c3e76fd\": no running task found: task a1416456bfb47e71f5446700fef5f24b7fe31f965017df25ce85a4d37108af82 not found: not found\n\t* failed to decode sandbox container metrics for sandbox \"61f4e0d6e4b3d1ee296908ebeaf8a2668c2e6b5e895906b277b945ea40fa5393\": no running task found: task 1b3c578821277faf98e6355697309c7db529daeb7e42e51bf8355b366a95449d not found: not found") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-3q6b:10250)
In [It] at: k8s.io/kubernetes/test/e2e/node/node_problem_detector.go:380 @ 03/28/24 11:47:20.969
```****
### Steps to reproduce the issue
the CI jobs can be modified to run with newer versions of containerd.
### Describe the results you received and expected
`ListPodSandboxStats` should succeed with whatever pods it can process
### What version of containerd are you using?
1.7.14
### Any other relevant information
containerd version 1.7.14
### Show configuration if it is related to CRI plugin.
not applicable
Description
We have a CI job in k8s - https://testgrid.k8s.io/google-gce#gci-gce-alpha-enabled-default&width=20 where we are consistently seeing errors from
ListPodSandboxStatsIn https://github.com/containerd/containerd/pull/9905/files we handled the condition where we were getting
errdefs.ErrUnavailablei think we need to handle this case as wellcontainerd/client/container.go
Lines 398 to 400 in b0d00f8