[cri] podSandboxStats should return errdefs.ErrUnavailable without wrapping#9897
Conversation
…apping Signed-off-by: Davanum Srinivas <[email protected]>
|
The bug here seems to be on the check side rather than generation side. The wrapping is done correctly, somewhere isn't unwrapping though like it should. |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
|
||
| if sandbox.Status.Get().State != sandboxstore.StateReady { | ||
| return nil, fmt.Errorf("failed to get pod sandbox stats since sandbox container %q is not in ready state: %w", meta.ID, errdefs.ErrUnavailable) | ||
| log.G(ctx).WithField("podsandboxid", sandbox.ID).Debugf("failed to get pod sandbox stats since sandbox container %q is not in ready state", meta.ID) |
There was a problem hiding this comment.
is it client side issue? I mean that we use fmt.Errorf(%w) to mark root cause as ErrUnavailable. The client side should unwrap error.
There was a problem hiding this comment.
@fuweid not a client side issue. please see where this is called from within containerd. (wrote a long PR text to describe why this is needed)
There was a problem hiding this comment.
Sorry, I didn't follow the description.
containerd/internal/cri/server/sandbox_stats_list.go
Lines 39 to 49 in 9a2b855
Checked the main branch and we did expect err to unavailable.
And the error looks weird because it doesn't have unavailable information. Do you know how to check what containerd version used in CI?
: failed to get pod sandbox stats since sandbox container \"c821a1cfa253eca57db68a1ae85184359d1422c7595982e32f47addeb8d8a78e\" is not in ready state"
There was a problem hiding this comment.
@fuweid yes originally i saw it in 1.7.13, but inspecting the code in main i see the issue can be here as well. right?
There was a problem hiding this comment.
I'd like to fix it first here and then cherry pick back to 1.7 branch
There was a problem hiding this comment.
I don't think so. main doesn't have the issue you mentioned. please provide the log. thanks
There was a problem hiding this comment.
For the ErrUnabailable error, we did use errors.Is to check. That's why the issue has been fixed.
Based on your description, the error is like
no running task found
I think we should skip task not found error instead of unavailable.
|
ok i had forgotten the unwrap/compare!! thanks |
kubelet is calling CRI's
ListPodSandboxStats, when this method is looping through sandbox(es), it callspodSandboxStatsand expects to get info about sandboxes that areStateReady.ListPodSandboxStatsexpectserrdefs.ErrUnavailablewhen sandbox is NOTStateReady. But in the current implementation, the ErrUnavailable is wrapped into another one usingfmt.Errorfso theListPodSandboxStatsfails fast without skipping the sandboxes that are not ready.from: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-alpha-enabled-default/1762909002125021184
test:
Kubernetes e2e suite: [It] [sig-node] NodeProblemDetector [NodeFeature:NodeProblemDetector] should run without errorlog snippet: