Skip to content

[Flaky test] [sig-node] Pods Extended Pod Container lifecycle should not create extra sandbox if all containers are done #106904

@knight42

Description

@knight42

Which jobs are flaking?

pull-kubernetes-e2e-kind-ipv6

Which tests are flaking?

[sig-node] Pods Extended Pod Container lifecycle should not create extra sandbox if all containers are done

Since when has it been flaking?

Not sure

Testgrid link

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/106860/pull-kubernetes-e2e-kind-ipv6/1468756327072272384

The board also shows that this test usually flakes:
https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-e2e-kind-ipv6

https://storage.googleapis.com/k8s-triage/index.html?pr=1&sig=node&test=if%20all%20containers%20are%20done

Reason for failure (if possible)

I think the flaky test [sig-node] Pods Extended Pod Container lifecycle should not create extra sandbox if all containers are done is caused by the short timeout:

ginkgo.By("Getting events about the pod")
framework.ExpectNoError(wait.Poll(time.Second*2, time.Second*60, func() (bool, error) {
selector := fields.Set{
"involvedObject.kind": "Pod",
"involvedObject.uid": string(createdPod.UID),
"involvedObject.namespace": f.Namespace.Name,
"source": "kubelet",
}.AsSelector().String()
options := metav1.ListOptions{FieldSelector: selector}
eventList, err = f.ClientSet.CoreV1().Events(f.Namespace.Name).List(context.TODO(), options)
if err != nil {
return false, err
}
if len(eventList.Items) > 0 {
return true, nil
}
return false, nil
}))

In the job, the test aborted with the timeout error at 02:01:26:

Dec  9 02:01:26.931: INFO: Pod "pod-always-succeed14e323b0-bc50-47ea-b94d-4ab967e86daf" satisfied condition "Succeeded or Failed"
�[1mSTEP�[0m: Getting events about the pod
Dec  9 02:02:26.968: FAIL: Unexpected error:
    <*errors.errorString | 0xc000238280>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

but the kubelet reported the container created event at 02:02:28 according to the log:

Dec 09 02:02:28 kind-worker kubelet[249]: E1209 02:02:28.266113     249 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"pod-alway
s-succeed14e323b0-bc50-47ea-b94d-4ab967e86daf.16bef3b6faf11c40", GenerateName:"", Namespace:"pods-2272", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0
, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]stri
ng(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"pods-2272", Name:"pod-always-succeed14e323b0-bc50-47ea-b94d-4ab967e86daf", UID:"af059
33a-30c6-4f04-bc87-ec5c904f2bd9", APIVersion:"v1", ResourceVersion:"13320", FieldPath:"spec.initContainers{foo}"}, Reason:"Created", Message:"Created container foo", Source:v1.EventSource{Component:"kubelet", Hos
t:"kind-worker"}, FirstTimestamp:time.Date(2021, time.December, 9, 2, 1, 21, 629142080, time.Local), LastTimestamp:time.Date(2021, time.December, 9, 2, 1, 21, 629142080, time.Local), Count:1, Type:"Normal", Event
Time:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "pod-always-succeed1
4e323b0-bc50-47ea-b94d-4ab967e86daf.16bef3b6faf11c40" is forbidden: unable to create new content in namespace pods-2272 because it is being terminated' (will not retry!)

Anything else we need to know?

No response

Relevant SIG(s)

/sig node

Metadata

Metadata

Assignees

Labels

kind/flakeCategorizes issue or PR as related to a flaky test.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions