Image I'm using:
ami-035c9bfb9c905837c - but I believe this is with any AWS K8S variant. I've reproduced this on x86_64.
What I expected to happen:
The cluster nodes to work without filling up on memory and errors to come in the logs about "Timed out while waiting for systemd to remove" in the kubelet logs.
What actually happened:
Apr 26 21:38:55 ip-192-168-81-168.us-west-2.compute.internal kubelet[1373]: I0426 21:38:55.258940 1373 pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] err="unable to destroy cgroup paths for cgroup [kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] : Timed out while waiting for systemd to remove kubepods-besteffort-pod482dc361_4c7d_4502_b54d_4ce53af7089e.slice"
How to reproduce the problem:
I launch a cluster with 2 nodes, add in a some load (in this case simulated by a webserver and some pods calling the webserver) and add in a CronJob as follows:
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.28
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster; sleep $(( ( RANDOM % 10 ) + 1 ));
restartPolicy: OnFailure
hostNetwork: true
parallelism: 60
After some time, the memory on the node fills up with kubelet taking up GB of ram and the logs contain logs of errors with failing to clean up the cgroups.
This seems to be a problem when lots of pods are cycled through. It may also have to do with the CronJob terminating the container since just simulating load and deleting the running containers manually doesn't seem to trigger it.
This does not happen on 1.13.3.
Image I'm using:
ami-035c9bfb9c905837c- but I believe this is with any AWS K8S variant. I've reproduced this on x86_64.What I expected to happen:
The cluster nodes to work without filling up on memory and errors to come in the logs about "Timed out while waiting for systemd to remove" in the
kubeletlogs.What actually happened:
How to reproduce the problem:
I launch a cluster with 2 nodes, add in a some load (in this case simulated by a webserver and some pods calling the webserver) and add in a CronJob as follows:
After some time, the memory on the node fills up with
kubelettaking up GB of ram and the logs contain logs of errors with failing to clean up the cgroups.This seems to be a problem when lots of pods are cycled through. It may also have to do with the CronJob terminating the container since just simulating load and deleting the running containers manually doesn't seem to trigger it.
This does not happen on 1.13.3.