Skip to content

High memory utilization on 1.13.4 release  #3057

@yeazelm

Description

@yeazelm

Image I'm using:
ami-035c9bfb9c905837c - but I believe this is with any AWS K8S variant. I've reproduced this on x86_64.

What I expected to happen:
The cluster nodes to work without filling up on memory and errors to come in the logs about "Timed out while waiting for systemd to remove" in the kubelet logs.

What actually happened:

Apr 26 21:38:55 ip-192-168-81-168.us-west-2.compute.internal kubelet[1373]: I0426 21:38:55.258940    1373 pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] err="unable to destroy cgroup paths for cgroup [kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] : Timed out while waiting for systemd to remove kubepods-besteffort-pod482dc361_4c7d_4502_b54d_4ce53af7089e.slice"

How to reproduce the problem:
I launch a cluster with 2 nodes, add in a some load (in this case simulated by a webserver and some pods calling the webserver) and add in a CronJob as follows:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox:1.28
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster; sleep $(( ( RANDOM % 10 )  + 1 ));
          restartPolicy: OnFailure
          hostNetwork: true
      parallelism: 60

After some time, the memory on the node fills up with kubelet taking up GB of ram and the logs contain logs of errors with failing to clean up the cgroups.

This seems to be a problem when lots of pods are cycled through. It may also have to do with the CronJob terminating the container since just simulating load and deleting the running containers manually doesn't seem to trigger it.

This does not happen on 1.13.3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/coreIssues core to the OS (variant independent)status/in-progressThis issue is currently being worked ontype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions