Skip to content

containerd-shim processes are leaking inotify instances with cgroups v2 #5670

@bharathguvvala

Description

@bharathguvvala

Description

containerd-shim processes are leaking inotify instances when setup with cgroups v2. The following is the count of the instances held by the shim processes (other processes have been filtered out from the output). The count only goes up as time progresses. With cgroups v1 there are no inotify instances being held by the container-shim processes. I suspect that this could be due to a runc PR to listen to oom events which uses the inotify mechanism. but that code is cleanly handling the inotify instances and closing them. Since I have not much insight into the containerd-shim and runc interaction, I suspect this could be due to some corner case in the interaction that's causing this.

To add more context, this is running on a kubernetes node with containerd as the runtime. Pods which are crash looping are contributing more to the problem.

$ find /proc/*/fd -lname anon_inode:inotify | cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c %a %P' -p '{}' | uniq -c | sort -nr | grep root

COUNT PID USER COMMAND COMMAND PPID
46 3276111 root containerd-shim /usr/local/bin/containerd-s 1
15 1585532 root containerd-shim /usr/local/bin/containerd-s 1
14 19306 root containerd-shim /usr/local/bin/containerd-s 1
14 19279 root containerd-shim /usr/local/bin/containerd-s 1
7 2937840 root containerd-shim /usr/local/bin/containerd-s 1
3 28459 root containerd-shim /usr/local/bin/containerd-s 1
3 19232 root containerd-shim /usr/local/bin/containerd-s 1
2 3251811 root containerd-shim /usr/local/bin/containerd-s 1
2 19200 root containerd-shim /usr/local/bin/containerd-s 1
2 19156 root containerd-shim /usr/local/bin/containerd-s 1
1 3602946 root containerd-shim /usr/local/bin/containerd-s 1
1 3171279 root containerd-shim /usr/local/bin/containerd-s 1
1 213009 root containerd /usr/local/bin/containerd 1

Steps to reproduce the issue:

  1. Setup containerd on linux machine with cgroups v2
  2. List the inotify instances held by the containerd-shims and observe over a period of time

Describe the results you received:
The inotify instances are increasing in an unbounded manner causing concern that the fs.inotify.max_user_instances would be breached over time. The impact on containerd-shim when the limit breaches is not known to me right now. If this were in kubernetes, 'kubectl logs -f' to tail logs on any pod on that node would fail unable to create a new inotify instance.

Describe the results you expected:
The number of instances should be bounded and should not increase/leak.

What version of containerd are you using:

$ containerd --version
containerd github.com/containerd/containerd v1.5.0 8c906ff108ac28da23f69cc7b74f8e7a470d1df0

Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):

runc --version
$ runc --version
runc version 1.0.0-rc93
commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
spec: 1.0.2-dev
go: go1.16.3
libseccomp: 2.3.3
uname -a
$ uname -a
Linux sparrow-hyd-playground-1-fk-prod-nodes-4-8340279 5.10.0-0.bpo.3-cloud-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions