Skip to content

Creation and removal of containers via ctr CLI does not release inodes in /run/containerd/fifo #2421

@konrad-ohms

Description

@konrad-ohms

Description
Containerd does not release inodes in /run/containerd/fifo, if containers are created and removed in a loop via the ctr cli. This leads to 100% inode usage on the /run file system over time which prevents containerd to create new containers.

Steps to reproduce the issue:

  1. The described behavior was noticed during a stress test which executed the following commands in a loop and stored the results as csv. The csv output was monitored for failures. In case of an exit code not equal to 0, up to 10 retries are executed before giving up that iteration.
ctr run -d docker.io/library/alpine:latest test.${iteration} sleep 60
sleep 0.1
ctr tasks pause test.${iteration}
sleep 0.1
ctr tasks resume test.${iteration}
sleep 0.1
ctr tasks kill -s SIGKILL test.${iteration}
sleep 0.1
ctr container delete test.${iteration}
  1. During the test execution system metics (based on /proc file system and df command) were collected every 10 minutes and stored as csv.

  2. After multiple failed iterations appeared in the log, the csv files were analyzed.

Describe the results you received:
After running the loop for 2 days and 17 min (126.064 container iterations) containerd could no longer start new containers.

The error reported by ctr was:

126065,1529869634,ctr:,OCI,runtime,create,failed:,no,space,left,on,device:,unknown,real,0.134,user,0.004,sys,0.024,1,ctr run -d docker.io/library/alpine:latest test.126065 sleep 60

During the first 126.064 cycles only in iteration 87631 and 116278 the following error occurred and could be resolved by the first retry (probably a timing problem).

87631,1529817767,real,0.292,user,0.016,sys,0.012,0,ctr run -d docker.io/library/alpine:latest test.87631 sleep 60
87631,1529817768,real,0.073,user,0.016,sys,0.000,0,ctr tasks pause test.87631
87631,1529817768,real,0.248,user,0.008,sys,0.008,0,ctr tasks resume test.87631
87631,1529817768,real,0.367,user,0.012,sys,0.004,0,ctr tasks kill -s SIGKILL test.87631
87631,1529817769,time="2018-06-24T05:22:49Z",level=error,msg="failed,to,delete,container,"test.87631"",error="cannot,delete,a,non,stopped,container:,{running,0,0001-01-01,00:00:00,+0000,UTC}",ctr:,cannot,delete,a,non,stopped,container:,{running,0,0001-01-01,00:00:00,+0000,UTC},real,0.299,user,0.004,sys,0.012,1,ctr container delete test.87631
87631,1529817770,real,0.774,user,0.012,sys,0.004,0,ctr container delete test.87631

The analysis of /run/containerd showed the following:

# created 126074 folders
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# ls -l | wc -l
126074
# each folder seem to contain pipes to already deleted containers
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# ls -lah 000001714
total 0
drwx------      2 root root  100 Jun 23 08:07 .
drwx------ 126075 root root 2.5M Jun 24 20:09 ..
prwx------      1 root root    0 Jun 23 08:07 test.32583-stderr
prwx------      1 root root    0 Jun 23 08:07 test.32583-stdin
prwx------      1 root root    0 Jun 23 08:07 test.32583-stdout

It looks like /run/containerd/fifo contains pipes to containers and they were not removed completely.

By checking the filesystem usage on /run I could see that there is space left, but no inodes were available any longer.

root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# df -i /run
Filesystem     Inodes  IUsed IFree IUse% Mounted on
tmpfs          505078 505066    12  100% /run
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# df -h /run
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           395M   41M  354M  11% /run

Plotting the file system usage of /run shows, that inodes were not released.

run_filesystem_usage

To verify the problem I rebooted the system which cleaned the /run tmpfs and explicitly deleted the task before deleting the container. An additional folder remains containing manual-stderr, manual-stdin and manual-stdout.

$ ls -lah /run/containerd/fifo/ | wc -l
28
$ ctr run -d docker.io/library/alpine:latest manual sleep 60
$ ctr tasks pause manual
$ ctr tasks resume manual
$ ctr tasks kill -s SIGKILL manual
$ ctr tasks delete manual
$ ctr container delete manual
$ ls -lah /run/containerd/fifo/ | wc -l
29

Those old folders can be removed by root afterwards manually, which release 4 inodes per folder:

$ df -i . && ls | wc && rm -rf 497171102 && df -i . && ls | wc
Filesystem     Inodes IUsed  IFree IUse% Mounted on
tmpfs          505078   864 504214    1% /run
     24      24     240
Filesystem     Inodes IUsed  IFree IUse% Mounted on
tmpfs          505078   860 504218    1% /run
     23      23     230

Describe the results you expected:
Containerd should not show any degradation over time and should not fail to create new containers.

Output of containerd --version:

$ ctr version
Client:
  Version:  v1.1.0
  Revision: 209a7fc3e4a32ef71a8c7b50c68fc8398415badf

Server:
  Version:  v1.1.0
  Revision: 209a7fc3e4a32ef71a8c7b50c68fc8398415badf

$ containerd --version
containerd github.com/containerd/containerd v1.1.0 209a7fc3e4a32ef71a8c7b50c68fc8398415badf


BUG REPORT INFORMATION

Ubuntu and Linux Kernel version

$ uname -a
Linux kube-tor01-xxx-w2.cloud.ibm 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

The system holds 4 GB of RAM, the swap space is disabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions