Description
Containerd does not release inodes in /run/containerd/fifo, if containers are created and removed in a loop via the ctr cli. This leads to 100% inode usage on the /run file system over time which prevents containerd to create new containers.
Steps to reproduce the issue:
- The described behavior was noticed during a stress test which executed the following commands in a loop and stored the results as csv. The csv output was monitored for failures. In case of an exit code not equal to 0, up to 10 retries are executed before giving up that iteration.
ctr run -d docker.io/library/alpine:latest test.${iteration} sleep 60
sleep 0.1
ctr tasks pause test.${iteration}
sleep 0.1
ctr tasks resume test.${iteration}
sleep 0.1
ctr tasks kill -s SIGKILL test.${iteration}
sleep 0.1
ctr container delete test.${iteration}
-
During the test execution system metics (based on /proc file system and df command) were collected every 10 minutes and stored as csv.
-
After multiple failed iterations appeared in the log, the csv files were analyzed.
Describe the results you received:
After running the loop for 2 days and 17 min (126.064 container iterations) containerd could no longer start new containers.
The error reported by ctr was:
126065,1529869634,ctr:,OCI,runtime,create,failed:,no,space,left,on,device:,unknown,real,0.134,user,0.004,sys,0.024,1,ctr run -d docker.io/library/alpine:latest test.126065 sleep 60
During the first 126.064 cycles only in iteration 87631 and 116278 the following error occurred and could be resolved by the first retry (probably a timing problem).
87631,1529817767,real,0.292,user,0.016,sys,0.012,0,ctr run -d docker.io/library/alpine:latest test.87631 sleep 60
87631,1529817768,real,0.073,user,0.016,sys,0.000,0,ctr tasks pause test.87631
87631,1529817768,real,0.248,user,0.008,sys,0.008,0,ctr tasks resume test.87631
87631,1529817768,real,0.367,user,0.012,sys,0.004,0,ctr tasks kill -s SIGKILL test.87631
87631,1529817769,time="2018-06-24T05:22:49Z",level=error,msg="failed,to,delete,container,"test.87631"",error="cannot,delete,a,non,stopped,container:,{running,0,0001-01-01,00:00:00,+0000,UTC}",ctr:,cannot,delete,a,non,stopped,container:,{running,0,0001-01-01,00:00:00,+0000,UTC},real,0.299,user,0.004,sys,0.012,1,ctr container delete test.87631
87631,1529817770,real,0.774,user,0.012,sys,0.004,0,ctr container delete test.87631
The analysis of /run/containerd showed the following:
# created 126074 folders
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# ls -l | wc -l
126074
# each folder seem to contain pipes to already deleted containers
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# ls -lah 000001714
total 0
drwx------ 2 root root 100 Jun 23 08:07 .
drwx------ 126075 root root 2.5M Jun 24 20:09 ..
prwx------ 1 root root 0 Jun 23 08:07 test.32583-stderr
prwx------ 1 root root 0 Jun 23 08:07 test.32583-stdin
prwx------ 1 root root 0 Jun 23 08:07 test.32583-stdout
It looks like /run/containerd/fifo contains pipes to containers and they were not removed completely.
By checking the filesystem usage on /run I could see that there is space left, but no inodes were available any longer.
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# df -i /run
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 505078 505066 12 100% /run
root@kube-tor01-crcb1c41560b7245269aa7262e7882c27f-w2:/run/containerd/fifo# df -h /run
Filesystem Size Used Avail Use% Mounted on
tmpfs 395M 41M 354M 11% /run
Plotting the file system usage of /run shows, that inodes were not released.

To verify the problem I rebooted the system which cleaned the /run tmpfs and explicitly deleted the task before deleting the container. An additional folder remains containing manual-stderr, manual-stdin and manual-stdout.
$ ls -lah /run/containerd/fifo/ | wc -l
28
$ ctr run -d docker.io/library/alpine:latest manual sleep 60
$ ctr tasks pause manual
$ ctr tasks resume manual
$ ctr tasks kill -s SIGKILL manual
$ ctr tasks delete manual
$ ctr container delete manual
$ ls -lah /run/containerd/fifo/ | wc -l
29
Those old folders can be removed by root afterwards manually, which release 4 inodes per folder:
$ df -i . && ls | wc && rm -rf 497171102 && df -i . && ls | wc
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 505078 864 504214 1% /run
24 24 240
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 505078 860 504218 1% /run
23 23 230
Describe the results you expected:
Containerd should not show any degradation over time and should not fail to create new containers.
Output of containerd --version:
$ ctr version
Client:
Version: v1.1.0
Revision: 209a7fc3e4a32ef71a8c7b50c68fc8398415badf
Server:
Version: v1.1.0
Revision: 209a7fc3e4a32ef71a8c7b50c68fc8398415badf
$ containerd --version
containerd github.com/containerd/containerd v1.1.0 209a7fc3e4a32ef71a8c7b50c68fc8398415badf
BUG REPORT INFORMATION
Ubuntu and Linux Kernel version
$ uname -a
Linux kube-tor01-xxx-w2.cloud.ibm 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
The system holds 4 GB of RAM, the swap space is disabled.
Description
Containerd does not release inodes in /run/containerd/fifo, if containers are created and removed in a loop via the ctr cli. This leads to 100% inode usage on the /run file system over time which prevents containerd to create new containers.
Steps to reproduce the issue:
During the test execution system metics (based on /proc file system and
dfcommand) were collected every 10 minutes and stored as csv.After multiple failed iterations appeared in the log, the csv files were analyzed.
Describe the results you received:
After running the loop for 2 days and 17 min (126.064 container iterations) containerd could no longer start new containers.
The error reported by
ctrwas:During the first 126.064 cycles only in iteration 87631 and 116278 the following error occurred and could be resolved by the first retry (probably a timing problem).
The analysis of /run/containerd showed the following:
It looks like /run/containerd/fifo contains pipes to containers and they were not removed completely.
By checking the filesystem usage on /run I could see that there is space left, but no inodes were available any longer.
Plotting the file system usage of /run shows, that inodes were not released.
To verify the problem I rebooted the system which cleaned the /run tmpfs and explicitly deleted the task before deleting the container. An additional folder remains containing manual-stderr, manual-stdin and manual-stdout.
Those old folders can be removed by root afterwards manually, which release 4 inodes per folder:
Describe the results you expected:
Containerd should not show any degradation over time and should not fail to create new containers.
Output of
containerd --version:BUG REPORT INFORMATION
Ubuntu and Linux Kernel version
The system holds 4 GB of RAM, the swap space is disabled.