Description
First, I am not sure it's a bug report, the problem occur should satisfy this conditions:
- put container rootfs on a remote disk
- use golang 1.12 to compile containerd
generally, the fd leak won't get probelm, only use remote disk, this will make remote disk fail to umount.
Here is the detail description:
the remote disk cannot be deleted since file fd was hold by some shim.
#ls -l /proc/7418/fd
total 0
lr-x------ 1 root root 64 Jul 30 00:15 0 -> /dev/null
l-wx------ 1 root root 64 Jul 30 00:15 1 -> /dev/null
lrwx------ 1 root root 64 Jul 30 00:15 10 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Jul 30 00:15 11 -> socket:[54356]
lrwx------ 1 root root 64 Jul 30 00:15 12 -> socket:[46343]
lr-x------ 1 root root 64 Jul 30 00:15 13 -> pipe:[46346]
l--------- 1 root root 64 Jul 30 00:15 14 -> /run/containerd/fifo/840409537/16020bb7d61a0bb1c93402249ab0476f1c3c2689ab963feefb75fb915b9128a7-stdout
lr-x------ 1 root root 64 Jul 30 00:15 15 -> pipe:[46347]
l-wx------ 1 root root 64 Jul 30 00:15 16 -> /run/containerd/fifo/840409537/16020bb7d61a0bb1c93402249ab0476f1c3c2689ab963feefb75fb915b9128a7-stdout
l--------- 1 root root 64 Jul 30 00:15 17 -> /run/containerd/fifo/840409537/16020bb7d61a0bb1c93402249ab0476f1c3c2689ab963feefb75fb915b9128a7-stdout
lr-x------ 1 root root 64 Jul 30 00:15 18 -> /run/containerd/fifo/840409537/16020bb7d61a0bb1c93402249ab0476f1c3c2689ab963feefb75fb915b9128a7-stdout
lr-x------ 1 root root 64 Jul 30 00:15 19 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282 (deleted)
l-wx------ 1 root root 64 Jul 30 00:15 2 -> /dev/null
lr-x------ 1 root root 64 Jul 30 00:15 20 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 21 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 22 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 23 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys/dragoon (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 24 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys/dragoon/libexec (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 25 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys/dragoon/libexec/hwqc (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 26 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys/dragoon/libexec/hwqc/lib (deleted)
lr-x------ 1 root root 64 Jul 30 00:15 27 -> /mnt/aliyun-disk/pvc-1afa9e8d-b216-11e9-bbef-42737b8c90e6/.rootDir/rm-282/fs/usr/alisys/dragoon/libexec/hwqc/lib/ali_algor_t
the containerd was gc snapshotters when this shim is created, in golang 1.12 os.RemoveAll will open all file without O_CLOEXEC flag. The shim creat during this time will inherit these fds but not close.
golang code has changed implement from go1.11 -> go 1.12, go 1.11 not have the problem, the detail in golang/go#33405
I tested both v1/v2 shim, all got this problem.
we think two ways to reslove the problem:
- make shim close unused fds(in case of other leaks), I try this, but found two fd looks like opened by ttrpc, I do not know how to filter un-used fds
0: /dev/null
1: pipe:[5784162]
2: pipe:[5784163]
3: socket:[5784159]
5: anon_inode:[eventpoll]
6: anon_inode:[eventpoll]
I know 0-3 is stdio and socket, but do not know 5-6 for what used
- update golang version to newly released version , go1.12.8
/cc @fuweid @yyb196 @rudyfly
Describe the results you received:
Describe the results you expected:
Output of containerd --version:
Any other relevant information:
Description
First, I am not sure it's a bug report, the problem occur should satisfy this conditions:
generally, the fd leak won't get probelm, only use remote disk, this will make remote disk fail to umount.
Here is the detail description:
the remote disk cannot be deleted since file fd was hold by some shim.
the containerd was gc snapshotters when this shim is created, in golang 1.12
os.RemoveAllwill open all file withoutO_CLOEXECflag. The shim creat during this time will inherit these fds but not close.golang code has changed implement from go1.11 -> go 1.12, go 1.11 not have the problem, the detail in golang/go#33405
I tested both v1/v2 shim, all got this problem.
we think two ways to reslove the problem:
I know 0-3 is stdio and socket, but do not know 5-6 for what used
/cc @fuweid @yyb196 @rudyfly
Describe the results you received:
Describe the results you expected:
Output of
containerd --version:Any other relevant information: