Reported by @lbernail.
After a node reboot, all network namespaces in /var/run/netns will be wiped out.
In that case, we'll skip teardown the cni network, because the network namespace has already gone. https://github.com/containerd/cri/blob/master/pkg/server/sandbox_stop.go#L64
However, if host-local ipam is being used, it checkpoints allocated ip in /var/lib/cni/containerd-net/, and the checkpoint won't be wiped out, e.g.
$ ls /var/lib/cni/networks/containerd-net/
10.88.0.2 last_reserved_ip.0 lock
If we skip teardown, the ip will be leaked forever.
To fix this problem, we need to:
- Teardown the network even if network namespace doesn't exist.
- Ignore "no such file or directory" error in go-cni.
Actually, we should be able to pass in empty network namespace if it doesn't exist (see the PR). However, the behavior is not consistent across different plugins right now, at least loopback and host-device still returns error. And we are using loopback by default.
I've filed containernetworking/plugins#210 for this. Before that is fixed, let's just ignore the "no such file or directory" error, which the cni libraries in kubernetes are doing as well: cni, kubenet.
We should fix this and cherrypick into all supported branches.