Description
While running containerd tests we wanted to capture the containerd-shim output to figure out how shim works. We found out that when doing so the tests started to hang when test runner waits for the containerd here (you can easily reproduce it via running TestDaemonRestart)
It turned out that tests hang while trying to Wait for the containerd daemon to stop/restart. This is due to the implementation of (*Cmd).Wait in the Go standard library, which will "wait for any copying to stdin or copying from stdout or stderr to complete".
Enabling the plugin.linux/shim_debug flag in the containerd configuration will make it wire the shim's stdout and stderr to the system ones.
Given the shim is designed to survive the killing of the server, we believe the following is happening:
- the tests try to kill the
containerd daemon;
- the deamon dies, but the shim survives;
- the tests hang while waiting for the shim's stdout/stderr to be flushed.
We have confirmed that kill -9ing the shim unlocks the tests.
In order to demonstrate the issue we have updated the TestDaemonRestart test in a containerd fork. After killing the shim process there is a containerd leftovers which one can clean up using this cleanup script
Steps to reproduce the issue:
- Clone https://github.com/gcapizzi/containerd
- Run the test -
sudo -E go test -run "TestDaemonRestart\b" -test.root=true
Describe the results you received:
Test times out
Describe the results you expected:
Test passes
Output of containerd --version:
containerd github.com/containerd/containerd v1.2.0-beta.2-27-gacced5d5.m acced5d58f61f342ef012643d6c5d6405f709f26.m
cc @gcapizzi
Description
While running containerd tests we wanted to capture the
containerd-shimoutput to figure out how shim works. We found out that when doing so the tests started to hang when test runner waits for the containerd here (you can easily reproduce it via runningTestDaemonRestart)It turned out that tests hang while trying to
Waitfor thecontainerddaemon to stop/restart. This is due to the implementation of(*Cmd).Waitin the Go standard library, which will "wait for any copying to stdin or copying from stdout or stderr to complete".Enabling the
plugin.linux/shim_debugflag in thecontainerdconfiguration will make it wire the shim's stdout and stderr to the system ones.Given the shim is designed to survive the killing of the server, we believe the following is happening:
containerddaemon;We have confirmed that
kill -9ing the shim unlocks the tests.In order to demonstrate the issue we have updated the
TestDaemonRestarttest in a containerd fork. After killing the shim process there is a containerd leftovers which one can clean up using this cleanup scriptSteps to reproduce the issue:
sudo -E go test -run "TestDaemonRestart\b" -test.root=trueDescribe the results you received:
Test times out
Describe the results you expected:
Test passes
Output of
containerd --version:cc @gcapizzi