=== BEGIN goroutine stack dump ===
goroutine 28 [running]:
github.com/containerd/containerd/cmd/containerd/command.dumpStacks(0xc0004f8301)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/cmd/containerd/command/main.go:336 +0x9a
github.com/containerd/containerd/cmd/containerd/command.handleSignals.func1(0xc000090540, 0xc0000904e0, 0x55889a1227f0, 0xc000122000, 0xc00010e180)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/cmd/containerd/command/main_unix.go:57 +0x2ba
created by github.com/containerd/containerd/cmd/containerd/command.handleSignals
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/cmd/containerd/command/main_unix.go:40 +0x8b
goroutine 1 [select]:
github.com/containerd/containerd/pkg/dialer.timeoutDialer(0xc000472667, 0x52, 0x174876e800, 0xc000472601, 0x59, 0x0, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/pkg/dialer/dialer.go:66 +0x145
github.com/containerd/containerd/runtime/v2/shim.AnonDialer(0xc000472660, 0x59, 0x174876e800, 0x59, 0x0, 0x0, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/shim/util_unix.go:79 +0x88
github.com/containerd/containerd/runtime/v2/shim.AnonReconnectDialer(...)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/shim/util_unix.go:84
github.com/containerd/containerd/runtime/v2/shim.Connect(...)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/shim/util.go:149
github.com/containerd/containerd/runtime/v2.loadShim(0x55889a122860, 0xc00009dc80, 0xc0004a6450, 0xc00049c028, 0xc000590540, 0xc0004c71d0, 0x0, 0x0, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/shim.go:70 +0x173
github.com/containerd/containerd/runtime/v2.(*TaskManager).loadTasks(0xc000090ba0, 0x55889a122860, 0xc00009dc80, 0x6, 0x55889a122860)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/manager.go:291 +0x9a8
github.com/containerd/containerd/runtime/v2.(*TaskManager).loadExistingTasks(0xc000090ba0, 0x55889a1227f0, 0xc000122000, 0x0, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/manager.go:234 +0x30a
github.com/containerd/containerd/runtime/v2.New(0x55889a1227f0, 0xc000122000, 0xc000499090, 0x42, 0xc0004990e0, 0x43, 0xc0000445a0, 0x1f, 0xc000040bd0, 0x25, ...)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/manager.go:97 +0x21a
github.com/containerd/containerd/runtime/v2.init.0.func1(0xc0003bca80, 0xc00004c440, 0x24, 0xc000509dc0, 0x1d)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/runtime/v2/manager.go:76 +0x245
github.com/containerd/containerd/plugin.(*Registration).Init(0xc000219560, 0xc0003bca80, 0x558899e15540)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/plugin/plugin.go:110 +0x3a
github.com/containerd/containerd/services/server.New(0x55889a1227f0, 0xc000122000, 0xc0004924e0, 0x1, 0x1, 0xc000040bd0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/services/server/server.go:168 +0xd25
github.com/containerd/containerd/cmd/containerd/command.App.func1(0xc0003d6840, 0xc0003d6840, 0xc0004c8a60)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/cmd/containerd/command/main.go:179 +0x77b
github.com/urfave/cli.HandleAction(0x558899e7a880, 0x55889a0c91e0, 0xc0003d6840, 0xc0003d6840, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:523 +0x107
github.com/urfave/cli.(*App).Run(0xc0003a2540, 0xc00012c000, 0x2, 0x2, 0x0, 0x0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:285 +0x655
main.main()
github.com/containerd/containerd/cmd/containerd/main.go:33 +0x51
goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x55889aaeec80)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/k8s.io/klog/v2/klog.go:1169 +0x8d
created by k8s.io/klog/v2.init.0
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/k8s.io/klog/v2/klog.go:417 +0xdf
goroutine 30 [syscall]:
os/signal.signal_recv(0x55889a105618)
/opt/hostedtoolcache/go/1.16.4/x64/src/runtime/sigqueue.go:168 +0xa5
os/signal.loop()
/opt/hostedtoolcache/go/1.16.4/x64/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
/opt/hostedtoolcache/go/1.16.4/x64/src/os/signal/signal.go:151 +0x46
goroutine 31 [select, 2 minutes]:
github.com/docker/go-events.(*Broadcaster).run(0xc00009a0a0)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:117 +0x1be
created by github.com/docker/go-events.NewBroadcaster
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:39 +0x1b5
goroutine 32 [select]:
github.com/containerd/containerd/gc/scheduler.(*gcScheduler).run(0xc000090840, 0x55889a1227f0, 0xc000122000)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:268 +0x18f
created by github.com/containerd/containerd/gc/scheduler.init.0.func1
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:132 +0x445
goroutine 50 [chan receive]:
github.com/containerd/containerd/pkg/dialer.timeoutDialer.func1(0xc000472720, 0xc0004726c0, 0xc000472667, 0x52, 0x174876e800)
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/pkg/dialer/dialer.go:58 +0x77
created by github.com/containerd/containerd/pkg/dialer.timeoutDialer
/home/runner/work/containerd/containerd/src/github.com/containerd/containerd/pkg/dialer/dialer.go:49 +0xc5
=== END goroutine stack dump ===
Description
I am using containerd as K8S container runtime. It works fine with my configuration, but after a node reboot, it will take hours to wait containerd to finish startup.
Looks like it hangs on loading plugin
io.containerd.runtime.v2.taskfor hours.Steps to reproduce the issue:
reboot nowDescribe the results you received:
From the log you can see that it takes about 100 seconds for containerd to cleanup the first dead shim.
So I guess the more workload I have, the longer it takes to startup after reboot.
containerd logsDescribe the results you expected:
Containerd should startup after reboot in a few seconds.
What version of containerd are you using:
Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):
runc --versioncontainerd config.tomluname -rHere I captured the stack trace using
SIGUSR1, definitely the main goroutine stucks onloadExistingTasks.I will dig deeper to see why.
containerd stack dump