You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For Wait, we can leave the timeout to the caller, and it is intuitive for a Wait caller to set a timeout.
This is important because today if a containerd-shim hangs:
Batch container operations will hang, e.g. ctr task ls, CRI plugin restart recovery etc.
Any operation to the hang containerd-shim will cause a containerd daemon-wise goroutine leakage.
I think it is less serious to leak goroutine in containerd-shim, because it only affects one container. However, leaking goroutine in containerd seems pretty bad to me. And think about users running periodic exec or state, there will be tons of goroutines leaked.
And we do need to consider the case that containerd-shim may become unresponsive, because:
Containerd-shim can be plugins in the future, the behavior will be even more unexpected. We may not want people come to containerd repo, file issue about hanging containerd, but turns out to be the containerd-shim they are using hangs.
No matter in shim v1 (https://github.com/containerd/containerd/blob/master/runtime/v1/shim/v1/shim.proto) or v2 (https://github.com/containerd/containerd/blob/master/runtime/v2/task/shim.proto), the only long running method is
Wait. It makes sense to set a default timeout for all short running methods.For
Wait, we can leave the timeout to the caller, and it is intuitive for aWaitcaller to set a timeout.This is important because today if a containerd-shim hangs:
ctr task ls, CRI plugin restart recovery etc.I think it is less serious to leak goroutine in containerd-shim, because it only affects one container. However, leaking goroutine in containerd seems pretty bad to me. And think about users running periodic
execorstate, there will be tons of goroutines leaked.And we do need to consider the case that containerd-shim may become unresponsive, because:
@containerd/containerd-maintainers
And there is one blocker containerd/ttrpc#3.