Skip to content

Conversation

@fuweid
Copy link
Member

@fuweid fuweid commented Sep 7, 2020

The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.

The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.

And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.

Signed-off-by: Wei Fu [email protected]


@theopenlab-ci
Copy link

theopenlab-ci bot commented Sep 7, 2020

Build succeeded.

@fuweid fuweid force-pushed the update-shim-cleanup branch from 2e52f91 to 6a82da4 Compare September 9, 2020 16:28
@theopenlab-ci
Copy link

theopenlab-ci bot commented Sep 9, 2020

Build succeeded.

@fuweid fuweid marked this pull request as ready for review September 10, 2020 00:02
@fuweid
Copy link
Member Author

fuweid commented Sep 11, 2020

ping @crosbymichael @AkihiroSuda @cpuguy83 @mikebrow PTAL~

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

@fuweid I see you tagged ttrpc v1.0.2; https://github.com/containerd/ttrpc/releases/tag/v1.0.2 can you update the vendor.conf to use that tag?

The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.

The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.

And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.

Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the update-shim-cleanup branch from 6a82da4 to 4b05d03 Compare September 20, 2020 03:26
@theopenlab-ci
Copy link

theopenlab-ci bot commented Sep 20, 2020

Build succeeded.

@fuweid
Copy link
Member Author

fuweid commented Sep 20, 2020

@thaJeztah sorry for late update. I updated the vendor.conf and PTAL. Thanks!

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkihiroSuda
Copy link
Member

This commit seems causing a regression: #4769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants