Skip to content

Conversation

@fuweid
Copy link
Member

@fuweid fuweid commented Aug 11, 2023

integration: add case to reproduce #7496

When the shim unmounts overlayfs rootfs, kernel will force syncfs if there is no volatile option. In order to reproduce the high IO pressure, this patch uses strace to delay the umount2 syscall.

NOTE: I don't merge three commits into one because it's easy to backport to v1.6.

Fixes: #7496 #8931

integration: add ShouldRetryShutdown case based on #7496

Within current design, if the shim is killed before task-service.Delete API call, the callback on connect close will send 137 exit code because the callback doesn't have any context about container's exit code.

if response != nil {
pid = response.Pid
exitStatus = response.Status
exitedAt = response.Timestamp
} else {
exitStatus = 255
exitedAt = time.Now()
}
events.Publish(ctx, runtime.TaskExitEventTopic, &eventstypes.TaskExit{
ContainerID: id,
ID: id,
Pid: pid,
ExitStatus: exitStatus,
ExitedAt: protobuf.ToTimestamp(exitedAt),
})

And the moby/moby can't handle duplicate exit event well. Let's say that the moby receives exit code 0 at first and then the duplicate exit event with different code 137 can override 0, as the #4769 described.

I think the best solution to avoid leaky issue is to redesign the task-service.Delete API, remove the async callback and then let the caller retry. Since the task in shim can't be restart, we can cache the exit code in container bundle so that the shim.Delete binary call and read it and return exit code correctly. However, it doesn't work with running shim server.

In order to prevent from regression like #4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

@mikebrow @dmcgowan @mxpv @AkihiroSuda @thaJeztah @laurazard

fuweid added 4 commits August 11, 2023 17:41
Since the moby/moby can't handle duplicate exit event well, it's hard
for containerd to retry shutdown if there is error, like context
canceled.

In order to prevent from regression like containerd#4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

Signed-off-by: Wei Fu <[email protected]>
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have a feeling moby needs a similar patch.

@fuweid
Copy link
Member Author

fuweid commented Aug 16, 2023

I have a feeling moby needs a similar patch.

Basically, yes. The leaky shim can be cleanup after containerd restart by the way.

Thanks for the review.

@fuweid fuweid merged commit ba852fa into containerd:main Aug 17, 2023
@fuweid fuweid added cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch area/cri Container Runtime Interface (CRI) labels Aug 17, 2023
@fuweid fuweid deleted the fix-shim-leak branch August 17, 2023 00:17
@fuweid fuweid added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch ok-to-test

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

shim process leaked when the host having high disk i/o usage

3 participants