-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[releases/1.7] *: fix leaked shim caused by high IO pressure #9003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 5bdd9ca) Signed-off-by: Wei Fu <[email protected]>
Fixes: containerd#7496 containerd#8931 Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 72bc63d) Signed-off-by: Wei Fu <[email protected]>
Fixes: containerd#7496 containerd#8931 Uses logrus instead of log Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 8dcb2a6) Signed-off-by: Wei Fu <[email protected]>
Since the moby/moby can't handle duplicate exit event well, it's hard for containerd to retry shutdown if there is error, like context canceled. In order to prevent from regression like containerd#4769, I add skipped integration case as TODO item and we should rethink about how to handle the task/shim lifecycle. Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 601699a) Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 00ef8ba) Signed-off-by: Wei Fu <[email protected]>
dcantah
approved these changes
Aug 24, 2023
Signed-off-by: Wei Fu <[email protected]>
Member
Author
|
The test works in CI https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:602 === RUN TestIssue7496
issue7496_linux_test.go:45: Checking CRI config's default runtime
issue7496_linux_test.go:56: Create a pod config and run sandbox container
issue7496_linux_test.go:64: [shim pid: 92854]: Injecting 12 seconds delay to umount2 syscall
strace: Process 92854 attached with 11 threads
issue7496_linux_test.go:70: Create a container config and run container in a pod
main_test.go:707: Image "registry.k8s.io/pause:3.8" already exists, not pulling.
[pid 92883] syscall_0x1b7(0xffffffffffffff9c, 0xc000140630, 0x1, 0x200, 0, 0) = 0
strace: Process 92902 attached
strace: Process 92902 detached
[pid 92883] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92902, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92857] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc0a8, 0x1, 0x200, 0, 0) = 0
strace: Process 92920 attached
strace: Process 92920 detached
[pid 92857] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92920, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
issue7496_linux_test.go:79: Start to StopPodSandbox and RemovePodSandbox
[pid 92857] syscall_0x1b7(0xffffffffffffff9c, 0xc0001404e0, 0x1, 0x200, 0, 0) = 0
strace: Process 92926 attached
strace: Process 92926 detached
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=92914, si_uid=65535, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid 92858] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92926, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc888, 0x1, 0x200, 0, 0) = 0
strace: Process 92932 attached
strace: Process 92932 detached
[pid 92863] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92932, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] syscall_0x1b7(0xffffffffffffff9c, 0xc000140[618](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:619), 0x1, 0x200, 0, 0) = 0
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
strace: Process 92939 attached
strace: Process 92939 detached
[pid 92857] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92939, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92859] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92862] syscall_0x1b7(0xffffffffffffff9c, 0xc0001404e0, 0x1, 0x200, 0, 0) = 0
strace: Process 92946 attached
strace: Process 92946 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92946, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92858] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/f29938ca8136384ef6ef819cea5c9490da9ef478[619](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:620)d36c012ebd53c10e0bd21/rootfs", 0 <unfinished ...>
[pid 92861] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] <... umount2 resumed>) = 0
[pid 92858] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/f29938ca813[638](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:639)4ef6ef819cea5c9490da9ef478619d36c012ebd53c10e0bd21/rootfs", 0) = -1 EINVAL (Invalid argument)
[pid 92858] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc7b0, 0x1, 0x200, 0, 0) = 0
strace: Process 92969 attached
strace: Process 92969 detached
[pid 92854] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=92877, si_uid=[655](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:656)35, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92969, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] syscall_0x1b7(0xffffffffffffff9c, 0xc0001406c0, 0x1, 0x200, 0, 0) = 0
strace: Process 92975 attached
strace: Process 92975 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92975, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92862] syscall_0x1b7(0xffffffffffffff9c, 0xc00003c2e8, 0x1, 0x200, 0, 0) = 0
strace: Process 92981 attached
strace: Process 92981 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92981, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92857] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2/rootfs", 0strace: Process 92987 attached
<unfinished ...>
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] <... umount2 resumed>) = 0
[pid 92857] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2/rootfs", 0) = -1 EINVAL (Invalid argument)
[pid 92987] +++ exited with 0 +++
[pid 92857] +++ exited with 0 +++
[pid 92855] +++ exited with 0 +++
[pid 92883] +++ exited with 0 +++
[pid 92863] +++ exited with 0 +++
[pid 92861] +++ exited with 0 +++
[pid 92859] +++ exited with 0 +++
[pid 92856] +++ exited with 0 +++
[pid 92858] +++ exited with 0 +++
[pid 92862] +++ exited with 0 +++
[pid 92860] +++ exited with 0 +++
+++ exited with 0 +++
issue7496_linux_test.go:153: Strace has exited
issue7496_linux_test.go:103: PodSandbox cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2 has been deleted and start to wait for strace exit
--- PASS: TestIssue7496 (49.84s) |
AkihiroSuda
approved these changes
Aug 24, 2023
fuweid
added a commit
to fuweid/containerd
that referenced
this pull request
Nov 7, 2023
```go
// Delete the initial process and container
func (s *Service) Delete(ctx context.Context, r *ptypes.Empty) (*shimapi.DeleteResponse, error) {
p, err := s.getInitProcess()
if err != nil {
return nil, err
}
if err := p.Delete(ctx); err != nil {
return nil, errdefs.ToGRPC(err)
}
// The client might canceled the request but the shim service still
// moved on. The `delete(s.processes, s.id)` was executed
// successfully. So the next Delete call will return `container
// must be created` error. The client side should ignore this
// issue.
s.mu.Lock()
delete(s.processes, s.id)
s.mu.Unlock()
s.platform.Close()
return &shimapi.DeleteResponse{
ExitStatus: uint32(p.ExitStatus()),
ExitedAt: protobuf.ToTimestamp(p.ExitedAt()),
Pid: uint32(p.Pid()),
}, nil
}
```
introduced by containerd#9003
fixes: containerd#9309
Signed-off-by: Wei Fu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick: #8954
pkg/cri/sbserver: fix leaked shim issue for podsandbox mode: I change the content to use logrus instead of log.L.Infof.
integration: issue7496 case should work for runc.v2 only: The test case should skip when the default runtime is runc.v1 or io.containerd.runtime.v1.linux.