Skip to content

Conversation

@fuweid
Copy link
Member

@fuweid fuweid commented Aug 24, 2023

Cherry-pick: #8954

pkg/cri/sbserver: fix leaked shim issue for podsandbox mode: I change the content to use logrus instead of log.L.Infof.

integration: issue7496 case should work for runc.v2 only: The test case should skip when the default runtime is runc.v1 or io.containerd.runtime.v1.linux.

fuweid added 5 commits August 24, 2023 08:17
Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 5bdd9ca)
Signed-off-by: Wei Fu <[email protected]>
Fixes: containerd#7496 containerd#8931

Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 72bc63d)
Signed-off-by: Wei Fu <[email protected]>
Fixes: containerd#7496 containerd#8931

Uses logrus instead of log

Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 8dcb2a6)
Signed-off-by: Wei Fu <[email protected]>
Since the moby/moby can't handle duplicate exit event well, it's hard
for containerd to retry shutdown if there is error, like context
canceled.

In order to prevent from regression like containerd#4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 601699a)
Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 00ef8ba)
Signed-off-by: Wei Fu <[email protected]>
@fuweid
Copy link
Member Author

fuweid commented Aug 24, 2023

The test works in CI https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:602

=== RUN   TestIssue7496
    issue7496_linux_test.go:45: Checking CRI config's default runtime
    issue7496_linux_test.go:56: Create a pod config and run sandbox container
    issue7496_linux_test.go:64: [shim pid: 92854]: Injecting 12 seconds delay to umount2 syscall
strace: Process 92854 attached with 11 threads
    issue7496_linux_test.go:70: Create a container config and run container in a pod
    main_test.go:707: Image "registry.k8s.io/pause:3.8" already exists, not pulling.
[pid 92883] syscall_0x1b7(0xffffffffffffff9c, 0xc000140630, 0x1, 0x200, 0, 0) = 0
strace: Process 92902 attached
strace: Process 92902 detached
[pid 92883] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92902, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92857] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc0a8, 0x1, 0x200, 0, 0) = 0
strace: Process 92920 attached
strace: Process 92920 detached
[pid 92857] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92920, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
    issue7496_linux_test.go:79: Start to StopPodSandbox and RemovePodSandbox
[pid 92857] syscall_0x1b7(0xffffffffffffff9c, 0xc0001404e0, 0x1, 0x200, 0, 0) = 0
strace: Process 92926 attached
strace: Process 92926 detached
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=92914, si_uid=65535, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid 92858] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92926, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc888, 0x1, 0x200, 0, 0) = 0
strace: Process 92932 attached
strace: Process 92932 detached
[pid 92863] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92932, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] syscall_0x1b7(0xffffffffffffff9c, 0xc000140[618](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:619), 0x1, 0x200, 0, 0) = 0
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
strace: Process 92939 attached
strace: Process 92939 detached
[pid 92857] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92939, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92859] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92862] syscall_0x1b7(0xffffffffffffff9c, 0xc0001404e0, 0x1, 0x200, 0, 0) = 0
strace: Process 92946 attached
strace: Process 92946 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92946, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92858] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/f29938ca8136384ef6ef819cea5c9490da9ef478[619](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:620)d36c012ebd53c10e0bd21/rootfs", 0 <unfinished ...>
[pid 92861] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92858] <... umount2 resumed>)      = 0
[pid 92858] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/f29938ca813[638](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:639)4ef6ef819cea5c9490da9ef478619d36c012ebd53c10e0bd21/rootfs", 0) = -1 EINVAL (Invalid argument)
[pid 92858] syscall_0x1b7(0xffffffffffffff9c, 0xc0002dc7b0, 0x1, 0x200, 0, 0) = 0
strace: Process 92969 attached
strace: Process 92969 detached
[pid 92854] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=92877, si_uid=[655](https://github.com/containerd/containerd/actions/runs/5958713939/job/16163447453?pr=9003#step:18:656)35, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid 92883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92969, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92863] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92863] syscall_0x1b7(0xffffffffffffff9c, 0xc0001406c0, 0x1, 0x200, 0, 0) = 0
strace: Process 92975 attached
strace: Process 92975 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92975, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92862] syscall_0x1b7(0xffffffffffffff9c, 0xc00003c2e8, 0x1, 0x200, 0, 0) = 0
strace: Process 92981 attached
strace: Process 92981 detached
[pid 92862] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=92981, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 92857] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2/rootfs", 0strace: Process 92987 attached
 <unfinished ...>
[pid 92858] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] <... umount2 resumed>)      = 0
[pid 92857] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=92854, si_uid=0} ---
[pid 92857] umount2("/run/containerd-test/io.containerd.runtime.v2.task/k8s.io/cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2/rootfs", 0) = -1 EINVAL (Invalid argument)
[pid 92987] +++ exited with 0 +++
[pid 92857] +++ exited with 0 +++
[pid 92855] +++ exited with 0 +++
[pid 92883] +++ exited with 0 +++
[pid 92863] +++ exited with 0 +++
[pid 92861] +++ exited with 0 +++
[pid 92859] +++ exited with 0 +++
[pid 92856] +++ exited with 0 +++
[pid 92858] +++ exited with 0 +++
[pid 92862] +++ exited with 0 +++
[pid 92860] +++ exited with 0 +++
+++ exited with 0 +++
    issue7496_linux_test.go:153: Strace has exited
    issue7496_linux_test.go:103: PodSandbox cd3b4dc16de6ca9a2c221d69df9bb6b11a141b96ef7de1477d1b9066d74976c2 has been deleted and start to wait for strace exit
--- PASS: TestIssue7496 (49.84s)

@AkihiroSuda AkihiroSuda merged commit 5ee9839 into containerd:release/1.7 Aug 24, 2023
@fuweid fuweid deleted the cp-17-8954 branch August 24, 2023 04:23
fuweid added a commit to fuweid/containerd that referenced this pull request Nov 7, 2023
```go
// Delete the initial process and container
func (s *Service) Delete(ctx context.Context, r *ptypes.Empty) (*shimapi.DeleteResponse, error) {
        p, err := s.getInitProcess()
        if err != nil {
                return nil, err
        }
        if err := p.Delete(ctx); err != nil {
                return nil, errdefs.ToGRPC(err)
        }

	// The client might canceled the request but the shim service still
	// moved on. The `delete(s.processes, s.id)` was executed
	// successfully. So the next Delete call will return `container
	// must be created` error. The client side should ignore this
	// issue.

        s.mu.Lock()
        delete(s.processes, s.id)
        s.mu.Unlock()
        s.platform.Close()
        return &shimapi.DeleteResponse{
                ExitStatus: uint32(p.ExitStatus()),
                ExitedAt:   protobuf.ToTimestamp(p.ExitedAt()),
                Pid:        uint32(p.Pid()),
        }, nil
}
```

introduced by containerd#9003
fixes: containerd#9309

Signed-off-by: Wei Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants