Fix shim deadlock by keloyang · Pull Request #3529 · containerd/containerd

keloyang · 2019-08-14T12:31:10Z

shim.Reap and shim.Default.Wait may deadlock, and #2748 do't fix this, putting those two pieces of code together makes it easier to see the problem.

func Reap() error {
	now := time.Now()
	exits, err := sys.Reap(false)
	Default.Lock()
	for c := range Default.subscribers {
		for _, e := range exits {
line 44			c <- runc.Exit{
				Timestamp: now,
				Pid:       e.Pid,
				Status:    e.Status,
			}
		}
	}
	Default.Unlock()
	return err
}

func (m *Monitor) Wait(c *exec.Cmd, ec chan runc.Exit) (int, error) {
	for e := range ec {
		if e.Pid == c.Process.Pid {
			c.Wait()
line 85			m.Unsubscribe(ec)
			return e.Status, nil
		}
	}
}

if first code run to line 44, and the second code happened to run to line 85, those two pieces of code will deadlock.

the following will show how to reproduce

[root@centos0 c]# cat fork.c
 #include <unistd.h>
 #include <stdio.h>
 #include <stdlib.h>

 void sleep_always() {
    while(1){
        sleep(1);
    }
}


int childs(int num)
{
    int child_count = 0;
    pid_t pid;

refork:
    child_count++;
    pid = fork();
    if ( pid>0 )
    {
    }
    else if (pid == 0)
    {
        if ( child_count < num )
        {
            goto refork;
        }
    }
    else
    {
        exit(0);
    }
    return 0;
}

int defuncts(int num)
{
    int defunct_count = 0;
    pid_t pid;
    int idx = 0;
    for (idx =  0; idx < num; idx++)
    {
        pid = fork();
        if (pid == 0)
        {
            exit(0);
        }
    }
    return 0;
}

// usage ./fork child_num defunct_num
// e.g.  ./fork 1024 2048
int main(int argc, char *argv[])
{
    int child_num = 1;
    int defunct_num = 1;

    if (argc >= 2)
    {
        child_num = atoi(argv[1]);
    }

    if (argc >= 3)
    {
        defunct_num = atoi(argv[2]);
    }

    defuncts(defunct_num);
    childs(child_num);

    sleep_always();
}
[root@centos0 c]# gcc fork.c -o fork

[root@centos3 shim-sigchld]# cat test.sh
#!/bin/bash

i=0
while [ 1 ]
do
        docker rm fork >/dev/null 2>&1
        docker run  -tdi -v `pwd`:/home -w /home --name fork centos:7 sleep  1000
        docker exec fork ./fork 1024  20480 &
        sleep 10
        docker stop -t 1 fork
        ((i++))
        echo "test ${i}"
done

at last , execute the shell script, and you will find the hung.

./test.sh

the other branches may have this problem too.

Signed-off-by: Shukui Yang [email protected]

shim.Reap and shim.Default.Wait may deadlock, use Monitor.Notify to fix this issue. Signed-off-by: Shukui Yang <[email protected]>

keloyang · 2019-08-15T04:20:29Z

I think this is a very serious problem, maybe a CVE, It lead to a k8s node not ready.

how to make k8s node not ready?

[root@centos2 shim-sigchld]#  cat kill
#!/bin/bash

kill -9 `ps -eaf|grep fork|grep -v grep|awk '{print $2}'`
[root@centos2 shim-sigchld]#  chmod u+x kill
[root@centos2 shim-sigchld]# cat Dockerfile
FROM centos:7

COPY fork /

[root@centos2 shim-sigchld]# docker build -t fork .
[root@centos2 shim-sigchld]# cat fork.yaml
apiVersion: v1
kind: Pod
metadata:
  name: fork
  namespace: default
spec:
  nodeName: centos2
  containers:
  - name: fork
    image: fork:latest
    imagePullPolicy: IfNotPresent
    command:
    - "sleep"
    - "300000"

[root@centos2 shim-sigchld]#  kubectl create -f fork.yaml
pod/fork created
[root@centos2 shim-sigchld]#  kubectl get pod
NAME      READY     STATUS              RESTARTS   AGE
fork      0/1       ContainerCreating   0          4s
[root@centos2 shim-sigchld]#  kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
fork      1/1       Running   0          6s

It may take a few attempts for the following two steps until you get the problem

[root@centos2 shim-sigchld]#  kubectl exec fork /fork 1024 20480 &
[root@centos2 shim-sigchld]#  kubectl exec fork /kill

if you find kubectl exec fork ...　hung, please check the docker container too.

[root@centos2 shim-sigchld]# docker ps|grep fork
554ab7b69187        c08ec68b3921           "sleep 20000000"         23 minutes ago      Up 23 minutes                              k8s_fork_fork_default_02f5c1b4-bf0a-11e9-9dae-8cec4b5ed562_0
7f3a2b5747c2        k8s.gcr.io/pause:3.1   "/pause"                 23 minutes ago      Up 23 minutes                              k8s_POD_fork_default_02f5c1b4-bf0a-11e9-9dae-8cec4b5ed562_0
[root@centos2 shim-sigchld]# docker inspect 554ab7b69187

after docker container hung, please wait, and check kubelet log, kubelet will log like this:

ug 15 11:46:12 centos2 kubelet[6904]: I0815 11:46:12.470631    6904 kubelet.go:1775] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m46.50397756s ago; threshold is 3m0s]

then we can see the state of k8s cluster

[root@centos0 containerd]# kubectl get node
NAME      STATUS     ROLES     AGE       VERSION
centos0   Ready      master    14d       v1.11.1
centos1   Ready      <none>    14d       v1.11.1
centos2   NotReady   <none>    14d       v1.11.1

use the following script, we can find the node of centos2 is NotReady every 3 min, and last 1min.

[root@centos2 shim-sigchld]# cat  ./time.sh
#!/bin/bash

while [ 1 ]
do
        kubectl get node|grep NotReady >/dev/null 2>&1
        if [ $? == 0 ];then
                date
        fi
        sleep 1
done
[root@centos2 shim-sigchld]# ./time.sh
Thu Aug 15 12:09:41 CST 2019
Thu Aug 15 12:09:43 CST 2019
Thu Aug 15 12:09:44 CST 2019
Thu Aug 15 12:09:45 CST 2019
Thu Aug 15 12:09:46 CST 2019
Thu Aug 15 12:09:47 CST 2019
Thu Aug 15 12:09:48 CST 2019
Thu Aug 15 12:09:49 CST 2019
Thu Aug 15 12:09:50 CST 2019
Thu Aug 15 12:09:51 CST 2019
Thu Aug 15 12:09:52 CST 2019
Thu Aug 15 12:09:54 CST 2019
Thu Aug 15 12:09:55 CST 2019
Thu Aug 15 12:09:56 CST 2019
Thu Aug 15 12:09:57 CST 2019
Thu Aug 15 12:09:58 CST 2019
Thu Aug 15 12:09:59 CST 2019
Thu Aug 15 12:10:00 CST 2019
Thu Aug 15 12:10:01 CST 2019
Thu Aug 15 12:10:02 CST 2019
Thu Aug 15 12:10:03 CST 2019
Thu Aug 15 12:10:04 CST 2019
Thu Aug 15 12:10:05 CST 2019
Thu Aug 15 12:10:07 CST 2019
Thu Aug 15 12:10:08 CST 2019
Thu Aug 15 12:10:09 CST 2019
Thu Aug 15 12:10:10 CST 2019
Thu Aug 15 12:10:11 CST 2019
Thu Aug 15 12:10:12 CST 2019
Thu Aug 15 12:10:13 CST 2019
Thu Aug 15 12:10:14 CST 2019
Thu Aug 15 12:10:15 CST 2019
Thu Aug 15 12:10:16 CST 2019
Thu Aug 15 12:10:17 CST 2019
Thu Aug 15 12:10:19 CST 2019
Thu Aug 15 12:10:20 CST 2019
Thu Aug 15 12:10:21 CST 2019
Thu Aug 15 12:10:22 CST 2019
Thu Aug 15 12:10:23 CST 2019
Thu Aug 15 12:10:24 CST 2019
Thu Aug 15 12:10:25 CST 2019
Thu Aug 15 12:10:26 CST 2019
Thu Aug 15 12:10:27 CST 2019
Thu Aug 15 12:10:28 CST 2019
Thu Aug 15 12:10:29 CST 2019
Thu Aug 15 12:10:30 CST 2019
Thu Aug 15 12:10:32 CST 2019
Thu Aug 15 12:10:33 CST 2019
Thu Aug 15 12:10:34 CST 2019
Thu Aug 15 12:10:35 CST 2019
Thu Aug 15 12:10:36 CST 2019
Thu Aug 15 12:10:37 CST 2019
Thu Aug 15 12:10:38 CST 2019
Thu Aug 15 12:10:39 CST 2019
Thu Aug 15 12:10:40 CST 2019
Thu Aug 15 12:13:42 CST 2019
Thu Aug 15 12:13:43 CST 2019
Thu Aug 15 12:13:44 CST 2019
Thu Aug 15 12:13:45 CST 2019
Thu Aug 15 12:13:46 CST 2019
Thu Aug 15 12:13:47 CST 2019
Thu Aug 15 12:13:48 CST 2019
Thu Aug 15 12:13:49 CST 2019
Thu Aug 15 12:13:50 CST 2019

keloyang · 2019-08-15T04:21:25Z

ping @calavera @fabiokung @vdemeester @dlespiau @crosbymichael

Random-Liu · 2019-08-15T23:13:05Z

For Kubernetes + containerd, this shouldn't cause kubelet to become not ready, but still good to fix it.

onesolpark · 2020-08-31T11:16:07Z

@Random-Liu Pods Like you mentioned this happens with kubelet + docker (dockershim) and not with kubelet + containerd(CRI).
Can you please tell us how it's different? (whether it's due to dockerd's implementation or dockershim)

To be more precise, pods on both docker and containerd runtime hangs but docker nodes flap between Ready and NotReady due to PLEG not healthy
And you can't check info on that specific container with docker inspect while you can with crictl inspect

Fix shim hung

d0634cf

shim.Reap and shim.Default.Wait may deadlock, use Monitor.Notify to fix this issue. Signed-off-by: Shukui Yang <[email protected]>

crosbymichael mentioned this pull request Aug 15, 2019

Use non-blocking send and retry for exit events #3540

Merged

crosbymichael closed this in #3540 Aug 20, 2019

dmcgowan mentioned this pull request Sep 6, 2019

containerd-shim hang #3617

Closed

hakman mentioned this pull request Oct 30, 2019

Add support for Docker 19.03.4 kubernetes/kops#7853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix shim deadlock#3529

Fix shim deadlock#3529
keloyang wants to merge 1 commit intocontainerd:release/1.2from
keloyang:sigchld-lost-1.2

keloyang commented Aug 14, 2019

Uh oh!

keloyang commented Aug 15, 2019

Uh oh!

keloyang commented Aug 15, 2019

Uh oh!

Random-Liu commented Aug 15, 2019

Uh oh!

onesolpark commented Aug 31, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

keloyang commented Aug 14, 2019

Uh oh!

keloyang commented Aug 15, 2019

Uh oh!

keloyang commented Aug 15, 2019

Uh oh!

Random-Liu commented Aug 15, 2019

Uh oh!

onesolpark commented Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

onesolpark commented Aug 31, 2020 •

edited

Loading