Skip to content

Fix shim deadlock#3529

Closed
keloyang wants to merge 1 commit intocontainerd:release/1.2from
keloyang:sigchld-lost-1.2
Closed

Fix shim deadlock#3529
keloyang wants to merge 1 commit intocontainerd:release/1.2from
keloyang:sigchld-lost-1.2

Conversation

@keloyang
Copy link
Copy Markdown
Contributor

shim.Reap and shim.Default.Wait may deadlock, and #2748 do't fix this, putting those two pieces of code together makes it easier to see the problem.

func Reap() error {
	now := time.Now()
	exits, err := sys.Reap(false)
	Default.Lock()
	for c := range Default.subscribers {
		for _, e := range exits {
line 44			c <- runc.Exit{
				Timestamp: now,
				Pid:       e.Pid,
				Status:    e.Status,
			}
		}
	}
	Default.Unlock()
	return err
}

func (m *Monitor) Wait(c *exec.Cmd, ec chan runc.Exit) (int, error) {
	for e := range ec {
		if e.Pid == c.Process.Pid {
			c.Wait()
line 85			m.Unsubscribe(ec)
			return e.Status, nil
		}
	}
}

if first code run to line 44, and the second code happened to run to line 85, those two pieces of code will deadlock.

the following will show how to reproduce

[root@centos0 c]# cat fork.c
 #include <unistd.h>
 #include <stdio.h>
 #include <stdlib.h>

 void sleep_always() {
    while(1){
        sleep(1);
    }
}


int childs(int num)
{
    int child_count = 0;
    pid_t pid;

refork:
    child_count++;
    pid = fork();
    if ( pid>0 )
    {
    }
    else if (pid == 0)
    {
        if ( child_count < num )
        {
            goto refork;
        }
    }
    else
    {
        exit(0);
    }
    return 0;
}

int defuncts(int num)
{
    int defunct_count = 0;
    pid_t pid;
    int idx = 0;
    for (idx =  0; idx < num; idx++)
    {
        pid = fork();
        if (pid == 0)
        {
            exit(0);
        }
    }
    return 0;
}

// usage ./fork child_num defunct_num
// e.g.  ./fork 1024 2048
int main(int argc, char *argv[])
{
    int child_num = 1;
    int defunct_num = 1;

    if (argc >= 2)
    {
        child_num = atoi(argv[1]);
    }

    if (argc >= 3)
    {
        defunct_num = atoi(argv[2]);
    }

    defuncts(defunct_num);
    childs(child_num);

    sleep_always();
}
[root@centos0 c]# gcc fork.c -o fork

[root@centos3 shim-sigchld]# cat test.sh
#!/bin/bash

i=0
while [ 1 ]
do
        docker rm fork >/dev/null 2>&1
        docker run  -tdi -v `pwd`:/home -w /home --name fork centos:7 sleep  1000
        docker exec fork ./fork 1024  20480 &
        sleep 10
        docker stop -t 1 fork
        ((i++))
        echo "test ${i}"
done

at last , execute the shell script, and you will find the hung.

./test.sh

the other branches may have this problem too.

Signed-off-by: Shukui Yang [email protected]

shim.Reap and shim.Default.Wait may deadlock, use Monitor.Notify
to fix this issue.

Signed-off-by: Shukui Yang <[email protected]>
@keloyang
Copy link
Copy Markdown
Contributor Author

I think this is a very serious problem, maybe a CVE, It lead to a k8s node not ready.

how to make k8s node not ready?

[root@centos2 shim-sigchld]#  cat kill
#!/bin/bash

kill -9 `ps -eaf|grep fork|grep -v grep|awk '{print $2}'`
[root@centos2 shim-sigchld]#  chmod u+x kill
[root@centos2 shim-sigchld]# cat Dockerfile
FROM centos:7

COPY fork /

[root@centos2 shim-sigchld]# docker build -t fork .
[root@centos2 shim-sigchld]# cat fork.yaml
apiVersion: v1
kind: Pod
metadata:
  name: fork
  namespace: default
spec:
  nodeName: centos2
  containers:
  - name: fork
    image: fork:latest
    imagePullPolicy: IfNotPresent
    command:
    - "sleep"
    - "300000"

[root@centos2 shim-sigchld]#  kubectl create -f fork.yaml
pod/fork created
[root@centos2 shim-sigchld]#  kubectl get pod
NAME      READY     STATUS              RESTARTS   AGE
fork      0/1       ContainerCreating   0          4s
[root@centos2 shim-sigchld]#  kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
fork      1/1       Running   0          6s

It may take a few attempts for the following two steps until you get the problem

[root@centos2 shim-sigchld]#  kubectl exec fork /fork 1024 20480 &
[root@centos2 shim-sigchld]#  kubectl exec fork /kill

if you find kubectl exec fork ... hung, please check the docker container too.

[root@centos2 shim-sigchld]# docker ps|grep fork
554ab7b69187        c08ec68b3921           "sleep 20000000"         23 minutes ago      Up 23 minutes                              k8s_fork_fork_default_02f5c1b4-bf0a-11e9-9dae-8cec4b5ed562_0
7f3a2b5747c2        k8s.gcr.io/pause:3.1   "/pause"                 23 minutes ago      Up 23 minutes                              k8s_POD_fork_default_02f5c1b4-bf0a-11e9-9dae-8cec4b5ed562_0
[root@centos2 shim-sigchld]# docker inspect 554ab7b69187

after docker container hung, please wait, and check kubelet log, kubelet will log like this:

ug 15 11:46:12 centos2 kubelet[6904]: I0815 11:46:12.470631    6904 kubelet.go:1775] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m46.50397756s ago; threshold is 3m0s]

then we can see the state of k8s cluster

[root@centos0 containerd]# kubectl get node
NAME      STATUS     ROLES     AGE       VERSION
centos0   Ready      master    14d       v1.11.1
centos1   Ready      <none>    14d       v1.11.1
centos2   NotReady   <none>    14d       v1.11.1

use the following script, we can find the node of centos2 is NotReady every 3 min, and last 1min.

[root@centos2 shim-sigchld]# cat  ./time.sh
#!/bin/bash

while [ 1 ]
do
        kubectl get node|grep NotReady >/dev/null 2>&1
        if [ $? == 0 ];then
                date
        fi
        sleep 1
done
[root@centos2 shim-sigchld]# ./time.sh
Thu Aug 15 12:09:41 CST 2019
Thu Aug 15 12:09:43 CST 2019
Thu Aug 15 12:09:44 CST 2019
Thu Aug 15 12:09:45 CST 2019
Thu Aug 15 12:09:46 CST 2019
Thu Aug 15 12:09:47 CST 2019
Thu Aug 15 12:09:48 CST 2019
Thu Aug 15 12:09:49 CST 2019
Thu Aug 15 12:09:50 CST 2019
Thu Aug 15 12:09:51 CST 2019
Thu Aug 15 12:09:52 CST 2019
Thu Aug 15 12:09:54 CST 2019
Thu Aug 15 12:09:55 CST 2019
Thu Aug 15 12:09:56 CST 2019
Thu Aug 15 12:09:57 CST 2019
Thu Aug 15 12:09:58 CST 2019
Thu Aug 15 12:09:59 CST 2019
Thu Aug 15 12:10:00 CST 2019
Thu Aug 15 12:10:01 CST 2019
Thu Aug 15 12:10:02 CST 2019
Thu Aug 15 12:10:03 CST 2019
Thu Aug 15 12:10:04 CST 2019
Thu Aug 15 12:10:05 CST 2019
Thu Aug 15 12:10:07 CST 2019
Thu Aug 15 12:10:08 CST 2019
Thu Aug 15 12:10:09 CST 2019
Thu Aug 15 12:10:10 CST 2019
Thu Aug 15 12:10:11 CST 2019
Thu Aug 15 12:10:12 CST 2019
Thu Aug 15 12:10:13 CST 2019
Thu Aug 15 12:10:14 CST 2019
Thu Aug 15 12:10:15 CST 2019
Thu Aug 15 12:10:16 CST 2019
Thu Aug 15 12:10:17 CST 2019
Thu Aug 15 12:10:19 CST 2019
Thu Aug 15 12:10:20 CST 2019
Thu Aug 15 12:10:21 CST 2019
Thu Aug 15 12:10:22 CST 2019
Thu Aug 15 12:10:23 CST 2019
Thu Aug 15 12:10:24 CST 2019
Thu Aug 15 12:10:25 CST 2019
Thu Aug 15 12:10:26 CST 2019
Thu Aug 15 12:10:27 CST 2019
Thu Aug 15 12:10:28 CST 2019
Thu Aug 15 12:10:29 CST 2019
Thu Aug 15 12:10:30 CST 2019
Thu Aug 15 12:10:32 CST 2019
Thu Aug 15 12:10:33 CST 2019
Thu Aug 15 12:10:34 CST 2019
Thu Aug 15 12:10:35 CST 2019
Thu Aug 15 12:10:36 CST 2019
Thu Aug 15 12:10:37 CST 2019
Thu Aug 15 12:10:38 CST 2019
Thu Aug 15 12:10:39 CST 2019
Thu Aug 15 12:10:40 CST 2019
Thu Aug 15 12:13:42 CST 2019
Thu Aug 15 12:13:43 CST 2019
Thu Aug 15 12:13:44 CST 2019
Thu Aug 15 12:13:45 CST 2019
Thu Aug 15 12:13:46 CST 2019
Thu Aug 15 12:13:47 CST 2019
Thu Aug 15 12:13:48 CST 2019
Thu Aug 15 12:13:49 CST 2019
Thu Aug 15 12:13:50 CST 2019

@keloyang
Copy link
Copy Markdown
Contributor Author

@Random-Liu
Copy link
Copy Markdown
Member

For Kubernetes + containerd, this shouldn't cause kubelet to become not ready, but still good to fix it.

@onesolpark
Copy link
Copy Markdown

onesolpark commented Aug 31, 2020

@Random-Liu Pods Like you mentioned this happens with kubelet + docker (dockershim) and not with kubelet + containerd(CRI).
Can you please tell us how it's different? (whether it's due to dockerd's implementation or dockershim)

To be more precise, pods on both docker and containerd runtime hangs but docker nodes flap between Ready and NotReady due to PLEG not healthy
And you can't check info on that specific container with docker inspect while you can with crictl inspect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants