Skip to content

fix race in exec delete and start(release/1.1)#2834

Merged
estesp merged 1 commit intocontainerd:release/1.1from
acmcodercom:execrace
Nov 26, 2018
Merged

fix race in exec delete and start(release/1.1)#2834
estesp merged 1 commit intocontainerd:release/1.1from
acmcodercom:execrace

Conversation

@lifubang
Copy link
Copy Markdown
Contributor

I don't know whether the release/1.1 is still receiving new PR or not. If not, feel free to close this PR.

As mentioned in #2826

But the PR #2624 is not perfect, because execCreatedState should lock the whole func, or it will cause race situation.

Of course, we will meet this race in a very small probability.
There is an example:

  1. Add a time sleep in func 'delete' and 'start' in file exec.go;
    https://github.com/acmcodercom/containerd/blob/raceexample/linux/proc/exec.go#L95-L108
    https://github.com/acmcodercom/containerd/blob/raceexample/linux/proc/exec.go#L194-L202

  2. Start a container;
    ctr --address /run/docker/containerd/containerd.sock run -d docker.io/library/redis:alpine redis

  3. create, delete and start a exec:
    go run testexecrace.go
    cat testexecrace.go

root@dockerdemo:~/testd# cat testexecrace.go 
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/containerd/containerd"
	"github.com/containerd/containerd/cio"
	"github.com/containerd/containerd/namespaces"
)

func main() {
	if err := redisExample(); err != nil {
		log.Fatal(err)
	}
}

func redisExample() error {
	// create a new client connected to the default socket path for containerd
	client, err := containerd.New("/run/docker/containerd/containerd.sock")
	if err != nil {
		return err
	}
	defer client.Close()

	// create a new context with an "example" namespace
	ctx := namespaces.WithNamespace(context.Background(), "default")

	// create a container
	container, err := client.LoadContainer(
		ctx,
		"redis",
	)
	if err != nil {
		return err
	}
	spec, err := container.Spec(ctx)
	if err != nil {
		return err
	}
	task, err := container.Task(ctx, nil)
	if err != nil {
		return err
	}

	pspec := spec.Process
	pspec.Terminal = false
	pspec.Args = []string{"sleep", "1000"}

	cioOpts := []cio.Opt{cio.WithStdio}
	ioCreator := cio.NewCreator(cioOpts...)

	// create aexec
	process, err := task.Exec(ctx, "testexec", pspec, ioCreator)
	if err != nil {
		return err
	}

	statusC, err := process.Wait(ctx)
	if err != nil {
		return err
	}

	// Two cases of simulation
	// First we call process.Delete
	go func() {
		fmt.Println("delete")
		if r, err := process.Delete(ctx); r != nil || err != nil {
			fmt.Println(r)
			fmt.Println(err)
		}
		fmt.Println("process.Delete")
	}()

	// Then, after 100ms, we start it
	time.Sleep(100 * time.Millisecond)
	go func() {
		fmt.Println("start")
		if err := process.Start(ctx); err != nil {
			fmt.Println(err)
		}
		fmt.Println("process.Start")
	}()

	// sleep for a lil bit to see the logs
	time.Sleep(2 * time.Second)

	// wait for the process to fully exit and print out the exit status

	status := <-statusC
	code, _, err := status.Result()
	if err != nil {
		return err
	}
	fmt.Printf("redis-server exited with status: %d\n", code)
	if r, err := process.Delete(ctx); r != nil || err != nil {
		fmt.Println(r)
		fmt.Println(err)
	}
	return nil
}
  1. after run this go program, the exec process is still running, but it can't be managed by containerd:
root@dockerdemo:~# ctr --address /run/docker/containerd/containerd.sock t ps redis
PID      INFO
3757     &ProcessDetails{ExecID:redis,}
10272    - (Hear meas running but execProcess has been deleted)

in runc, it is also running:

root@dockerdemo:~# runc --root /run/docker/runc/default/ ps redis
UID        PID  PPID  C STIME TTY          TIME CMD
systemd+  3757  3734  0 13:13 ?        00:00:05 redis-server
root     10272  3734  0 14:49 ?        00:00:00 sleep 1000

testexec,pid is also be deleted:

root@dockerdemo:~# ls /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/default/redis
config.json  init.pid  log.json  rootfs

and can't be killed:

root@dockerdemo:~# ctr --address /run/docker/containerd/containerd.sock t kill --exec-id testexec redis
ctr: process testexec does not exist: not found

I think we can move the lock location when the state of execProcess is execStoppedState, but cant' when the state is execCreatedState. So, we should revert it in func (s *execCreatedState) Delete(ctx context.Context).

Signed-off-by: Lifubang [email protected]

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #2834 into release/1.1 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/1.1    #2834   +/-   ##
============================================
  Coverage        48.99%   48.99%           
============================================
  Files               85       85           
  Lines             7603     7603           
============================================
  Hits              3725     3725           
  Misses            3203     3203           
  Partials           675      675
Flag Coverage Δ
#linux 48.99% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5960cad...33c860f. Read the comment docs.

@crosbymichael
Copy link
Copy Markdown
Member

LGTM

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 6bb83f2 into containerd:release/1.1 Nov 26, 2018
@lifubang lifubang deleted the execrace branch November 27, 2018 03:47
@thaJeztah
Copy link
Copy Markdown
Member

Is this also needed for the 1.2 and 1.0 branches?

@lifubang
Copy link
Copy Markdown
Contributor Author

Is this also needed for the 1.2 and 1.0 branches?

1.2 doesn't need this patch, but need #2826.
1.0 doesn't need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants