Skip to content

Evented Process Monitor #11529

@crosbymichael

Description

@crosbymichael

TLDR:

Pros:

  • Evented lock free code
  • 1 goroutine for all containers
  • Generic zombie fighting abilities for free

Cons:

  • none

Currently docker uses exec.Cmd and cmd.Wait() inside a goroutine for blocking on a container's processes until it finishes. After a container dies two things can happen. One, the container's process is restarted depending if the user requested that the container always restart. Two, we tear down the container and update it's status with the exit code and release any resources required.

Writing a process monitor like this is not very efficient and docker is unable to handle SIGCHLD events to reap zombies that were not direct children of the daemon's process.

It also means that if we have one goroutine per container, if we have 1000 containers we have 1000 blocking goroutines in the daemon. Booooo.

We can do better. The proper way is to move to an evented system for monitoring when child processes change state. This can be handled via SIGCHLD. A process can setup a signal handler for SIGCHLD events and when the status of a child process changes this signal is sent to the handler. We can use this to extract the pid, exit status, and make decision on how to handle the event.

Using an evented system like this we can reduce the number of goroutines to 1 for N number of containers also reduce the amount of locks that are required to handle the previous level of concurrency. Running one container will require 1 goroutine, running 10k container's will require 1 goroutine, win. This model also allows us to reap zombies, because zombies are bad m'kay, in the daemon process that are not direct children. i.e. non container PID1's.

A sample code of what the process monitor would look like is follows:

package main

import (
    "os"
    "os/exec"
    "os/signal"
    "sync"
    "syscall"
    "time"

    "github.com/Sirupsen/logrus"
)

var pidsICareAbout map[int]string

func handleEvents(signals chan os.Signal, group *sync.WaitGroup) {
    defer group.Done()
    for sig := range signals {
        switch sig {
        case syscall.SIGTERM, syscall.SIGINT:
            // return cuz user said so
            return
        case syscall.SIGCHLD:
            var (
                status syscall.WaitStatus
                usage  syscall.Rusage
            )
            pid, err := syscall.Wait4(-1, &status, syscall.WNOHANG, &usage)
            if err != nil {
                logrus.Error(err)
            }
            logrus.WithField("pid", pid).Info("process status changed")
            if _, ok := pidsICareAbout[pid]; ok {
                logrus.Infof("i care about %d", pid)
                // we can modify the map without a lock because we have a single
                // goroutine and handler for signals.
                delete(pidsICareAbout, pid)
                if len(pidsICareAbout) == 0 {
                    // return after everything is dead
                    return
                }
            } else {
                logrus.Infof("---> i don't care about %d", pid)
            }
        }
    }
}

func runSomething() error {
    // just add some delay for demo
    time.Sleep(1 * time.Second)
    cmd := exec.Command("sh", "-c", "sleep 5")
    // cmd.Start() is non blocking
    if err := cmd.Start(); err != nil {
        return err
    }
    // get the pid because I care about this one.
    logrus.WithField("pid", cmd.Process.Pid).Info("spawned new process")
    pidsICareAbout[cmd.Process.Pid] = "sleep 5"
    return nil
}

func randomFork() error {
    syscall.ForkLock.Lock()
    pid, _, err := syscall.RawSyscall(syscall.SYS_FORK, 0, 0, 0)
    syscall.ForkLock.Unlock()
    if err != 0 {
        return err
    }
    if pid == 0 {
        logrus.Info("i'm on a boat")
        os.Exit(0)
    } else {
        logrus.Infof("forked off %d", pid)
    }
    return nil
}

func main() {
    signals := make(chan os.Signal, 1024)
    signal.Notify(signals, syscall.SIGCHLD, syscall.SIGTERM, syscall.SIGINT)
    pidsICareAbout = make(map[int]string)
    group := &sync.WaitGroup{}
    group.Add(1)
    go handleEvents(signals, group)
    for i := 0; i < 5; i++ {
        if err := runSomething(); err != nil {
            logrus.Fatal(err)
        }
    }
    // fork off a random process that we don't care about but make sure the
    // signal handler reaps it when it dies.
    if err := randomFork(); err != nil {
        logrus.Error(err)
    }
    logrus.Info("waiting on processes to finish")
    group.Wait()
    logrus.Info("all processes are done, exiting...")
}

We should build this in a generic way because we want this monitor with restart capabilities to be available to any type of process the daemon can spawn.

Metadata

Metadata

Assignees

Labels

exp/expertkind/enhancementEnhancements are not bugs or new features but can improve usability or performance.roadmap

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions