Skip to content

Docker should fall back on its feet again if it crashes or gets upgraded #131

@jpetazzo

Description

@jpetazzo

This is a summary of discussions with @shykes and personal thoughts.

Current situation: docker is the parent of all the lxc-start processes (i.e. all the "containerized" processes). This allows relatively easy implementation of the following cool features:

  • docker can waitpid on those processes, and detect when they are terminated;
  • docker exactly knows which containers are running;
  • docker can capture stdout+stderr of those processes for logging purposes.

However, if the docker daemon process is stopped (because it crashes, or because it needs to be upgraded and restarted), the relationship with the children processes is lost (as well as the file descriptors).

There are some tricks can be considered to work around this:

  • before an upgrade, docker can serialize its state on disk, and exec the new binary, which will re-read the serialized state to "know" which file descriptors correspond to which containers;
  • it is possible to pass file descriptors through AF_UNIX sockets, so the "old" and the "new" docker processes could cooperate to hand over the management of the processes (but the process lineage would be lost anyway);
  • in some cases, it might be possible to rely on /proc/<pid>/fd/<#> to re-acquire lost file descriptors.

Those solutions aren't totally satisfactory, though.

To address the logging issue, I suggest the following strategy:

  • when starting a container, docker creates a pair of named pipes on the filesystem (one for stdout, another for stderr), and redirects the stdout+stderr of the container process to those named pipes;
  • docker reads from those named pipes to relay the logging information (as expected);
  • to prevent SIGPIPE from being sent to the container process when (if) docker exits or crashes, docker would open the file descriptor with the O_RDWR flag (as opposed to the standard O_WRONLY flag): this will make sure that the number of readers is always at least one, even if docker is terminated;
  • when docker starts, it re-opens the named pipes, and thus "reconnects" to the outputs of the processes, without losing any data.

This, however, has the following drawbacks:

  • since stdout and stderr will be open in read+write mode, the process will be able to read from them; and if it does that, it will receive its own output, and prevent docker from receiving it;
  • if docker is stopped, or takes a while to restart, the pipe will fill up, and the writer will be blocked (messages can even be dropped in the writer is not behaving properly).
    Note, however, that pipes have a capacity of 64kB on Linux (since 2.6.11), so the latter is not a huge issue.

If the containers are detached from the docker process, it also means that docker has no straightforward way to know when the processes exit. This could be annoying if we want docker to restart processes automatically (or notify their termination in a timely manner).

With the named pipe strategy described above, when docker is attached to stdout+stderr, if the process exits, docker will detect it, because it will reach EOF on its side of the named pipe. However, it can also reach EOF if the container closes its file descriptors. Therefore, when EOF is detected, docker must check if the process is still running. If it is the case, it should fall back to another method, e.g. polling /proc/<pid> at short intervals to assert if the container is still running.

Additionally, docker must be able to detect when containers are stopped while docker itself was stopped (for instance, during an upgrade of docker). This can be made possible by persisting the state of containers. For each container, docker could store the PID of the lxc-start process, its start time (as indicated by the starttime field in /proc/<pid>/stat), its launch parameters. When restarting, docker would check those PIDs, compare the start times (to make sure that the PID wasn't recycled to another process), and if the process exited, it would be able to notify listeners and possibly restart it with its parameters.

The main drawback of this method is the loss of the exit status of the process (assuming that lxc-start relays this information correctly).

Overall, the strategies described here allow to restart docker without impacting running containers, without losing log data, without losing features. Docker can even be stopped for short periods of time (or even longer ones, as long as the containers do not generate large amounts of logs).

Comments/feedback welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions