Docker should fall back on its feet again if it crashes or gets upgraded

This is a summary of discussions with @shykes and personal thoughts.

Current situation: docker is the parent of all the `lxc-start` processes (i.e. all the "containerized" processes). This allows relatively easy implementation of the following cool features:
- docker can `waitpid` on those processes, and detect when they are terminated;
- docker exactly knows which containers are running;
- docker can capture stdout+stderr of those processes for logging purposes.

However, if the docker daemon process is stopped (because it crashes, or because it needs to be upgraded and restarted), the relationship with the children processes is lost (as well as the file descriptors).

There are some tricks can be considered to work around this:
- before an upgrade, docker can serialize its state on disk, and `exec` the new binary, which will re-read the serialized state to "know" which file descriptors correspond to which containers;
- it is possible to pass file descriptors through `AF_UNIX` sockets, so the "old" and the "new" docker processes could cooperate to hand over the management of the processes (but the process lineage would be lost anyway);
- in some cases, it might be possible to rely on `/proc/<pid>/fd/<#>` to re-acquire lost file descriptors.

Those solutions aren't totally satisfactory, though.

To address the logging issue, I suggest the following strategy:
- when starting a container, docker creates a pair of named pipes on the filesystem (one for stdout, another for stderr), and redirects the stdout+stderr of the container process to those named pipes;
- docker reads from those named pipes to relay the logging information (as expected);
- to prevent `SIGPIPE` from being sent to the container process when (if) docker exits or crashes, docker would open the file descriptor with the `O_RDWR` flag (as opposed to the standard `O_WRONLY` flag): this will make sure that the number of readers is always at least one, even if docker is terminated;
- when docker starts, it re-opens the named pipes, and thus "reconnects" to the outputs of the processes, without losing any data.

This, however, has the following drawbacks:
- since stdout and stderr will be open in read+write mode, the process will be able to read from them; and if it does that, it will receive its own output, and prevent docker from receiving it;
- if docker is stopped, or takes a while to restart, the pipe will fill up, and the writer will be blocked (messages can even be dropped in the writer is not behaving properly).
  Note, however, that pipes have a capacity of 64kB on Linux (since 2.6.11), so the latter is not a huge issue.

If the containers are detached from the docker process, it also means that docker has no straightforward way to know when the processes exit. This could be annoying if we want docker to restart processes automatically (or notify their termination in a timely manner).

With the named pipe strategy described above, when docker is attached to stdout+stderr, if the process exits, docker will detect it, because it will reach `EOF` on its side of the named pipe. However, it can also reach `EOF` if the container closes its file descriptors. Therefore, when `EOF` is detected, docker must check if the process is still running. If it is the case, it should fall back to another method, e.g. polling `/proc/<pid>` at short intervals to assert if the container is still running.

Additionally, docker must be able to detect when containers are stopped _while docker itself was stopped_ (for instance, during an upgrade of docker). This can be made possible by persisting the state of containers. For each container, docker could store the PID of the `lxc-start` process, its start time (as indicated by the _starttime_ field in [`/proc/<pid>/stat`](http://man7.org/linux/man-pages/man5/proc.5.html)), its launch parameters. When restarting, docker would check those PIDs, compare the start times (to make sure that the PID wasn't recycled to another process), and if the process exited, it would be able to notify listeners and possibly restart it with its parameters.

The main drawback of this method is the loss of the exit status of the process (assuming that `lxc-start` relays this information correctly).

Overall, the strategies described here allow to restart docker without impacting running containers, without losing log data, without losing features. Docker can even be stopped for short periods of time (or even longer ones, as long as the containers do not generate large amounts of logs).

Comments/feedback welcome!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker should fall back on its feet again if it crashes or gets upgraded #131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docker should fall back on its feet again if it crashes or gets upgraded #131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions