Skip to content

[Proposal] Add file descriptor store to daemon and fd mapping args to CLI commands #48302

@MayCXC

Description

@MayCXC

Description

An old-and-now-new-again technique to scale and update daemons that listen on sockets of all kinds, is to rely on the daemon executor to bind and listen on sockets and pass on their file descriptors, which then allows the daemon to be stopped and restarted while its sockets remain bound and listening. This is the function of tools like inetd, launchd, systemd-socket-activation, s6-fdholderd, etc. podman supports this functionality for containers with systemd: https://github.com/containers/podman/blob/main/docs/tutorials/socket_activation.md#socket-activation-of-containers

any container runtime daemon can just as easily support this functionality on its own, and the sockets themselves can comfortably be made part of an image configuration. Here is an example of instructions that could declare such file descriptors in a Dockerfile:

SOCKET 3/tcp
SOCKET 8/unix
ENTRYPOINT ...

This documents that the container expects to receive file descriptors 3 and 8 from the host, similar to the EXPOSE instruction for tcp/udp ports, and that they should be sockets that listen on the tcp and unix networks. Here is a corresponding service level element in a compose.yml:

services:
  www:
    sockets:
      - 0.0.0.0:8080:3/tcp
      - /run/www.sock:8/unix

here the daemon is instructed to open bind and listen on these sockets in the host, and pass them to the www service container as fds 3 and 8.

A program that supports socket activation like traefik can be executed seamlessly in this manner, and even while its container restarts or updates, it appears to be listening on both of these sockets. A savvy daemon can scale it to zero instances, wait for either socket to receive a connection, and then activate it again. This has an added benefit in compose projects that certain depends_on and healthcheck elements can become unnecessary, because the host can listen on every socket before the services that use them start. then services can connect to these listeners as early as they want, with host.docker.internal for network sockets, or a bind mount for named sockets, and simply wait for their connections to unblock.

I believe that declarative socket file descriptors carry the same advantages as bind mounts and bridge networks for containers that listen on sockets. They can be configured via CLI as well like so: docker run -s 0.0.0.0:80:3/tcp -s /run/www.sock:8/unix traefik ...

In other cases, CLI users may want to pass extra fds to a container without binding them on the host. This case can be documented with a similar instruction:

FD 4
FD 5
ENTRYPOINT ...

and configured via CLI as well: docker run -f 4 -f 5 ... to receive fds 4 and 5 from the parent process without binding them, or docker run -F ... to receive all the declared fds in this way. It could also be convenient to map fds with the CLI: docker run -f 6:4 -f 9:5 ... passes fd 6 from the host to fd 4 in the container, and does the same for 9 to 5.

This enables any docker host to enjoy the seamless restarts and reduced initialization complexity of socket activation, without relying on a particular init system. I think it follows in the spirit of #2658, but offers seamless restarts for containers and not just the daemon. I'd love to know what others think of this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureFunctionality or other elements that the project doesn't currently have. Features are new and shinystatus/0-triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions