Skip to content

Create docker-proxy TCP/UDP listener sockets in the daemon#48132

Merged
thaJeztah merged 9 commits intomoby:masterfrom
robmry:bind_socket_for_docker_proxy
Aug 8, 2024
Merged

Create docker-proxy TCP/UDP listener sockets in the daemon#48132
thaJeztah merged 9 commits intomoby:masterfrom
robmry:bind_socket_for_docker_proxy

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Jul 4, 2024

- What I did

Before commit 4f09af6, when allocating host ports for a new port mapping iptables rules were set up, then docker-proxy was started. If the host port was already in-use, docker-proxy exited with an error, and the iptables rules were removed. That could potentially interfere with a non-docker service that was already using the host port for something unrelated.

Commit 4f09af6 swapped that problem for a different one... in order to check that a port was available before creating iptables rules, it attempted to start docker-proxy first. If it failed, it could then try a different host port, without interfering with any other service. The problem with that is docker-proxy would start listening before the iptables rules were in place, so it could accept connections then become unusable because new NAT rules diverted packets directly to the container. This would leave the client with a broken connection, causing at-least a delay while it figured that out and reconnected.

- How I did it

This change creates and binds the socket in the daemon, before creating iptables rules. If the bind fails, it may try a different port. When or if the bind succeeds, iptables rules are created, then the daemon calls listen on the socket. If docker-proxy is needed, the socket is handed over to it at that point.

If docker-proxy is not needed (--userland-proxy=false), the daemon still listens on TCP sockets as the dummyProxy would have done. This makes the socket show up in netstat output.

The dummyProxy is no longer needed on Linux. Its job was to bind the host ports if docker-proxy was disabled, but that's now already handled by binding the sockets early.

This change doesn't affect SCTP, because it's not currently possible for docker-proxy to convert the file descriptor into an SCTPListener. So, docker-proxy is still started early, and the window for lost connections remains.

A new option has been added to docker-proxy, -use-listen-fd to tell it that fd 4 is a listener socket. It no longer needs to know the "frontend" address/port for TCP/UDP, but it's useful to see the options in the command line to work out which docker-proxy is which. (As a follow-up, it'd be possible to read the address/port from the fd and display it in the process title.)

If the user has an old docker-proxy in their path it'll fail for TCP/UDP, because it won't recognise the new command line option that says an listen-fd has been supplied as fd 4. In this case, the daemon's error message suggests checking $PATH.

RootlessKit used a proxy for docker-proxy called rootlesskit-docker-proxy. Its job was to make RootlessKit API calls to tell it about mapped host ports when necessary, before starting the usual docker-proxy, and then to remove that mapping when stopping the proxy. The rootlesskit-docker-proxy didn't know about new option -use-listen-fd, or the open sockets passed-in. Rather than update rootlesskit-docker-proxy, replace it by making the RootlessKit API calls directly from the daemon. That'll make it possible to continue to evolve docker-proxy, without having to keep rootlesskit-docker-proxy in-step (for example, we may go to a single docker-proxy per container rather than per port-mapping). It also makes it possible to set up those mappings in RootlessKit without running the proxy, when --userland-proxy=false. Because rootlesskit-docker-proxy is no longer used, it's no longer built.

- How to verify it

Updated tests.

The port mapping tests relied on startProxy being called to check that expected ports had been selected. That call is not made now, becausedummyProxy isn't used. So, changed most of the tests to make them start the proxy.

For tests of behaviour where a host port is unexpectedly unavailable, a real bind() now needs to fail in the daemon code being tested (the test's mock proxy can't report that it would have failed, because it's called much later in the process). So, the tests now set up a dummy bridge in their n/w namespace, add any host addresses necessary to that bridge, and bind ports that need to be busy - before the test starts.

With an old docker-proxy in the path, the error looks like this ...

docker: Error response from daemon: driver failed programming external connectivity on endpoint fervent_brattain (b4a9c5ebe14946f70c12c304da95abf41ed8b804903879c1163030ddbdbf1273): failed to start userland proxy for port mapping 0.0.0.0:8080:172.21.0.2:80/tcp: failed to start docker-proxy, check that the current version is in your $PATH.

(I hacked a NewSCTPListener(fd) into the sctp package, and checked that it worked with the socket bound in the daemon. So we can propose a change like that, then align SCTP socket creation with TCP/UDP.)

Rootless

Mapping from unspecified host address ...

$ docker run --rm -ti --network n1 -p 8080:80 nginx
$ ps -ef | grep docker-proxy
robm       39815   39165  0 18:37 ?        00:00:00 /home/robm/bin/docker-proxy -proto tcp -host-ip 127.0.0.1 -host-port 8080 -container-ip 172.18.0.2 -container-port 80
robm       39822   39165  0 18:37 ?        00:00:00 /home/robm/bin/docker-proxy -proto tcp -host-ip ::1 -host-port 8080 -container-ip 172.18.0.2 -container-port 80

These work ...
curl http://192.168.175.43:8080 # host ip
curl http://127.0.0.1:8080
curl http://[::1]:8080

Mapping from a specific host address ...

$ docker run --rm -ti --network n1 -p 192.168.175.43:8080:80 nginx
$ ps -ef | grep docker-proxy
robm       39978   39165  0 18:39 ?        00:00:00 /home/robm/bin/docker-proxy -proto tcp -host-ip 127.0.0.1 -host-port 8080 -container-ip 172.18.0.2 -container-port 80

Mapping from two specific host addresses, before the change ...

$ docker run --rm -ti --network n1 -p 192.168.175.43:8080:80 -p 127.0.0.1:8080:80 nginx
<long delay, the 16s timeout on trying to start rootlesskit-docker-proxy>
docker: Error response from daemon: driver failed programming external connectivity on endpoint admiring_williamson (b3d757bf1d0832fcc85eccd9692c72fb24ba2ec817495e977824516cbfbdcda2): failed to bind port 192.168.175.43:8080/tcp: Timed out proxy starting the userland proxy.

<then subsequent valid requests for the same mapped port fail with an error from RootlessKit>
$ docker run --rm -ti --network n1 -p 192.168.175.43:8080:80 nginx
docker: Error response from daemon: driver failed programming external connectivity on endpoint xenodochial_taussig (24118b2865fa4fc3f314cf172c21d1118e3a28aa0cccce82d33ceff002eaf3a9): failed to bind port 192.168.175.43:8080/tcp: Error starting userland proxy: error while calling PortManager.AddPort(): conflict with ID 2.

<need to restart rootlesskit>

After the change - binding fails immediately, because both mappings end up trying to use the same child address ...

$ docker run --rm -ti --network n1 -p 192.168.175.43:8080:80 -p 127.0.0.1:8080:80 nginx
docker: Error response from daemon: driver failed programming external connectivity on endpoint charming_johnson (1d749d940f091a93108d8364ae658a9c8b44b57b362963cfc18f5e01cf904f25): failed to bind port 192.168.175.43:8080/tcp: Error starting userland proxy: failed to listen on 127.0.0.1:8080: listen tcp4 127.0.0.1:8080: bind: address already in use.

With --userland-proxy=false ...

$ docker run --rm -ti --network n1 -p 8080:80 nginx
-> no docker-proxy process
-> curl works for http://127.0.0.1:8080, http://192.168.175.43:8080 - but not http://[::1]:8080

$ docker run --rm -ti --network n1 -p 192.168.175.43:8080:80 nginx
-> no docker-proxy process
-> curl works for http://192.168.175.43:8080

- Description for the changelog

- Close a window in which `docker-proxy` could accept TCP connections, which would fail after NAT rules were set up. The `docker-proxy` binary has been updated, the old version will not work with the updated `dockerd`.
- The executable `rootlesskit-docker-proxy` is no longer used, it has been removed from the build.

@robmry robmry self-assigned this Jul 4, 2024
@robmry robmry added this to the 28.0.0 milestone Jul 4, 2024
@robmry robmry force-pushed the bind_socket_for_docker_proxy branch 2 times, most recently from 76afba0 to 50ed3a8 Compare July 4, 2024 13:56
@robmry robmry marked this pull request as ready for review July 4, 2024 16:27
@robmry robmry requested review from akerouanton and corhere July 4, 2024 16:27
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion: go all Single Responsibility Principle on the proxy implementations and make it the caller's responsibility to construct the net.Listener. https://github.com/robmry/moby/compare/bind_socket_for_docker_proxy...corhere:moby:bind_socket_for_docker_proxy?expand=1

I think docker-proxy should be backwards compatible with older dockerd. It's easy enough to do (if my suggestion is followed...) and could save users some grief if they accidentally mix versions of the binaries.

@robmry robmry force-pushed the bind_socket_for_docker_proxy branch from 50ed3a8 to e6728e6 Compare July 8, 2024 18:11
@robmry
Copy link
Contributor Author

robmry commented Jul 8, 2024

I have a suggestion: go all Single Responsibility Principle on the proxy implementations and make it the caller's responsibility to construct the net.Listener. https://github.com/robmry/moby/compare/bind_socket_for_docker_proxy...corhere:moby:bind_socket_for_docker_proxy?expand=1

I think docker-proxy should be backwards compatible with older dockerd. It's easy enough to do (if my suggestion is followed...) and could save users some grief if they accidentally mix versions of the binaries.

Thanks Cory - I picked up your changes and rebased/reworked a bit.

As discussed, before main() Go, at-least on Linux, has used fd 4 (5, 6, for and epoll and a pipe). So, I added a command line option to tell docker-proxy that the listen fd is present. An old docker-proxy won't understand that flag, it just exits with no response on the pipe to the daemon ... so, I used that to suggest checking $PATH. If used with an old daemon, the new docker-proxy will open sockets as it did before.

@robmry robmry force-pushed the bind_socket_for_docker_proxy branch from e6728e6 to b021e87 Compare July 9, 2024 10:59
@robmry
Copy link
Contributor Author

robmry commented Jul 9, 2024

The rootless tests are failing because, it turns out, there's a rootlesskit-docker-proxy (a docker-proxy proxy) - and it doesn't know about the new -use-listen-fd option.

I think the tests passed before the last set of updates added that option, even though dockerd would already have bound the ports in the namespace it's using. Maybe dockerd needs to know if it's rootless and not-bind the ports, or just not use the new option to say it has. Or, something. I'll investigate ...

@corhere
Copy link
Contributor

corhere commented Jul 9, 2024

rootlesskit-docker-proxy (a docker-proxy proxy)

That smells a lot like a port-forwarding "driver" for rootless Docker. I wonder if it would be worthwhile to move the rootlesskit client stuff into the dockerd side so rootlesskit-docker-proxy can go away. Dockerd is already integrating with rootlesskit in other respects, and getting rootlesskit-docker-proxy out of the way would make it that much more practical to tackle e.g. implementing proxy-per-container multiplexing.

@robmry robmry force-pushed the bind_socket_for_docker_proxy branch from b021e87 to 66a6bf3 Compare July 10, 2024 16:45
@robmry robmry requested a review from tianon as a code owner July 10, 2024 16:45
@robmry
Copy link
Contributor Author

robmry commented Jul 10, 2024

rootlesskit-docker-proxy (a docker-proxy proxy)

That smells a lot like a port-forwarding "driver" for rootless Docker. I wonder if it would be worthwhile to move the rootlesskit client stuff into the dockerd side so rootlesskit-docker-proxy can go away. Dockerd is already integrating with rootlesskit in other respects, and getting rootlesskit-docker-proxy out of the way would make it that much more practical to tackle e.g. implementing proxy-per-container multiplexing.

Thanks Cory, I've had a go at that - there are three new commits ...

  • Make the RootlessKit API calls from the daemon before starting docker-proxy (instead of rootlesskit-docker-proxy).
  • Make those calls in rootless mode, even when --userland-proxy=false, which didn't work in rootless mode before.
  • Remove rootlesskit-docker-proxy from the build.

@AkihiroSuda - it'd be good to get your thoughts on this, there may well be something we missed.

@robmry robmry force-pushed the bind_socket_for_docker_proxy branch from 66a6bf3 to b0caf36 Compare July 11, 2024 17:37
robmry and others added 8 commits August 5, 2024 14:04
In preparation for the daemon passing a listen fd, add command line
option -use-listen-fd to indicate that the fd is present (as fd 4).

If the new flag isn't given, open the listener as normal.

Refactor the TCP and UDP proxies to be constructed with an existing
TCPListener or UDPConn, respectively. Lift the responsibilty of opening
the listener to the entrypoint. Per the Single Responsibility Principle,
this structure affords changing how the listener is created without
having to touch the proxy implementations.

Co-authored-by: Cory Snider <[email protected]>
Signed-off-by: Rob Murray <[email protected]>
It was only used in tests.

Signed-off-by: Cory Snider <[email protected]>
Get rid of "FIXME: Got an API for which error does not match
any expected type!!!  error="driver failed programming external
connectivity on endpoint..." from debug logs.

Signed-off-by: Rob Murray <[email protected]>
Before this change, when running rootless, instead of running
docker-proxy the daemon would run rootlesskit-docker-proxy.

The job of rootlesskit-docker-proxy was to tell RootlessKit
about mapped host ports before starting docker-proxy, and then
to remove the mapping when it was stopped.

So, rootlesskit-docker-proxy would need to be kept in-step
with changes to docker-proxy (particuarly the upcoming change
to bind TCP/UDP ports in the daemon and pass them to the proxy,
but also possible-future changes like running proxy per-container
rather than per-port-mapping).

This change runs the docker-proxy in rootless mode, instead of
rootlesskit-docker-proxy, and the daemon itself tells RootlessKit
about changes in host port mappings.

Signed-off-by: Rob Murray <[email protected]>
It's not needed, now the daemon tells RootlessKit about port
mappings directly.

Signed-off-by: Rob Murray <[email protected]>
Before commit 4f09af6, when allocating host ports for a new
port mapping, iptables rules were set up then docker-proxy was
started. If the host port was already in-use, docker-proxy exited
with an error, and the iptables rules were removed. That could
potentially interfere with a non-docker service that was already
using the host port for something unrelated.

Commit 4f09af6 swapped that problem for a different one... in
order to check that a port was available before creating iptables
rules, it attempted to start docker-proxy first. If it failed, it
could then try a different host port, without interfering with
any other service. The problem with that is docker-proxy would
start listening before the iptables rules were in place, so it
could accept connections then become unusable because new NAT
rules diverted packets directly to the container. This would leave
the client with a broken connection, causing at-least a delay
while it figured that out and reconnected.

This change creates and binds the socket in the daemon, before
creating iptables rules. If the bind fails, it may try a different
port. When or if the bind succeeds, iptables rules are created,
then the daemon calls listen on the socket. If docker-proxy is
needed, the socket is handed over to it at that point.

In rootless mode, the ports have to be bound to an address in the
rootless network namespace (where dockerd is running). DNAT rules
now use the same address.

If docker-proxy is not needed ("--userland-proxy=false"), the daemon
still listens on TCP sockets as the old dummyProxy would have done.
This makes the socket show up in "netstat" output.

The dummyProxy is no longer needed on Linux. Its job was to bind the
host ports if docker-proxy was disabled, but that's now already
handled by binding the sockets early.

This change doesn't affect SCTP, because it's not currently possible
for docker-proxy to convert the file descriptor into an SCTPListener.
So, docker-proxy is still started early, and the window for lost
connections remains.

If the user has an old docker-proxy in their path and it's given a
listener docker with '-use-listen-fd', it'll fail because of the
unknown option. In this case, the daemon's error message suggests
checking $PATH.

Signed-off-by: Rob Murray <[email protected]>
The daemon was modified to tell RootlessKit about host port
mappings directly, rather than by running rootlesskit-docker-proxy
to make those updates.

DNAT rules created in rootless mode referred to the host IP address,
rather than the address seen as host address in the rootless network
namespace.

With these changes, port mappings work in rootless mode when
--userland-proxy=false - so, don't gate the RootlessKit API calls
on starting docker-proxy.

Signed-off-by: Rob Murray <[email protected]>
@thaJeztah
Copy link
Member

Two quick questions;

  • The location of the userland-proxy binary is configurable; do we know if there's existing alternative implementations that try to match ABI compatibility? I guess we may fail if we'd encounter one of those, as they don't know about the new use-listen-fd option? Have we tried what this situation would look like (e.g. by using a v27.1.1 version of the proxy with this PR)?
  • ☝️ I do expect this to be a really niche case, but mostly wondering how loud it fails, and if the error is clear enough for users in such a situation to trace back the cause.
  • Slightly similar; how do we handle proxies when live-restore is enabled? Do we keep those running? And in that case, what happens when updating docker (thus: updating to a newer version of docker-proxy) ?
  • ☝️ Note that we do NOT support live-restore when doing "major version updates" (only patch releases), so depending on what version update this lands in may be just "expected behavior", but just thought I'd ask. (And to get an understanding what amount of release-notes are needed to warn users about these).

@thaJeztah
Copy link
Member

Also FWIW; regardless what version this PR lands in, the first commit would likely be fine on its own; it's an unused binary, and with docker/docker-ce-packaging#1045 in place, means that our packaging scripts also no longer expect the unused binary to be present.

@robmry
Copy link
Contributor Author

robmry commented Aug 6, 2024

  • The location of the userland-proxy binary is configurable; do we know if there's existing alternative implementations that try to match ABI compatibility? I guess we may fail if we'd encounter one of those, as they don't know about the new use-listen-fd option? Have we tried what this situation would look like (e.g. by using a v27.1.1 version of the proxy with this PR)?

An old version of our docker-proxy fails with a long-winded error message that ends with failed to start docker-proxy, check that the current version is in your $PATH. (There's a full version in the PR description.)

But no, I don't know if anyone's implemented their own version. If they have, and they ignore the new --use-listen-fd option, it'll probably fail because the ports are already bound... and the error response from their version of the proxy command is under their control.

  • Slightly similar; how do we handle proxies when live-restore is enabled? Do we keep those running? And in that case, what happens when updating docker (thus: updating to a newer version of docker-proxy) ?

We don't keep them running ... which seems pretty wrong. But, at least it makes this simpler.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thanks for answering my questions!

@thaJeztah thaJeztah merged commit 8d06e70 into moby:master Aug 8, 2024
@robmry robmry deleted the bind_socket_for_docker_proxy branch September 16, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docker-proxy accepts connections before NAT rules are set up

6 participants