Create docker-proxy TCP/UDP listener sockets in the daemon#48132
Create docker-proxy TCP/UDP listener sockets in the daemon#48132thaJeztah merged 9 commits intomoby:masterfrom
Conversation
76afba0 to
50ed3a8
Compare
corhere
left a comment
There was a problem hiding this comment.
I have a suggestion: go all Single Responsibility Principle on the proxy implementations and make it the caller's responsibility to construct the net.Listener. https://github.com/robmry/moby/compare/bind_socket_for_docker_proxy...corhere:moby:bind_socket_for_docker_proxy?expand=1
I think docker-proxy should be backwards compatible with older dockerd. It's easy enough to do (if my suggestion is followed...) and could save users some grief if they accidentally mix versions of the binaries.
50ed3a8 to
e6728e6
Compare
Thanks Cory - I picked up your changes and rebased/reworked a bit. As discussed, before |
e6728e6 to
b021e87
Compare
|
The rootless tests are failing because, it turns out, there's a rootlesskit-docker-proxy (a docker-proxy proxy) - and it doesn't know about the new I think the tests passed before the last set of updates added that option, even though dockerd would already have bound the ports in the namespace it's using. Maybe dockerd needs to know if it's rootless and not-bind the ports, or just not use the new option to say it has. Or, something. I'll investigate ... |
That smells a lot like a port-forwarding "driver" for rootless Docker. I wonder if it would be worthwhile to move the rootlesskit client stuff into the dockerd side so rootlesskit-docker-proxy can go away. Dockerd is already integrating with rootlesskit in other respects, and getting rootlesskit-docker-proxy out of the way would make it that much more practical to tackle e.g. implementing proxy-per-container multiplexing. |
b021e87 to
66a6bf3
Compare
Thanks Cory, I've had a go at that - there are three new commits ...
@AkihiroSuda - it'd be good to get your thoughts on this, there may well be something we missed. |
66a6bf3 to
b0caf36
Compare
In preparation for the daemon passing a listen fd, add command line option -use-listen-fd to indicate that the fd is present (as fd 4). If the new flag isn't given, open the listener as normal. Refactor the TCP and UDP proxies to be constructed with an existing TCPListener or UDPConn, respectively. Lift the responsibilty of opening the listener to the entrypoint. Per the Single Responsibility Principle, this structure affords changing how the listener is created without having to touch the proxy implementations. Co-authored-by: Cory Snider <[email protected]> Signed-off-by: Rob Murray <[email protected]>
Signed-off-by: Cory Snider <[email protected]>
It was only used in tests. Signed-off-by: Cory Snider <[email protected]>
Get rid of "FIXME: Got an API for which error does not match any expected type!!! error="driver failed programming external connectivity on endpoint..." from debug logs. Signed-off-by: Rob Murray <[email protected]>
Before this change, when running rootless, instead of running docker-proxy the daemon would run rootlesskit-docker-proxy. The job of rootlesskit-docker-proxy was to tell RootlessKit about mapped host ports before starting docker-proxy, and then to remove the mapping when it was stopped. So, rootlesskit-docker-proxy would need to be kept in-step with changes to docker-proxy (particuarly the upcoming change to bind TCP/UDP ports in the daemon and pass them to the proxy, but also possible-future changes like running proxy per-container rather than per-port-mapping). This change runs the docker-proxy in rootless mode, instead of rootlesskit-docker-proxy, and the daemon itself tells RootlessKit about changes in host port mappings. Signed-off-by: Rob Murray <[email protected]>
It's not needed, now the daemon tells RootlessKit about port mappings directly. Signed-off-by: Rob Murray <[email protected]>
Before commit 4f09af6, when allocating host ports for a new port mapping, iptables rules were set up then docker-proxy was started. If the host port was already in-use, docker-proxy exited with an error, and the iptables rules were removed. That could potentially interfere with a non-docker service that was already using the host port for something unrelated. Commit 4f09af6 swapped that problem for a different one... in order to check that a port was available before creating iptables rules, it attempted to start docker-proxy first. If it failed, it could then try a different host port, without interfering with any other service. The problem with that is docker-proxy would start listening before the iptables rules were in place, so it could accept connections then become unusable because new NAT rules diverted packets directly to the container. This would leave the client with a broken connection, causing at-least a delay while it figured that out and reconnected. This change creates and binds the socket in the daemon, before creating iptables rules. If the bind fails, it may try a different port. When or if the bind succeeds, iptables rules are created, then the daemon calls listen on the socket. If docker-proxy is needed, the socket is handed over to it at that point. In rootless mode, the ports have to be bound to an address in the rootless network namespace (where dockerd is running). DNAT rules now use the same address. If docker-proxy is not needed ("--userland-proxy=false"), the daemon still listens on TCP sockets as the old dummyProxy would have done. This makes the socket show up in "netstat" output. The dummyProxy is no longer needed on Linux. Its job was to bind the host ports if docker-proxy was disabled, but that's now already handled by binding the sockets early. This change doesn't affect SCTP, because it's not currently possible for docker-proxy to convert the file descriptor into an SCTPListener. So, docker-proxy is still started early, and the window for lost connections remains. If the user has an old docker-proxy in their path and it's given a listener docker with '-use-listen-fd', it'll fail because of the unknown option. In this case, the daemon's error message suggests checking $PATH. Signed-off-by: Rob Murray <[email protected]>
The daemon was modified to tell RootlessKit about host port mappings directly, rather than by running rootlesskit-docker-proxy to make those updates. DNAT rules created in rootless mode referred to the host IP address, rather than the address seen as host address in the rootless network namespace. With these changes, port mappings work in rootless mode when --userland-proxy=false - so, don't gate the RootlessKit API calls on starting docker-proxy. Signed-off-by: Rob Murray <[email protected]>
fe9c7b1 to
cdea750
Compare
|
Two quick questions;
|
|
Also FWIW; regardless what version this PR lands in, the first commit would likely be fine on its own; it's an unused binary, and with docker/docker-ce-packaging#1045 in place, means that our packaging scripts also no longer expect the unused binary to be present. |
An old version of our But no, I don't know if anyone's implemented their own version. If they have, and they ignore the new
We don't keep them running ... which seems pretty wrong. But, at least it makes this simpler. |
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM
thanks for answering my questions!
- What I did
Before commit 4f09af6, when allocating host ports for a new port mapping iptables rules were set up, then docker-proxy was started. If the host port was already in-use, docker-proxy exited with an error, and the iptables rules were removed. That could potentially interfere with a non-docker service that was already using the host port for something unrelated.
Commit 4f09af6 swapped that problem for a different one... in order to check that a port was available before creating iptables rules, it attempted to start docker-proxy first. If it failed, it could then try a different host port, without interfering with any other service. The problem with that is docker-proxy would start listening before the iptables rules were in place, so it could accept connections then become unusable because new NAT rules diverted packets directly to the container. This would leave the client with a broken connection, causing at-least a delay while it figured that out and reconnected.
- How I did it
This change creates and binds the socket in the daemon, before creating iptables rules. If the bind fails, it may try a different port. When or if the bind succeeds, iptables rules are created, then the daemon calls listen on the socket. If docker-proxy is needed, the socket is handed over to it at that point.
If docker-proxy is not needed (
--userland-proxy=false), the daemon still listens on TCP sockets as thedummyProxywould have done. This makes the socket show up innetstatoutput.The
dummyProxyis no longer needed on Linux. Its job was to bind the host ports if docker-proxy was disabled, but that's now already handled by binding the sockets early.This change doesn't affect SCTP, because it's not currently possible for docker-proxy to convert the file descriptor into an SCTPListener. So, docker-proxy is still started early, and the window for lost connections remains.
A new option has been added to
docker-proxy,-use-listen-fdto tell it that fd 4 is a listener socket. It no longer needs to know the "frontend" address/port for TCP/UDP, but it's useful to see the options in the command line to work out which docker-proxy is which. (As a follow-up, it'd be possible to read the address/port from the fd and display it in the process title.)If the user has an old docker-proxy in their path it'll fail for TCP/UDP, because it won't recognise the new command line option that says an listen-fd has been supplied as fd 4. In this case, the daemon's error message suggests checking $PATH.
RootlessKit used a proxy for
docker-proxycalledrootlesskit-docker-proxy. Its job was to make RootlessKit API calls to tell it about mapped host ports when necessary, before starting the usualdocker-proxy, and then to remove that mapping when stopping the proxy. Therootlesskit-docker-proxydidn't know about new option-use-listen-fd, or the open sockets passed-in. Rather than updaterootlesskit-docker-proxy, replace it by making the RootlessKit API calls directly from the daemon. That'll make it possible to continue to evolvedocker-proxy, without having to keeprootlesskit-docker-proxyin-step (for example, we may go to a singledocker-proxyper container rather than per port-mapping). It also makes it possible to set up those mappings in RootlessKit without running the proxy, when--userland-proxy=false. Becauserootlesskit-docker-proxyis no longer used, it's no longer built.- How to verify it
Updated tests.
The port mapping tests relied on
startProxybeing called to check that expected ports had been selected. That call is not made now, becausedummyProxyisn't used. So, changed most of the tests to make them start the proxy.For tests of behaviour where a host port is unexpectedly unavailable, a real
bind()now needs to fail in the daemon code being tested (the test's mock proxy can't report that it would have failed, because it's called much later in the process). So, the tests now set up a dummy bridge in their n/w namespace, add any host addresses necessary to that bridge, and bind ports that need to be busy - before the test starts.With an old
docker-proxyin the path, the error looks like this ...docker: Error response from daemon: driver failed programming external connectivity on endpoint fervent_brattain (b4a9c5ebe14946f70c12c304da95abf41ed8b804903879c1163030ddbdbf1273): failed to start userland proxy for port mapping 0.0.0.0:8080:172.21.0.2:80/tcp: failed to start docker-proxy, check that the current version is in your $PATH.(I hacked a
NewSCTPListener(fd)into the sctp package, and checked that it worked with the socket bound in the daemon. So we can propose a change like that, then align SCTP socket creation with TCP/UDP.)Rootless
Mapping from unspecified host address ...
Mapping from a specific host address ...
Mapping from two specific host addresses, before the change ...
After the change - binding fails immediately, because both mappings end up trying to use the same child address ...
With
--userland-proxy=false...- Description for the changelog