Support hairpin NAT without going through docker server#4442
Support hairpin NAT without going through docker server#4442vieux merged 1 commit intomoby:masterfrom
Conversation
Hairpin NAT is currently done by passing through the docker server. If two containers on the same box try to access each other through exposed ports and using the host IP the current iptables rules will not match the DNAT and thus the traffic goes to 'docker -d' This change drops the restriction that DNAT traffic must not originate from docker0. It should be safe to drop this restriction because the DOCKER chain is already gated by jumps that check for the destination address to be a local address. Docker-DCO-1.1-Signed-off-by: Darren Shepherd <[email protected]> (github: ibuildthecloud)
|
Knowing the fragile nature of iptable rules there is probably some terribly bad and hideous side effects of this simple change. If this is not the correct change, I'd like to discuss what is the proper change. In my application, if I pass the the traffic through docker -d, it will crash in minutes. The logic behind this change is that all jumps to DOCKER already check that the destination address is type LOCAL. If we already know that the packet is intended for this server, then any dport matching a port mapping should be DNAT'd. My guess is that the "! -i docker0" was there to not DNAT outbound traffic from the containers, but again, in the DOCKER chain we already know the traffic is destined for the local machine. Possibly, just to be safe, one could add "match dst-type LOCAL" to all port mapping rules in the DOCKER chain, just to be safe. |
|
thanks, i'll look at this after 0.9 is out |
|
I opened this PR because when my application traffic actually goes through docker it dies. So I looked into it a bit. I haven't tracked down the root cause, but basically every UDP packet that goes through docker creates an "ESTABLISHED" connect to the destination port. Docker eventually runs out of file descriptors and then everything hangs. So I'm not too sure why "ESTABLISHED" connections are being created (honestly didn't know something like that existed with UDP). |
|
ping @crosbymichael |
|
@ibuildthecloud I think this can be closed because it is an issue with the userland UDP proxy right? |
|
There is an issue with userland UDP proxy, but that is separate. Currently the hairpin traffic goes through docker but it does not need to. Even if the userland UDP proxy worked fine, it's still not acceptable because the communication between two containers on the same host are going through user space when it isn't needed. |
|
ping @jpoimboe I know you have helped with the iptables rules before, can you check this out and see if you can think of any side effects for removing this line. In my testing everything seems to be working fine. |
|
I verified with links and icc true&false |
|
@crosbymichael This change looks fine to me. |
|
@jpetazzo Can you see any side effects of this? All my tests look good but it's always these small one liners that end up being a pain when they go wrong. |
|
@crosbymichael : LGTM. |
|
LGTM |
1 similar comment
|
LGTM |
Support hairpin NAT without going through docker server
|
This appears to have broken hairpin NAT for me. Connecting from a container to the same container using the host IP and forwarded port worked in previous versions and now times out in v0.10 with this patch. |
|
This has now been reverted in master, but let me look and see why it's causing an issue. This is a pretty critical patch for some things that I'm doing so I'd like to not revert it. |
|
The whole point of the userland proxy is this. If you find a way without proxifying, it would be nicer :) So we can remove all that code. |
|
First off, sorry for breaking things. Basically there are the following scenarios to look at for hairpin. Assuming your host IP is 192.168.0.1 imagine you have a setup like docker run -p 80:8080 --name container-server ubuntu ... (IP 172.17.0.2) Now the following should work scenario A: from container-server: telnet 192.168.0.1:80 Before this PR scenario A and B would go through user space. The intention was to get the traffic to not go through user space. So this PR made scenario B work without user space. Unfortunately it totally broke scenario A. It broke because source==dest from the bridging and routing perspective. With the help of @krislindgren basically what we figured out is the following two things are missing. for i in /sys/devices/virtual/net/*/brport/hairpin_mode; do
echo 1 > $i
done
iptables -t nat -I POSTROUTING -o docker0 -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp --dport 8080 -j MASQUERADEFirst, bridge hairpin mode needs to be turned on for the bridge port. This allows a packet to go out and then back in the same interface. Secondly, if traffic is coming is going out docker0 and the source and destination IP are the same and its going to port 8080, you need to masquerade it. Otherwise the container will get a packet where source and destination are the same and it drops it. This change is a bit more involved because of the need to turn on hairpin_mode for the interface added to the bridge. I don't know what from the coding perspective would be involved. I'm also interested to get feedback on whether changing the hairpin_mode of the bridge port is acceptable? The net effect is that whatever packets are broadcasted from a container will get echo back to it. |
|
I think it's worth noting also that this doesn't require setting |
|
@tianon Right, that was just laziness that I did |
|
Any progress on this issue? Support for hairpin NAT without going through the Docker daemon in userspace would help a lot, as this reduces network throughput between containers on the same host (while making Docker completely eat up a CPU core). |
This is to support being able to DNAT/MASQ traffic from a container back into itself (moby/moby#4442) Docker-DCO-1.1-Signed-off-by: Patrick Hemmer <[email protected]> (github: phemmer)
Hairpin NAT is currently done by passing through the docker server. If
two containers on the same box try to access each other through exposed
ports and using the host IP the current iptables rules will not match the
DNAT and thus the traffic goes to 'docker -d'
This change drops the restriction that DNAT traffic must not originate
from docker0. It should be safe to drop this restriction because the
DOCKER chain is already gated by jumps that check for the destination
address to be a local address.
Docker-DCO-1.1-Signed-off-by: Darren Shepherd [email protected] (github: ibuildthecloud)