Support hairpin NAT without going through docker server by ibuildthecloud · Pull Request #4442 · moby/moby

ibuildthecloud · 2014-03-04T05:18:49Z

Hairpin NAT is currently done by passing through the docker server. If
two containers on the same box try to access each other through exposed
ports and using the host IP the current iptables rules will not match the
DNAT and thus the traffic goes to 'docker -d'

This change drops the restriction that DNAT traffic must not originate
from docker0. It should be safe to drop this restriction because the
DOCKER chain is already gated by jumps that check for the destination
address to be a local address.

Docker-DCO-1.1-Signed-off-by: Darren Shepherd [email protected] (github: ibuildthecloud)

Hairpin NAT is currently done by passing through the docker server. If two containers on the same box try to access each other through exposed ports and using the host IP the current iptables rules will not match the DNAT and thus the traffic goes to 'docker -d' This change drops the restriction that DNAT traffic must not originate from docker0. It should be safe to drop this restriction because the DOCKER chain is already gated by jumps that check for the destination address to be a local address. Docker-DCO-1.1-Signed-off-by: Darren Shepherd <[email protected]> (github: ibuildthecloud)

ibuildthecloud · 2014-03-04T05:18:55Z

Knowing the fragile nature of iptable rules there is probably some terribly bad and hideous side effects of this simple change. If this is not the correct change, I'd like to discuss what is the proper change. In my application, if I pass the the traffic through docker -d, it will crash in minutes.

The logic behind this change is that all jumps to DOCKER already check that the destination address is type LOCAL. If we already know that the packet is intended for this server, then any dport matching a port mapping should be DNAT'd.

My guess is that the "! -i docker0" was there to not DNAT outbound traffic from the containers, but again, in the DOCKER chain we already know the traffic is destined for the local machine. Possibly, just to be safe, one could add "match dst-type LOCAL" to all port mapping rules in the DOCKER chain, just to be safe.

crosbymichael · 2014-03-05T05:54:04Z

thanks, i'll look at this after 0.9 is out

ibuildthecloud · 2014-03-14T00:11:04Z

I opened this PR because when my application traffic actually goes through docker it dies. So I looked into it a bit. I haven't tracked down the root cause, but basically every UDP packet that goes through docker creates an "ESTABLISHED" connect to the destination port. Docker eventually runs out of file descriptors and then everything hangs. So I'm not too sure why "ESTABLISHED" connections are being created (honestly didn't know something like that existed with UDP).

vieux · 2014-03-18T01:24:48Z

ping @crosbymichael

crosbymichael · 2014-03-18T16:27:39Z

@ibuildthecloud I think this can be closed because it is an issue with the userland UDP proxy right?

ibuildthecloud · 2014-03-25T02:58:56Z

There is an issue with userland UDP proxy, but that is separate. Currently the hairpin traffic goes through docker but it does not need to. Even if the userland UDP proxy worked fine, it's still not acceptable because the communication between two containers on the same host are going through user space when it isn't needed.

crosbymichael · 2014-03-26T21:31:14Z

ping @jpoimboe

I know you have helped with the iptables rules before, can you check this out and see if you can think of any side effects for removing this line. In my testing everything seems to be working fine.

crosbymichael · 2014-03-26T21:38:32Z

I verified with links and icc true&false

jpoimboe · 2014-03-27T04:46:06Z

@crosbymichael This change looks fine to me.

crosbymichael · 2014-03-27T17:57:07Z

@jpetazzo Can you see any side effects of this? All my tests look good but it's always these small one liners that end up being a pain when they go wrong.

jpetazzo · 2014-03-27T23:22:09Z

@crosbymichael : LGTM.

crosbymichael · 2014-03-27T23:44:05Z

LGTM

vieux · 2014-03-28T01:09:40Z

LGTM

Support hairpin NAT without going through docker server

titanous · 2014-04-09T16:29:10Z

This appears to have broken hairpin NAT for me. Connecting from a container to the same container using the host IP and forwarded port worked in previous versions and now times out in v0.10 with this patch.

ibuildthecloud · 2014-04-10T16:13:34Z

This has now been reverted in master, but let me look and see why it's causing an issue. This is a pretty critical patch for some things that I'm doing so I'd like to not revert it.

creack · 2014-04-10T16:18:14Z

The whole point of the userland proxy is this. If you find a way without proxifying, it would be nicer :) So we can remove all that code.

ibuildthecloud · 2014-04-10T19:16:00Z

First off, sorry for breaking things. Basically there are the following scenarios to look at for hairpin. Assuming your host IP is 192.168.0.1 imagine you have a setup like

docker run -p 80:8080 --name container-server ubuntu ... (IP 172.17.0.2)
docker run --name container-client ubuntu ... (IP 172.17.0.3)

Now the following should work

scenario A: from container-server: telnet 192.168.0.1:80
scenario B: from container-client: telnet 192.168.0.1:80
scenario C: from host: telnet 192.168.0.1:80

Before this PR scenario A and B would go through user space. The intention was to get the traffic to not go through user space. So this PR made scenario B work without user space. Unfortunately it totally broke scenario A. It broke because source==dest from the bridging and routing perspective.

With the help of @krislindgren basically what we figured out is the following two things are missing.

for i in /sys/devices/virtual/net/*/brport/hairpin_mode; do
  echo 1 > $i
done

iptables -t nat -I POSTROUTING -o docker0 -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp --dport 8080 -j MASQUERADE

First, bridge hairpin mode needs to be turned on for the bridge port. This allows a packet to go out and then back in the same interface. Secondly, if traffic is coming is going out docker0 and the source and destination IP are the same and its going to port 8080, you need to masquerade it. Otherwise the container will get a packet where source and destination are the same and it drops it.

This change is a bit more involved because of the need to turn on hairpin_mode for the interface added to the bridge. I don't know what from the coding perspective would be involved. I'm also interested to get feedback on whether changing the hairpin_mode of the bridge port is acceptable? The net effect is that whatever packets are broadcasted from a container will get echo back to it.

tianon · 2014-04-10T19:44:49Z

I think it's worth noting also that this doesn't require setting hairpin_mode to 1 for all bridge bits. When I ls /sys/devices/virtual/net/*/brport/hairpin_mode on my own system here, I only get the three currently running container's interfaces listed, so we just need to set this after we create the container's interface, which shouldn't be too bad (I think).

ibuildthecloud · 2014-04-10T20:06:20Z

@tianon Right, that was just laziness that I did *. To clarify you just have to set brport/hairpin_mode to 1 on just the interfaces of the containers that need reflective relay. If the container does not publicly expose any ports you do not need to set 1. So its a case by case basis.

mpetazzoni · 2014-06-03T23:47:05Z

Any progress on this issue? Support for hairpin NAT without going through the Docker daemon in userspace would help a lot, as this reduces network throughput between containers on the same host (while making Docker completely eat up a CPU core).

This is to support being able to DNAT/MASQ traffic from a container back into itself (moby/moby#4442) Docker-DCO-1.1-Signed-off-by: Patrick Hemmer <[email protected]> (github: phemmer)

crosbymichael mentioned this pull request Mar 14, 2014

Issue binding to docker0 interface #3850

Closed

vieux added a commit that referenced this pull request Mar 28, 2014

Merge pull request #4442 from ibuildthecloud/hairpin-nat

d232700

Support hairpin NAT without going through docker server

vieux merged commit d232700 into moby:master Mar 28, 2014

ibuildthecloud deleted the hairpin-nat branch March 31, 2014 18:14

ibuildthecloud restored the hairpin-nat branch April 10, 2014 16:14

jayd3e mentioned this pull request Jun 27, 2014

remote: Build failed: EOF flynn/flynn#18

Closed

This was referenced Jul 2, 2014

enable hairpin mode on virtual interface bridge port docker-archive/libcontainer#62

Merged

Support hairpin NAT #6810

Merged

polarathene mentioned this pull request May 14, 2023

Networking - userland-proxy could better clarify impact docker/docs#17312

Open

Conversation

ibuildthecloud commented Mar 4, 2014

Uh oh!

ibuildthecloud commented Mar 4, 2014

Uh oh!

crosbymichael commented Mar 5, 2014

Uh oh!

ibuildthecloud commented Mar 14, 2014

Uh oh!

vieux commented Mar 18, 2014

Uh oh!

crosbymichael commented Mar 18, 2014

Uh oh!

ibuildthecloud commented Mar 25, 2014

Uh oh!

crosbymichael commented Mar 26, 2014

Uh oh!

crosbymichael commented Mar 26, 2014

Uh oh!

jpoimboe commented Mar 27, 2014

Uh oh!

crosbymichael commented Mar 27, 2014

Uh oh!

jpetazzo commented Mar 27, 2014

Uh oh!

crosbymichael commented Mar 27, 2014

Uh oh!

vieux commented Mar 28, 2014

Uh oh!

titanous commented Apr 9, 2014

Uh oh!

ibuildthecloud commented Apr 10, 2014

Uh oh!

creack commented Apr 10, 2014

Uh oh!

ibuildthecloud commented Apr 10, 2014

Uh oh!

tianon commented Apr 10, 2014

Uh oh!

ibuildthecloud commented Apr 10, 2014

Uh oh!

mpetazzoni commented Jun 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants