Skip to content

Support hairpin NAT without going through docker server#4442

Merged
vieux merged 1 commit intomoby:masterfrom
ibuildthecloud:hairpin-nat
Mar 28, 2014
Merged

Support hairpin NAT without going through docker server#4442
vieux merged 1 commit intomoby:masterfrom
ibuildthecloud:hairpin-nat

Conversation

@ibuildthecloud
Copy link
Copy Markdown
Contributor

Hairpin NAT is currently done by passing through the docker server. If
two containers on the same box try to access each other through exposed
ports and using the host IP the current iptables rules will not match the
DNAT and thus the traffic goes to 'docker -d'

This change drops the restriction that DNAT traffic must not originate
from docker0. It should be safe to drop this restriction because the
DOCKER chain is already gated by jumps that check for the destination
address to be a local address.

Docker-DCO-1.1-Signed-off-by: Darren Shepherd [email protected] (github: ibuildthecloud)

Hairpin NAT is currently done by passing through the docker server.  If
two containers on the same box try to access each other through exposed
ports and using the host IP the current iptables rules will not match the
DNAT and thus the traffic goes to 'docker -d'

This change drops the restriction that DNAT traffic must not originate
from docker0.  It should be safe to drop this restriction because the
DOCKER chain is already gated by jumps that check for the destination
address to be a local address.

Docker-DCO-1.1-Signed-off-by: Darren Shepherd <[email protected]> (github: ibuildthecloud)
@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

Knowing the fragile nature of iptable rules there is probably some terribly bad and hideous side effects of this simple change. If this is not the correct change, I'd like to discuss what is the proper change. In my application, if I pass the the traffic through docker -d, it will crash in minutes.

The logic behind this change is that all jumps to DOCKER already check that the destination address is type LOCAL. If we already know that the packet is intended for this server, then any dport matching a port mapping should be DNAT'd.

My guess is that the "! -i docker0" was there to not DNAT outbound traffic from the containers, but again, in the DOCKER chain we already know the traffic is destined for the local machine. Possibly, just to be safe, one could add "match dst-type LOCAL" to all port mapping rules in the DOCKER chain, just to be safe.

@crosbymichael
Copy link
Copy Markdown
Contributor

thanks, i'll look at this after 0.9 is out

@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

I opened this PR because when my application traffic actually goes through docker it dies. So I looked into it a bit. I haven't tracked down the root cause, but basically every UDP packet that goes through docker creates an "ESTABLISHED" connect to the destination port. Docker eventually runs out of file descriptors and then everything hangs. So I'm not too sure why "ESTABLISHED" connections are being created (honestly didn't know something like that existed with UDP).

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Mar 18, 2014

ping @crosbymichael

@crosbymichael
Copy link
Copy Markdown
Contributor

@ibuildthecloud I think this can be closed because it is an issue with the userland UDP proxy right?

@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

There is an issue with userland UDP proxy, but that is separate. Currently the hairpin traffic goes through docker but it does not need to. Even if the userland UDP proxy worked fine, it's still not acceptable because the communication between two containers on the same host are going through user space when it isn't needed.

@crosbymichael
Copy link
Copy Markdown
Contributor

ping @jpoimboe

I know you have helped with the iptables rules before, can you check this out and see if you can think of any side effects for removing this line. In my testing everything seems to be working fine.

@crosbymichael
Copy link
Copy Markdown
Contributor

I verified with links and icc true&false

@jpoimboe
Copy link
Copy Markdown
Contributor

@crosbymichael This change looks fine to me.

@crosbymichael
Copy link
Copy Markdown
Contributor

@jpetazzo Can you see any side effects of this? All my tests look good but it's always these small one liners that end up being a pain when they go wrong.

@jpetazzo
Copy link
Copy Markdown
Contributor

@crosbymichael : LGTM.

@crosbymichael
Copy link
Copy Markdown
Contributor

LGTM

1 similar comment
@vieux
Copy link
Copy Markdown
Contributor

vieux commented Mar 28, 2014

LGTM

vieux added a commit that referenced this pull request Mar 28, 2014
Support hairpin NAT without going through docker server
@vieux vieux merged commit d232700 into moby:master Mar 28, 2014
@ibuildthecloud ibuildthecloud deleted the hairpin-nat branch March 31, 2014 18:14
@titanous
Copy link
Copy Markdown
Contributor

titanous commented Apr 9, 2014

This appears to have broken hairpin NAT for me. Connecting from a container to the same container using the host IP and forwarded port worked in previous versions and now times out in v0.10 with this patch.

@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

This has now been reverted in master, but let me look and see why it's causing an issue. This is a pretty critical patch for some things that I'm doing so I'd like to not revert it.

@ibuildthecloud ibuildthecloud restored the hairpin-nat branch April 10, 2014 16:14
@creack
Copy link
Copy Markdown
Contributor

creack commented Apr 10, 2014

The whole point of the userland proxy is this. If you find a way without proxifying, it would be nicer :) So we can remove all that code.

@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

First off, sorry for breaking things. Basically there are the following scenarios to look at for hairpin. Assuming your host IP is 192.168.0.1 imagine you have a setup like

docker run -p 80:8080 --name container-server ubuntu ... (IP 172.17.0.2)
docker run --name container-client ubuntu ... (IP 172.17.0.3)

Now the following should work

scenario A: from container-server: telnet 192.168.0.1:80
scenario B: from container-client: telnet 192.168.0.1:80
scenario C: from host: telnet 192.168.0.1:80

Before this PR scenario A and B would go through user space. The intention was to get the traffic to not go through user space. So this PR made scenario B work without user space. Unfortunately it totally broke scenario A. It broke because source==dest from the bridging and routing perspective.

With the help of @krislindgren basically what we figured out is the following two things are missing.

for i in /sys/devices/virtual/net/*/brport/hairpin_mode; do
  echo 1 > $i
done

iptables -t nat -I POSTROUTING -o docker0 -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp --dport 8080 -j MASQUERADE

First, bridge hairpin mode needs to be turned on for the bridge port. This allows a packet to go out and then back in the same interface. Secondly, if traffic is coming is going out docker0 and the source and destination IP are the same and its going to port 8080, you need to masquerade it. Otherwise the container will get a packet where source and destination are the same and it drops it.

This change is a bit more involved because of the need to turn on hairpin_mode for the interface added to the bridge. I don't know what from the coding perspective would be involved. I'm also interested to get feedback on whether changing the hairpin_mode of the bridge port is acceptable? The net effect is that whatever packets are broadcasted from a container will get echo back to it.

@tianon
Copy link
Copy Markdown
Member

tianon commented Apr 10, 2014

I think it's worth noting also that this doesn't require setting hairpin_mode to 1 for all bridge bits. When I ls /sys/devices/virtual/net/*/brport/hairpin_mode on my own system here, I only get the three currently running container's interfaces listed, so we just need to set this after we create the container's interface, which shouldn't be too bad (I think).

@ibuildthecloud
Copy link
Copy Markdown
Contributor Author

@tianon Right, that was just laziness that I did *. To clarify you just have to set brport/hairpin_mode to 1 on just the interfaces of the containers that need reflective relay. If the container does not publicly expose any ports you do not need to set 1. So its a case by case basis.

@mpetazzoni
Copy link
Copy Markdown

Any progress on this issue? Support for hairpin NAT without going through the Docker daemon in userspace would help a lot, as this reduces network throughput between containers on the same host (while making Docker completely eat up a CPU core).

icecrime pushed a commit to icecrime/libcontainer that referenced this pull request Feb 9, 2015
This is to support being able to DNAT/MASQ traffic from a container back into itself (moby/moby#4442)

Docker-DCO-1.1-Signed-off-by: Patrick Hemmer <[email protected]> (github: phemmer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants