Skip to content

Enable bridge netfiltering if userland-proxy=false#48676

Merged
akerouanton merged 1 commit intomoby:masterfrom
robmry:48664_br_netfilter_noproxy
Oct 17, 2024
Merged

Enable bridge netfiltering if userland-proxy=false#48676
akerouanton merged 1 commit intomoby:masterfrom
robmry:48664_br_netfilter_noproxy

Conversation

@robmry
Copy link
Copy Markdown
Contributor

@robmry robmry commented Oct 16, 2024

- What I did

In release 27.0, ip6tables was enabled by default. That caused a problem on some hosts where iptables was explicitly disabled and loading the br_netfilter module (which loads with its nf-call-iptables settings enabled) caused user-defined iptables rules to block traffic on bridges, breaking inter-container communication.

In 27.3.0, #48511 delayed loading of the br_netfilter module until it was needed. The load now happens in the function that sets bridge-nf-call-ip[6]tables when needed. It was only called for icc=false networks.

However, br_netfilter is also needed when userland-proxy=false. Without it, packets addressed to a host-mapped port for a container on the same network are not DNAT'd properly (responses have the server container's address instead of the host's).

That means, in all releases including 26.x, if br_netfilter was loaded before the daemon started - and the OS/user/other-application had disabled bridge-nf-call-ip[6]tables (which apparently some OSs try to do), it would not be enabled by the daemon. So, ICC would fail for host-mapped ports with the userland-proxy disabled.

The change in 27.3.0 made this worse - previously, loading br_netfilter whenever iptables/ip6tables was enabled meant that bridge-netfiltering got enabled, even though the daemon didn't check it was enabled.

- How I did it

Check that br_netfilter is loaded, with bridge-nf-call-ip[6]tables enabled, if userland-proxy=false.

- How to verify it

New test that disables bridge-nf-call-iptables and the userland proxy, then tries to access another container's mapped port ... not quite the same as unloading br_netfilter - but this test fails with a daemon built from the 26.1 branch (as well as 27.x).

Manually tested on an Ubuntu VM, including with br_netfilter unloaded before starting the daemon.

- Description for the changelog

- Fix an issue that meant published ports from one container on a bridge network were not accessible
  from another container on the same network with `userland-proxy` disabled, if the kernel's `br_netfilter`
  module was not loaded and enabled. The daemon will now attempt to load the module and enable
  `bridge-nf-call-iptables` or `bridge-nf-call-ip6tables` when creating a network with the userland proxy
   disabled.

@robmry robmry self-assigned this Oct 16, 2024
@robmry robmry added this to the 28.0.0 milestone Oct 16, 2024
@robmry robmry marked this pull request as ready for review October 16, 2024 12:55
@robmry robmry requested a review from akerouanton October 16, 2024 12:55

// Find an address on the test host.
hostAddr := func() string {
conn, err := net.Dial("tcp4", "hub.docker.com:80")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this needs to be an actual domain, we should consider using one of the designated ones (could be "example.com" here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"example.com" is for documentation, I'm not sure using it in a test that expects it to be actually resolvable and listening is really in the spirit of the thing.

An alternative would be to look up the host's addresses using netlink, and pick one. Which is do-able as this is a Linux-only test. Other tests use this Dial trick now ... if they need to change, I could follow-up with a separate PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah, I was looking for something not tied to "docker.com" and more generic. Not a big issue.

container.WithCmd("wget", "http://"+hostAddr+":"+hostPort),
)
defer c.ContainerRemove(ctx, res.ContainerID, containertypes.RemoveOptions{Force: true})
assert.Check(t, is.Contains(res.Stderr.String(), "404 Not Found"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -NS work with busybox as well, considering if the 404 Not Found would always have that format, and thus if we could somehow just get the status code only; https://serverfault.com/questions/1088710/how-to-use-wget-http-status-codes-in-an-if-else-statement-to-take-an-action-base

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

busybox's wget doesn't have a -N. Its -S shows HTTP response headers, so ...

/ # wget http://google.com/doesnotexist
Connecting to google.com (142.250.200.46:80)
wget: server returned error: HTTP/1.1 404 Not Found

Compared to ...

/ # wget -S http://google.com/doesnotexist
Connecting to google.com (142.250.200.46:80)
  HTTP/1.1 404 Not Found
wget: server returned error: HTTP/1.1 404 Not Found

So, without the -S the error message is just the HTTP first line. With -S, that first line is just shown twice. It's a standard format, very unlikely to change.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, yeah, I was wondering if the output could contain a server response, which could be anything "different".

using -q (quiet) seems to work if we want to reduce additional output, but I guess it's fine;

/ #  wget -q https://google.com/doesnotexist
wget: server returned error: HTTP/1.1 404 Not Found

In release 27.0, ip6tables was enabled by default. That caused a
problem on some hosts where iptables was explicitly disabled and
loading the br_netfilter module (which loads with its nf-call-iptables
settings enabled) caused user-defined iptables rules to block traffic
on bridges, breaking inter-container communication.

In 27.3.0, commit 5c499fc delayed
loading of the br_netfilter module until it was needed. The load
now happens in the function that sets bridge-nf-call-ip[6]tables when
needed. It was only called for icc=false networks.

However, br_netfilter is also needed when userland-proxy=false.
Without it, packets addressed to a host-mapped port for a container
on the same network are not DNAT'd properly (responses have the server
container's address instead of the host's).

That means, in all releases including 26.x, if br_netfilter was loaded
before the daemon started - and the OS/user/other-application had
disabled bridge-nf-call-ip[6]tables, it would not be enabled by the
daemon. So, ICC would fail for host-mapped ports with the userland-proxy
disabled.

The change in 27.3.0 made this worse - previously, loading br_netfilter
whenever iptables/ip6tables was enabled meant that bridge-netfiltering
got enabled, even though the daemon didn't check it was enabled.

So... check that br_netfilter is loaded, with bridge-nf-call-ip[6]tables
enabled, if userland-proxy=false.

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry force-pushed the 48664_br_netfilter_noproxy branch from 72a1bfe to 0548fe2 Compare October 17, 2024 11:33
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ledlamp
Copy link
Copy Markdown

ledlamp commented Jan 9, 2025

However, br_netfilter is also needed when userland-proxy=false. Without it, packets addressed to a host-mapped port for a container on the same network are not DNAT'd properly (responses have the server container's address instead of the host's).

Why do you need bridge filtering? All you need is SNAT.

iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -d 172.17.0.0/16 -j MASQUERADE

Edit: I found the reason. With br_netfilter, the source IP is preserved (i.e. 172.17.0.3) and you can tell which container it's from. With MASQUERADE, the source IP will be the host's IP (i.e. 172.17.0.1). But this is the same case for the userland-proxies. So I think iptables MASQUERADE should be used as fallback instead of userland proxies. Userland proxy should only be used as last resort if iptables cannot be used (rarely the case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker reports "WARNING: bridge-nf-call-iptables/ip6tables is disabled" at startup after upgrading to 27.3.1

4 participants