Only enable bridge netfiltering when needed#48492
Conversation
09119a0 to
0677efa
Compare
| if out, err := exec.Command("modprobe", "-va", "bridge", "br_netfilter").CombinedOutput(); err != nil { | ||
| log.G(context.TODO()).WithError(err).Warnf("Running modprobe bridge br_netfilter failed with message: %s", out) |
There was a problem hiding this comment.
Wondering if we should start looking at plumbing through context to these networking options (both for OTEL, as well as potentially context-cancellation (e.g. this could use exec.CommandContext).
Definitely not for this PR, but curious if it would make sense to start looking at.
There was a problem hiding this comment.
Yes, it would make sense.
| if _, err := os.Stat("/proc/sys/net/bridge"); err != nil { | ||
| if out, err := exec.Command("modprobe", "-va", "bridge", "br_netfilter").CombinedOutput(); err != nil { | ||
| log.G(context.TODO()).WithError(err).Warnf("Running modprobe bridge br_netfilter failed with message: %s", out) | ||
| return fmt.Errorf("modprobe br_netfilter failed: %w", err) |
There was a problem hiding this comment.
This now both returns and logs the error, whereas previously we would only log it; is there a risk that we're changing behavior?
(Also wondering if the logging should happen on the caller side now that we return these)?
There was a problem hiding this comment.
As it's an error now; perhaps the Warnf should become an Errorf now as well 🤔 (we tend to use "warning" for errors that we can recover from, but as we now no longer recover, it's an Error I guess?
There was a problem hiding this comment.
This now both returns and logs the error, whereas previously we would only log it; is there a risk that we're changing behavior?
If the modprobe failed on startup before, network creation with icc=false would have failed here, with the "cannot restrict inter-container communication" error (just below). I'll add "cannot restrict inter-container communication" to the modprobe error.
So, I don't think there's any change in behaviour, apart from the text in the error message.
(Also wondering if the logging should happen on the caller side now that we return these)?
I think that'd mean duplicating the log lines, would it be an improvement? We could just get rid of the log line, but it might still provide a useful clue for something.
As it's an error now; perhaps the Warnf should become an Errorf now as well 🤔 (we tend to use "warning" for errors that we can recover from, but as we now no longer recover, it's an Error I guess?
I'll change it to an Errorf. (If the modprobe fails, it now means an network create "icc=false" didn't work - before it meant a future network create with "icc=false" would fail if attempted. Recovery is the same as before, just reject the network creation. The rest of the daemon wouldn't be affected in either case.)
Kernel module br_netfilter is loaded when the daemon starts with either iptables or ip6tables enabled. That automatically sets: net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 So, when: - docker was running happily with iptables=false, and - no explicit ip6tables=false, and - br_netfilter was not loaded ... the change in moby 27.0 to enable ip6tables by default, resulted in net.bridge.bridge-nf-call-iptables being enabled. If the host also had a firewall with default-drop on its forward chain - that resulted in packets getting dropped between containers on a bridge network. So, only try to load br_netfilter when it's needed - it's only needed to implement "--icc=false", which can only be used when iptables or ip6tables is enabled. Signed-off-by: Rob Murray <[email protected]>
0677efa to
db25b0d
Compare
Docker versions prior to 27.3.1 automatically loaded the br_netfilter kernel module at daemon startup. When loaded, br_netfilter causes bridge traffic to pass through the host's iptables FORWARD chain, where DOCKER-ISOLATION-STAGE-1 drops packets from --internal networks destined for external IPs. This silently breaks membrane's transparent proxy; the agent's TCP connections time out with no useful error. Insert a narrow iptables rule into DOCKER-USER (which runs before DOCKER-ISOLATION-STAGE-1) that accepts forwarded traffic from the membrane internal bridge. The rule is scoped to the per-session bridge interface and is removed on session teardown. If cleanup fails, the orphaned rule references a nonexistent interface and is harmless; Docker rewrites its chains on restart anyway. This fix makes membrane work correctly regardless of Docker version or whether br_netfilter is loaded on the host. Ref: moby/moby#48492 (Docker stopped auto-loading br_netfilter in 27.3.1)
Docker versions prior to 27.3.1 automatically loaded the br_netfilter kernel module at daemon startup. When loaded, br_netfilter causes bridge traffic to pass through the host's iptables FORWARD chain, where DOCKER-ISOLATION-STAGE-1 drops packets from --internal networks destined for external IPs. This silently breaks membrane's transparent proxy; the agent's TCP connections time out with no useful error. Insert a narrow iptables rule into DOCKER-USER (which runs before DOCKER-ISOLATION-STAGE-1) that accepts forwarded traffic from the membrane internal bridge. The rule is scoped to the per-session bridge interface and is removed on session teardown. If cleanup fails, the orphaned rule references a nonexistent interface and is harmless; Docker rewrites its chains on restart anyway. Only inject the rule when br_netfilter is actually loaded, checked via /proc/modules. On hosts where it is not loaded (Docker 27.3.1+ default), membrane works without touching iptables at all. Use sudo for the iptables commands since membrane does not run as root. This fix makes membrane work correctly regardless of Docker version or whether br_netfilter is loaded on the host. Ref: moby/moby#48492 (Docker stopped auto-loading br_netfilter in 27.3.1)
Docker versions prior to 27.3.1 automatically loaded the br_netfilter kernel module at daemon startup. When loaded, br_netfilter causes bridge traffic to pass through the host's iptables FORWARD chain, where DOCKER-ISOLATION-STAGE-1 drops packets from --internal networks destined for external IPs. This silently breaks membrane's transparent proxy; the agent's TCP connections time out with no useful error. Insert a narrow iptables rule into DOCKER-USER (which runs before DOCKER-ISOLATION-STAGE-1) that accepts forwarded traffic from the membrane internal bridge. The rule is scoped to the per-session bridge interface and is removed on session teardown. If cleanup fails, the orphaned rule references a nonexistent interface and is harmless; Docker rewrites its chains on restart anyway. Only inject the rule when br_netfilter is actually loaded, checked via /proc/modules. On hosts where it is not loaded (Docker 27.3.1+ default), membrane works without touching iptables at all. Use sudo for the iptables commands since membrane does not run as root. Tighten the DOCKER-USER rule to also restrict the outbound interface to the external bridge (-o br-<external>), so packets from the internal bridge can only be forwarded toward the handler and not to any other interface on the host. This fix makes membrane work correctly regardless of Docker version or whether br_netfilter is loaded on the host. Ref: moby/moby#48492 (Docker stopped auto-loading br_netfilter in 27.3.1)
Docker versions prior to 27.3.1 automatically loaded the br_netfilter kernel module at daemon startup. When loaded, br_netfilter causes bridge traffic to pass through the host's iptables FORWARD chain, where DOCKER-ISOLATION-STAGE-1 drops packets from --internal networks destined for external IPs. This silently breaks membrane's transparent proxy; the agent's TCP connections time out with no useful error. When br_netfilter is detected in /proc/modules, insert a rule into DOCKER-USER (-i br-<internal> -j ACCEPT) before session start. DOCKER-USER is evaluated before DOCKER-ISOLATION-STAGE-1, so the ACCEPT prevents the DROP from firing. The rule is scoped to the per-session internal bridge interface and is removed on session teardown. If cleanup fails, the orphaned rule references a nonexistent interface and is harmless; Docker rewrites its chains on restart anyway. The rule uses only -i (inbound interface) without an outbound constraint. The agent's packets are dnat'd by the handler's nftables PREROUTING rule before the forwarding decision, rewriting the destination back into the internal subnet, so adding -o would prevent the rule from ever matching. Ref: moby/moby#48492 (Docker stopped auto-loading br_netfilter in 27.3.1)
- What I did
- How I did it
Kernel module br_netfilter is loaded when the daemon starts with either iptables or ip6tables enabled. That automatically sets:
So, when:
... the change in moby 27.0 to enable ip6tables by default, resulted in
net.bridge.bridge-nf-call-iptablesbeing enabled, where it wasn't before.If the host also had a firewall with default-drop on its forward chain - that resulted in packets getting dropped between containers on a bridge network.
So, only try to load br_netfilter when it's needed - it's only needed to implement
--icc=false, which only works wheniptables/ip6tablesare enabled.- How to verify it
br_netfilterloaded)."iptables":false, and"ip6tables"left as default.docker network create br4.br4can communicate. For example:docker run --rm -d --name c1 --network br4 nginx; docker run --rm -ti --network br4 alpine wget -O- http://c1br_netfilterhas been loaded,rmmod br_netfilteror reboot to get rid of it.br_netfilterhasn't been loaded.sysctl -a | grep net.bridge.bridge-nf-call.docker network create --ipv6 -o com.docker.network.bridge.enable_icc=false iccf.br_netfilterhas been loaded.- Description for the changelog