libnet/d/bridge: port mappings: filter by input iface#48721
libnet/d/bridge: port mappings: filter by input iface#48721akerouanton merged 1 commit intomoby:masterfrom
Conversation
52fa34f to
1a4e9ef
Compare
1a4e9ef to
519ef77
Compare
robmry
left a comment
There was a problem hiding this comment.
As discussed - maybe worth clarifying in the commit message that the attacker needs to share an L2 segment / subnet with one of the docker host's interfaces (and it's the attacked interface that's on a different segment).
| stdOut := strings.TrimSpace(actualStdout.String()) | ||
| assert.Assert(t, !strings.Contains(stdOut, "foobar")) | ||
| } | ||
|
|
There was a problem hiding this comment.
It'd probably be worth adding a test that's expected to succeed? (By sending to the "eth-test" addresses I think?)
That'd help to confirm the packet's not getting through in the other tests because of the new rule - rather than the container log read being too quick, or some other test problem?
There was a problem hiding this comment.
Maybe also run the tests twice, with the escape hatch env-var set and unset?
There was a problem hiding this comment.
The tests now run with and without the escape hatch, and all test cases also run the 'happy case' to make sure the test is working as expected.
|
|
||
| // WithIPv6 Enables IPv6 on the network | ||
| func WithIPv6() func(*network.CreateOptions) { | ||
| func WithIPv6(enable bool) func(*network.CreateOptions) { |
There was a problem hiding this comment.
I'm slightly on the fence on adding the argument (but could use input from others); perhaps a WithDisableIPv6() could make it easier to find places where we disable, and we could keep the WithIPv6 as-is.
Also curious what is the default if not set? (we don't have an option to reset, correct?)
There was a problem hiding this comment.
I added WithDisableIPv6() and removed the commit that was changing WithIPv6() signature.
519ef77 to
402268c
Compare
402268c to
50c391b
Compare
robmry
left a comment
There was a problem hiding this comment.
LGTM - except the new iptables doc needs to be manually added to integration/network/bridge/iptablesdoc/index.md.
| if v, _ := strconv.ParseBool(os.Getenv("DOCKER_DISABLE_INPUT_IFACE_FILTERING")); v { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Some questions about this env-var;
- Do we consider this one to be permanent or temporary ?
- Should we have a log entry to indicate this option was set?
- If we consider it temporary, have a warning added to
docker info? (check if it's set, and if so, add a warning to the list of warnings returned)?
There was a problem hiding this comment.
Do we consider this one to be permanent or temporary ?
That's a temporary escape hatch in case the new filtering does break someone, somewhere.
Should we have a log entry to indicate this option was set?
I added an info log to that code path. Now, whenever users create a container with NAT-based port-mappings, they'll get this logged:
DOCKER_DISABLE_INPUT_IFACE_FILTERING is set, skipping input iface filtering
If we consider it temporary, have a warning added to docker info ? (check if it's set, and if so, add a warning to the list of warnings returned)?
I added these two warnings to make it clear that it's insecure, and that it'll be removed at a later time.
WARNING: input interface filtering is disabled on port mappings, this might be insecure
DEPRECATED: DOCKER_DISABLE_INPUT_IFACE_FILTERING is deprecated and will be removed in a future release
50c391b to
2526269
Compare
| // DOCKER_DISABLE_INPUT_IFACE_FILTERING can be used as an escape hatch if | ||
| // this filtering doesn't work out well for some users. | ||
| if v, _ := strconv.ParseBool(os.Getenv("DOCKER_DISABLE_INPUT_IFACE_FILTERING")); v { | ||
| log.G(context.TODO()).Info("DOCKER_DISABLE_INPUT_IFACE_FILTERING is set, skipping input iface filtering.") |
There was a problem hiding this comment.
Thanks! I'd even be ok with Warn for this if we consider it "bad", but Info works for me.
| libprotobuf-c1 \ | ||
| libyajl2 \ | ||
| net-tools \ | ||
| netcat-openbsd \ |
There was a problem hiding this comment.
Is this only for our tests, or is this a new dependency we should have installed (and to be added as "requires" in our packages?)
I can't find a direct reference to this in this PR; was it accidentally left behind (for local debugging?)
There was a problem hiding this comment.
That's only for testing. Here's where it's used: https://github.com/moby/moby/pull/48721/files#diff-4fd76a7e4270f16a03d5160517ee2454558f127af10e95c55451963b17429414R817
| // not reach the container (ie. the container returns an ICMP | ||
| // 'Port Unreachable' message). | ||
| time.Sleep(50 * time.Millisecond) | ||
| icmd.RunCommand("/bin/sh", "-c", fmt.Sprintf("echo foobar | nc -w1 -u %s %s", hostAddr, hostPort)).Assert(t, icmd.Success) |
There was a problem hiding this comment.
OK 🙈 I went looking for netcat not nc; @akerouanton pointed me it's used here, and only needed for testing, so no packaging changes required.
| // TODO(aker): we need to figure out what we want to do for rootlesskit. | ||
| skip.If(t, testEnv.IsRootless, "rootlesskit has its own netns") |
There was a problem hiding this comment.
@AkihiroSuda I'm wondering what we should do for rootless mode.
On one hand, accepting packets with a given dest addr from any interface is just how Linux sockets work. So this issue also exists with rootless mode I think.
OTOH, this proved to be surprising for many users. The biggest issue is for loopback addresses, ie. in non-rootless mode we'd happily forward packets with a dest addr in 127.0.0.0/8 and coming from any interface. Rootless mode shouldn't be susceptible to this security issue (because the iptables rules that cause this issue don't live in the host netns).
I'm wondering if rootless / non-rootless modes should diverge from each other (one protecting, and the other not protecting against that issue). Or if we should diverge from Linux sockets' semantics in both cases. WDYT?
There was a problem hiding this comment.
Could you explain what is the attack scenario?
Rootless mode has its own 127.0.0.0/8 (until #47103 is merged) that is isolated from the host-side 127.0.0.0/8
There was a problem hiding this comment.
There are two different attack scenarios.
First one involves ports bound to one of the loopback address. In that case, the attacker could send packets with the loopback address set as the daddr, and the target Docker host would accept them. That's not a problem for rootless mode because the port mapping is proxied by rootlesskit, and thus packets are correctly filtered by the kernel by default.
The second attack scenario involves ports mapped to a specific subnet, and an attacker part of another subnet. In that case, the attacker could send packets destined to the IP address where ports are mapped, without being part of that subnet, and the victim Docker host would accept them too. Whether it's a security issue is debatable because that's just how networking works on Linux. However, it can be fixed relatively easily in rootless mode through an iptables rule, but that's harder for a rootful Engine as you need to insert rule(s) in chains managed by Docker -- so we considered this as a vulnerability too.
But now with the current fix, that means rootful docker will be protected against both attack scenarios, and rootless mode will be affected by one.
So either rootful mode diverges from default kernel behavior, introducing a discrepancy between rootless and rootful mode, but keeping users at bay.
Or we could consider the 2nd scenario as not being a security issue, but then users are on their own to prevent such access patterns - and that means manually setting up iptables rules, ensuring these rules are always correctly positioned in the chains, etc...
2526269 to
0b39171
Compare
| // For instance, if HostIP == 127.0.0.1, no ingress should come from anything | ||
| // but lo. | ||
| func (n *bridgeNetwork) filterPortByInputIface(b portBinding, enable bool) error { | ||
| hostIP := b.childHostIP |
There was a problem hiding this comment.
nit (no need to change unless other changes are needed); this could've been put closer to where it's used (also because it may not be used at all due to early returns)
| d := newDriver() | ||
| d.nlh, err = nlwrap.NewHandle() |
There was a problem hiding this comment.
qNot for this PR, but I noticed the weird newDriver() / driver.configure() design, which is a bit .. odd, as it looks like driver.configure() is the actual constructor, and newDriver() is mostly a useless thing; (almost?) all locations where it's used, look like;
d := newDriver()
if err := d.configure(options); err != nil {
....
}We should consider making newDriver accept the options (if my interpretation of the flow is correct); looks like this was the commit introducing some of it; d565a4d, although it was there in some form before that.
There was a problem hiding this comment.
This was addressed in #49158, but I now realize that d.nlh should probably be initialized in newDriver too. I'll open another PR to fix that.
There was a problem hiding this comment.
oh; this is NewHandle() not NlHandle(), so not same I guess
0b39171 to
e6af2fc
Compare
e6af2fc to
08b696f
Compare
When a NAT-based port mapping is created with a HostIP specified, we insert a DNAT rule in nat-DOCKER to replace the dest addr with the container IP. Then, in filter chains, we allow access to the container port for any packet not coming from the container's network itself (if hairpinning is disabled), nor from another host bridge. However we don't set any rule that prevents a rogue neighbor that shares a L2 segment with the host, but not the one where the port binding is expected to be published, from sending packets destined to that HostIP. For instance, if a port binding is created with HostIP == '127.0.0.1', this port should not be accessible from anything but the lo interface. That's currently not the case and this provides a false sense of security. Since nat-DOCKER mangles the dest addr, and the nat table rejects DROP rules, this change adds rules into raw-PREROUTING to filter ingress packets destined to mapped ports based on the input interface, the dest addr and the dest port. Interfaces are dynamically resolved when packets hit the host, thanks to iptables' addrtype extension. This extension does a fib lookup of the dest addr and checks that it's associated with the interface reached. Also, when a proxy-based port mapping is created, as is the case when an IPv6 HostIP is specified but the container is only IPv4-capable, we don't set any sort of filtering. So the same issue might happen. The reason is a bit different - in that case, that's just how the kernel works. But, in order to stay consistent with NAT-based mappings, these rules are also applied. The env var `DOCKER_DISABLE_INPUT_IFACE_FILTERING` can be set to any true-ish value to globally disable this behavior. Signed-off-by: Albin Kerouanton <[email protected]>
08b696f to
433b1f9
Compare
|
Last force-pushes were only rebase, so let me merge it. |
|
This is excellent. Thanks @akerouanton and everyone else who worked on this. |
Fixes Publishing ports explicitly to private networks should not be accessible from LAN hosts #45610- What I did
When a NAT-based port mapping is created with a HostIP specified, we insert a DNAT rule in nat DOCKER to replace the dest addr with the container IP. Then, in filter chains, we allow access to the container port for any packet not coming from the container's network itself (if hairpinning is disabled), nor from another host bridge.
However we don't set any rule that prevent a rogue neighbor located in another L2 segment / subnet from sending packets destined to that HostIP.
For instance, if a port binding is created with HostIP == '127.0.0.1', this port should not be accessible from anything but the lo interface. That's currently not the case and this provides a false sense of security.
Since nat-DOCKER mangles the dest addr, and the nat table rejects DROP rules, this change adds rules into raw-PREROUTING to filter ingress packets destined to mapped ports based on the input interface, the dest addr and the dest port.
Interfaces are dynamically resolved when packets hit the host, thanks to iptables' addrtype extension. This extension does a fib lookup of the dest addr and checks that it's associated with the interface reached.
Also, when a proxy-based port mapping is created, as is the case when an IPv6 HostIP is specified but the container is only IPv4-capable, we don't set any sort of filtering. So the same issue might happen. The reason is a bit different - in that case, that's just how the kernel works. But, in order to stay consistent with NAT-based mappings, these rules are also applied.
The env var
DOCKER_DISABLE_INPUT_IFACE_FILTERINGcan be set to any true-ish value to globally disable this behavior.- How to verify it
A new regression / integration test
TestAccessPublishedPortFromNonMatchingIfacechecks that remote access is correctly disallowed for ports published on loopback and routable addresses.- Description for the changelog