Do not DNAT packets from WSL2's loopback0#48075
Conversation
| func mirroredWSL2() bool { | ||
| if _, err := netlink.LinkByName("loopback0"); err != nil { | ||
| if !errors.As(err, &netlink.LinkNotFoundError{}) { | ||
| log.G(context.TODO()).Warnf("Failed to check for WSL interface: %v", err) |
There was a problem hiding this comment.
Minor nit; can you use WithError() or perhaps even WithFields() in case we want to log the name of the interface we're looking for?
| log.G(context.TODO()).Warnf("Failed to check for WSL interface: %v", err) | |
| log.G(context.TODO()).WithError(err).Warn("Failed to check for WSL interface") |
| log.G(context.TODO()).Warnf("Failed to check for WSL interface: %v", err) | |
| log.G(context.TODO()).WithFields(log.Fields{"error": err, "interface": "loopback0"}).Warn("Failed to check for WSL interface") |
There was a problem hiding this comment.
Oops, yes - I've split out the err, thank you.
I didn't feel the need to report the loopback0 interface name ... it might be in the netlink message anyway, but the idea was just to log a failed netlink call (having ignored LinkNotFoundError).
| return false | ||
| } | ||
| output, err := exec.Command(wslinfoPath, "--networking-mode", "-n").CombinedOutput() | ||
| log.G(context.TODO()).Debugf("wslinfo --networking-mode:%s err:%v", string(output), err) |
There was a problem hiding this comment.
Same here, at least for the error, but perhaps output could also be a field.
| ipv: iptables.IPv4, | ||
| table: iptables.Nat, | ||
| chain: DockerChain, | ||
| args: []string{"-i", "loopback0", "-d", "127.0.0.0/8", "-j", "RETURN"}, |
There was a problem hiding this comment.
As (even more with my suggestions), the magic loopback0 interface will be used more now, I'm wondering it would make sense to define a const for it as well; at least it would allow documenting "what's this magic loopback0?
(downside is that it's less grep'able for loopback0 in the code, although the const can still be a good starting point I guess)
So, yeah, not a "strong" opinion one way of another 😅
There was a problem hiding this comment.
I tried it with a const name ... but I don't think it made things any clearer, so put it back.
The gigantic comment just above explains what it is.
| func mirroredWSL2Workaround(ipv iptables.IPVersion) error { | ||
| // WSL2 does not (currently) support Windows<->Linux communication via ::1. |
There was a problem hiding this comment.
I notice that we don't check if wslinfoPath (/usr/bin/wslinfo) exists, which (I think) is the way to detect if we're actually running on WSL2?
- Should we have a
detectWSL2()utility to verify if we're running on WSL2, as this workaround is not for other situations - Perhaps that check should be a
sync.OnceorsyncOnceValueif we only need to check if once (but may not be needed if we only run this code once 😂)
For testing, maybe we need something to override the detection 🤔 (not sure?)
There was a problem hiding this comment.
I notice that we don't check if
wslinfoPath(/usr/bin/wslinfo) exists
There's no need to: the exec.Command call would fail with ENOENT if the path does not exist.
There was a problem hiding this comment.
Indeed - but it's gone now anyway.
corhere
left a comment
There was a problem hiding this comment.
Running /usr/bin/wslinfo as root is not ideal, especially on non-WSL environments. Some ideas:
- Run wslinfo as an unprivileged user. But which user?
- Sandbox the wslinfo process in a container
- More sanity checks to limit the exposure to non-WSL users:
- Test that
/run/WSLexists and is a directory - Test that both
/usr/binand/usr/bin/wslinfoare owned by root and not world-writable?
- Test that
| // mirroredWSL2 returns true if the host Linux appears to be running under | ||
| // Windows WSL2 with mirrored mode networking. If a loopback0 device exists, it | ||
| // checks that /usr/bin/wslinfo is executable and reports mirrored networking. | ||
| func mirroredWSL2() bool { |
There was a problem hiding this comment.
Should we use sync.Once and cache the answer? Can the networking mode change dynamically?
There was a problem hiding this comment.
This only happens once, when the DOCKER chain etc are set up.
As far as I can tell, the mode can't change dynamically ... I guess it's a bit of a major change under Linux's feet. New config seems to take effect on a guest-Linux restart.
There was a problem hiding this comment.
FWIW: WSL cannot change network modes dynamically - it's set when we start the VM. (as it has impacts beyond just the Linux configuration, but also the host vswitch and tcpip stack configuration).
Yes - I think we could just use the presence of loopback0 and wslinfo to infer that it's wsl2 and mirrored networking. If we're wrong, which seems unlikely (unless it's deliberate) ... we create the rule to skip DNAT for packets from that interface addressed to 127.0.0.0/8 - there won't be any but, even if there were, they'd only be DNAT'd if they were for a mapped port ... in which case they don't want DNAT either. (But I've asked the Microsoft folks if there's an API equivalent anyway.) |
That's what I'm worried about: some attacker tricking dockerd into executing attacker-controlled code as root. |
Yes, agreed (that's why I raised the issue on our call). I'm suggesting we don't run it ... just see if it and loopback0 are there. There aren't really any consequences if we infer wrongly. |
|
Let me post the $ strace -ff wslinfo --networking-mode
execve("/usr/bin/wslinfo", ["wslinfo", "--networking-mode"], 0x7fff40ffd070 /* 36 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x40cc60) = 0
set_tid_address(0x40cc00) = 6710
gettid() = 6710
gettid() = 6710
gettid() = 6710
gettid() = 6710
brk(NULL) = 0x1b8e000
brk(0x1b90000) = 0x1b90000
mmap(0x1b8e000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1b8e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a11000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a10000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a0a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a09000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a07000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f62e9a05000
sched_getaffinity(0, 128, [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]) = 32
getpid() = 6710
getpid() = 6710
uname({sysname="Linux", nodename="catra", ...}) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/WSL/1_interop"}, 110) = 0
write(3, "\32\0\0\0\10\0\0\0", 8) = 8
poll([{fd=3, events=POLLIN}], 1, 10000) = 1 ([{fd=3, revents=POLLIN|POLLHUP}])
read(3, "\1\0\0\0", 4) = 4
close(3) = 0
ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="nat", iov_len=3}, {iov_base="\n", iov_len=1}], 2nat
) = 4
munmap(0x7f62e9a05000, 8192) = 0
munmap(0x7f62e9a07000, 8192) = 0
munmap(0x7f62e9a09000, 4096) = 0
munmap(0x7f62e9a0b000, 4096) = 0
munmap(0x7f62e9a0c000, 4096) = 0
munmap(0x7f62e9a0d000, 4096) = 0
munmap(0x7f62e9a0e000, 4096) = 0
munmap(0x7f62e9a0f000, 4096) = 0
munmap(0x7f62e9a10000, 4096) = 0
munmap(0x7f62e9a11000, 4096) = 0
exit_group(0) = ?
+++ exited with 0 +++Only reference I found to that on GitHub was this; https://github.com/nullpo-head/wsl-distrod/blob/c3181f4e49566a1d92988b71487716f0dceddd1e/distrod/distrod/tests/test_runner.sh#L133-L134 |
37e8f55 to
1cd6cb3
Compare
| _, _, _, _, err := setupIPChains(configuration{EnableIPTables: true}, iptables.IPv4) | ||
| assert.NilError(t, err) | ||
| assert.Check(t, mirroredWSL2Rule().Exists() == tc.expLoopback0Rule, | ||
| fmt.Sprintf("Expected exists=%v", tc.expLoopback0Rule)) |
There was a problem hiding this comment.
Everyone forgets that assert.Check takes a format string and arguments.
| fmt.Sprintf("Expected exists=%v", tc.expLoopback0Rule)) | |
| "Expected exists=%v", tc.expLoopback0Rule) |
There was a problem hiding this comment.
Ah! I didn't spot it this time; good catch.
Using is.Equal(mirroredWSL2Rule().Exists(), tc.expLoopback0Rule) will also do all of that; it prints the variable names in failure, so if those are descriptive, it will be printing something like;
foo_test.go:35: assertion failed: false (bool) != true (tc.expLoopback0Rule bool)
There was a problem hiding this comment.
Swapped for is.Equal.
| if stat, err := os.Stat(wslinfoPath); err == nil { | ||
| return stat.Mode().IsRegular() && (stat.Mode().Perm()&0111) != 0 | ||
| } | ||
| return false |
There was a problem hiding this comment.
Nit: the hit to readability doesn't seem worth saving one line vertically here. https://go.dev/wiki/CodeReviewComments#indent-error-flow
| if stat, err := os.Stat(wslinfoPath); err == nil { | |
| return stat.Mode().IsRegular() && (stat.Mode().Perm()&0111) != 0 | |
| } | |
| return false | |
| stat, err := os.Stat(wslinfoPath) | |
| if err != nil { | |
| return false | |
| } | |
| return stat.Mode().IsRegular() && (stat.Mode().Perm()&0111) != 0 |
| // Avoid overwriting a real "/usr/bin/wslinfo" (and clashing with a real | ||
| // loopback0). | ||
| if _, err := os.Stat(wslinfoPath); err == nil { | ||
| t.Skip("Skipping test because " + wslinfoPath + " exists") | ||
| } |
There was a problem hiding this comment.
Sounds like a job for namespaces! We don't have to worry about clashing with a real loopback0 because the test cases are invoked in an unshared network namespace thanks to netnsutils.SetupTestOSContext(). While we don't have a ready-made test utility for unsharing a mount namespace and setting things up such that parts of the host filesystem can be mutated such that the mutations are only visible to the test, we do have all the building blocks. The general idea is to over-mount a path with an overlayfs with the same path as the lowerdir, and a tmpfs path as the upper:
root@56e9443a7364:~# mkdir /tmp/upper /tmp/work
root@56e9443a7364:~# unshare --mount-proc --propagation slave bash
root@56e9443a7364:~# mount -t overlay -olowerdir=/usr/bin,upperdir=/tmp/upper,workdir=/tmp/work overlay /usr/bin
root@56e9443a7364:~# rm /usr/bin/ls
root@56e9443a7364:~# ls -ld
bash: ls: command not found
root@56e9443a7364:~# exit
exit
root@56e9443a7364:~# ls -ld
drwx------ 1 root root 4096 Jun 28 23:05 ....it may be too much to tack onto this PR, though. Save for a follow-up?
There was a problem hiding this comment.
I forgot to delete that check when I changed wslinfoPath from const to var so that the test could modify it to avoid the clash that way instead. It's gone now.
1cd6cb3 to
708b1e6
Compare
|
This PR fixes the issue listed in microsoft/WSL#10494 for the loopback interface only, sadly. Running The |
Hi @dannyhpy - this seems to be unrelated to this I took a look, and could repro ... Packets arrive on the WSL2 guest's That's because the dest MAC address in response packets belongs to the docker network's bridge (the container's default gateway). The WSL2 guest has that IP address itself, on its own I don't have an idea for a workaround right now. Maybe somehow forcing use of |
|
Hi @robmry, Thank you for your very detailled explanations. I originally thought this issue was similar to the loopback0 issue and wrongly assumed it could be solved using the same workaround. My apologies for my off-topic comment.
I'm afraid I won't be as capable as you are to properly describe the issue. |
|
Oh, no problem - thank you for raising the issue ... I've created #48136. |
|
Hope this pull request can be merged promptly, as I have been troubled by this issue for quite some time. |
|
will mirroredWSL2Rule() still be called if userland-proxy=false ? |
Hi @shigenobuokamoto - I think we could do that ... how will it help? Could you describe the use-case? |
|
@robmry thank you so much. WSL 2.3.11 adds some improvements to communicating with Windows hosts. |
|
Ah, excellent - thank you! |
708b1e6 to
9600e1a
Compare
I've added the exception for running without userland-proxy. In testing it, I found with the latest WSL2 pre-release (2.3.17 but, presumably, anything >2.3.11) no workaround in docker, and no extra |
|
@robmry, thank you for dealing with this matter. |
| desc string | ||
| loopback0 bool | ||
| userlandProxy bool | ||
| wslinfoExists bool |
There was a problem hiding this comment.
| wslinfoExists bool |
When running WSL2 with mirrored mode networking, add an iptables rule to skip DNAT for packets arriving on interface loopback0 that are addressed to a localhost address - they're from the Windows host. Signed-off-by: Rob Murray <[email protected]>
9600e1a to
f9c0103
Compare
keith-horton
left a comment
There was a problem hiding this comment.
I'm not 100% fluent in GO - but this looks good to me! (from the wsl side)
- What I did
When running WSL2 with mirrored mode networking, add an iptables rule to skip DNAT for packets arriving on interface loopback0 that are addressed to a localhost address - they're from the Windows host.
networkingMode=mirroredmakes Docker unable to forward ports microsoft/WSL#10494WSL2's mirrored mode networking is outlined here.
- How I did it
Detect WSL2 mirrored mode by the presence of interface
loopback0, and (inspired by this workaround linked from the WSL ticket)/usr/bin/wslinfo --networking-modereportingmirrored, see wslinfo release note.If needed, create a rule in the nat-DOCKER chain to return early for packets arriving on
loopback0for127.0.0.0/8.There's no IPv6 rule, because WSL2 mirrored mode doesn't support it.
- How to verify it
As described on the ticket, with docker-ce installed in an instance of Linux (Ubuntu) running under WSL2 with
networkingMode=mirrored- run an nginx container with-p 8080:80, check that the Windows host can connect to it viahttp://localhost:8080.Also checked that the new iptables rule is not created unless it's needed.
Access from Linux to a service running on the Windows localhost address worked before and after this change.
(
--userland-proxy=true, the default, is required for this to work.)New unit test, just to check the conditions for adding the rule.
- Description for the changelog