cmd/docker-proxy: re-add SO_REUSEADDR#48567
Conversation
robmry
left a comment
There was a problem hiding this comment.
LGTM - just some ignorable suggestions ...
| container.WithName(sanitizeCtrName(t.Name()+"-server")), | ||
| container.WithExposedPorts("80/tcp"), | ||
| container.WithPortMap(nat.PortMap{"80/tcp": {{HostPort: "1780"}}}), | ||
| container.WithCmd("httpd", "-f"), | ||
| container.WithNetworkMode("nat-time-wait")) |
There was a problem hiding this comment.
Could make a []func(*TestContainerConfig) of these options and use it in both Run calls - just to make it more obvious that the networks are the same, and in case something needs to change one day?
d764f48 to
d2aa364
Compare
729769a to
2badf35
Compare
|
This one ready for review? |
|
@thaJeztah Yes! I was waiting for the CI to be green. |
| assert.NilError(t, c.ContainerRemove(ctx, serverID, containertypes.RemoveOptions{Force: true})) | ||
|
|
||
| serverID = container.Run(ctx, t, c, ctrOpts...) | ||
| defer c.ContainerRemove(ctx, serverID, containertypes.RemoveOptions{Force: true}) |
There was a problem hiding this comment.
This one doesn't have to be a defer;
| defer c.ContainerRemove(ctx, serverID, containertypes.RemoveOptions{Force: true}) | |
| assert.NilError(t, c.ContainerRemove(ctx, serverID, containertypes.RemoveOptions{Force: true})) |
There was a problem hiding this comment.
Right, actually the call to ContainerRemove doesn't even need to be made here. At this point, the 1st defer will take care of deleting that container (if the container name is passed).
| httpClient := &http.Client{Timeout: 3 * time.Second} | ||
| resp, err := httpClient.Get("http://127.0.0.1:1780") | ||
| assert.NilError(t, err) | ||
| assert.Check(t, is.Equal(resp.StatusCode, 404)) |
There was a problem hiding this comment.
I was actually curious if this would pass if no container was created, but looks like that gives a dial tcp 127.0.0.1:1780: connect: connection refused
Regardless; wondering if we should have something that's more clearly "it works", because I had trouble understanding the test here, and thought 404 was meant to indicate "should not return a result".
I guess we could either add some file to serve, or use some file that's expected to be present (e.g.) http://127.0.0.1:1780/etc/resolv.conf (but maybe that's confusing).
There was a problem hiding this comment.
I guess we could either add some file to serve, or use some file that's expected to be present (e.g.) http://127.0.0.1:1780/etc/resolv.conf (but maybe that's confusing).
There are other tests doing exactly that in this suite. Generally speaking, we don't care at all about what's happening in L7. It's just an easy way to realistically test L4 (full TCP connection lifecycle, exchange a message back-and-forth, etc...).
I don't think we need to make things more complicated, like serving a real file, etc... That wouldn't add any value. But I added an inline comment stating that we don't care about the 404 response itself, and that we're just looking for a full-duplex TCP connection that we know is working correctly.
There was a problem hiding this comment.
Yeah, I'm fine with keeping the 404 for now. Perhaps we could find some other minimal approach. My main train of thought here was that usually a 404 means "something went wrong!" which could be confusing (at least it initially confused me).
There was a problem hiding this comment.
nit: Actually wondering now if the response needs to be checked at all for this; i.e., the err returned will be non-nil when failing to connect.
There was a problem hiding this comment.
Right, we don't need to check the status code. I'll open a follow-up PR to fix the few tests that use this pattern.
Since commit b3fabed, the Engine creates the listening sockets used by docker-proxy by making raw syscalls (ie. socket, setsockopt, bind). Before that commit, those sockets were created by docker-proxy through Go's `net.ListenX` functions. Unlike `net.ListenX` functions, the raw syscall code doesn't set the `SO_REUSEADDR` option. This option is typically used by TCP servers to make sure that they can be restarted even if there are client sockets referencing the server port as their sport (eg. in TIME_WAIT state, or any other state). Citing UNIX Network Programming, Section 7.5 (p210): > By default, when the listening server is restarted by calling socket, > bind, and listen, the call to bind fails because the listening server > is trying to bind a port that is part of an existing connection [...] > _All_ TCP servers should specify this socket option to allow the > server to be restarted in this situation. Signed-off-by: Albin Kerouanton <[email protected]>
2badf35 to
77e5165
Compare
Related to:
- What I did
Since commit b3fabed, the Engine creates the listening sockets used by docker-proxy by making raw syscalls (ie. socket, setsockopt, bind). Before that commit, those sockets were created by docker-proxy through Go's
net.ListenXfunctions.Unlike
net.ListenXfunctions, the raw syscall code doesn't set theSO_REUSEADDRoption. This option is typically used by TCP servers to make sure that they can be restarted even if there are client sockets referencing that sport (eg. in TIME_WAIT state, or any other state).Citing UNIX Network Programming, Section 7.5 (p210):
- How to verify it
The new integration test
TestRestartUserlandProxyUnder2MSLshould be enough.- Description for the changelog
- Fix a bug that was preventing containers exposing a TCP port on the host to be restarted if it was accessed by another container (or from the host) shortly before.