Skip to content

Add http(s) proxy properties to daemon configuration (carry 42647)#42835

Merged
thaJeztah merged 6 commits intomoby:masterfrom
thaJeztah:proxy_daemon_config_carry2
Oct 28, 2021
Merged

Add http(s) proxy properties to daemon configuration (carry 42647)#42835
thaJeztah merged 6 commits intomoby:masterfrom
thaJeztah:proxy_daemon_config_carry2

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Sep 9, 2021

⚠️ note that the format in daemon.json changed in a follow-up; #43448

The new format uses a "proxies" key that holds the proxies;

{
  "proxies": {
    "http-proxy": "http-config",
    "https-proxy": "https-config",
    "no-proxy": "no-proxy-config"
  }
}

carry of #42647
fixes #24758
closes #42647
addresses #40201

This allows configuring the daemon's proxy server through the daemon.json con-
figuration file or command-line flags configuration file, in addition to the
existing option (through environment variables).

Configuring environment variables on Windows to configure a service is more
complicated than on Linux, and adding alternatives for this to the daemon con-
figuration makes the configuration more transparent and easier to use.

The configuration as set through command-line flags or through the daemon.json
configuration file takes precedence over env-vars in the daemon's environment,
which allows the daemon to use a different proxy. If both command-line flags
and a daemon.json configuration option is set, an error is produced when starting
the daemon.

Note that this configuration is not "live reloadable" due to Golang's use of
sync.Once() for proxy configuration, which means that changing the proxy
configuration requires a restart of the daemon (reload / SIGHUP will not update
the configuration.

With this patch: (⚠️ #43448 changes the location of these fields to be in a "proxy-config" struct within daemon.json)

cat /etc/docker/daemon.json
{
    "http-proxy": "http://proxytest.example.com:80",
    "https-proxy": "https://proxytest.example.com:443"
}

docker pull busybox
Using default tag: latest
Error response from daemon: Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host

docker build .
Sending build context to Docker daemon  89.28MB
Step 1/3 : FROM golang:1.16-alpine AS base
Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host

Integration tests were added to test the behavior:

  • verify that the configuration through all means are used (env-var,
    command-line flags, damon.json), and used in the expected order of
    preference.
  • verify that conflicting options produce an error.
  • verify that logs and error messages sanitise proxy URLs (as they may contain username / password)
make BIND_DIR=. DOCKER_GRAPHDRIVER=vfs TEST_FILTER=TestDaemonProxy binary test-integration
Running integration-test (iteration 1)
Running /go/src/github.com/docker/docker/integration/daemon (amd64.integration.daemon) flags=-test.v -test.timeout=5m  -test.run TestDaemonProxy
=== RUN   TestDaemonProxy
=== RUN   TestDaemonProxy/environment_variables
=== RUN   TestDaemonProxy/command-line_options
=== RUN   TestDaemonProxy/configuration_file
=== RUN   TestDaemonProxy/conflicting_options
=== RUN   TestDaemonProxy/reload_sanitized
--- PASS: TestDaemonProxy (6.75s)
    --- PASS: TestDaemonProxy/environment_variables (1.84s)
    --- PASS: TestDaemonProxy/command-line_options (1.84s)
    --- PASS: TestDaemonProxy/configuration_file (1.93s)
    --- PASS: TestDaemonProxy/conflicting_options (0.52s)
    --- PASS: TestDaemonProxy/reload_sanitized (0.63s)
PASS

DONE 6 tests in 6.942s

- Description for the changelog

- Add options to the `daemon.json` configuration file and `dockerd` command-line
  to configure the daemon's proxy. With these options it is possible to configure
  http(s) proxies for the daemon through the configuration file as an alternative
  to the `HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY` environment variables, or to
  override the system-wide proxy configuration set in those environment variables.

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Copy Markdown
Member Author

@tianon @fredericdalleau @cpuguy83 PTAL

}

func TestDaemonProxy(t *testing.T) {
skip.If(t, runtime.GOOS == "windows", "cannot start multiple daemons on windows")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone knows if we can somehow test this in Windows CI, let me know 😅

Copy link
Copy Markdown
Member Author

@thaJeztah thaJeztah Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we may have to skip this on rootless as well; it's not connecting to the "proxy". Not exactly sure why it would not be supported by rootless though, but perhaps something in how networking is set up there? 🤔

=== RUN   TestDaemonProxy/environment_variables
    daemon_test.go:183: assertion failed:  (received string) != example.org:5000 (string)
    --- FAIL: TestDaemonProxy/environment_variables (0.53s)
time="2021-09-09T12:24:03.244052911Z" level=debug msg="Calling POST /v1.42/images/create?fromImage=example.org%3A5000%2Fsome%2Fimage&tag=latest"
time="2021-09-09T12:24:03.251949489Z" level=debug msg="hostDir: /home/unprivilegeduser/.config/docker/certs.d/example.org:5000"
time="2021-09-09T12:24:03.251995467Z" level=debug msg="Trying to pull example.org:5000/some/image from https://example.org:5000 v2"
time="2021-09-09T12:24:03.252172111Z" level=warning msg="Error getting v2 registry: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
time="2021-09-09T12:24:03.252193892Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
time="2021-09-09T12:24:03.255022400Z" level=error msg="Handler for POST /v1.42/images/create returned error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"

Comment thread cmd/dockerd/config.go Outdated
Comment thread integration/daemon/daemon_test.go Outdated
}

func TestDaemonProxy(t *testing.T) {
skip.If(t, runtime.GOOS == "windows", "cannot start multiple daemons on windows")
Copy link
Copy Markdown
Member Author

@thaJeztah thaJeztah Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we may have to skip this on rootless as well; it's not connecting to the "proxy". Not exactly sure why it would not be supported by rootless though, but perhaps something in how networking is set up there? 🤔

=== RUN   TestDaemonProxy/environment_variables
    daemon_test.go:183: assertion failed:  (received string) != example.org:5000 (string)
    --- FAIL: TestDaemonProxy/environment_variables (0.53s)
time="2021-09-09T12:24:03.244052911Z" level=debug msg="Calling POST /v1.42/images/create?fromImage=example.org%3A5000%2Fsome%2Fimage&tag=latest"
time="2021-09-09T12:24:03.251949489Z" level=debug msg="hostDir: /home/unprivilegeduser/.config/docker/certs.d/example.org:5000"
time="2021-09-09T12:24:03.251995467Z" level=debug msg="Trying to pull example.org:5000/some/image from https://example.org:5000 v2"
time="2021-09-09T12:24:03.252172111Z" level=warning msg="Error getting v2 registry: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
time="2021-09-09T12:24:03.252193892Z" level=info msg="Attempting next endpoint for pull after error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
time="2021-09-09T12:24:03.255022400Z" level=error msg="Handler for POST /v1.42/images/create returned error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from db8edec to cb5cfa0 Compare September 13, 2021 07:55
Comment thread cmd/dockerd/config.go Outdated
Comment thread cmd/dockerd/config.go Outdated
Comment thread cmd/dockerd/daemon.go
Comment on lines +786 to +797
if p := conf.HTTPProxy; p != "" {
_ = os.Setenv("HTTP_PROXY", p)
_ = os.Setenv("http_proxy", p)
}
if p := conf.HTTPSProxy; p != "" {
_ = os.Setenv("HTTPS_PROXY", p)
_ = os.Setenv("https_proxy", p)
}
if p := conf.NoProxy; p != "" {
_ = os.Setenv("NO_PROXY", p)
_ = os.Setenv("no_proxy", p)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably overthinking this, but we likely want the precedence of these to remain familiar to users, right? CLI flag > environment variable > config file? I'm not sure what the best way to accomplish that would be (nor if it's even worth doing so).

(If we were able to resolve that disparity somehow, we could probably remove the getConfigOrEnv function. 😬)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. good point.

I recall we had discussions around that when daemon.json was implemented. Afaik, we currently treat --some-option and configfile.some-option to be at the same level (which is why the daemon produces a "conflicting options" error if both are set). IIRC, the motivation at the time was that --some-option would be usually set in a config-file as well (e.g. a systemd unit), and therefore to be at the same level of priority.

I thought we made them both > env-var, but looks like for, e.g.DOCKER_DRIVER, we make the env-var take precedence, i.e. the following will use overlay2, not vfs;

DOCKER_DRIVER=overlay2 dockerd --debug --storage-driver=vfs

I thought it would make sense to make the config (or --flag) > env-var to allow overriding any default proxy that's set in the environment, but perhaps that was the wrong idea 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ discussing the above in the maintainers meeting, and we'll keep the current implementation (allowboth --flag and daemon.json to override the "system" configuration), but if we find that a proxy env-var is present, we'll print a warning in the daemon logs.

Comment thread daemon/reload.go Outdated
@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch 2 times, most recently from 3bf661f to f948e6e Compare September 14, 2021 13:49
@thaJeztah
Copy link
Copy Markdown
Member Author

Added a warning if we're overriding the "system" value (environment variables);

HTTPS_PROXY="https://user:[email protected]" \
https_proxy="https://user:[email protected]" \
HTTP_PROXY="http://user:[email protected]" \
http_proxy="http://user:[email protected]" \
NO_PROXY="*" \
no_proxy="*" \
dockerd --https-proxy="https://user:[email protected]" --http-proxy="http://user:[email protected]" --no-proxy="hello"

WARN[2021-09-17T14:55:36.554465385Z] overriding existing proxy variable with value from configuration  name=HTTP_PROXY new-value="http://xxxxx:[email protected]" old-value="http://xxxxx:[email protected]"
WARN[2021-09-17T14:55:36.554547154Z] overriding existing proxy variable with value from configuration  name=http_proxy new-value="http://xxxxx:[email protected]" old-value="http://xxxxx:[email protected]"
WARN[2021-09-17T14:55:36.554572109Z] overriding existing proxy variable with value from configuration  name=HTTPS_PROXY new-value="https://xxxxx:[email protected]" old-value="https://xxxxx:[email protected]"
WARN[2021-09-17T14:55:36.554617061Z] overriding existing proxy variable with value from configuration  name=https_proxy new-value="https://xxxxx:[email protected]" old-value="https://xxxxx:[email protected]"
WARN[2021-09-17T14:55:36.554627868Z] overriding existing proxy variable with value from configuration  name=NO_PROXY new-value=hello old-value="*"
WARN[2021-09-17T14:55:36.554683409Z] overriding existing proxy variable with value from configuration  name=no_proxy new-value=hello old-value="*"

@tianon PTAL

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from f948e6e to 828b449 Compare September 20, 2021 13:25
@thaJeztah
Copy link
Copy Markdown
Member Author

Added a warning if we're overriding the "system" value (environment variables);

Hm... looks like I didn't push those changes? 🤔 😂
Anyway; just did now

@thaJeztah
Copy link
Copy Markdown
Member Author

Hmm somehow I broke the "reload" test. Not sure how

Copy link
Copy Markdown
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like it! 👍

@tianon
Copy link
Copy Markdown
Member

tianon commented Sep 20, 2021

The reload code paths shouldn't even hit this code, right? (Otherwise it would/should print a warning at most 😕)

@thaJeztah
Copy link
Copy Markdown
Member Author

The reload code paths shouldn't even hit this code, right? (Otherwise it would/should print a warning at most 😕)

The reload code should print the active configuration that it loaded. Thinking now if it also does so if nothing changed, or only if a "reloadable" configuration was updated 🤔

@thaJeztah
Copy link
Copy Markdown
Member Author

Looks like it's either a race-condition, or something else is happening when reloading the config that causes the test-utility to not read the full content.

Here's a diff between what the test found, and what's in the actual daemon logs for that test daemon (downloaded bundle);

diff --git a/./test_output.log b/./docker.log
index 923aa3d..f39707b 100644
--- a/./test_output.log
+++ b/./docker.log
@@ -213,3 +213,17 @@ time="2021-09-20T13:42:37.511662185Z" level=debug msg="Registering DELETE, /netw
 time="2021-09-20T13:42:37.512786542Z" level=info msg="API listen on /tmp/docker-integration/d76f680f641d9.sock"
 time="2021-09-20T13:42:37.810621126Z" level=debug msg="Calling GET /_ping"
 time="2021-09-20T13:42:37.810994855Z" level=debug msg="Calling GET /info"
+time="2021-09-20T13:42:37.818311878Z" level=info msg="Got signal to reload configuration, reloading from: /etc/docker/daemon.json"
+time="2021-09-20T13:42:37.818446122Z" level=debug msg="Reset Max Concurrent Downloads: 3"
+time="2021-09-20T13:42:37.818465007Z" level=debug msg="Reset Max Concurrent Uploads: 5"
+time="2021-09-20T13:42:37.818474917Z" level=debug msg="Reset Max Download Attempts: 5"
+time="2021-09-20T13:42:37.819029352Z" level=info msg="Reloaded configuration: {\"storage-driver\":\"overlay2\",\"mtu\":1500,\"pidfile\":\"/go/src/github.com/docker/docker/bundles/test-integration/TestDaemonProxy/reload_sanitized/d76f680f641d9/docker.pid\",\"data-root\":\"/go/src/github.com/docker/docker/bundles/test-integration/TestDaemonProxy/reload_sanitized/d76f680f641d9/root\",\"exec-root\":\"/tmp/dxr/d76f680f641d9\",\"group\":\"docker\",\"deprecated-key-path\":\"/etc/docker/key.json\",\"max-concurrent-downloads\":3,\"max-concurrent-uploads\":5,\"max-download-attempts\":5,\"shutdown-timeout\":15,\"debug\":true,\"hosts\":[\"unix:///tmp/docker-integration/d76f680f641d9.sock\"],\"log-level\":\"info\",\"swarm-default-advertise-addr\":\"\",\"swarm-raft-heartbeat-tick\":0,\"swarm-raft-election-tick\":0,\"metrics-addr\":\"\",\"host-gateway-ip\":\"172.18.0.1\",\"log-driver\":\"json-file\",\"ip\":\"0.0.0.0\",\"icc\":true,\"iptables\":true,\"ip-forward\":true,\"ip-masq\":true,\"userland-proxy\":true,\"default-address-pools\":{\"Values\":null},\"network-control-plane-mtu\":1500,\"experimental\":false,\"containerd\":\"/var/run/docker/containerd/containerd.sock\",\"builder\":{\"GC\":{},\"Entitlements\":{}},\"containerd-namespace\":\"d76f680f641d9\",\"containerd-plugin-namespace\":\"d76f680f641d9p\",\"runtimes\":{\"io.containerd.runc.v2\":{\"path\":\"runc\"},\"io.containerd.runtime.v1.linux\":{\"path\":\"runc\"},\"runc\":{\"path\":\"runc\"}},\"default-runtime\":\"runc\",\"seccomp-profile\":\"builtin\",\"default-shm-size\":67108864,\"default-ipc-mode\":\"private\",\"default-cgroupns-mode\":\"private\",\"resolv-conf\":\"/etc/resolv.conf\",\"http-proxy\":\"https://xxxxx:[email protected]\",\"https-proxy\":\"https://xxxxx:[email protected]\",\"no-proxy\":\"example.com\"}"
+time="2021-09-20T13:42:37.819650900Z" level=info msg="Processing signal 'interrupt'"
+time="2021-09-20T13:42:37.819752770Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
+time="2021-09-20T13:42:37.819783138Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
+time="2021-09-20T13:42:37.819821110Z" level=debug msg="found 0 orphan layers"
+time="2021-09-20T13:42:37.820970956Z" level=debug msg="Unix socket /tmp/dxr/d76f680f641d9/libnetwork/86a2f8b505ab.sock doesn't exist. cannot accept client connections"
+time="2021-09-20T13:42:37.821025906Z" level=debug msg="Cleaning up old mountid : start."
+time="2021-09-20T13:42:37.821538602Z" level=debug msg="Cleaning up old mountid : done."
+time="2021-09-20T13:42:37.821950436Z" level=debug msg="Clean shutdown succeeded"
+time="2021-09-20T13:42:37.821968287Z" level=info msg="Daemon shutdown complete"

Comment thread testutil/daemon/daemon.go

// ReadLogFile returns the content of the daemon log file
func (d *Daemon) ReadLogFile() ([]byte, error) {
_ = d.logFile.Sync()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOLO; wondering if it's just not written to disk the moment we try to read it 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks "better", but still flaky

So, looks like test-integration-flaky (perhaps because of the subtests) runs the test 25 times in total:

cd /Users/sebastiaan/Downloads/bundles/1/test-integration-flaky/TestDaemonProxy/reload_sanitized

ls -l
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d03a7aefa903c/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d12825289bf2b/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d35d0bbf7367a/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d47371778e153/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d4c7283bf1c32/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d59a9befb682b/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d5c38961c3cca/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d5c6cf3878a96/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d5f09c9777209/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d6e6a3dc8339f/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d7ef34388452f/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d7fc5ed00fc65/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d83c195f1c19d/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d9ac12b0678f4/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 d9da71ee7bf96/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 db95143c8d488/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dba5677b1543e/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dbe1bc8fb4409/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dc09093c6ca96/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dc3a50433e7c6/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dc7d0850ec28a/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dd0482bb66e57/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 dd4312712c345/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 ddc2b2078d2bb/
drwxr-xr-x@ 3 sebastiaan  staff  96 Sep 21 16:06 def3e032e5255/

Checking if logs for all of those contains the Reloaded configuration string, that looks to be the case:

grep -lr 'Reloaded configuration' . | wc -l
25

But the test failed twice;

grep -r 'does not contain "Reloaded configuration:"' . | wc -l
2

Copy link
Copy Markdown
Member Author

@thaJeztah thaJeztah Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure that was reported was in d9ac12b0678f4, but it also has the string, but I do see it ran into another failure with the libnetwork socket, but that looks to be in the shutdown cycle;

time="2021-09-21T12:55:04.290097255Z" level=info msg="API listen on /tmp/docker-integration/d9ac12b0678f4.sock"
time="2021-09-21T12:55:04.414946112Z" level=debug msg="Calling GET /_ping"
time="2021-09-21T12:55:04.415317278Z" level=debug msg="Calling GET /info"
time="2021-09-21T12:55:04.430754586Z" level=info msg="Got signal to reload configuration, reloading from: /etc/docker/daemon.json"
time="2021-09-21T12:55:04.431178648Z" level=debug msg="Reset Max Concurrent Downloads: 3"
time="2021-09-21T12:55:04.431201851Z" level=debug msg="Reset Max Concurrent Uploads: 5"
time="2021-09-21T12:55:04.431221426Z" level=debug msg="Reset Max Download Attempts: 5"
time="2021-09-21T12:55:04.432309643Z" level=info msg="Reloaded configuration: {\"storage-driver\":\"overlay2\",\"mtu\":1500,\"pidfile\":\"/go/src/github.com/docker/docker/bundles/test-integration-flaky/TestDaemonProxy/reload_sanitized/d9ac12b0678f4/docker.pid\",\"data-root\":\"/go/src/github.com/docker/docker/bundles/test-integration-flaky/TestDaemonProxy/reload_sanitized/d9ac12b0678f4/root\",\"exec-root\":\"/tmp/dxr/d9ac12b0678f4\",\"group\":\"docker\",\"deprecated-key-path\":\"/etc/docker/key.json\",\"max-concurrent-downloads\":3,\"max-concurrent-uploads\":5,\"max-download-attempts\":5,\"shutdown-timeout\":15,\"debug\":true,\"hosts\":[\"unix:///tmp/docker-integration/d9ac12b0678f4.sock\"],\"log-level\":\"info\",\"swarm-default-advertise-addr\":\"\",\"swarm-raft-heartbeat-tick\":0,\"swarm-raft-election-tick\":0,\"metrics-addr\":\"\",\"host-gateway-ip\":\"172.18.0.1\",\"log-driver\":\"json-file\",\"ip\":\"0.0.0.0\",\"icc\":true,\"iptables\":true,\"ip-forward\":true,\"ip-masq\":true,\"userland-proxy\":true,\"default-address-pools\":{\"Values\":null},\"network-control-plane-mtu\":1500,\"experimental\":false,\"containerd\":\"/var/run/docker/containerd/containerd.sock\",\"builder\":{\"GC\":{},\"Entitlements\":{}},\"containerd-namespace\":\"d9ac12b0678f4\",\"containerd-plugin-namespace\":\"d9ac12b0678f4p\",\"runtimes\":{\"io.containerd.runc.v2\":{\"path\":\"runc\"},\"io.containerd.runtime.v1.linux\":{\"path\":\"runc\"},\"runc\":{\"path\":\"runc\"}},\"default-runtime\":\"runc\",\"seccomp-profile\":\"builtin\",\"default-shm-size\":67108864,\"default-ipc-mode\":\"private\",\"default-cgroupns-mode\":\"host\",\"resolv-conf\":\"/etc/resolv.conf\",\"http-proxy\":\"https://xxxxx:[email protected]\",\"https-proxy\":\"https://xxxxx:[email protected]\",\"no-proxy\":\"example.com\"}"
time="2021-09-21T12:55:04.435328719Z" level=info msg="Processing signal 'interrupt'"
time="2021-09-21T12:55:04.435439253Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
time="2021-09-21T12:55:04.435514624Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
time="2021-09-21T12:55:04.435590112Z" level=debug msg="found 0 orphan layers"
time="2021-09-21T12:55:04.437848674Z" level=debug msg="Unix socket /tmp/dxr/d9ac12b0678f4/libnetwork/f7725651065a.sock doesn't exist. cannot accept client connections"
time="2021-09-21T12:55:04.437938435Z" level=debug msg="Cleaning up old mountid : start."
time="2021-09-21T12:55:04.439961024Z" level=debug msg="Cleaning up old mountid : done."
time="2021-09-21T12:55:04.444201003Z" level=debug msg="Clean shutdown succeeded"
time="2021-09-21T12:55:04.444236880Z" level=info msg="Daemon shutdown complete"

The same failure is appearing in all of them, it seems:

grep -lr 'cannot accept client connections' . | wc -l
25

Actually; perhaps even in all tests that shutdown the daemon (running within the bundles top level directory);

grep -lr 'cannot accept client connections' . | wc -l
     419

Copy link
Copy Markdown
Contributor

@fredericdalleau fredericdalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from a24db0f to 016981c Compare September 24, 2021 16:47
Copy link
Copy Markdown
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although with a minor note 👀

Comment thread cmd/dockerd/daemon.go Outdated
@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from 016981c to ccc6f9e Compare September 27, 2021 11:50
Comment thread integration/daemon/daemon_test.go Outdated
Comment on lines +326 to +332
// FIXME: there appears to ba a race condition, which causes ReadLogFile
// to not contain the full logs after signaling the daemon to reload,
// causing the test will fail here. As a workaround, check if we
// received the "reloaded" message after signaling, and only then
// check that it's sanitized properly. For more details on this
// issue, see https://github.com/moby/moby/pull/42835/files#r713120315
if strings.Contains(string(logs), "Reloaded configuration:") {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running out of ideas on where the race condition is, and how to work around it. From CI, it looks like it's passing 23 out of 25 times (roughly 92% passes), so using this workaround to only verify if we actually read the logs with the "Reloaded configuration" message, otherwise skip the check.

If anyone has ideas how to fix the root cause of the race condition; let me know , then we can remove this hack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put in a t.Skip or something? Too bad there's not something that's the opposite of -short like -extended for tests like this. 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, a t.Skip() could make sense; didn't come to mind, because we would normally do that only at the start of a test, but I guess it's ok to do it elsewhere.

I'll update

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from ccc6f9e to 50793bc Compare September 27, 2021 14:06
@thaJeztah
Copy link
Copy Markdown
Member Author

Failures are unrelated (see #42892)

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from 50793bc to 215d461 Compare October 20, 2021 07:59
Copy link
Copy Markdown
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread integration/daemon/daemon_test.go Outdated
Comment on lines +326 to +332
// FIXME: there appears to ba a race condition, which causes ReadLogFile
// to not contain the full logs after signaling the daemon to reload,
// causing the test will fail here. As a workaround, check if we
// received the "reloaded" message after signaling, and only then
// check that it's sanitized properly. For more details on this
// issue, see https://github.com/moby/moby/pull/42835/files#r713120315
if strings.Contains(string(logs), "Reloaded configuration:") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put in a t.Skip or something? Too bad there's not something that's the opposite of -short like -extended for tests like this. 😅

@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from 215d461 to 82efc3e Compare October 22, 2021 08:09
@thaJeztah
Copy link
Copy Markdown
Member Author

Rebased, and updated the test to use t.Skip()

@thaJeztah
Copy link
Copy Markdown
Member Author

Failures are unrelated and will be fixed / addressed by #42960

thaJeztah and others added 6 commits October 27, 2021 12:38
This allows the utility to be used in other places.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
This allows configuring the daemon's proxy server through the daemon.json con-
figuration file or command-line flags configuration file, in addition to the
existing option (through environment variables).

Configuring environment variables on Windows to configure a service is more
complicated than on Linux, and adding alternatives for this to the daemon con-
figuration makes the configuration more transparent and easier to use.

The configuration as set through command-line flags or through the daemon.json
configuration file takes precedence over env-vars in the daemon's environment,
which allows the daemon to use a different proxy. If both command-line flags
and a daemon.json configuration option is set, an error is produced when starting
the daemon.

Note that this configuration is not "live reloadable" due to Golang's use of
`sync.Once()` for proxy configuration, which means that changing the proxy
configuration requires a restart of the daemon (reload / SIGHUP will not update
the configuration.

With this patch:

    cat /etc/docker/daemon.json
    {
        "http-proxy": "http://proxytest.example.com:80",
        "https-proxy": "https://proxytest.example.com:443"
    }

    docker pull busybox
    Using default tag: latest
    Error response from daemon: Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host

    docker build .
    Sending build context to Docker daemon  89.28MB
    Step 1/3 : FROM golang:1.16-alpine AS base
    Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host

Integration tests were added to test the behavior:

- verify that the configuration through all means are used (env-var,
  command-line flags, damon.json), and used in the expected order of
  preference.
- verify that conflicting options produce an error.

Signed-off-by: Anca Iordache <[email protected]>
Signed-off-by: Sebastiaan van Stijn <[email protected]>
The proxy configuration works, but looks like we're unable to connect to the
test proxy server as part of our test;

    level=debug msg="Trying to pull example.org:5000/some/image from https://example.org:5000 v2"
    level=warning msg="Error getting v2 registry: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
    level=info msg="Attempting next endpoint for pull after error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"
    level=error msg="Handler for POST /v1.42/images/create returned error: Get \"https://example.org:5000/v2/\": proxyconnect tcp: dial tcp 127.0.0.1:45999: connect: connection refused"

Signed-off-by: Sebastiaan van Stijn <[email protected]>
The daemon can print the proxy configuration as part of error-messages,
and when reloading the daemon configuration (SIGHUP). Make sure that
the configuration is sanitized before printing.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
For consistency, and to allow easier grepping for all tests
that we skip on windows.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Make sure it's written to disk before we try reading the logs.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah thaJeztah force-pushed the proxy_daemon_config_carry2 branch from 82efc3e to e7583ab Compare October 27, 2021 10:39
@thaJeztah
Copy link
Copy Markdown
Member Author

Looks like we're (fairly consistently) hitting cleanup errors from arm64 and cgroup2. In both cases, Jenkins is complaining about files from TestBuildUserNamespaceValidateCapabilitiesAreV2;

https://ci-next.docker.com/public/job/moby/job/PR-42835/19/execution/node/398/

Also:   java.nio.file.AccessDeniedException: /home/ubuntu/workspace/moby_PR-42835/bundles/test-integration/TestBuildUserNamespaceValidateCapabilitiesAreV2/de3b5140cc048/root/overlay2/9b312dec55e10fe243a5e0422a7f9375c0cc2880d41696ac19fd6d9c191f1f56-init/work/work

https://ci-next.docker.com/public/job/moby/job/PR-42835/19/execution/node/420/

Also:   java.nio.file.AccessDeniedException: /home/ubuntu/workspace/moby_PR-42835/bundles/test-integration/TestBuildUserNamespaceValidateCapabilitiesAreV2/d4b91bdaf46b5/root/165536.165536/overlay2/d6050e82750d178c5151ec1427a44ce8352d9aa6e9843d61483e0f54b4a3125f-init/work/work

Interesting case is that in the arm64 case, the path looks to be a rootless / remapped path (root/165536.165536), whereas in the overlay2 case, the path does not have the remapped user.

I guess that could be explained because the test starts to separate daemons; one rootless, and one not;

dUserRemap := daemon.New(t)
dUserRemap.Start(t, "--userns-remap", "default")
ctx := context.Background()
clientUserRemap := dUserRemap.NewClientT(t)

dNoUserRemap := daemon.New(t)
dNoUserRemap.Start(t)
defer dNoUserRemap.Stop(t)

Wondering if one of those daemons is not succesfully killed, and therefore causing the permission denied

@thaJeztah
Copy link
Copy Markdown
Member Author

Nothing immediately apparent in the logs for that test; there's indeed a file

bundles/test-integration/TestBuildUserNamespaceValidateCapabilitiesAreV2
├── d2fa016d2485d
│   └── docker.log
└── de3b5140cc048
    ├── docker.log
    └── root
        └── containers
            └── 8a0ecd0bda119b4da4b797e27917e4e197e005df08e966c0a1f9ab3eb0dd7466
                └── 8a0ecd0bda119b4da4b797e27917e4e197e005df08e966c0a1f9ab3eb0dd7466-json.log

5 directories, 3 files

non-userns daemon logs (looks to have shutdown, but does show a context canceled related to containerd?)

time="2021-10-27T14:02:20.048470533Z" level=debug msg="begin logs" container=8a0ecd0bda119b4da4b797e27917e4e197e005df08e966c0a1f9ab3eb0dd7466 method="(*Daemon).ContainerLogs" module=daemon
time="2021-10-27T14:02:20.048541160Z" level=debug msg="end logs (<nil>)" container=8a0ecd0bda119b4da4b797e27917e4e197e005df08e966c0a1f9ab3eb0dd7466 method="(*Daemon).ContainerLogs" module=daemon
time="2021-10-27T14:02:20.048726785Z" level=info msg="Processing signal 'interrupt'"
time="2021-10-27T14:02:20.048816313Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
time="2021-10-27T14:02:20.048836922Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
time="2021-10-27T14:02:20.048907749Z" level=debug msg="found 0 orphan layers"
time="2021-10-27T14:02:20.049310916Z" level=debug msg="Unix socket /tmp/dxr/de3b5140cc048/libnetwork/81821a0b9210.sock doesn't exist. cannot accept client connections"
time="2021-10-27T14:02:20.049340267Z" level=debug msg="Cleaning up old mountid : start."
time="2021-10-27T14:02:20.049365776Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=de3b5140cc048
time="2021-10-27T14:02:20.049619940Z" level=debug msg="Cleaning up old mountid : done."
time="2021-10-27T14:02:20.049750375Z" level=debug msg="Clean shutdown succeeded"
time="2021-10-27T14:02:20.049772168Z" level=info msg="Daemon shutdown complete"
time="2021-10-27T14:02:20.049820176Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=de3b5140cc048p

uerns daemon also shut down (but had no containers running);

time="2021-10-27T14:02:17.367669475Z" level=info msg="Processing signal 'interrupt'"
time="2021-10-27T14:02:17.367779297Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
time="2021-10-27T14:02:17.367801677Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
time="2021-10-27T14:02:17.367889754Z" level=debug msg="found 0 orphan layers"
time="2021-10-27T14:02:17.368254229Z" level=debug msg="Unix socket /tmp/dxr/d2fa016d2485d/libnetwork/6141813aca14.sock doesn't exist. cannot accept client connections"
time="2021-10-27T14:02:17.368324828Z" level=debug msg="Cleaning up old mountid : start."
time="2021-10-27T14:02:17.368509860Z" level=debug msg="Cleaning up old mountid : done."
time="2021-10-27T14:02:17.368618217Z" level=debug msg="Clean shutdown succeeded"
time="2021-10-27T14:02:17.368629737Z" level=info msg="Daemon shutdown complete"

Full logs for both:

docker.log
docker.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Enable HTTP_PROXY Configuration as part of deamon json options

4 participants