fix(docker): suppress Go HTTP/2 auto-config breaking plain-tcp:// hijack (#668)#681
Conversation
…hijack Closes #668. With DOCKER_HOST=tcp://socket-proxy:2375 (plain HTTP, no TLS) the regular HTTP path works but ContainerExecAttach (the hijack path used by run_exec, ContainerAttach, ContainerLogs --follow) fails with tls: first record does not look like a TLS handshake Root cause: Go's net/http lazily auto-configures HTTP/2 on the first request and ALLOCATES Transport.TLSClientConfig in place (to seed NextProtos=[h2 http/1.1] for ALPN). The Docker SDK's hijack dialer reads baseTransport.TLSClientConfig as its "TLS is required" signal. After the first plain-HTTP request lands the SDK misreads the auto-installed config as operator-configured TLS and calls tls.Dial against the plain daemon. Reproducer (drives ContainerExecAttach against a plain-HTTP TCP listener that issues 101 Switching Protocols) fails with the exact user error pre- fix and passes post-fix. The existing client_tcp_upgrade suite only asserts DaemonHost() strings — it never exercises the hijack dialer. Fix: set transport.TLSNextProto to a non-nil empty map on the non-TLS apply paths (applyUnixTransport, applyTCPTransport, applyPlainTransport). That is the documented stdlib contract for "do not auto-enable HTTP/2", which prevents the in-place TLSClientConfig mutation. TLS Docker hosts (https://, tcp+tls://) keep TLSNextProto nil so ALPN h2 still works. Note: http:// hijack has a separate, pre-existing failure mode (SDK falls to net.Dial("http", addr) which Go rejects with "unknown network http"). That requires a TCP DialContext on the http:// transport and is filed as a follow-up enhancement. Signed-off-by: Sebastian Mendel <[email protected]>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
There was a problem hiding this comment.
Pull request overview
Fixes a TLS handshake error on the hijack path (ContainerExecAttach, used by run_exec) when DOCKER_HOST is a plain tcp:// URL. Root cause: Go's net/http lazily auto-configures HTTP/2 on first request, allocating Transport.TLSClientConfig in place; the Docker SDK's hijack dialer then misreads that as "TLS required" and attempts tls.Dial against a plain daemon.
Changes:
- Adds
disableHTTP2AutoConfighelper that setstransport.TLSNextPrototo a non-nil empty map (the documented stdlib opt-out for HTTP/2 auto-config). - Wires the helper into the three non-TLS apply paths:
applyUnixTransport,applyTCPTransport,applyPlainTransport(TLS paths keepTLSNextProtonil to allow ALPN h2). - Adds a regression test that primes the transport via
Pingand then exercisesContainerExecAttachagainst a fake plain-HTTP Docker proxy with a101 Switching Protocolshijack.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| core/adapters/docker/client.go | Introduces disableHTTP2AutoConfig and applies it to non-TLS transports (unix/tcp/plain). |
| core/adapters/docker/client_tcp_hijack_test.go | New regression test driving the hijack path against a fake plain-HTTP TCP Docker proxy. |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #681 +/- ##
=======================================
Coverage 88.01% 88.02%
=======================================
Files 89 89
Lines 11280 11286 +6
=======================================
+ Hits 9928 9934 +6
Misses 1103 1103
Partials 249 249
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Tightens PR #681 based on the multi-axis review fan-out: - Add unit test on disableHTTP2AutoConfig itself (transport-level invariant) so a future stdlib refactor cannot silently re-introduce the bug while the end-to-end SDK test still passes by chance. - Add TLS-path negative control (TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig) that guards against accidentally extending the suppression to the TLS apply path, which would silently disable ALPN h2 negotiation for https:// and tcp+tls:// hosts. - Add table test pinning that every non-TLS apply function calls disableHTTP2AutoConfig — catches the silent-omit footgun when new schemes are added in future. - Add t.Skip placeholder for http:// hijack so the gap is visible in `go test -v` output and reachable via grep from the test directory. - Apply disableHTTP2AutoConfig to the createHTTPClient unknown-scheme fallback (defensive — same hijack risk applies to direct test callers with exotic schemes). - Fix t.Logf race in the fake-proxy serve goroutine by routing through http.Server.Shutdown and a done channel so the goroutine exits before test cleanup completes. - Trim disableHTTP2AutoConfig doc comment (drop cost-justification paragraph and the upstream-quote nicety; keep the load-bearing WHY and the upstream-code reference). - Add CHANGELOG.md Unreleased > Fixed entry (matches the docker-adapter release-notes convention from prior PRs #634 / #627 / #653 / #613). - Add docs/TROUBLESHOOTING.md entry for v0.25.0 plain-tcp:// run_exec failure mode (slots next to the related #605 socket-proxy section); cross-reference from the older v0.11.0 HTTP/2 section so operators scanning by topic land on the right bug. Deferred to follow-up issues (out of scope for this PR): - Refactor to schemeHandler.tlsCapable flag to kill the silent-omit footgun structurally. - Dead ClientConfig.HTTPClient field (pre-existing, unrelated cleanup). Signed-off-by: Sebastian Mendel <[email protected]>
|
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
Address PR review: the new json-post bundled preset (#676) is an Added feature and per semver belongs in a minor release (0.26.0), not a patch (0.25.1). Move it back under [Unreleased] above the new [0.25.1] heading. The four bugfixes (#681, #685, #686, #670) remain in 0.25.1. Signed-off-by: Sebastian Mendel <[email protected]>
…Capable Promote "non-TLS schemes must suppress HTTP/2 auto-config" from a manually-duplicated call inside four apply* functions to a data-driven property on the schemeHandler struct. createHTTPClient now invokes disableHTTP2AutoConfig when !handler.tlsCapable, so: - applyTLSTransport (https://, tcp+tls://): tlsCapable=true, no suppression — ALPN h2 negotiation still works. - All other apply functions: tlsCapable=false, suppression handled centrally — a future scheme added to schemeHandlers either declares tlsCapable=true (and accepts the implications) or gets the #668 fix for free. Closes the "silent regression risk" gap from #683: the per-apply disableHTTP2AutoConfig call could be forgotten when adding a new scheme. The structural invariant is now compile-time enforced via the flag on the dispatch table. No behavior change for existing schemes. Two test reshapes: - TestApplyTransport_NonTLSDisableHTTP2AutoConfig renamed to TestSchemeHandler_NonTLSCapableSuppressesHTTP2 and rewritten to iterate schemeHandlers and assert via createHTTPClient instead of calling each apply function directly. The new shape automatically covers any future tlsCapable=false scheme. - New TestSchemeHandler_TLSCapableMeansHTTP2Allowed locks the inverse half of the invariant per the #683 issue spec. Refs #683, #681, #668. Signed-off-by: Sebastian Mendel <[email protected]>



Summary
Closes #668.
With
DOCKER_HOST=tcp://socket-proxy:2375(plain HTTP, no TLS) the regular HTTP path works butContainerExecAttach(the hijack path used byrun_exec,ContainerAttach,ContainerLogs --follow) fails with:Root cause
Go's
net/httplazily auto-configures HTTP/2 on the first request and allocatesTransport.TLSClientConfigin place (to seedNextProtos=[h2 http/1.1]for ALPN). The Docker SDK's hijack dialer readsbaseTransport.TLSClientConfigas its "TLS is required" signal. After the first plain-HTTP request lands, the SDK misreads the auto-installed config as operator-configured TLS and callstls.Dialagainst the plain Docker daemon — handshake fails because the daemon answers with raw HTTP.DialContextTLSClientConfigNewClientWithOpts_pingNextProtos=[h2 http/1.1]) ❌The SDK author already flagged the hijack dialer in
moby/mobyclient.go#L439-L443as suspect: "I honestly don't know why it doesn't do that."Fix
One knob: set
transport.TLSNextPrototo a non-nil empty map on the non-TLS apply paths (applyUnixTransport,applyTCPTransport,applyPlainTransport) plus thecreateHTTPClientunknown-scheme fallback. That is the documented Go stdlib contract for "do not auto-enable HTTP/2", which prevents the in-placeTLSClientConfigmutation. TLS Docker hosts (https://,tcp+tls://) keepTLSNextProtonil so ALPN h2 negotiation still works there.Extracted as
disableHTTP2AutoConfigwith a// Why:comment so the subtle stdlib contract isn't deleted as dead code by a future reader.Tests
New
client_tcp_hijack_test.go(verified bidirectionally — fails with the exact user error pre-fix, passes post-fix):TestPlainTCPHijack_ContainerExecAttachWorks— drives the fullContainerExecAttachhijack path against a plain-HTTP TCP listener with 101-Switching-Protocols semantics. The exact-shape regression test the existing suite lacked.TestDisableHTTP2AutoConfig_KeepsTLSClientConfigNil— unit test pinning the stdlib invariant directly so a future net/http refactor cannot silently re-introduce the bug while the end-to-end SDK test still passes by chance.TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig— negative control: TLS apply paths must keepTLSNextProtonil so ALPN h2 negotiation remains intact. Guards against accidentally extending the suppression to the TLS path.TestApplyTransport_NonTLSDisableHTTP2AutoConfig— table test pinning that every non-TLS apply function calls the helper. Catches the silent-omit footgun if new schemes are added.TestPlainHTTPHijack_ContainerExecAttach_FollowUp—t.Skipplaceholder so the gap is visible ingo test -vand grep-able to fix(docker): http:// hijack falls through to invalid net.Dial("http", addr) #682.Race-clean over 10 iterations under
go test -race.Docs
CHANGELOG.md—[Unreleased] > Fixedentry (matches the docker-adapter release-notes convention from prior PRs Docker SDK:tcp://+ TLS env vars wires TLSClientConfig but the URL stays http:// #634 / Docker SDK:tcp+tls://without cert material silently uses system CA / no client cert (should fail-closed) #627 / docker: 3 silent-TLS-downgrade vectors (applyDockerTLS warns, SMTP OpportunisticStartTLS, webhook empty-allowlist) #653 / fix(docker): honor DOCKER_TLS_VERIFY/DOCKER_CERT_PATH env vars (#607) #613).docs/TROUBLESHOOTING.md— new section slotted next to the related Failing to start when using DOCKER_HOST #605 socket-proxy section; cross-referenced from the older v0.11.0 HTTP/2 section so operators scanning by topic land on the right bug.Out of scope
http://hijack has a separate, pre-existing failure mode: SDK falls tonet.Dial("http", addr), which Go rejects withunknown network http. Fixing it needs a TCPDialContexton the http transport. Filed as #682; called out in test comments andt.Skip-tracked above.Test plan
go test ./core/...cleango test -race -count=10clean (no flakes, no goroutine race)golangci-lint run ./core/adapters/docker/...clean