fix(docker): honor DOCKER_TLS_VERIFY/DOCKER_CERT_PATH env vars (#607)#613
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
✅ Mutation Testing ResultsMutation Score: 100.00% (threshold: 60%)
What is mutation testing?Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #613 +/- ##
==========================================
+ Coverage 87.29% 87.34% +0.04%
==========================================
Files 88 88
Lines 10677 10719 +42
==========================================
+ Hits 9320 9362 +42
- Misses 1115 1116 +1
+ Partials 242 241 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request updates the Docker SDK adapter to properly honor DOCKER_TLS_VERIFY and DOCKER_CERT_PATH for HTTPS hosts, addressing an issue where custom HTTP clients would silently discard TLS material. It introduces TLSCertPath and TLSVerify to ClientConfig and implements logic to resolve host and TLS settings with precedence for configuration over environment variables. Feedback indicates a potential security risk where errors in TLS configuration resolution are ignored; it is recommended to refactor the client creation to fail-fast upon encountering such errors.
There was a problem hiding this comment.
Pull request overview
This pull request fixes a Docker SDK adapter bug where DOCKER_TLS_VERIFY / DOCKER_CERT_PATH TLS material was being lost when Ofelia installs its custom http.Client via client.WithHTTPClient, ensuring TLS configuration is applied on the custom transport as well.
Changes:
- Add
ClientConfig.TLSCertPathandClientConfig.TLSVerifyto allow explicit TLS configuration with config > env precedence. - Extend
createHTTPClientto resolve the effective host (includingDOCKER_HOST) and apply TLS config for HTTPS transports. - Add regression tests generating temporary TLS fixtures and update
CHANGELOG.mdwith the fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| core/adapters/docker/client.go | Adds host resolution for transport selection and TLS config resolution/application for HTTPS hosts. |
| core/adapters/docker/client_mutation_test.go | Adds regression tests ensuring TLS env/config precedence and TLS/no-TLS behavior across host types. |
| CHANGELOG.md | Documents the TLS env var fix and new config overrides under Unreleased. |
Comments suppressed due to low confidence (2)
core/adapters/docker/client.go:199
- TLS is only applied when the resolved host string starts with "https://". However repo docs show the common TLS setup as
DOCKER_HOST=tcp://...withDOCKER_TLS_VERIFY/DOCKER_CERT_PATH(e.g. docs/SECURITY.md). Withtcp://the code falls into the default branch and never calls resolveTLSConfig, so client cert/CA material will still be discarded in that configuration. Consider applying TLS whenever a cert path is configured (and host is not unix://), or explicitly handling thetcp://scheme used by Docker for TLS endpoints.
This issue also appears on line 189 of the same file.
// Configure dialer based on host type
switch {
case strings.HasPrefix(host, "unix://"):
socketPath := strings.TrimPrefix(host, "unix://")
transport.DialContext = func(ctx context.Context, network, addr string) (net.Conn, error) {
dialer := &net.Dialer{Timeout: config.DialTimeout}
return dialer.DialContext(ctx, "unix", socketPath)
}
// HTTP/2 not supported on Unix sockets
transport.ForceAttemptHTTP2 = false
case strings.HasPrefix(host, "https://"):
// HTTPS connections can use HTTP/2 via ALPN
transport.ForceAttemptHTTP2 = true
// Apply TLS from explicit config / env. The SDK's client.FromEnv
// also tries to apply this, but client.WithHTTPClient below
// overwrites the SDK's http.Client wholesale and would discard
// that TLS material — see issue #607.
if tlsCfg, err := resolveTLSConfig(config); err == nil && tlsCfg != nil {
transport.TLSClientConfig = tlsCfg
}
default:
// TCP without TLS - HTTP/2 not supported (no h2c in Docker)
transport.ForceAttemptHTTP2 = false
}
core/adapters/docker/client.go:195
- Errors from resolveTLSConfig are silently ignored (
err == nilguard). If TLS files are missing/invalid, Ofelia will proceed with an HTTPS transport but without the intended client cert/CA config, which can lead to confusing failures or an unintended unauthenticated TLS connection. It would be safer to propagate this error (e.g. have createHTTPClient/NewClientWithConfig return an error) so misconfiguration is surfaced early.
// Apply TLS from explicit config / env. The SDK's client.FromEnv
// also tries to apply this, but client.WithHTTPClient below
// overwrites the SDK's http.Client wholesale and would discard
// that TLS material — see issue #607.
if tlsCfg, err := resolveTLSConfig(config); err == nil && tlsCfg != nil {
transport.TLSClientConfig = tlsCfg
}
- Surface resolveTLSConfig errors via slog warning. Previously the call site silently swallowed the error, reproducing the very class of bug this PR fixes (security + test + DX reviewers all flagged this). - Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is not a silent downgrade. Caught by the security review of PR #612. - Use SDK-exported env constants (client.EnvOverrideHost, EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the SDK ever renames them we get a compile error. - Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error so operators get the remediation hint inline. - Note in resolveHostForTransport that PR #606 introduces a duplicate helper — whichever PR merges second consolidates. Tests: - TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out precedence (config > env) for the verify polarity. - TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true. - TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the same TLS material as https://. CHANGELOG: moved entry from ### Fixed to ### Security with explicit upgrade-impact callout (operators relying on silent insecure dial will break on upgrade — that is the point). docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors covering the post-fix failure modes. Signed-off-by: Sebastian Mendel <[email protected]>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
Security review (NO-SHIP) identified that adding tcp+tls:// to the allow-list without wiring TLS material into the custom http.Transport is a silent downgrade to plain TCP - exactly the bug class this PR was meant to prevent. PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold tcp+tls:// from the allow-list and reject it loudly with the same error as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls with a transport-level test that asserts TLS is actually wired up. Test review (conditional ship) noted the tcp+tls assertion was at string-transform level, not transport level. With tcp+tls now rejected, the assertion belongs on the rejection path - moved into the existing TestCreateHTTPClient_UnsupportedSchemes table. DX review: changelog reclassified to ### Changed with **BREAKING:** marker per project convention; ssh:// migration advice added inline so operators see the recommendation in the PR notes, not just in TROUBLESHOOTING. docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to unsupported list with explanation pointing at #613. Signed-off-by: Sebastian Mendel <[email protected]>
PR #613's codecov/patch check failed at 78.37% (target 80%) because the new `slog.Default().Warn(...)` branch on resolveTLSConfig errors was unexercised. Add two tests: - TestCreateHTTPClient_TLSLoadErrorDoesNotPanic exercises the surfacing behavior end-to-end: DOCKER_CERT_PATH points at an empty dir, so tlsconfig.Client errors out, the Warn branch fires, and the transport falls back to no TLS without panicking. - TestResolveTLSConfig_MissingCertFiles directly exercises the helper's error wrap, pinning the operator-facing remediation hint ("expected ca.pem, cert.pem, key.pem") into the error message. Coverage of createHTTPClient and resolveTLSConfig now 100%. Signed-off-by: Sebastian Mendel <[email protected]>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
…k on PR #613) Copilot review noted that the docker CLI / SDK has long supported DOCKER_HOST=tcp://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=... and silently upgrades the connection to HTTPS. Mirroring that here closes a silent-plaintext-downgrade gap for users following Docker's canonical mTLS docs. Refactor: - Extract applyDockerTLS / hasTLSMaterial helpers so the https://, tcp+tls://, and tcp:// branches share the same TLS-application path. applyDockerTLS surfaces resolveTLSConfig errors via slog (same warn-don't-swallow pattern from the previous review-feedback commit). - tcp:// remains plain TCP / HTTP/1.1 when no TLS material is configured; it upgrades to HTTPS / HTTP/2 only when DOCKER_CERT_PATH (or ClientConfig.TLSCertPath) is present. Tests: - TestCreateHTTPClient_TCPWithTLSEnvUpgrades pins the upgrade. - TestCreateHTTPClient_TCPWithoutTLSEnvStaysPlaintext is the negative case: tcp:// without TLS env stays plaintext (otherwise the upgrade test alone could silently make every tcp:// HTTPS). - TestHasTLSMaterial_ExplicitConfigOnly covers the explicit-config branch of the new helper. Signed-off-by: Sebastian Mendel <[email protected]>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
Security review (NO-SHIP) identified that adding tcp+tls:// to the allow-list without wiring TLS material into the custom http.Transport is a silent downgrade to plain TCP - exactly the bug class this PR was meant to prevent. PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold tcp+tls:// from the allow-list and reject it loudly with the same error as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls with a transport-level test that asserts TLS is actually wired up. Test review (conditional ship) noted the tcp+tls assertion was at string-transform level, not transport level. With tcp+tls now rejected, the assertion belongs on the rejection path - moved into the existing TestCreateHTTPClient_UnsupportedSchemes table. DX review: changelog reclassified to ### Changed with **BREAKING:** marker per project convention; ssh:// migration advice added inline so operators see the recommendation in the PR notes, not just in TROUBLESHOOTING. docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to unsupported list with explanation pointing at #613. Signed-off-by: Sebastian Mendel <[email protected]>
The Docker SDK adapter built its custom *http.Client via createHTTPClient and attached it with client.WithHTTPClient. That option replaces the SDK's c.client wholesale — and in doing so silently discards the *tls.Config that client.FromEnv had just installed via WithTLSClientConfigFromEnv, which is where DOCKER_CERT_PATH / DOCKER_TLS_VERIFY are consumed. Net effect: with DOCKER_HOST=https://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=..., the resulting transport had a nil TLSClientConfig, so Ofelia either failed the handshake or fell back to system roots without the operator-supplied client cert. Latent since the SDK adapter landed in v0.12.0. Fix: in createHTTPClient, when the resolved host scheme is https://, build a *tls.Config via go-connections/tlsconfig.Client (already a transitive dep) using the same precedence as the SDK and assign it to transport.TLSClientConfig. Also resolve the host via DOCKER_HOST env so the HTTPS branch is selected in the same configurations the SDK targets. Add ClientConfig.TLSCertPath and ClientConfig.TLSVerify so callers can override env-supplied values explicitly. Precedence: config field > env var > unset (no TLS modification) DOCKER_TLS_VERIFY semantics match the SDK exactly: any non-empty value (including "0") means verify; only an unset/empty value disables it. Tests in client_mutation_test.go cover all four combinations (env-only / explicit-override / unix-socket-ignored / no-cert-path) using self-signed throwaway material under t.TempDir() and 127.0.0.1:0. This is the TLS sibling of #606 (DOCKER_HOST dialer divergence). Fixes #607 Signed-off-by: Sebastian Mendel <[email protected]> fix(docker): address PR #613 review feedback - Surface resolveTLSConfig errors via slog warning. Previously the call site silently swallowed the error, reproducing the very class of bug this PR fixes (security + test + DX reviewers all flagged this). - Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is not a silent downgrade. Caught by the security review of PR #612. - Use SDK-exported env constants (client.EnvOverrideHost, EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the SDK ever renames them we get a compile error. - Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error so operators get the remediation hint inline. - Note in resolveHostForTransport that PR #606 introduces a duplicate helper — whichever PR merges second consolidates. Tests: - TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out precedence (config > env) for the verify polarity. - TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true. - TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the same TLS material as https://. CHANGELOG: moved entry from ### Fixed to ### Security with explicit upgrade-impact callout (operators relying on silent insecure dial will break on upgrade — that is the point). docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors covering the post-fix failure modes. Signed-off-by: Sebastian Mendel <[email protected]> test(docker): cover the resolveTLSConfig error path PR #613's codecov/patch check failed at 78.37% (target 80%) because the new `slog.Default().Warn(...)` branch on resolveTLSConfig errors was unexercised. Add two tests: - TestCreateHTTPClient_TLSLoadErrorDoesNotPanic exercises the surfacing behavior end-to-end: DOCKER_CERT_PATH points at an empty dir, so tlsconfig.Client errors out, the Warn branch fires, and the transport falls back to no TLS without panicking. - TestResolveTLSConfig_MissingCertFiles directly exercises the helper's error wrap, pinning the operator-facing remediation hint ("expected ca.pem, cert.pem, key.pem") into the error message. Coverage of createHTTPClient and resolveTLSConfig now 100%. Signed-off-by: Sebastian Mendel <[email protected]> fix(docker): wire TLS for tcp:// + DOCKER_TLS_VERIFY (Copilot feedback on PR #613) Copilot review noted that the docker CLI / SDK has long supported DOCKER_HOST=tcp://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=... and silently upgrades the connection to HTTPS. Mirroring that here closes a silent-plaintext-downgrade gap for users following Docker's canonical mTLS docs. Refactor: - Extract applyDockerTLS / hasTLSMaterial helpers so the https://, tcp+tls://, and tcp:// branches share the same TLS-application path. applyDockerTLS surfaces resolveTLSConfig errors via slog (same warn-don't-swallow pattern from the previous review-feedback commit). - tcp:// remains plain TCP / HTTP/1.1 when no TLS material is configured; it upgrades to HTTPS / HTTP/2 only when DOCKER_CERT_PATH (or ClientConfig.TLSCertPath) is present. Tests: - TestCreateHTTPClient_TCPWithTLSEnvUpgrades pins the upgrade. - TestCreateHTTPClient_TCPWithoutTLSEnvStaysPlaintext is the negative case: tcp:// without TLS env stays plaintext (otherwise the upgrade test alone could silently make every tcp:// HTTPS). - TestHasTLSMaterial_ExplicitConfigOnly covers the explicit-config branch of the new helper. Signed-off-by: Sebastian Mendel <[email protected]>
|
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
The actual code fix for #605 cascaded into main via #613 (which added DOCKER_HOST env-driven host resolution to createHTTPClient as a side-effect of wiring TLS material). This commit adds the parts of PR #606 that did not duplicate code already on main: - TROUBLESHOOTING entry "DOCKER_HOST with TCP Socket Proxy (v0.12.0 - v0.24.0)" with the original symptom, root cause, and a working tecnativa/docker-socket-proxy compose snippet. - CHANGELOG attribution under [Unreleased] -> Fixed for #606/#605 alongside the existing #611/#612/#613/#618 entries, plus a Tests section noting the TestHealthStatus race fix that was also in #606. The TestHealthStatus race fix was cherry-picked separately as the preceding commit (it predates the recent batch and is unrelated to the docker adapter). Signed-off-by: Sebastian Mendel <[email protected]>
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://... is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme to https:// before passing it to client.WithHost. Without the rewrite, the SDK was given a tcp:// URL while the custom HTTP transport was wired with TLS material via #613. Go's http.Transport only triggers TLS for https:// URLs, so the cert material was silently unused and connections went out plaintext, leaving operators believing they had mTLS when they did not. The rewrite is implemented as a tested helper (upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig right after host resolution. Other schemes (tcp+tls://, https://, unix://, http://, npipe://) pass through unchanged. The applyTCPTransport hasTLSMaterial branch stays in place to support direct callers of createHTTPClient (notably TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite means dispatch goes through applyTLSTransport. Updates the godoc on the tcp: schemeHandlers entry and the TROUBLESHOOTING.md scheme table to make the new contract explicit. Closes #634, follow-up to #613. Signed-off-by: Sebastian Mendel <[email protected]>
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://... is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme to https:// before passing it to client.WithHost. Without the rewrite, the SDK was given a tcp:// URL while the custom HTTP transport was wired with TLS material via #613. Go's http.Transport only triggers TLS for https:// URLs, so the cert material was silently unused and connections went out plaintext, leaving operators believing they had mTLS when they did not. The rewrite is implemented as a tested helper (upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig right after host resolution. Other schemes (tcp+tls://, https://, unix://, http://, npipe://) pass through unchanged. The applyTCPTransport hasTLSMaterial branch stays in place to support direct callers of createHTTPClient (notably TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite means dispatch goes through applyTLSTransport. Updates the godoc on the tcp: schemeHandlers entry and the TROUBLESHOOTING.md scheme table to make the new contract explicit. Closes #634, follow-up to #613. Signed-off-by: Sebastian Mendel <[email protected]>
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://... is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme to https:// before passing it to client.WithHost. Without the rewrite, the SDK was given a tcp:// URL while the custom HTTP transport was wired with TLS material via #613. Go's http.Transport only triggers TLS for https:// URLs, so the cert material was silently unused and connections went out plaintext, leaving operators believing they had mTLS when they did not. The rewrite is implemented as a tested helper (upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig right after host resolution. Other schemes (tcp+tls://, https://, unix://, http://, npipe://) pass through unchanged. The applyTCPTransport hasTLSMaterial branch stays in place to support direct callers of createHTTPClient (notably TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite means dispatch goes through applyTLSTransport. Updates the godoc on the tcp: schemeHandlers entry and the TROUBLESHOOTING.md scheme table to make the new contract explicit. Closes #634, follow-up to #613. Signed-off-by: Sebastian Mendel <[email protected]>
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://... is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme to https:// before passing it to client.WithHost. Without the rewrite, the SDK was given a tcp:// URL while the custom HTTP transport was wired with TLS material via #613. Go's http.Transport only triggers TLS for https:// URLs, so the cert material was silently unused and connections went out plaintext, leaving operators believing they had mTLS when they did not. The rewrite is implemented as a tested helper (upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig right after host resolution. Other schemes (tcp+tls://, https://, unix://, http://, npipe://) pass through unchanged. The applyTCPTransport hasTLSMaterial branch stays in place to support direct callers of createHTTPClient (notably TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite means dispatch goes through applyTLSTransport. Updates the godoc on the tcp: schemeHandlers entry and the TROUBLESHOOTING.md scheme table to make the new contract explicit. Closes #634, follow-up to #613. Signed-off-by: Sebastian Mendel <[email protected]>
Tightens PR #681 based on the multi-axis review fan-out: - Add unit test on disableHTTP2AutoConfig itself (transport-level invariant) so a future stdlib refactor cannot silently re-introduce the bug while the end-to-end SDK test still passes by chance. - Add TLS-path negative control (TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig) that guards against accidentally extending the suppression to the TLS apply path, which would silently disable ALPN h2 negotiation for https:// and tcp+tls:// hosts. - Add table test pinning that every non-TLS apply function calls disableHTTP2AutoConfig — catches the silent-omit footgun when new schemes are added in future. - Add t.Skip placeholder for http:// hijack so the gap is visible in `go test -v` output and reachable via grep from the test directory. - Apply disableHTTP2AutoConfig to the createHTTPClient unknown-scheme fallback (defensive — same hijack risk applies to direct test callers with exotic schemes). - Fix t.Logf race in the fake-proxy serve goroutine by routing through http.Server.Shutdown and a done channel so the goroutine exits before test cleanup completes. - Trim disableHTTP2AutoConfig doc comment (drop cost-justification paragraph and the upstream-quote nicety; keep the load-bearing WHY and the upstream-code reference). - Add CHANGELOG.md Unreleased > Fixed entry (matches the docker-adapter release-notes convention from prior PRs #634 / #627 / #653 / #613). - Add docs/TROUBLESHOOTING.md entry for v0.25.0 plain-tcp:// run_exec failure mode (slots next to the related #605 socket-proxy section); cross-reference from the older v0.11.0 HTTP/2 section so operators scanning by topic land on the right bug. Deferred to follow-up issues (out of scope for this PR): - Refactor to schemeHandler.tlsCapable flag to kill the silent-omit footgun structurally. - Dead ClientConfig.HTTPClient field (pre-existing, unrelated cleanup). Signed-off-by: Sebastian Mendel <[email protected]>
…ack (#668) (#681) ## Summary Closes #668. With `DOCKER_HOST=tcp://socket-proxy:2375` (plain HTTP, no TLS) the regular HTTP path works but `ContainerExecAttach` (the hijack path used by `run_exec`, `ContainerAttach`, `ContainerLogs --follow`) fails with: ``` tls: first record does not look like a TLS handshake ``` ## Root cause Go's `net/http` lazily auto-configures HTTP/2 on the first request and **allocates `Transport.TLSClientConfig` in place** (to seed `NextProtos=[h2 http/1.1]` for ALPN). The Docker SDK's hijack dialer reads `baseTransport.TLSClientConfig` as its "TLS is required" signal. After the first plain-HTTP request lands, the SDK misreads the auto-installed config as operator-configured TLS and calls `tls.Dial` against the plain Docker daemon — handshake fails because the daemon answers with raw HTTP. | Stage | `DialContext` | `TLSClientConfig` | | --- | --- | --- | | After `NewClientWithOpts` | nil | **nil** ✅ | | After first `_ping` | nil | **non-nil** (`NextProtos=[h2 http/1.1]`) ❌ | The SDK author already flagged the hijack dialer in [`moby/moby` `client.go#L439-L443`](https://github.com/moby/moby/blob/v28.5.2/client/client.go#L439-L443) as suspect: *"I honestly don't know why it doesn't do that."* ## Fix One knob: set `transport.TLSNextProto` to a non-nil empty map on the non-TLS apply paths (`applyUnixTransport`, `applyTCPTransport`, `applyPlainTransport`) plus the `createHTTPClient` unknown-scheme fallback. That is the documented Go stdlib contract for *"do not auto-enable HTTP/2"*, which prevents the in-place `TLSClientConfig` mutation. TLS Docker hosts (`https://`, `tcp+tls://`) keep `TLSNextProto` nil so ALPN h2 negotiation still works there. Extracted as `disableHTTP2AutoConfig` with a `// Why:` comment so the subtle stdlib contract isn't deleted as dead code by a future reader. ## Tests New `client_tcp_hijack_test.go` (verified bidirectionally — fails with the exact user error pre-fix, passes post-fix): - **`TestPlainTCPHijack_ContainerExecAttachWorks`** — drives the full `ContainerExecAttach` hijack path against a plain-HTTP TCP listener with 101-Switching-Protocols semantics. The exact-shape regression test the existing suite lacked. - **`TestDisableHTTP2AutoConfig_KeepsTLSClientConfigNil`** — unit test pinning the stdlib invariant directly so a future net/http refactor cannot silently re-introduce the bug while the end-to-end SDK test still passes by chance. - **`TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig`** — negative control: TLS apply paths must keep `TLSNextProto` nil so ALPN h2 negotiation remains intact. Guards against accidentally extending the suppression to the TLS path. - **`TestApplyTransport_NonTLSDisableHTTP2AutoConfig`** — table test pinning that every non-TLS apply function calls the helper. Catches the silent-omit footgun if new schemes are added. - **`TestPlainHTTPHijack_ContainerExecAttach_FollowUp`** — `t.Skip` placeholder so the gap is visible in `go test -v` and grep-able to #682. Race-clean over 10 iterations under `go test -race`. ## Docs - `CHANGELOG.md` — `[Unreleased] > Fixed` entry (matches the docker-adapter release-notes convention from prior PRs #634 / #627 / #653 / #613). - `docs/TROUBLESHOOTING.md` — new section slotted next to the related #605 socket-proxy section; cross-referenced from the older v0.11.0 HTTP/2 section so operators scanning by topic land on the right bug. ## Out of scope `http://` hijack has a separate, pre-existing failure mode: SDK falls to `net.Dial("http", addr)`, which Go rejects with `unknown network http`. Fixing it needs a TCP `DialContext` on the http transport. Filed as #682; called out in test comments and `t.Skip`-tracked above. ## Test plan - [x] New regression test passes with fix, fails without it - [x] `go test ./core/...` clean - [x] `go test -race -count=10` clean (no flakes, no goroutine race) - [x] `golangci-lint run ./core/adapters/docker/...` clean - [ ] CI green



Summary
Fixes #607. TLS sibling of #606 (which fixes the dialer-divergence half of the same root cause).
When
DOCKER_HOST=https://...is used together withDOCKER_TLS_VERIFY=1andDOCKER_CERT_PATH=..., Ofelia silently discarded the TLS material — the resulting*http.Transporthad a nilTLSClientConfig, so the daemon either rejected the handshake or got an unauthenticated connection where mTLS was intended. Latent since the SDK adapter landed in v0.12.0.Root cause
NewClientWithConfigconfigures the SDK in two passes that don't compose:client.FromEnvrunsWithTLSClientConfigFromEnv(), which builds a*tls.ConfigfromDOCKER_CERT_PATH/DOCKER_TLS_VERIFYand assigns it onto a freshhttp.Client.Transport— i.e. it replacesc.client.client.WithHTTPClient(httpClient)then overwritesc.clientagain with our custom transport, which never sees the TLS config from step 1.So whichever step runs last wins, and step 2 always wins. The exact same shape as #606's
WithHostFromEnvvs custom dialer divergence.Fix
In
createHTTPClient, when the resolved host scheme ishttps://, build a*tls.Configviago-connections/tlsconfig.Client(already a dependency) and assign it totransport.TLSClientConfig. Two new fields onClientConfigallow explicit override:TLSCertPath stringandTLSVerify *bool. Precedence mirrorsresolveDockerHostfrom #606: config field > env var > unset.DOCKER_TLS_VERIFYsemantics match the SDK exactly: any non-empty value (including"0") means verify; only an unset/empty value disables it. When neither config nor env supply a cert path, no TLS is applied (matching the SDK's "leave TLS untouched" behavior).A small helper,
resolveHostForTransport, also consultsDOCKER_HOSTso the HTTPS branch is selected in the same configurations the SDK targets — avoiding a related divergence where emptyconfig.Hostwould fall back to the unix default and skip the TLS branch entirely.Test plan
core/adapters/docker/client_mutation_test.go:TestCreateHTTPClient_HonorsTLSEnv—DOCKER_HOST=https://127.0.0.1:0+ cert path, assertsTLSClientConfig.{RootCAs, Certificates}populated andInsecureSkipVerify=falseTestCreateHTTPClient_ExplicitTLSOverridesEnv—cfg.TLSCertPath+cfg.TLSVerifywin over env-supplied valuesTestCreateHTTPClient_NoTLSForUnixSocket— unix socket path → no TLS even with env setTestCreateHTTPClient_NoTLSWhenCertPathEmpty— empty cert path → SDK-equivalent "no TLS modification"BUG #607 confirmed: transport.TLSClientConfig is nil; TLS env not honoredbefore the fixgo test -race ./core/adapters/docker/ ./core/passesgo test -race ./...passes (full module)golangci-lint run --timeout=3m ./...cleant.TempDir(); loopback-only127.0.0.1:0(no parallel-test contamination, per fix(docker): honor DOCKER_HOST env var when selecting transport dialer #606 review)Scope
Self-contained — no
go.mod/go.sumchange (go-connections/tlsconfigwas already a transitive dep), no impact on the unix-socket / TCP code paths.