Skip to content

fix(docker): honor DOCKER_TLS_VERIFY/DOCKER_CERT_PATH env vars (#607)#613

Merged
CybotTM merged 1 commit into
mainfrom
fix/issue-607
May 13, 2026
Merged

fix(docker): honor DOCKER_TLS_VERIFY/DOCKER_CERT_PATH env vars (#607)#613
CybotTM merged 1 commit into
mainfrom
fix/issue-607

Conversation

@CybotTM

@CybotTM CybotTM commented May 13, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #607. TLS sibling of #606 (which fixes the dialer-divergence half of the same root cause).

When DOCKER_HOST=https://... is used together with DOCKER_TLS_VERIFY=1 and DOCKER_CERT_PATH=..., Ofelia silently discarded the TLS material — the resulting *http.Transport had a nil TLSClientConfig, so the daemon either rejected the handshake or got an unauthenticated connection where mTLS was intended. Latent since the SDK adapter landed in v0.12.0.

Root cause

NewClientWithConfig configures the SDK in two passes that don't compose:

  1. client.FromEnv runs WithTLSClientConfigFromEnv(), which builds a *tls.Config from DOCKER_CERT_PATH / DOCKER_TLS_VERIFY and assigns it onto a fresh http.Client.Transport — i.e. it replaces c.client.
  2. client.WithHTTPClient(httpClient) then overwrites c.client again with our custom transport, which never sees the TLS config from step 1.

So whichever step runs last wins, and step 2 always wins. The exact same shape as #606's WithHostFromEnv vs custom dialer divergence.

Fix

In createHTTPClient, when the resolved host scheme is https://, build a *tls.Config via go-connections/tlsconfig.Client (already a dependency) and assign it to transport.TLSClientConfig. Two new fields on ClientConfig allow explicit override: TLSCertPath string and TLSVerify *bool. Precedence mirrors resolveDockerHost from #606: config field > env var > unset.

DOCKER_TLS_VERIFY semantics match the SDK exactly: any non-empty value (including "0") means verify; only an unset/empty value disables it. When neither config nor env supply a cert path, no TLS is applied (matching the SDK's "leave TLS untouched" behavior).

A small helper, resolveHostForTransport, also consults DOCKER_HOST so the HTTPS branch is selected in the same configurations the SDK targets — avoiding a related divergence where empty config.Host would fall back to the unix default and skip the TLS branch entirely.

Test plan

  • New tests in core/adapters/docker/client_mutation_test.go:
    • TestCreateHTTPClient_HonorsTLSEnvDOCKER_HOST=https://127.0.0.1:0 + cert path, asserts TLSClientConfig.{RootCAs, Certificates} populated and InsecureSkipVerify=false
    • TestCreateHTTPClient_ExplicitTLSOverridesEnvcfg.TLSCertPath + cfg.TLSVerify win over env-supplied values
    • TestCreateHTTPClient_NoTLSForUnixSocket — unix socket path → no TLS even with env set
    • TestCreateHTTPClient_NoTLSWhenCertPathEmpty — empty cert path → SDK-equivalent "no TLS modification"
  • Verified the bug exists on the unmodified code: a stripped-down version of the env test fails with BUG #607 confirmed: transport.TLSClientConfig is nil; TLS env not honored before the fix
  • go test -race ./core/adapters/docker/ ./core/ passes
  • go test -race ./... passes (full module)
  • golangci-lint run --timeout=3m ./... clean
  • All TLS material in tests is generated under t.TempDir(); loopback-only 127.0.0.1:0 (no parallel-test contamination, per fix(docker): honor DOCKER_HOST env var when selecting transport dialer #606 review)

Scope

Self-contained — no go.mod/go.sum change (go-connections/tlsconfig was already a transitive dep), no impact on the unix-socket / TCP code paths.

Copilot AI review requested due to automatic review settings May 13, 2026 10:09
@github-actions github-actions Bot added documentation Improvements or additions to documentation tests labels May 13, 2026
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

✅ Mutation Testing Results

Mutation Score: 100.00% (threshold: 60%)

✨ Good job! Mutation score meets the threshold.

What is mutation testing?

Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.

  • Killed mutants: Tests caught the mutation (good!)
  • Survived mutants: Tests missed the mutation (needs improvement)

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.34%. Comparing base (73bbc6a) to head (db9f62f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #613      +/-   ##
==========================================
+ Coverage   87.29%   87.34%   +0.04%     
==========================================
  Files          88       88              
  Lines       10677    10719      +42     
==========================================
+ Hits         9320     9362      +42     
- Misses       1115     1116       +1     
+ Partials      242      241       -1     
Flag Coverage Δ
integration 87.32% <100.00%> (+0.03%) ⬆️
unittests 83.93% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Docker SDK adapter to properly honor DOCKER_TLS_VERIFY and DOCKER_CERT_PATH for HTTPS hosts, addressing an issue where custom HTTP clients would silently discard TLS material. It introduces TLSCertPath and TLSVerify to ClientConfig and implements logic to resolve host and TLS settings with precedence for configuration over environment variables. Feedback indicates a potential security risk where errors in TLS configuration resolution are ignored; it is recommended to refactor the client creation to fail-fast upon encountering such errors.

Comment thread core/adapters/docker/client.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a Docker SDK adapter bug where DOCKER_TLS_VERIFY / DOCKER_CERT_PATH TLS material was being lost when Ofelia installs its custom http.Client via client.WithHTTPClient, ensuring TLS configuration is applied on the custom transport as well.

Changes:

  • Add ClientConfig.TLSCertPath and ClientConfig.TLSVerify to allow explicit TLS configuration with config > env precedence.
  • Extend createHTTPClient to resolve the effective host (including DOCKER_HOST) and apply TLS config for HTTPS transports.
  • Add regression tests generating temporary TLS fixtures and update CHANGELOG.md with the fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
core/adapters/docker/client.go Adds host resolution for transport selection and TLS config resolution/application for HTTPS hosts.
core/adapters/docker/client_mutation_test.go Adds regression tests ensuring TLS env/config precedence and TLS/no-TLS behavior across host types.
CHANGELOG.md Documents the TLS env var fix and new config overrides under Unreleased.
Comments suppressed due to low confidence (2)

core/adapters/docker/client.go:199

  • TLS is only applied when the resolved host string starts with "https://". However repo docs show the common TLS setup as DOCKER_HOST=tcp://... with DOCKER_TLS_VERIFY/DOCKER_CERT_PATH (e.g. docs/SECURITY.md). With tcp:// the code falls into the default branch and never calls resolveTLSConfig, so client cert/CA material will still be discarded in that configuration. Consider applying TLS whenever a cert path is configured (and host is not unix://), or explicitly handling the tcp:// scheme used by Docker for TLS endpoints.

This issue also appears on line 189 of the same file.

	// Configure dialer based on host type
	switch {
	case strings.HasPrefix(host, "unix://"):
		socketPath := strings.TrimPrefix(host, "unix://")
		transport.DialContext = func(ctx context.Context, network, addr string) (net.Conn, error) {
			dialer := &net.Dialer{Timeout: config.DialTimeout}
			return dialer.DialContext(ctx, "unix", socketPath)
		}
		// HTTP/2 not supported on Unix sockets
		transport.ForceAttemptHTTP2 = false
	case strings.HasPrefix(host, "https://"):
		// HTTPS connections can use HTTP/2 via ALPN
		transport.ForceAttemptHTTP2 = true
		// Apply TLS from explicit config / env. The SDK's client.FromEnv
		// also tries to apply this, but client.WithHTTPClient below
		// overwrites the SDK's http.Client wholesale and would discard
		// that TLS material — see issue #607.
		if tlsCfg, err := resolveTLSConfig(config); err == nil && tlsCfg != nil {
			transport.TLSClientConfig = tlsCfg
		}
	default:
		// TCP without TLS - HTTP/2 not supported (no h2c in Docker)
		transport.ForceAttemptHTTP2 = false
	}

core/adapters/docker/client.go:195

  • Errors from resolveTLSConfig are silently ignored (err == nil guard). If TLS files are missing/invalid, Ofelia will proceed with an HTTPS transport but without the intended client cert/CA config, which can lead to confusing failures or an unintended unauthenticated TLS connection. It would be safer to propagate this error (e.g. have createHTTPClient/NewClientWithConfig return an error) so misconfiguration is surfaced early.
		// Apply TLS from explicit config / env. The SDK's client.FromEnv
		// also tries to apply this, but client.WithHTTPClient below
		// overwrites the SDK's http.Client wholesale and would discard
		// that TLS material — see issue #607.
		if tlsCfg, err := resolveTLSConfig(config); err == nil && tlsCfg != nil {
			transport.TLSClientConfig = tlsCfg
		}

Comment thread core/adapters/docker/client_mutation_test.go
CybotTM added a commit that referenced this pull request May 13, 2026
- Surface resolveTLSConfig errors via slog warning. Previously the call
  site silently swallowed the error, reproducing the very class of bug
  this PR fixes (security + test + DX reviewers all flagged this).
- Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is
  not a silent downgrade. Caught by the security review of PR #612.
- Use SDK-exported env constants (client.EnvOverrideHost,
  EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the
  SDK ever renames them we get a compile error.
- Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error
  so operators get the remediation hint inline.
- Note in resolveHostForTransport that PR #606 introduces a duplicate
  helper — whichever PR merges second consolidates.

Tests:
- TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out
  precedence (config > env) for the verify polarity.
- TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring
  default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true.
- TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the
  same TLS material as https://.

CHANGELOG: moved entry from ### Fixed to ### Security with explicit
upgrade-impact callout (operators relying on silent insecure dial will
break on upgrade — that is the point).

docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors
covering the post-fix failure modes.

Signed-off-by: Sebastian Mendel <[email protected]>
github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

CybotTM added a commit that referenced this pull request May 13, 2026
Security review (NO-SHIP) identified that adding tcp+tls:// to the
allow-list without wiring TLS material into the custom http.Transport is
a silent downgrade to plain TCP - exactly the bug class this PR was
meant to prevent.

PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold
tcp+tls:// from the allow-list and reject it loudly with the same error
as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls
with a transport-level test that asserts TLS is actually wired up.

Test review (conditional ship) noted the tcp+tls assertion was at
string-transform level, not transport level. With tcp+tls now rejected,
the assertion belongs on the rejection path - moved into the existing
TestCreateHTTPClient_UnsupportedSchemes table.

DX review: changelog reclassified to ### Changed with **BREAKING:**
marker per project convention; ssh:// migration advice added inline so
operators see the recommendation in the PR notes, not just in
TROUBLESHOOTING.

docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to
unsupported list with explanation pointing at #613.

Signed-off-by: Sebastian Mendel <[email protected]>
CybotTM added a commit that referenced this pull request May 13, 2026
PR #613's codecov/patch check failed at 78.37% (target 80%) because the
new `slog.Default().Warn(...)` branch on resolveTLSConfig errors was
unexercised. Add two tests:

- TestCreateHTTPClient_TLSLoadErrorDoesNotPanic exercises the surfacing
  behavior end-to-end: DOCKER_CERT_PATH points at an empty dir, so
  tlsconfig.Client errors out, the Warn branch fires, and the transport
  falls back to no TLS without panicking.
- TestResolveTLSConfig_MissingCertFiles directly exercises the helper's
  error wrap, pinning the operator-facing remediation hint
  ("expected ca.pem, cert.pem, key.pem") into the error message.

Coverage of createHTTPClient and resolveTLSConfig now 100%.

Signed-off-by: Sebastian Mendel <[email protected]>
github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

CybotTM added a commit that referenced this pull request May 13, 2026
…k on PR #613)

Copilot review noted that the docker CLI / SDK has long supported
DOCKER_HOST=tcp://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=... and
silently upgrades the connection to HTTPS. Mirroring that here closes a
silent-plaintext-downgrade gap for users following Docker's canonical
mTLS docs.

Refactor:
- Extract applyDockerTLS / hasTLSMaterial helpers so the https://,
  tcp+tls://, and tcp:// branches share the same TLS-application path.
  applyDockerTLS surfaces resolveTLSConfig errors via slog (same
  warn-don't-swallow pattern from the previous review-feedback commit).
- tcp:// remains plain TCP / HTTP/1.1 when no TLS material is configured;
  it upgrades to HTTPS / HTTP/2 only when DOCKER_CERT_PATH (or
  ClientConfig.TLSCertPath) is present.

Tests:
- TestCreateHTTPClient_TCPWithTLSEnvUpgrades pins the upgrade.
- TestCreateHTTPClient_TCPWithoutTLSEnvStaysPlaintext is the negative
  case: tcp:// without TLS env stays plaintext (otherwise the upgrade
  test alone could silently make every tcp:// HTTPS).
- TestHasTLSMaterial_ExplicitConfigOnly covers the explicit-config
  branch of the new helper.

Signed-off-by: Sebastian Mendel <[email protected]>
github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

CybotTM added a commit that referenced this pull request May 13, 2026
Security review (NO-SHIP) identified that adding tcp+tls:// to the
allow-list without wiring TLS material into the custom http.Transport is
a silent downgrade to plain TCP - exactly the bug class this PR was
meant to prevent.

PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold
tcp+tls:// from the allow-list and reject it loudly with the same error
as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls
with a transport-level test that asserts TLS is actually wired up.

Test review (conditional ship) noted the tcp+tls assertion was at
string-transform level, not transport level. With tcp+tls now rejected,
the assertion belongs on the rejection path - moved into the existing
TestCreateHTTPClient_UnsupportedSchemes table.

DX review: changelog reclassified to ### Changed with **BREAKING:**
marker per project convention; ssh:// migration advice added inline so
operators see the recommendation in the PR notes, not just in
TROUBLESHOOTING.

docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to
unsupported list with explanation pointing at #613.

Signed-off-by: Sebastian Mendel <[email protected]>
The Docker SDK adapter built its custom *http.Client via createHTTPClient
and attached it with client.WithHTTPClient. That option replaces the
SDK's c.client wholesale — and in doing so silently discards the
*tls.Config that client.FromEnv had just installed via
WithTLSClientConfigFromEnv, which is where DOCKER_CERT_PATH /
DOCKER_TLS_VERIFY are consumed.

Net effect: with DOCKER_HOST=https://... + DOCKER_TLS_VERIFY=1 +
DOCKER_CERT_PATH=..., the resulting transport had a nil TLSClientConfig,
so Ofelia either failed the handshake or fell back to system roots
without the operator-supplied client cert. Latent since the SDK adapter
landed in v0.12.0.

Fix: in createHTTPClient, when the resolved host scheme is https://,
build a *tls.Config via go-connections/tlsconfig.Client (already a
transitive dep) using the same precedence as the SDK and assign it to
transport.TLSClientConfig. Also resolve the host via DOCKER_HOST env so
the HTTPS branch is selected in the same configurations the SDK targets.

Add ClientConfig.TLSCertPath and ClientConfig.TLSVerify so callers can
override env-supplied values explicitly. Precedence:

  config field > env var > unset (no TLS modification)

DOCKER_TLS_VERIFY semantics match the SDK exactly: any non-empty value
(including "0") means verify; only an unset/empty value disables it.

Tests in client_mutation_test.go cover all four combinations
(env-only / explicit-override / unix-socket-ignored / no-cert-path) using
self-signed throwaway material under t.TempDir() and 127.0.0.1:0.

This is the TLS sibling of #606 (DOCKER_HOST dialer divergence).

Fixes #607

Signed-off-by: Sebastian Mendel <[email protected]>

fix(docker): address PR #613 review feedback

- Surface resolveTLSConfig errors via slog warning. Previously the call
  site silently swallowed the error, reproducing the very class of bug
  this PR fixes (security + test + DX reviewers all flagged this).
- Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is
  not a silent downgrade. Caught by the security review of PR #612.
- Use SDK-exported env constants (client.EnvOverrideHost,
  EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the
  SDK ever renames them we get a compile error.
- Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error
  so operators get the remediation hint inline.
- Note in resolveHostForTransport that PR #606 introduces a duplicate
  helper — whichever PR merges second consolidates.

Tests:
- TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out
  precedence (config > env) for the verify polarity.
- TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring
  default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true.
- TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the
  same TLS material as https://.

CHANGELOG: moved entry from ### Fixed to ### Security with explicit
upgrade-impact callout (operators relying on silent insecure dial will
break on upgrade — that is the point).

docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors
covering the post-fix failure modes.

Signed-off-by: Sebastian Mendel <[email protected]>

test(docker): cover the resolveTLSConfig error path

PR #613's codecov/patch check failed at 78.37% (target 80%) because the
new `slog.Default().Warn(...)` branch on resolveTLSConfig errors was
unexercised. Add two tests:

- TestCreateHTTPClient_TLSLoadErrorDoesNotPanic exercises the surfacing
  behavior end-to-end: DOCKER_CERT_PATH points at an empty dir, so
  tlsconfig.Client errors out, the Warn branch fires, and the transport
  falls back to no TLS without panicking.
- TestResolveTLSConfig_MissingCertFiles directly exercises the helper's
  error wrap, pinning the operator-facing remediation hint
  ("expected ca.pem, cert.pem, key.pem") into the error message.

Coverage of createHTTPClient and resolveTLSConfig now 100%.

Signed-off-by: Sebastian Mendel <[email protected]>

fix(docker): wire TLS for tcp:// + DOCKER_TLS_VERIFY (Copilot feedback on PR #613)

Copilot review noted that the docker CLI / SDK has long supported
DOCKER_HOST=tcp://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=... and
silently upgrades the connection to HTTPS. Mirroring that here closes a
silent-plaintext-downgrade gap for users following Docker's canonical
mTLS docs.

Refactor:
- Extract applyDockerTLS / hasTLSMaterial helpers so the https://,
  tcp+tls://, and tcp:// branches share the same TLS-application path.
  applyDockerTLS surfaces resolveTLSConfig errors via slog (same
  warn-don't-swallow pattern from the previous review-feedback commit).
- tcp:// remains plain TCP / HTTP/1.1 when no TLS material is configured;
  it upgrades to HTTPS / HTTP/2 only when DOCKER_CERT_PATH (or
  ClientConfig.TLSCertPath) is present.

Tests:
- TestCreateHTTPClient_TCPWithTLSEnvUpgrades pins the upgrade.
- TestCreateHTTPClient_TCPWithoutTLSEnvStaysPlaintext is the negative
  case: tcp:// without TLS env stays plaintext (otherwise the upgrade
  test alone could silently make every tcp:// HTTPS).
- TestHasTLSMaterial_ExplicitConfigOnly covers the explicit-config
  branch of the new helper.

Signed-off-by: Sebastian Mendel <[email protected]>
@sonarqubecloud

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

@CybotTM CybotTM added this pull request to the merge queue May 13, 2026
Merged via the queue into main with commit 824478f May 13, 2026
27 checks passed
@CybotTM CybotTM deleted the fix/issue-607 branch May 13, 2026 12:19
CybotTM added a commit that referenced this pull request May 13, 2026
The actual code fix for #605 cascaded into main via #613 (which added
DOCKER_HOST env-driven host resolution to createHTTPClient as a
side-effect of wiring TLS material). This commit adds the parts of
PR #606 that did not duplicate code already on main:

- TROUBLESHOOTING entry "DOCKER_HOST with TCP Socket Proxy
  (v0.12.0 - v0.24.0)" with the original symptom, root cause, and a
  working tecnativa/docker-socket-proxy compose snippet.
- CHANGELOG attribution under [Unreleased] -> Fixed for #606/#605
  alongside the existing #611/#612/#613/#618 entries, plus a Tests
  section noting the TestHealthStatus race fix that was also in #606.

The TestHealthStatus race fix was cherry-picked separately as the
preceding commit (it predates the recent batch and is unrelated to
the docker adapter).

Signed-off-by: Sebastian Mendel <[email protected]>
CybotTM added a commit that referenced this pull request May 14, 2026
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://...
is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent
ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme
to https:// before passing it to client.WithHost.

Without the rewrite, the SDK was given a tcp:// URL while the custom
HTTP transport was wired with TLS material via #613. Go's http.Transport
only triggers TLS for https:// URLs, so the cert material was silently
unused and connections went out plaintext, leaving operators believing
they had mTLS when they did not.

The rewrite is implemented as a tested helper
(upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig
right after host resolution. Other schemes (tcp+tls://, https://,
unix://, http://, npipe://) pass through unchanged. The
applyTCPTransport hasTLSMaterial branch stays in place to support
direct callers of createHTTPClient (notably
TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite
means dispatch goes through applyTLSTransport.

Updates the godoc on the tcp: schemeHandlers entry and the
TROUBLESHOOTING.md scheme table to make the new contract explicit.

Closes #634, follow-up to #613.

Signed-off-by: Sebastian Mendel <[email protected]>
CybotTM added a commit that referenced this pull request May 14, 2026
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://...
is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent
ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme
to https:// before passing it to client.WithHost.

Without the rewrite, the SDK was given a tcp:// URL while the custom
HTTP transport was wired with TLS material via #613. Go's http.Transport
only triggers TLS for https:// URLs, so the cert material was silently
unused and connections went out plaintext, leaving operators believing
they had mTLS when they did not.

The rewrite is implemented as a tested helper
(upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig
right after host resolution. Other schemes (tcp+tls://, https://,
unix://, http://, npipe://) pass through unchanged. The
applyTCPTransport hasTLSMaterial branch stays in place to support
direct callers of createHTTPClient (notably
TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite
means dispatch goes through applyTLSTransport.

Updates the godoc on the tcp: schemeHandlers entry and the
TROUBLESHOOTING.md scheme table to make the new contract explicit.

Closes #634, follow-up to #613.

Signed-off-by: Sebastian Mendel <[email protected]>
CybotTM added a commit that referenced this pull request May 14, 2026
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://...
is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent
ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme
to https:// before passing it to client.WithHost.

Without the rewrite, the SDK was given a tcp:// URL while the custom
HTTP transport was wired with TLS material via #613. Go's http.Transport
only triggers TLS for https:// URLs, so the cert material was silently
unused and connections went out plaintext, leaving operators believing
they had mTLS when they did not.

The rewrite is implemented as a tested helper
(upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig
right after host resolution. Other schemes (tcp+tls://, https://,
unix://, http://, npipe://) pass through unchanged. The
applyTCPTransport hasTLSMaterial branch stays in place to support
direct callers of createHTTPClient (notably
TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite
means dispatch goes through applyTLSTransport.

Updates the godoc on the tcp: schemeHandlers entry and the
TROUBLESHOOTING.md scheme table to make the new contract explicit.

Closes #634, follow-up to #613.

Signed-off-by: Sebastian Mendel <[email protected]>
CybotTM added a commit that referenced this pull request May 14, 2026
Mirror the docker CLI's silent scheme upgrade: when DOCKER_HOST=tcp://...
is combined with DOCKER_TLS_VERIFY / DOCKER_CERT_PATH (or the equivalent
ClientConfig.TLSCertPath / TLSVerify overrides), rewrite the host scheme
to https:// before passing it to client.WithHost.

Without the rewrite, the SDK was given a tcp:// URL while the custom
HTTP transport was wired with TLS material via #613. Go's http.Transport
only triggers TLS for https:// URLs, so the cert material was silently
unused and connections went out plaintext, leaving operators believing
they had mTLS when they did not.

The rewrite is implemented as a tested helper
(upgradeTCPToHTTPSIfTLSMaterial) called once from NewClientWithConfig
right after host resolution. Other schemes (tcp+tls://, https://,
unix://, http://, npipe://) pass through unchanged. The
applyTCPTransport hasTLSMaterial branch stays in place to support
direct callers of createHTTPClient (notably
TestCreateHTTPClient_TCPWithTLSEnvUpgrades); in production this rewrite
means dispatch goes through applyTLSTransport.

Updates the godoc on the tcp: schemeHandlers entry and the
TROUBLESHOOTING.md scheme table to make the new contract explicit.

Closes #634, follow-up to #613.

Signed-off-by: Sebastian Mendel <[email protected]>
@CybotTM CybotTM mentioned this pull request May 14, 2026
5 tasks
CybotTM added a commit that referenced this pull request May 15, 2026
Tightens PR #681 based on the multi-axis review fan-out:

- Add unit test on disableHTTP2AutoConfig itself (transport-level invariant)
  so a future stdlib refactor cannot silently re-introduce the bug while
  the end-to-end SDK test still passes by chance.
- Add TLS-path negative control (TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig)
  that guards against accidentally extending the suppression to the TLS
  apply path, which would silently disable ALPN h2 negotiation for https://
  and tcp+tls:// hosts.
- Add table test pinning that every non-TLS apply function calls
  disableHTTP2AutoConfig — catches the silent-omit footgun when new
  schemes are added in future.
- Add t.Skip placeholder for http:// hijack so the gap is visible in
  `go test -v` output and reachable via grep from the test directory.
- Apply disableHTTP2AutoConfig to the createHTTPClient unknown-scheme
  fallback (defensive — same hijack risk applies to direct test callers
  with exotic schemes).
- Fix t.Logf race in the fake-proxy serve goroutine by routing through
  http.Server.Shutdown and a done channel so the goroutine exits before
  test cleanup completes.
- Trim disableHTTP2AutoConfig doc comment (drop cost-justification
  paragraph and the upstream-quote nicety; keep the load-bearing WHY
  and the upstream-code reference).
- Add CHANGELOG.md Unreleased > Fixed entry (matches the docker-adapter
  release-notes convention from prior PRs #634 / #627 / #653 / #613).
- Add docs/TROUBLESHOOTING.md entry for v0.25.0 plain-tcp:// run_exec
  failure mode (slots next to the related #605 socket-proxy section);
  cross-reference from the older v0.11.0 HTTP/2 section so operators
  scanning by topic land on the right bug.

Deferred to follow-up issues (out of scope for this PR):
- Refactor to schemeHandler.tlsCapable flag to kill the silent-omit
  footgun structurally.
- Dead ClientConfig.HTTPClient field (pre-existing, unrelated cleanup).

Signed-off-by: Sebastian Mendel <[email protected]>
github-merge-queue Bot pushed a commit that referenced this pull request May 15, 2026
…ack (#668) (#681)

## Summary

Closes #668.

With `DOCKER_HOST=tcp://socket-proxy:2375` (plain HTTP, no TLS) the
regular HTTP path works but `ContainerExecAttach` (the hijack path used
by `run_exec`, `ContainerAttach`, `ContainerLogs --follow`) fails with:

```
tls: first record does not look like a TLS handshake
```

## Root cause

Go's `net/http` lazily auto-configures HTTP/2 on the first request and
**allocates `Transport.TLSClientConfig` in place** (to seed
`NextProtos=[h2 http/1.1]` for ALPN). The Docker SDK's hijack dialer
reads `baseTransport.TLSClientConfig` as its "TLS is required" signal.
After the first plain-HTTP request lands, the SDK misreads the
auto-installed config as operator-configured TLS and calls `tls.Dial`
against the plain Docker daemon — handshake fails because the daemon
answers with raw HTTP.

| Stage | `DialContext` | `TLSClientConfig` |
| --- | --- | --- |
| After `NewClientWithOpts` | nil | **nil** ✅ |
| After first `_ping` | nil | **non-nil** (`NextProtos=[h2 http/1.1]`) ❌
|

The SDK author already flagged the hijack dialer in [`moby/moby`
`client.go#L439-L443`](https://github.com/moby/moby/blob/v28.5.2/client/client.go#L439-L443)
as suspect: *"I honestly don't know why it doesn't do that."*

## Fix

One knob: set `transport.TLSNextProto` to a non-nil empty map on the
non-TLS apply paths (`applyUnixTransport`, `applyTCPTransport`,
`applyPlainTransport`) plus the `createHTTPClient` unknown-scheme
fallback. That is the documented Go stdlib contract for *"do not
auto-enable HTTP/2"*, which prevents the in-place `TLSClientConfig`
mutation. TLS Docker hosts (`https://`, `tcp+tls://`) keep
`TLSNextProto` nil so ALPN h2 negotiation still works there.

Extracted as `disableHTTP2AutoConfig` with a `// Why:` comment so the
subtle stdlib contract isn't deleted as dead code by a future reader.

## Tests

New `client_tcp_hijack_test.go` (verified bidirectionally — fails with
the exact user error pre-fix, passes post-fix):

- **`TestPlainTCPHijack_ContainerExecAttachWorks`** — drives the full
`ContainerExecAttach` hijack path against a plain-HTTP TCP listener with
101-Switching-Protocols semantics. The exact-shape regression test the
existing suite lacked.
- **`TestDisableHTTP2AutoConfig_KeepsTLSClientConfigNil`** — unit test
pinning the stdlib invariant directly so a future net/http refactor
cannot silently re-introduce the bug while the end-to-end SDK test still
passes by chance.
- **`TestApplyTLSTransport_DoesNotDisableHTTP2AutoConfig`** — negative
control: TLS apply paths must keep `TLSNextProto` nil so ALPN h2
negotiation remains intact. Guards against accidentally extending the
suppression to the TLS path.
- **`TestApplyTransport_NonTLSDisableHTTP2AutoConfig`** — table test
pinning that every non-TLS apply function calls the helper. Catches the
silent-omit footgun if new schemes are added.
- **`TestPlainHTTPHijack_ContainerExecAttach_FollowUp`** — `t.Skip`
placeholder so the gap is visible in `go test -v` and grep-able to #682.

Race-clean over 10 iterations under `go test -race`.

## Docs

- `CHANGELOG.md` — `[Unreleased] > Fixed` entry (matches the
docker-adapter release-notes convention from prior PRs #634 / #627 /
#653 / #613).
- `docs/TROUBLESHOOTING.md` — new section slotted next to the related
#605 socket-proxy section; cross-referenced from the older v0.11.0
HTTP/2 section so operators scanning by topic land on the right bug.

## Out of scope

`http://` hijack has a separate, pre-existing failure mode: SDK falls to
`net.Dial("http", addr)`, which Go rejects with `unknown network http`.
Fixing it needs a TCP `DialContext` on the http transport. Filed as
#682; called out in test comments and `t.Skip`-tracked above.

## Test plan

- [x] New regression test passes with fix, fails without it
- [x] `go test ./core/...` clean
- [x] `go test -race -count=10` clean (no flakes, no goroutine race)
- [x] `golangci-lint run ./core/adapters/docker/...` clean
- [ ] CI green
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker SDK adapter: WithHTTPClient clobbers TLS config from DOCKER_TLS_VERIFY / DOCKER_CERT_PATH

2 participants