fix(docker): validate DOCKER_HOST scheme and reject unsupported transports (#609)#612
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
✅ Mutation Testing ResultsMutation Score: 88.89% (threshold: 60%)
What is mutation testing?Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #612 +/- ##
==========================================
+ Coverage 87.23% 87.27% +0.04%
==========================================
Files 88 88
Lines 10643 10677 +34
==========================================
+ Hits 9284 9318 +34
Misses 1118 1118
Partials 241 241
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR hardens Ofelia’s Docker adapter by validating and normalizing the DOCKER_HOST URL scheme up-front, preventing unsupported/unknown schemes from silently falling through to an incorrect transport configuration and producing confusing runtime dial errors.
Changes:
- Add allow-list validation + scheme normalization for the effective Docker host during client construction, with a sentinel error for unsupported schemes.
- Update HTTP transport selection to switch on the parsed scheme (including
tcp+tlshandling for HTTP/2 enablement). - Add documentation, changelog entry, and unit tests covering supported/unsupported schemes and case-insensitivity.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/TROUBLESHOOTING.md | Documents supported/unsupported DOCKER_HOST schemes and the new actionable startup error. |
| core/adapters/docker/client.go | Introduces scheme validation/normalization and updates transport selection logic to avoid silent fallthrough. |
| core/adapters/docker/client_mutation_test.go | Adds table-driven tests for scheme validation, normalization, and env-var normalization behavior. |
| CHANGELOG.md | Notes the behavior change under Unreleased “Fixed”. |
There was a problem hiding this comment.
Code Review
This pull request introduces strict validation and normalization for the DOCKER_HOST URL scheme to prevent silent TLS downgrades and resolve case-sensitivity issues. It adds an allow-list of supported schemes, improves error reporting at startup, and updates the internal HTTP client logic. Documentation and tests were also updated to reflect these changes. Review feedback highlighted a potential nil pointer dereference in NewClientWithConfig and recommended using a local copy of the configuration to avoid mutating the caller's input.
- Surface resolveTLSConfig errors via slog warning. Previously the call site silently swallowed the error, reproducing the very class of bug this PR fixes (security + test + DX reviewers all flagged this). - Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is not a silent downgrade. Caught by the security review of PR #612. - Use SDK-exported env constants (client.EnvOverrideHost, EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the SDK ever renames them we get a compile error. - Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error so operators get the remediation hint inline. - Note in resolveHostForTransport that PR #606 introduces a duplicate helper — whichever PR merges second consolidates. Tests: - TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out precedence (config > env) for the verify polarity. - TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true. - TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the same TLS material as https://. CHANGELOG: moved entry from ### Fixed to ### Security with explicit upgrade-impact callout (operators relying on silent insecure dial will break on upgrade — that is the point). docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors covering the post-fix failure modes. Signed-off-by: Sebastian Mendel <[email protected]>
Security review (NO-SHIP) identified that adding tcp+tls:// to the allow-list without wiring TLS material into the custom http.Transport is a silent downgrade to plain TCP - exactly the bug class this PR was meant to prevent. PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold tcp+tls:// from the allow-list and reject it loudly with the same error as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls with a transport-level test that asserts TLS is actually wired up. Test review (conditional ship) noted the tcp+tls assertion was at string-transform level, not transport level. With tcp+tls now rejected, the assertion belongs on the rejection path - moved into the existing TestCreateHTTPClient_UnsupportedSchemes table. DX review: changelog reclassified to ### Changed with **BREAKING:** marker per project convention; ssh:// migration advice added inline so operators see the recommendation in the PR notes, not just in TROUBLESHOOTING. docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to unsupported list with explanation pointing at #613. Signed-off-by: Sebastian Mendel <[email protected]>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
- Add nil-config guard at the top of NewClientWithConfig so a nil arg
no longer panics on the daemon at startup (Gemini HIGH).
- Stop mutating the caller's *ClientConfig.Host. Copy the struct locally
for createHTTPClient so the dialer-selection switch sees the
normalized lowercase scheme without surprising the caller (Copilot
+ Gemini both flagged the silent mutation).
- Distinguish "missing scheme" from "unsupported scheme" with a
dedicated ErrMissingDockerHostScheme sentinel and rephrase the message
("missing URL scheme in DOCKER_HOST ...") so operators get a clearer
diagnosis (Copilot).
- Rename TestCreateHTTPClient_UnsupportedSchemes ->
TestNewClientWithConfig_UnsupportedSchemes since the validation lives
in NewClientWithConfig, not createHTTPClient (Copilot).
- Add TestNewClientWithConfig_NilConfig to pin the nil-safety contract.
- Add TestNewClientWithConfig_DoesNotMutateConfigHost to pin the
no-mutation contract.
- Add TestNewClientWithConfig_MissingScheme for the new error class.
Drive-by lint cleanups picked up by the linter on these files: switch
strings.Index to strings.Cut and the manual scheme-loop to
slices.Contains.
Signed-off-by: Sebastian Mendel <[email protected]>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
…ports
The Docker adapter's createHTTPClient switched on the DOCKER_HOST URL scheme
to select a dialer, but only matched "unix://" and "https://" explicitly.
Any other scheme silently fell through to a plain-TCP transport, producing
broken or surprising behavior:
- tcp+tls:// silently downgraded to plain TCP (insecure)
- ssh:// and fd:// failed with opaque dial errors
- TCP:// / UNIX:// (uppercase) fell through to default because the prefix
check was case-sensitive — RFC 3986 says schemes are case-insensitive
Resolve the effective host from config.Host, then DOCKER_HOST env, then the
SDK default. Validate the scheme against an explicit allow-list
(unix, tcp, tcp+tls, http, https, npipe) and normalize it to lowercase
before dispatch. Unsupported schemes now return a wrapped
ErrUnsupportedDockerHostScheme at construction time, listing the supported
schemes so operators get an actionable startup error instead of an opaque
dial error later.
tcp+tls:// is treated like https:// for transport selection (TLS, HTTP/2
via ALPN). http:// is treated like tcp:// (plain HTTP/1.1, no h2c).
npipe:// is on the allow-list so Windows configurations work
transparently; on non-Windows builds the actual named-pipe dialer is
absent and the connection fails at dial time — documented in
docs/TROUBLESHOOTING.md.
ssh:// and fd:// are deliberately not supported; the dialers (SSH tunnel,
systemd socket activation) are out of scope.
Adds table-driven tests:
- validateAndNormalizeHost covers the allow-list, case normalization,
Unix path-case preservation, and unsupported schemes.
- NewClientWithConfig_UnsupportedSchemes asserts the wrapped sentinel
error and that the message lists supported schemes.
- NewClientWithConfig_NormalizesDockerHostEnv asserts DOCKER_HOST=TCP://
is normalized (no t.Parallel — t.Setenv is incompatible).
- createHTTPClient_NpipeTransport documents the npipe contract.
Fixes #609
Signed-off-by: Sebastian Mendel <[email protected]>
Security review (NO-SHIP) identified that adding tcp+tls:// to the allow-list without wiring TLS material into the custom http.Transport is a silent downgrade to plain TCP - exactly the bug class this PR was meant to prevent. PR #613 (issue #607) adds the TLS plumbing. Until that lands, withhold tcp+tls:// from the allow-list and reject it loudly with the same error as ssh:// / fd://. Once #613 merges, follow-up to re-enable tcp+tls with a transport-level test that asserts TLS is actually wired up. Test review (conditional ship) noted the tcp+tls assertion was at string-transform level, not transport level. With tcp+tls now rejected, the assertion belongs on the rejection path - moved into the existing TestCreateHTTPClient_UnsupportedSchemes table. DX review: changelog reclassified to ### Changed with **BREAKING:** marker per project convention; ssh:// migration advice added inline so operators see the recommendation in the PR notes, not just in TROUBLESHOOTING. docs/TROUBLESHOOTING.md: drop tcp+tls from supported table; add to unsupported list with explanation pointing at #613. Signed-off-by: Sebastian Mendel <[email protected]>
- Add nil-config guard at the top of NewClientWithConfig so a nil arg
no longer panics on the daemon at startup (Gemini HIGH).
- Stop mutating the caller's *ClientConfig.Host. Copy the struct locally
for createHTTPClient so the dialer-selection switch sees the
normalized lowercase scheme without surprising the caller (Copilot
+ Gemini both flagged the silent mutation).
- Distinguish "missing scheme" from "unsupported scheme" with a
dedicated ErrMissingDockerHostScheme sentinel and rephrase the message
("missing URL scheme in DOCKER_HOST ...") so operators get a clearer
diagnosis (Copilot).
- Rename TestCreateHTTPClient_UnsupportedSchemes ->
TestNewClientWithConfig_UnsupportedSchemes since the validation lives
in NewClientWithConfig, not createHTTPClient (Copilot).
- Add TestNewClientWithConfig_NilConfig to pin the nil-safety contract.
- Add TestNewClientWithConfig_DoesNotMutateConfigHost to pin the
no-mutation contract.
- Add TestNewClientWithConfig_MissingScheme for the new error class.
Drive-by lint cleanups picked up by the linter on these files: switch
strings.Index to strings.Cut and the manual scheme-loop to
slices.Contains.
Signed-off-by: Sebastian Mendel <[email protected]>
|
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
The Docker SDK adapter built its custom *http.Client via createHTTPClient and attached it with client.WithHTTPClient. That option replaces the SDK's c.client wholesale — and in doing so silently discards the *tls.Config that client.FromEnv had just installed via WithTLSClientConfigFromEnv, which is where DOCKER_CERT_PATH / DOCKER_TLS_VERIFY are consumed. Net effect: with DOCKER_HOST=https://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=..., the resulting transport had a nil TLSClientConfig, so Ofelia either failed the handshake or fell back to system roots without the operator-supplied client cert. Latent since the SDK adapter landed in v0.12.0. Fix: in createHTTPClient, when the resolved host scheme is https://, build a *tls.Config via go-connections/tlsconfig.Client (already a transitive dep) using the same precedence as the SDK and assign it to transport.TLSClientConfig. Also resolve the host via DOCKER_HOST env so the HTTPS branch is selected in the same configurations the SDK targets. Add ClientConfig.TLSCertPath and ClientConfig.TLSVerify so callers can override env-supplied values explicitly. Precedence: config field > env var > unset (no TLS modification) DOCKER_TLS_VERIFY semantics match the SDK exactly: any non-empty value (including "0") means verify; only an unset/empty value disables it. Tests in client_mutation_test.go cover all four combinations (env-only / explicit-override / unix-socket-ignored / no-cert-path) using self-signed throwaway material under t.TempDir() and 127.0.0.1:0. This is the TLS sibling of #606 (DOCKER_HOST dialer divergence). Fixes #607 Signed-off-by: Sebastian Mendel <[email protected]> fix(docker): address PR #613 review feedback - Surface resolveTLSConfig errors via slog warning. Previously the call site silently swallowed the error, reproducing the very class of bug this PR fixes (security + test + DX reviewers all flagged this). - Extend TLS branch to also match tcp+tls:// so PR #609's new scheme is not a silent downgrade. Caught by the security review of PR #612. - Use SDK-exported env constants (client.EnvOverrideHost, EnvOverrideCertPath, EnvTLSVerify) instead of string literals; if the SDK ever renames them we get a compile error. - Append "(expected ca.pem, cert.pem, key.pem)" to the cert-load error so operators get the remediation hint inline. - Note in resolveHostForTransport that PR #606 introduces a duplicate helper — whichever PR merges second consolidates. Tests: - TestCreateHTTPClient_TLSVerifyExplicitFalse pins the explicit-opt-out precedence (config > env) for the verify polarity. - TestCreateHTTPClient_InsecureSkipVerifyDefaults pins the SDK-mirroring default: empty DOCKER_TLS_VERIFY -> InsecureSkipVerify=true. - TestCreateHTTPClient_TCPPlusTLSEnablesTLS asserts tcp+tls:// gets the same TLS material as https://. CHANGELOG: moved entry from ### Fixed to ### Security with explicit upgrade-impact callout (operators relying on silent insecure dial will break on upgrade — that is the point). docs/TROUBLESHOOTING.md: new section on TLS handshake / cert path errors covering the post-fix failure modes. Signed-off-by: Sebastian Mendel <[email protected]> test(docker): cover the resolveTLSConfig error path PR #613's codecov/patch check failed at 78.37% (target 80%) because the new `slog.Default().Warn(...)` branch on resolveTLSConfig errors was unexercised. Add two tests: - TestCreateHTTPClient_TLSLoadErrorDoesNotPanic exercises the surfacing behavior end-to-end: DOCKER_CERT_PATH points at an empty dir, so tlsconfig.Client errors out, the Warn branch fires, and the transport falls back to no TLS without panicking. - TestResolveTLSConfig_MissingCertFiles directly exercises the helper's error wrap, pinning the operator-facing remediation hint ("expected ca.pem, cert.pem, key.pem") into the error message. Coverage of createHTTPClient and resolveTLSConfig now 100%. Signed-off-by: Sebastian Mendel <[email protected]> fix(docker): wire TLS for tcp:// + DOCKER_TLS_VERIFY (Copilot feedback on PR #613) Copilot review noted that the docker CLI / SDK has long supported DOCKER_HOST=tcp://... + DOCKER_TLS_VERIFY=1 + DOCKER_CERT_PATH=... and silently upgrades the connection to HTTPS. Mirroring that here closes a silent-plaintext-downgrade gap for users following Docker's canonical mTLS docs. Refactor: - Extract applyDockerTLS / hasTLSMaterial helpers so the https://, tcp+tls://, and tcp:// branches share the same TLS-application path. applyDockerTLS surfaces resolveTLSConfig errors via slog (same warn-don't-swallow pattern from the previous review-feedback commit). - tcp:// remains plain TCP / HTTP/1.1 when no TLS material is configured; it upgrades to HTTPS / HTTP/2 only when DOCKER_CERT_PATH (or ClientConfig.TLSCertPath) is present. Tests: - TestCreateHTTPClient_TCPWithTLSEnvUpgrades pins the upgrade. - TestCreateHTTPClient_TCPWithoutTLSEnvStaysPlaintext is the negative case: tcp:// without TLS env stays plaintext (otherwise the upgrade test alone could silently make every tcp:// HTTPS). - TestHasTLSMaterial_ExplicitConfigOnly covers the explicit-config branch of the new helper. Signed-off-by: Sebastian Mendel <[email protected]>
The actual code fix for #605 cascaded into main via #613 (which added DOCKER_HOST env-driven host resolution to createHTTPClient as a side-effect of wiring TLS material). This commit adds the parts of PR #606 that did not duplicate code already on main: - TROUBLESHOOTING entry "DOCKER_HOST with TCP Socket Proxy (v0.12.0 - v0.24.0)" with the original symptom, root cause, and a working tecnativa/docker-socket-proxy compose snippet. - CHANGELOG attribution under [Unreleased] -> Fixed for #606/#605 alongside the existing #611/#612/#613/#618 entries, plus a Tests section noting the TestHealthStatus race fix that was also in #606. The TestHealthStatus race fix was cherry-picked separately as the preceding commit (it predates the recent batch and is unrelated to the docker adapter). Signed-off-by: Sebastian Mendel <[email protected]>
tcp+tls:// is an *explicit* TLS opt-in scheme. Until now, if an operator set DOCKER_HOST=tcp+tls://daemon:2376 but forgot DOCKER_CERT_PATH / DOCKER_TLS_VERIFY (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), resolveTLSConfig returned (nil, nil), applyDockerTLS left transport.TLSClientConfig nil, and the SDK silently dialed TLS using Go's stdlib defaults -- system CA bundle, NO client certificate. Operators who declared mTLS were getting unauthenticated TLS handshakes against any daemon that did not strictly require client auth. This is the analog, for tcp+tls://, of the silent plain-TCP downgrade closed by #612 / #625. Surfaced during the security review of #625; tracked in #627. Fix: in NewClientWithConfig, after resolveDockerHost succeeds, if scheme == tcp+tls and hasTLSMaterial(config) == false, wrap and return the new typed sentinel ErrTCPTLSRequiresCertMaterial. The error message points operators at docs/TROUBLESHOOTING.md, which gains a new section explaining the fix. tcp:// remains fail-open (it is ambiguous -- operators may rely on stdlib defaults). https:// remains fail-open with a warning, matching the upstream SDK's documented posture. TDD: TestNewClientWithConfig_TCPPlusTLSRequiresCertMaterial drives the gate; two positive counterparts cover both ClientConfig.TLSCertPath and DOCKER_CERT_PATH branches of hasTLSMaterial. Existing TestCreateHTTPClient_TCPPlusTLSEnablesTLS and TestNewClientWithConfig_TCPPlusTLSScheme remain green. Refs: #627, #625 Signed-off-by: Sebastian Mendel <[email protected]>



Summary
Fixes #609 — the Docker adapter's
createHTTPClientswitched on theDOCKER_HOSTURL scheme to pick a dialer, but only matchedunix://andhttps://explicitly. Every other scheme silently fell through to a plain-TCP transport, producing broken or surprising behavior:tcp+tls://silently downgraded to plain TCP (insecure).ssh://andfd://failed with opaque dial errors (no SSH/systemd-socket dialer wired up).TCP:///UNIX://(uppercase) fell intodefaultbecausestrings.HasPrefixis case-sensitive — RFC 3986 says URL schemes are case-insensitive.npipe://(Windows default) limped along only because the SDK has its own fallback dialer.Root cause
core/adapters/docker/client.go::createHTTPClientused a two-case prefix switch with no allow-list, no normalization, and no validation of the input scheme.Fix
NewClientWithConfig:config.Host→$DOCKER_HOST→ SDK default.validateAndNormalizeHostenforces an explicit allow-list —unix,tcp,tcp+tls,http,https,npipe— and lowercases the scheme (path is preserved verbatim sounix:///Var/Run/docker.sockstill works).ErrUnsupportedDockerHostScheme. Unsupported schemes return a wrapped error at construction listing the supported values, so operators get an actionable startup error instead of an opaque dial failure.createHTTPClientnow handlestcp+tls(HTTP/2 via ALPN, likehttps) and the implicithttp/npipe/tcpcases explicitly.ssh://andfd://are deliberately not supported — out of scope.Documentation
docs/TROUBLESHOOTING.md: new section listing supported / unsupported schemes and the resulting error message.CHANGELOG.md: entry under Unreleased / Fixed.Test plan
go test -race -count=3 ./core/adapters/docker/— passesgolangci-lint run --timeout=3m— 0 issuesgo test -count=1 ./...— full suite passesssh://,fd://,gopher://, missing scheme,tcp+tls://(now supported), case-insensitivity (TCP://,UNIX://,HtTpS://), Unix path-case preservation, empty string, andnpipe://.DOCKER_HOST=TCP://127.0.0.1:0test confirms env-var normalization (cannot bet.Parallel()—t.Setenvincompatibility, noted in code).Production paths untouched
unix://,tcp://,https://continue to take the same dialer / HTTP/2 setting as before.