Skip to content

client: prevent idle connections leaking FDs#48736

Merged
thaJeztah merged 1 commit intomoby:masterfrom
laurazard:client-set-conn-timeout
Oct 25, 2024
Merged

client: prevent idle connections leaking FDs#48736
thaJeztah merged 1 commit intomoby:masterfrom
laurazard:client-set-conn-timeout

Conversation

@laurazard
Copy link
Copy Markdown
Member

@laurazard laurazard commented Oct 23, 2024

Patch from tonistiigi@af6ada9

- What I did

Without this change, if a long-lived process is using the client, idle connections are not released and grow over time.

We can also look into addressing this issue from the server side, but it doesn't hurt for the client package to have good defaults and not cause this.

- How I did it

Set MaxIdleConns and IdleConnTimeout on the defaultHTTPClient.

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@laurazard
Copy link
Copy Markdown
Member Author

I'll look into adding some tests for this too.

Comment thread client/client.go Outdated
Comment on lines +254 to +257
transport.MaxIdleConns = 6
transport.IdleConnTimeout = 30 * time.Second
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth to add a comment linking to the issue here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious; would setting these on the transport we pass to ConfigureTransport work? i.e., I don't think it overwrites properties it doesn't set? But perhaps it would make it more future-proof to consider these defaults we set (but can be overwritten by ConfigureTransport;

i.e.;

	transport := &http.Transport{
		MaxIdleConns:    6,
		IdleConnTimeout: 30 * time.Second,
	}
	err := sockets.ConfigureTransport(transport, hostURL.Scheme, hostURL.Host)

Copy link
Copy Markdown
Member Author

@laurazard laurazard Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth to add a comment linking to the issue here

Will do!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious; would setting these on the transport we pass to ConfigureTransport work?

It would work, sure, I'm okay with doing that. I'd rather not bikeshed too much on this change though, I want to take a proper look at go-connections afterwards and see if we can some changes there and then we can revisit this, this was just meant as a quick port of tonis's patch to address the immediate issue.

Comment thread client/client.go Outdated
Comment on lines +254 to +257
transport.MaxIdleConns = 6
transport.IdleConnTimeout = 30 * time.Second
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these configurable on client opts like WithMaxIdleCons / WithIdleConnTimeout so it would be applied to client transport like we do for WithTLSClientConfig ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe! I'll ping people for thoughts.

Regardless though, imo this change is about having better defaults – we can discuss/introduce options separately, unless this change breaks any existing client as-is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurazard laurazard force-pushed the client-set-conn-timeout branch from ccf7889 to 5c64770 Compare October 23, 2024 14:40
Patch from tonistiigi@af6ada9

Without this change, if a long-lived process uses the client and creates
connections, these connections are not released and grow over time.

We can also look into addressing this issue from the server side, but it
doesn't hurt for the `client` package to have good defaults and not
cause this.

Signed-off-by: Laura Brehm <[email protected]>
@laurazard laurazard force-pushed the client-set-conn-timeout branch from 5c64770 to 5c72a95 Compare October 23, 2024 15:14
Comment thread client/client.go
Comment on lines +256 to +257
transport.MaxIdleConns = 6
transport.IdleConnTimeout = 30 * time.Second
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values seem fairly low. Maybe we could keep them the same as defaults in http.DefaultTransport?

var DefaultTransport RoundTripper = &Transport{
    ...
	MaxIdleConns:          100,
	IdleConnTimeout:       90 * time.Second,
...

https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/net/http/transport.go;l=51-52

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like 100 idle connections is a lot, especially when those are all open FDs, but I'm open to changing it. Thoughts? @thaJeztah @corhere @crazy-max ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need as much as 100, but 6 feels a bit too conservative IMO 😅 These are still subject to the IdleConnTimeout so I think we can afford to go a little bit higher than that.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for regular use in the CLI it wouldn't make a huge difference, but possibly more so if the client is used in a long-lived process.

Looking at the defaults, MaxIdleConns is unlimited; https://github.com/golang/go/blob/2ef8e41f9543478a51a0147a735e4415737de09f/src/net/http/transport.go#L205-L207

// MaxIdleConns controls the maximum number of idle (keep-alive)
// connections across all hosts. Zero means no limit.
MaxIdleConns int

But MaxIdleConnsPerHost defaults to 2 (DefaultMaxIdleConnsPerHost)

https://github.com/golang/go/blob/2ef8e41f9543478a51a0147a735e4415737de09f/src/net/http/transport.go#L209-L212

// MaxIdleConnsPerHost, if non-zero, controls the maximum idle
// (keep-alive) connections to keep per-host. If zero,
// DefaultMaxIdleConnsPerHost is used.
MaxIdleConnsPerHost int

https://github.com/golang/go/blob/2ef8e41f9543478a51a0147a735e4415737de09f/src/net/http/transport.go#L58-L60

// DefaultMaxIdleConnsPerHost is the default value of [Transport]'s
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2

Which would mean that with the current settings, the client would have to have idle connections for 5 different hosts before this limit kicks in. Raising MaxIdleConns to 100 _without_ raising DefaultMaxIdleConnsPerHost` means that the client would need to have idle connections for 50 different hosts before it taking effect. That may be very unlikely (as hosts here would be "Engine APIs" or perhaps "registry to authenticate").

Also did a check for prior art in our codebase; here's some defaults that are set by SDKs we use;

git grep -E '(MaxIdleConns|IdleConnTimeout)[^!]*[:=]'
plugin/registry.go:                                                                     IdleConnTimeout:     30 * time.Second,
vendor/cloud.google.com/go/compute/metadata/CHANGES.md:* **compute/metadata:** Set IdleConnTimeout for http.Client ([#7084](https://github.com/googleapis/google-cloud-go/issues/7084)) ([766516a](https://github.com/googleapis/google-cloud-go/commit/766516aaf3816bfb3159efeea65aa3d1d205a3e2)), refs [#5430](https://github.com/googleapis/google-cloud-go/issues/5430)
vendor/cloud.google.com/go/compute/metadata/metadata.go:                                IdleConnTimeout: 60 * time.Second,
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       DefaultHTTPTransportMaxIdleConns        = 100
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       DefaultHTTPTransportMaxIdleConnsPerHost = 10
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       DefaultHTTPTransportIdleConnTimeout       = 90 * time.Second
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       MaxIdleConns:          DefaultHTTPTransportMaxIdleConns,
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       MaxIdleConnsPerHost:   DefaultHTTPTransportMaxIdleConnsPerHost,
vendor/github.com/aws/aws-sdk-go-v2/aws/transport/http/client.go:                       IdleConnTimeout:       DefaultHTTPTransportIdleConnTimeout,
vendor/github.com/moby/buildkit/util/resolver/resolver.go:                              MaxIdleConns:          30,
vendor/github.com/moby/buildkit/util/resolver/resolver.go:                              IdleConnTimeout:       120 * time.Second,
vendor/github.com/moby/buildkit/util/resolver/resolver.go:                              MaxIdleConnsPerHost:   4,
vendor/go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp/client.go:     MaxIdleConns:          100,
vendor/go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp/client.go:     IdleConnTimeout:       90 * time.Second,
vendor/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp/client.go:       MaxIdleConns:          100,
vendor/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp/client.go:       IdleConnTimeout:       90 * time.Second,
vendor/google.golang.org/api/internal/creds.go:                                         MaxIdleConns:          100,
vendor/google.golang.org/api/internal/creds.go:                                         MaxIdleConnsPerHost:   100,
vendor/google.golang.org/api/internal/creds.go:                                         IdleConnTimeout:       90 * time.Second,
vendor/google.golang.org/api/transport/http/dial.go:                                    trans.MaxIdleConnsPerHost = 100
vendor/google.golang.org/api/transport/http/dial.go:                                    MaxIdleConns:          100,
vendor/google.golang.org/api/transport/http/dial.go:                                    MaxIdleConnsPerHost:   100,
vendor/google.golang.org/api/transport/http/dial.go:                                    IdleConnTimeout:       90 * time.Second,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm right, I missed that MaxIdleConnsPerHost defaults to 2.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

client idle connections leak FDs against Docker Desktop

5 participants