Skip to content

networking: investigate EINTR regression after updating github.com/vishvananda/netlink to v1.3.0 #48400

@thaJeztah

Description

@thaJeztah

Description

github.com/vishvananda/netlink was updated to v1.3.0 in #46982, but resulted in flakiness in CI;

Error initializing network controller: list bridge addresses failed: interrupted system call

Upon first look, it was suggested that this was due to a missing condition for handling EINTR; #46982 (comment)

EINTR on netlink sockets is a new one. I suspect it has more to do with the netlink dependency bump you pulled in when rebasing than on the Go toolchain bump. I think the bug is here: https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L821-L824 It retries on EWOULDBLOCK (a.k.a. EAGAIN) but neglects to retry on EINTR .

However, it may be because of our use of SetSocketTimeout ; see vishvananda/netlink#793 (comment)

IO calls on non-blocking sockets will never return -EINTR. The problem here is that Moby calls SetSocketTimeout, which sets SO_SNDTIMEO and SO_RCVTIMEO. These socket options are only useful for sockets in blocking mode. Setting these probably places the socket back into blocking mode.

// Init initializes a new network namespace
func Init() {
var err error
initNs, err = netns.Get()
if err != nil {
log.G(context.TODO()).Errorf("could not get initial namespace: %v", err)
}
initNl, err = netlink.NewHandle(getSupportedNlFamilies()...)
if err != nil {
log.G(context.TODO()).Errorf("could not create netlink handle on initial namespace: %v", err)
}
err = initNl.SetSocketTimeout(NetlinkSocketsTimeout)
if err != nil {
log.G(context.TODO()).Warnf("Failed to set the timeout on the default netlink handle sockets: %v", err)
}
}

https://github.com/vishvananda/netlink/blob/92645823f36c7ed03faf4baa566078d9d5e06fda/nl/nl_linux.go#L848-L860

I'm not very familiar with this project, but it seems to me that before this PR, only blocking IO is used. This PR adds calls to set the socket as non-blocking, but still allows setting the timeout socket options.

Those SetSocketTimeout uses were added in moby/libnetwork@f459afb

to address

Also related:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions