Skip to content

Add timeout for send and receive#170

Merged
vishvananda merged 1 commit intovishvananda:masterfrom
aboch:to
Nov 15, 2016
Merged

Add timeout for send and receive#170
vishvananda merged 1 commit intovishvananda:masterfrom
aboch:to

Conversation

@aboch
Copy link
Collaborator

@aboch aboch commented Nov 8, 2016

Because of kernel bugs, we have seen cases where the call to netlink socket send or receive will wait indefinitely causing docker engine to hang.

This PR allows client to set a timeout for socket send and receive for a netlink handle via Handle.SetTimeout(), in order to gracefully workaround the above situation.

A timeout logic was also requested in #131

Signed-off-by: Alessandro Boch [email protected]

@aboch aboch changed the title Add timeout to for send and receive Add timeout for send and receive Nov 8, 2016
@cpuguy83
Copy link

+1

@cpuguy83
Copy link

Seems a little funky that this is implemented using just go with a timeout waiting for the goroutine to return rather on the socket itself.

nl/nl_linux.go Outdated
if err != nil {
return nil, err
}
case <-time.After(req.Timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will leak a lot of timers. You should use NewTimer and stop them.

handle_linux.go Outdated
// SetTimeout sets the amount of time the library will wait for the socket send
// and for the socket receive on this handle to terminate before declaring a
// a timeout failure to the client.
func (h *Handle) SetTimeout(to time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd better go with NewHandleWithTimeout to avoid mutexes. But it's up to @vishvananda for decide of course.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid a precedence for adding a new constructor each time we add a new option.
This is more flexible and client will likely set the timeout once at handle creation. After that it is only going to be read locks.

nl/nl_linux.go Outdated
select {
case err := <-rch:
return res, err
case <-time.After(req.Timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think send and read should be in the same goroutine for having one timeout for whole Execute and simplify code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen reports where the hung happen in both the send and receive phase, I wanted the client to know during which phase the timeout happened.

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 10, 2016

@cpuguy83 yeah, there is syscall.SO_RCVTIMEO.

@aboch
Copy link
Collaborator Author

aboch commented Nov 10, 2016

Thanks @cpuguy83 @LK4D4
Agree it would be ideal to let the socket handle the timeout.
I will take a look at it.

@aboch
Copy link
Collaborator Author

aboch commented Nov 11, 2016

@cpuguy83 @LK4D4 Addressed the comments. Thanks. PTAL.

@vishvananda vishvananda merged commit 17ea11b into vishvananda:master Nov 15, 2016
@cpuguy83
Copy link

🎉 thanks @vishvananda!

@aboch aboch mentioned this pull request Aug 8, 2017
@Sn0rt
Copy link

Sn0rt commented Oct 23, 2017

it may a kernel bug.

you can try insert a probe into nettling_recvmsg and nettling_recvmsg.return by systemtap , then to count the number of call and return .

the count will be stable temporary , you will the its different between the number of call and return .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants