Retry on EINTR from netlink dump calls#48407
Conversation
66b72ce to
0f90653
Compare
8e96013 to
7afc608
Compare
7afc608 to
9e74044
Compare
internal/nlutil/nlutil_linux.go
Outdated
| "golang.org/x/sys/unix" | ||
| ) | ||
|
|
||
| const maxAttempts = 5 |
There was a problem hiding this comment.
Can you add a comment "why 5"?
There was a problem hiding this comment.
Yes - done ... it's an arbitrary limit, just a small number that's more than one!
9e74044 to
e0aec27
Compare
internal/nlutil/nlutil_linux.go
Outdated
| if err := f(); !errors.Is(err, unix.EINTR) { | ||
| return | ||
| } |
There was a problem hiding this comment.
Silly question; any other error is intentionally swallowed, correct?
There was a problem hiding this comment.
Swallowed here in the retry loop, yes ... but also assigned in the caller's closure, and returned from this package from there.
f83c3d5 to
6b96a9f
Compare
6b96a9f to
c2f4710
Compare
A recent change to the vishvananda/netlink package exposes NLM_F_DUMP_INTR in some netlink responses as an EINTR (with no data). Retry the requests when that happens, up to five times, before returning the error. The limit of five is arbitrary, on most systems a single retry will be rare but, there's no guarantee that a retry will succeed. So, on a very busy or misbehaving system the error may still be returned. In most cases, this will lead to failure of the operation being attempted (which may lead to daemon startup failure, network initialisation failure etc). Signed-off-by: Rob Murray <[email protected]>
Spot netlink functions that may return EINTR because network configuration changed during a state dump, and point at the nlutil wrappers. Signed-off-by: Rob Murray <[email protected]>
c2f4710 to
edaa0eb
Compare
| func NewHandle(nlFamilies ...int) (Handle, error) { | ||
| nlh, err := netlink.NewHandle(nlFamilies...) | ||
| if err != nil { | ||
| return Handle{}, err |
There was a problem hiding this comment.
Curious; should we return a pointer here (and nil on error), similar to how the thing we wrap returns a pointer?
moby/vendor/github.com/vishvananda/netlink/handle_linux.go
Lines 50 to 52 in b6398f1
There was a problem hiding this comment.
That's what I did initially, but @corhere pointed out ...
My only complaint is the extra pointer indirection. Since
nlwrap.Handleis effectively a reference type, a*nlwrap.Handleis laid out in memory identically to a**netlink.Handle. Passing aroundnlwrap.Handlevalues by-value would be more efficient
It's a reference type in the same vein as slices -- a value which carries a reference
There was a problem hiding this comment.
Ah, right, hm, yeah, I was trying to see if we could make it a full drop-in replacement
EINTRregression after updating github.com/vishvananda/netlink to v1.3.0 #48400- What I did
A recent change to the vishvananda/netlink package (vishvananda/netlink#925 in v1.2.1) exposes
NLM_F_DUMP_INTRin some netlink responses as anEINTRerror return, with no data.This has been causing failures in CI testing.
- How I did it
Retry the requests on
EINTR, up to five times, before returning the error.The limit of five is arbitrary, on most systems a single retry will be rare but, there's no guarantee that it will succeed. So, on a very busy or misbehaving system the error may still be returned. In most cases, this will lead to failure of the operation being attempted (which may lead to daemon startup failure, network initialisation failure etc).
Also added a
forbidigolinting rule to spot use of un-wrapped netlink functions.If vishvananda/netlink#1018 is accepted and merged, we'll have the option of using the inconsistent/incomplete results - either in cases where the retry is unnecessary because some version of the interesting data was returned anyway, or because retries have all failed and we want to continue with the same data (and possibly bugs) we'd have seen before the netlink package changes.
- How to verify it
CI tests pass more reliably, with no errors about interrupted calls.
- Description for the changelog