Retry AttachNetwork when it fails to find network#27466
Retry AttachNetwork when it fails to find network#27466thaJeztah merged 1 commit intomoby:masterfrom
Conversation
daemon/container_operations.go
Outdated
There was a problem hiding this comment.
@mrjana discussing this; should we have a limit on this loop? And a delay?
There was a problem hiding this comment.
@thaJeztah The retry should only happen when the last container on a network is going away or disconnecting while at some short time window a new container is trying to connect to the same network. So even though in theory this might loop indefinitely in practice it should never retry more than once. Also we should probably not add a delay since the chance of getting into the another such time window probably increases with more delay so the best possible course is to retry immediately rather than waiting.
|
ping @mavenugo |
|
LGTM |
daemon/container_operations.go
Outdated
There was a problem hiding this comment.
@mrjana discussing this; should we have a limit on this loop? And a delay?
When trying to attach to swarm scope network for an unmanaged container sometimes even if attaching to network succeeds, we may not find the network because some other container which was using the network went down and removed the network. So if it is not found, try to detach and reattach to re-download the network from the manager. Fixes moby#26588 Signed-off-by: Jana Radhakrishnan <[email protected]>
|
@thaJeztah Added a retry limit. |
|
SGTM (not a maintainer) |
|
ping @thaJeztah |
| // the process of attaching. | ||
| if config != nil { | ||
| if _, ok := err.(libnetwork.ErrNoSuchNetwork); ok { | ||
| if retryCount >= 5 { |
There was a problem hiding this comment.
Any reason for doing this, instead of for i := 0; i < 5; i++ { (or for retries := 0; retries < 5; retries++ {)?
There was a problem hiding this comment.
Because I need to return a specific error if this failed because of exceeding number of retries.
There was a problem hiding this comment.
Ah, hm, I'd have expected the error to be after the for loop (i.e., if no network was found after 5 attempts, generate, and return an error).
It's just a nit, so let's keep it
When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.
Fixes #26588
Signed-off-by: Jana Radhakrishnan [email protected]