Daemon to take care of ingress cleanup on cluster leave and graceful shutdown#32283
Daemon to take care of ingress cleanup on cluster leave and graceful shutdown#32283vdemeester merged 2 commits intomoby:masterfrom aboch:clearingress
Conversation
|
Thanks for figuring out and fixing the panic, @aboch! |
vendor.conf
Outdated
There was a problem hiding this comment.
Should we wait for a specific PR to go into libnetwork upstream before merging this PR?
There was a problem hiding this comment.
Yes, currently this PR is vendoring from a private libentwork .
daemon/network.go
Outdated
There was a problem hiding this comment.
How about having SetupIngress return a channel instead of taking one as an argument? It looks like SetupIngress is called rarely, and the overhead of allocating an extra channel when it's not needed would be negligible. The function signature could be:
func (daemon *Daemon) SetupIngress(create clustertypes.NetworkCreateRequest, nodeIP string) (<-chan struct, error) {There was a problem hiding this comment.
Thanks, that would be cleaner.
I agree the overhead is negligible.
daemon/network.go
Outdated
There was a problem hiding this comment.
Similarly to my comment on SetupIngress, ReleaseIngress could return a channel that gets closed when the job is completed.
func (daemon *Daemon) ReleaseIngress() (<-chan struct, error) {
daemon/daemon.go
Outdated
There was a problem hiding this comment.
This function calls clearAttachableNetworks - is that safe to call as part of the shutdown process? It seems to delete some networks.
There was a problem hiding this comment.
Yes, as long as it is being called before netController.Stop().
Currently, on daemon graceful shutdown, we were skipping all the cluster resources cleanup.
I feel having this gracefully cleanup what done during leave is the proper thing to do.
Although I am not sure yet if better to invoke this before or after the containers shutdown.
@tonistiigi input here would help.
There was a problem hiding this comment.
Regarding the networks being deleted, that is the right thing to do as the swarm networks are dynamic networks. Otherwise the respective plumbing would not be deleted on graceful shutdown of a daemon participating in a cluster. Neither it will be taken care on next daemon boot, as libnetwork is no longer aware those network existed (dynamic). So this is actually taking care of a today missing logic.
There was a problem hiding this comment.
Just to confirm, would these networks be recreated when the daemon next starts up?
There was a problem hiding this comment.
No. The dynamic networks are owned by swarm and created on demand when the first task (or user run container) is connected to them. And they are removed from the node once the last task is disconnected. This is the current behavior.
There was a problem hiding this comment.
So yes, on next daemon startup they will be recreated if the node is meant to join the cluster and the scheduler dispatch a task on this node on that network. Or if a user run container on that network has restart policy configured.
daemon/daemon.go
Outdated
There was a problem hiding this comment.
Let's either check the error here, or change ReleaseIngress so it does not return an error (currently it always returns nil).
There was a problem hiding this comment.
Ok. I will check the error. Just in case in future the function scope increases.
|
Ok, this is interesting: |
|
I moved the ingress cleanup after resetting the cluster provider, so after the nw agent to stop listening on cluster events. Experimental now passed: https://jenkins.dockerproject.org/job/Docker-PRs-experimental/32391/console I will spin few more to make sure the issue is gone. |
|
@aboch: How's it looking? |
|
New failure. At first look it does not seem related to this change. Janky test logs: Daemon logs: |
|
WindosRS1 failure (https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/12239/console) does not seem related as I saw it in other runs from other PRs: https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/12252/console. |
|
@aboch: Is this ready to merge? |
|
Yes, windowsRS1 failure not related. |
daemon/daemon.go
Outdated
There was a problem hiding this comment.
I noticed that setClusterProvider calls daemon.netController.SetClusterProvider, which writes to the c.cfg.Daemon.DisableProvider channel. Since this is asynchronous, do we have to wait for the message sent on that channel to be received? It seems like it's important that the cluster provider be disabled before moving on to ReleaseIngress, because when these were in the opposite order the panic still happened.
There was a problem hiding this comment.
Yes, ideally we should wait for the network cluster agent to shutdown. In practice it looks like the ingress removal job queuing and processing take longer than that.
Proper solution would be to add a AgentStopWait() function to Controller.
There was a problem hiding this comment.
// DaemonLeavesCluster informs the daemon has left the cluster
func (daemon *Daemon) DaemonLeavesCluster() {
I feel like this function name/comment needs modification. A daemon shutdown is different from a daemon leaves a cluster. If we are not careful, upgrading/restarting daemon may result in error, i.e., the node not able to join the cluster, or has error.
There was a problem hiding this comment.
Please refer to #30157 (comment) where the function naming was decided.
This function takes care of the things that need to be done when the node leaves the cluster.
At the same time the same set of actions needs to be taken care when a node is shutting down.
I did not want to duplicate the same code for the shutdown path.
Signed-off-by: Alessandro Boch <[email protected]>
|
Regarding the powerpc failure, Then the next test failure message is not really a panic, it's a induced trace dump on failure to stop the node in cluster.Cleanup. Looks like the integ-ci interprets it as a panic ? I will trigger a new run now. |
|
@aaronlehmann Taken care of your comment. Thanks. For reference, I have been logging all the extraneous ci failures which could have been incorrectly linked to the one this PR is fixing. |
|
@aboch: Did that failure show stack traces for any other goroutines? It would be useful to see where it was stuck. |
|
@aaronlehmann I will attach the full logs in a snippet somewhere. |
|
I think the agent is stuck, and I remember fixing a very similar bug recently, so I wonder if there's another case I didn't handle, or something about this change causes the fix to regress. cc @dongluochen |
|
The full daemon logs are now linked in my previous post. |
|
Indeed, it looks like there is a deadlock involving libnetwork: A few other goroutines are blocked on what looks like the same lock. I'm not sure which goroutine is holding it. |
|
Oh, this one is holding it: The deadlock is that cc @tonistiigi |
|
I don't think the deadlock discussed in the last few comments is related to the changes in this PR, so... LGTM But let's make sure to open an issue for the deadlock and try to fix it ASAP. |
|
Yes, I was about to post the same findings you posted above |
|
@aaronlehmann It looks like the reason why the lock is being hold that long is only because the waiting for |
|
...but it times out because of the deadlock. That's the problem. |
aaronlehmann
left a comment
There was a problem hiding this comment.
Put my LGTM in a comment but forgot to do this...
we can have the daemon naturally take care of the cleanup of the resources he created.
The cleanup will be done on cluster leave and on graceful shutdown.
and libnetwork were racing in disconnecting the ingress network endpoints on a daemon
shutdown.
Fixes #32173
- A picture of a cute animal (not mandatory but encouraged)
