Move load balancer sandbox creation/deletion into libnetwork#35422
Move load balancer sandbox creation/deletion into libnetwork#35422mavenugo merged 3 commits intomoby:masterfrom
Conversation
|
ping @mavenugo PTAL |
There was a problem hiding this comment.
Can you pls explain why we need this ?
There was a problem hiding this comment.
Design wise, I treated ingress like other overlay networks with a load balancer sandbox.
It requires the same steps to setup and remove the network (i.e. we must create the ingress-sbox, the ingress-endpoint, etc).
However, the ingress sandbox's lifetime is slightly different than the other networks.
The ingress sandbox exists even when there are no tasks connected to it.
But, other networks create the load balancer sandbox on the fly when the first task is scheduled, and remove it when the last task goes away.
Currently, when a task is removed, we delete all the networks it is connect to.
For ingress, this always returns "ActiveEndpoints" error.
The ingress sandbox and network are actually deleted in releaseIngress().
Now, by moving the deletion of the lb sandbox into libnetwork, libnetwork needs to figure out when to delete the lb sandbox (https://github.com/docker/libnetwork/pull/2011/files#diff-cba6b1b3488fe48ebbef7e1821a3745dR962). Libnetwork, deletes the network when the only endpoint left is the lb endpoint.
But, the ingress sandbox's exists even when there are no tasks left. So, we don't delete the ingress network here and only in releaseIngress().
One alternative I considered was to make the ingress's sandbox lifetime behave exactly like other overlay sandbox's. I.e. only create the ingress sandbox when a task is scheduled and remove it when the last task is gone. That would remove these special cases for ingress. However, I felt like that would be a bigger design change with more ramifications.
There was a problem hiding this comment.
The suggestion to change the ingress sandbox lifetime wont work, because the purpose of ingress network was to provide the overlay connectivity from any node for any service with a task attached to any other node.
Also, technically the ingress-sbox is similar to the lb-sandbox that you have introduced. But the management of this sandbox and the ingress-endpoint is handled outside of your existing changes (I may have to spend some more time to understand if that is not the case). PS: I didnt review your previous changes and that might answer the lack of my knowledge in this area of the code.
I will comment more on this once I review the ingress-sbox management more thoroughly.
There was a problem hiding this comment.
Why is Ingress network special here ?
There was a problem hiding this comment.
As mentioned above, I treated ingress like other overlay networks with a load balancer sandbox since setting up and deleting the load balancer sandbox is the same as other networks.
So, on linux, if we are creating the ingress network, we go ahead and create the load balancer sandbox. For all other overlay networks on linux, we don't do this...yet. Until linux takes advantage of the load balancer sandboxes.
There was a problem hiding this comment.
Same comment as above. I have to make sure this changes doesnt adversely impact Linux ingress network since the ingress-sbox is already managed in different code path. So, I would like to see if we can narrow down the scope of this change only to Windows without any impacts to Linux. We can remove the windows check once Linux makes full use of the LB-Sandbox concept.
But again, I have to spend a bit more time to understand the current changes made to ingress-sbox management in your previous ILB/ELB changes in to give a strong recommendation.
There was a problem hiding this comment.
I can limit the change to windows even more if you prefer. It would just mean duplicate code for setting up the sandbox for ingress and other overlay networks on windows. But, it would be less risky for linux.
Here is the previous code to setup the ingress-sbox (https://github.com/moby/moby/blob/9be245f438f9fb2eaeb7891673b16aed9262a192/daemon/network.go)
And here is the current code in libnetwork PR (https://github.com/docker/libnetwork/pull/2011/files#diff-cba6b1b3488fe48ebbef7e1821a3745dR2057)
There was a problem hiding this comment.
Thanks for the clarification. I think this change is fine as is.
|
@pradipd apologies on the delay. I didnt find the time to evaluate the impact on the ingress changes (both the current one and the one introduced prior to this change). We can get this and the libnetwork PR moving once I get to complete that analysis... In the meantime if someone else can do the same, that might accelerate this review. |
|
@pradipd can you revendor the latest libnetwork? |
There was a problem hiding this comment.
we should consider also the case where the error is associated with the ingress network else this message can be misleading
There was a problem hiding this comment.
actually considering that the ingress is actually an instance of a load balancer sandbox, the id should give enough information on what to look for
|
Will try and get it today. |
Signed-off-by: Pradip Dhara <[email protected]>
Signed-off-by: Pradip Dhara <[email protected]>
Signed-off-by: Pradip Dhara <[email protected]>
|
@thaJeztah @mavenugo @fcrisciani Can someone restart/retry the "z" and "experimental" builds? |
|
@pradipd i tried. it is still failing. we have to look into this. |
|
@thaJeztah @mavenugo @fcrisciani Can someone restart/retry the "experimental" build? |
|
Thanks! |
|
This is still persisting on 17.12 version I get |
The commit moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]>
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> (cherry picked from commit 805b6a7) Signed-off-by: Sebastiaan van Stijn <[email protected]>
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> Upstream-commit: 805b6a7 Component: engine
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> (cherry picked from commit 805b6a7) Signed-off-by: Sebastiaan van Stijn <[email protected]>
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> (cherry picked from commit 805b6a7) Signed-off-by: Sebastiaan van Stijn <[email protected]>
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> (cherry picked from commit 805b6a7) Signed-off-by: Sebastiaan van Stijn <[email protected]>
The commit moby/moby#35422 had the result of accidentally causing the removal of the ingress network when the last member of a service left the network. This did not appear in swarm instances because the swarm manager would still maintain and return cluster state about the network even though it had removed its sandbox and endpoint. This test verifies that after a service gets added and removed that the ingress sandbox remains in a functional state. Signed-off-by: Chris Telfer <[email protected]> (cherry picked from commit 805b6a7) Signed-off-by: Sebastiaan van Stijn <[email protected]>
- What I did
Fix for #35310
- How I did it
2 changes:
a. ingress network (both linux/windows)
b. any overlay network on windows.
- How to verify it
Reran repro (described at the end of #35310)
UT/IT/PY
- Description for the changelog
Move load balancer sandbox creation/deletion into libnetwork
- A picture of a cute animal (not mandatory but encouraged)
kittens.