Skip to content

Move load balancer sandbox creation/deletion into libnetwork#35422

Merged
mavenugo merged 3 commits intomoby:masterfrom
pradipd:lbfix
Dec 1, 2017
Merged

Move load balancer sandbox creation/deletion into libnetwork#35422
mavenugo merged 3 commits intomoby:masterfrom
pradipd:lbfix

Conversation

@pradipd
Copy link
Copy Markdown
Contributor

@pradipd pradipd commented Nov 7, 2017

- What I did
Fix for #35310

- How I did it
2 changes:

  1. Move sandbox creation/deletion into libnetwork. (Move load balancer sandbox creation/deletion into libnetwork. libnetwork#2011)
  2. Limit the number of scenarios in which we create the per-network lb sandbox. We only create the per network lb sandbox for:
    a. ingress network (both linux/windows)
    b. any overlay network on windows.

- How to verify it
Reran repro (described at the end of #35310)
UT/IT/PY

- Description for the changelog
Move load balancer sandbox creation/deletion into libnetwork

- A picture of a cute animal (not mandatory but encouraged)
kittens.

@pradipd pradipd changed the title Lbfix Move load balancer sandbox creation/deletion into libnetwork Nov 7, 2017
@thaJeztah
Copy link
Copy Markdown
Member

ping @mavenugo PTAL

Comment thread daemon/network.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls explain why we need this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design wise, I treated ingress like other overlay networks with a load balancer sandbox.
It requires the same steps to setup and remove the network (i.e. we must create the ingress-sbox, the ingress-endpoint, etc).
However, the ingress sandbox's lifetime is slightly different than the other networks.
The ingress sandbox exists even when there are no tasks connected to it.
But, other networks create the load balancer sandbox on the fly when the first task is scheduled, and remove it when the last task goes away.

Currently, when a task is removed, we delete all the networks it is connect to.
For ingress, this always returns "ActiveEndpoints" error.
The ingress sandbox and network are actually deleted in releaseIngress().

Now, by moving the deletion of the lb sandbox into libnetwork, libnetwork needs to figure out when to delete the lb sandbox (https://github.com/docker/libnetwork/pull/2011/files#diff-cba6b1b3488fe48ebbef7e1821a3745dR962). Libnetwork, deletes the network when the only endpoint left is the lb endpoint.
But, the ingress sandbox's exists even when there are no tasks left. So, we don't delete the ingress network here and only in releaseIngress().

One alternative I considered was to make the ingress's sandbox lifetime behave exactly like other overlay sandbox's. I.e. only create the ingress sandbox when a task is scheduled and remove it when the last task is gone. That would remove these special cases for ingress. However, I felt like that would be a bigger design change with more ramifications.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion to change the ingress sandbox lifetime wont work, because the purpose of ingress network was to provide the overlay connectivity from any node for any service with a task attached to any other node.

Also, technically the ingress-sbox is similar to the lb-sandbox that you have introduced. But the management of this sandbox and the ingress-endpoint is handled outside of your existing changes (I may have to spend some more time to understand if that is not the case). PS: I didnt review your previous changes and that might answer the lack of my knowledge in this area of the code.
I will comment more on this once I review the ingress-sbox management more thoroughly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd okay. the changes makes sense.

Comment thread daemon/network.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Ingress network special here ?

Copy link
Copy Markdown
Contributor Author

@pradipd pradipd Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I treated ingress like other overlay networks with a load balancer sandbox since setting up and deleting the load balancer sandbox is the same as other networks.

So, on linux, if we are creating the ingress network, we go ahead and create the load balancer sandbox. For all other overlay networks on linux, we don't do this...yet. Until linux takes advantage of the load balancer sandboxes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. I have to make sure this changes doesnt adversely impact Linux ingress network since the ingress-sbox is already managed in different code path. So, I would like to see if we can narrow down the scope of this change only to Windows without any impacts to Linux. We can remove the windows check once Linux makes full use of the LB-Sandbox concept.

But again, I have to spend a bit more time to understand the current changes made to ingress-sbox management in your previous ILB/ELB changes in to give a strong recommendation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can limit the change to windows even more if you prefer. It would just mean duplicate code for setting up the sandbox for ingress and other overlay networks on windows. But, it would be less risky for linux.
Here is the previous code to setup the ingress-sbox (https://github.com/moby/moby/blob/9be245f438f9fb2eaeb7891673b16aed9262a192/daemon/network.go)

And here is the current code in libnetwork PR (https://github.com/docker/libnetwork/pull/2011/files#diff-cba6b1b3488fe48ebbef7e1821a3745dR2057)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I think this change is fine as is.

@coolljt0725
Copy link
Copy Markdown
Contributor

ping @pradipd @mavenugo

@mavenugo
Copy link
Copy Markdown
Contributor

@pradipd apologies on the delay. I didnt find the time to evaluate the impact on the ingress changes (both the current one and the one introduced prior to this change). We can get this and the libnetwork PR moving once I get to complete that analysis... In the meantime if someone else can do the same, that might accelerate this review.

@fcrisciani
Copy link
Copy Markdown
Contributor

@pradipd can you revendor the latest libnetwork?

Comment thread daemon/network.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider also the case where the error is associated with the ingress network else this message can be misleading

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually considering that the ingress is actually an instance of a load balancer sandbox, the id should give enough information on what to look for

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Nov 29, 2017

Will try and get it today.

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Nov 30, 2017

@thaJeztah @mavenugo @fcrisciani Can someone restart/retry the "z" and "experimental" builds?

@mavenugo
Copy link
Copy Markdown
Contributor

@pradipd i tried. it is still failing. we have to look into this.

Copy link
Copy Markdown
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Dec 1, 2017

@thaJeztah @mavenugo @fcrisciani Can someone restart/retry the "experimental" build?

Copy link
Copy Markdown
Contributor

@fcrisciani fcrisciani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mavenugo mavenugo merged commit 4bb2c24 into moby:master Dec 1, 2017
@pradipd pradipd deleted the lbfix branch December 1, 2017 22:32
@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Dec 1, 2017

Thanks!

@swaroopkundeti
Copy link
Copy Markdown

swaroopkundeti commented Jan 8, 2018

This is still persisting on 17.12 version

I get

Unable to complete atomic operation, key modified
--

ctelfer added a commit to ctelfer/moby that referenced this pull request Mar 12, 2018
The commit moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
thaJeztah pushed a commit to thaJeztah/docker-ce that referenced this pull request Mar 14, 2018
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 805b6a7)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Mar 14, 2018
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
Upstream-commit: 805b6a7
Component: engine
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Jan 31, 2020
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 805b6a7)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 805b6a7)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
glours pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 13, 2020
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 805b6a7)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
glours pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
The commit moby/moby#35422 had the result of
accidentally causing the removal of the ingress network when the
last member of a service left the network.  This did not appear
in swarm instances because the swarm manager would still maintain
and return cluster state about the network even though it had
removed its sandbox and endpoint.  This test verifies that after a
service gets added and removed that the ingress sandbox remains
in a functional state.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 805b6a7)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants