Skip to content

Enabling ILB/ELB on windows using per-node, per-network LB endpoint.#2363

Merged
nishanttotla merged 1 commit intomoby:masterfrom
pradipd:windows_routingmesh
Sep 14, 2017
Merged

Enabling ILB/ELB on windows using per-node, per-network LB endpoint.#2363
nishanttotla merged 1 commit intomoby:masterfrom
pradipd:windows_routingmesh

Conversation

@pradipd
Copy link
Copy Markdown
Contributor

@pradipd pradipd commented Aug 29, 2017

This is 1 of 3 PRs (the other 2 are in libnetwork and moby repos).

This change enables ILB/ELB functionality on windows using a per-node, per-network LB endpoint. The per-node, per-network LB endpoint was discussed with docker
as a solution to moby/moby#30820. Note that only windows is using the LB endpoint. We have not changed the linux networking code to take advantage of the LB endpoint.

Signed-off-by: Pradip Dhara [email protected]

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Aug 29, 2017

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Aug 29, 2017

ping @mavenugo @fcrisciani @msabansal

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 31, 2017

Codecov Report

Merging #2363 into master will increase coverage by 0.41%.
The diff coverage is 56.06%.

@@            Coverage Diff             @@
##           master    #2363      +/-   ##
==========================================
+ Coverage   60.16%   60.58%   +0.41%     
==========================================
  Files         128      128              
  Lines       26191    26305     +114     
==========================================
+ Hits        15759    15937     +178     
+ Misses       9035     8957      -78     
- Partials     1397     1411      +14

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Sep 1, 2017

@aluzzardi , @aaronlehmann , @stevvooe Can someone take a look at this PR and give feedback? Thanks!

DeallocateTask(t *api.Task) error

// AllocateLBAttachment Allocates a load balancer endpoint
AllocateLBAttachment(node *api.Node, nid string) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a problem if this interface keeps growing. Are we at the right level of abstraction?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the interface is not growing. I just renamed AllocateNode to AllocateLBAttachment.
Given your other comment about "attachment" in api/objects.proto, I'm going to leave the name as AllocateNode in the next PR.
// AllocateNode allocates the IP addresses for the networks to which
// the node is attached.
AllocateNode(node *api.Node, nid string) error

Thanks again for the feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these previously doing attachment allocations?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear I'm going to remove AllocateLBAttachment and go back to AllocateNode. So, the interface won't have any additional members. Just the method signature will change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd I'm just wondering if we need a bit of refactoring here before making changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These were previously doing attachment allocations. Before it was only doing attachment for the ingress network. Now it is doing attachment allocations for all overlay networks.

As for refactoring: IMHO, the interface/abstraction seems correct to me. The NetworkAllocator handles network resources across the cluster. I.e. what network resource are needed for a task, service, node, network.

On the other hand, network allocator is the only allocator in the package. So, perhaps the abstraction looks off since there is only one resource being managed. Perhaps, if there were other resources (maybe a CPUallocator or MemoryAllocator), the abstraction would make more sense. E.g. I can easily see func (a *Allocator) allocateService (network.go:930) being updated in the future to also allocate other resource (CPU, Memory, etc.).

I don't really have a great answer on whether it's worth refactoring or not. The current abstraction make immediate sense to me.

Comment thread api/objects.proto Outdated

// LBAttachments carry per-network node attachment required for
// optimally placed distributed load-balancers.
map<string, NetworkAttachment> lb_attachments = 10;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to operate cooperatively with the attachment field. Do we need to deprecate that field?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachment can be deprecated. How do I deprecate it? Can you point me to an example? Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a GRPC directive that can mark it as deprecated. We'll have to look more into how this field appears in the Docker API to see impact there. (Check the swagger docs)

Also, this should not be a map type. Declare a new message with the decorated attributes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks for the feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding deprecating attachment: I followed the other examples in api*.proto and marked the field at deprecated using [deprecated=true].
Note that this differs from the recommendation from protobuf documentation https://developers.google.com/protocol-buffers/docs/proto3#updating
but, it is consistent with other deprecated fields in swarmkit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "Also, this should not be a map type. Declare a new message with the decorated attributes."
Thought about this, tried it out, and it looks really odd doing that. I moved the map into it's own message and the map was the only field.
Why do you think the map should be a new message? We use maps in many other messages.
Maybe I misunderstood your comment.
For now, I left it as is, and updated the documentation as suggested by aaronlehmann.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd Maps are almost never the right data structure, unless there is a uniqueness constraint (and even then, I would recommend against their usage). It also makes it unclear what the value of the key is.

Using a proper structure makes it clear what the values actually are. However, I am pretty confused about why you have it this way, since the network id is already a part of the attachment. This looks completely redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundancy is intentional. Maps are useful for scalability. I opted for a map (as opposed to an array) to make the lookup's more efficient. E.g. in AllocateLBAttachment().
If we use an array then we will iterate over all the networks for every node we allocate.
My assumptions is that there will be multiple networks. But, perhaps we will always have a very small number of networks (like 1 or 2).
I can change it to an array if you feel strongly about it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd So, index the array in the implementation. Don't choose the datastructure based on the current implementation but on what it represents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest changes.

@aaronlehmann
Copy link
Copy Markdown
Collaborator

ping @ijc

Comment thread api/objects.proto Outdated
NodeRole role = 9;

// LBAttachments carry per-network node attachment required for
// optimally placed distributed load-balancers.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should explain what keys are used in this map.

// Allocator state to indicate if allocation has been
// successfully completed for this node.
nodes map[string]struct{}
nodes map[string]map[string]struct{}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful for the comment to explain what the keyspace of the inner map is.

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Sep 8, 2017

@stevvooe @aaronlehmann
I think I have addressed all of your feedback. Any other feedback?

Comment thread manager/allocator/network.go Outdated

for _, na := range node.LbAttachments {
if _, ok := storeNodeLBMap[na.Network.ID]; !ok {
//PR-Question: Do we need to NetworkAttachment.Copy() the attachments?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishanttotla : Do we need to copy each network attachment? Or can I just do:
storeNode.LBAttachemtns = node.LbAttachments.
It is not clear why I need to make a copy since a copy will be done when the node is updated in the store.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to copy them here because UpdateNode will automatically make a deep copy of the node.

Comment thread manager/allocator/network.go Outdated
lbAttachment.Network = network.Copy()
if err := a.netCtx.nwkAllocator.AllocateLBAttachment(node, lbAttachment); err != nil {
log.G(ctx).WithError(err).Errorf("Failed to allocate network resources for node %s", node.ID)
//TODO: Should we add a unallocatedNode and retry allocating resources like we do for network, tasks, services?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishanttotla : Seems like we should have some retry logic here. As far as I can tell, the original code does not have any retry logic if we fail to allocate network resources for the ingress network.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd if retry logic does not exist, perhaps this PR doesn't need to add it. If I understand correctly, not having retry logic here will mean that if the LB endpoint attachment fails, then the endpoint does not work. Is there a way to allocate later in that case? What is the fix in case of failure?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing code, if we fail to allocate the endpoint for the ingress network, the node still becomes active, and we can schedule tasks on it. But, the node, won't be part of an load balancers (ILB or ELB(routing mesh)). This would be "bad" (I know that's arbitrary) for a customer, because they may not realize some node is not participating in the routing mesh.

My change does nothing to add this, other than leave a TODO.
If we were to address this, we should add retry logic similar to unallocatedTasks, unallocatedServices, and unallocatedNetworks. See L317 when we retry unallocated items after a timeout or when something was deleted.

Also, I feel addressing this should be a separate issue and have a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradipd I agree. Let's keep that separate.

cc @mavenugo @vieux for confirmation on this. We should track this as a follow-up because the load balancing may not work properly without it.

}
if err := na.allocateNetworkIPs(nAttach); err != nil {
if err := na.releaseEndpoints(t.Networks[:i]); err != nil {
log.G(context.TODO()).WithError(err).Errorf("Failed to release IP addresses while rolling back allocation for task %s network %s", t.ID, nAttach.Network.ID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: error messages start with lower case across the repo.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this verbatim from old code. some stuff got moved and made the diff wonky.

Copy link
Copy Markdown
Contributor

@nishanttotla nishanttotla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like for this PR to have some unit tests. @pradipd do you think it's possible to add some quickly? Let me know if you need help with that.

Also cc @dperny @anshulpundir for additional reviews.

@pradipd
Copy link
Copy Markdown
Contributor Author

pradipd commented Sep 14, 2017

I have one. I'll add it to the PR. And address the other issues/questions.

@dperny
Copy link
Copy Markdown
Collaborator

dperny commented Sep 14, 2017

It would also be helpful to have a small doc written explaining what this change does and means. I'm mostly following from the PR history but I'm unfamiliar with exactly what this is and why it's needed.

@dperny
Copy link
Copy Markdown
Collaborator

dperny commented Sep 14, 2017

Ok this has been explained to me out of band. It LGTM, with the caveat that I don't know this portion of the code very well.

Copy link
Copy Markdown
Contributor

@nishanttotla nishanttotla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I've read the design and the SwarmKit changes seem reasonable, even though I haven't reviewed other related PRs).

@pradipd couple of things:

  1. nits: fix comments in some places (space needed after //
  2. Squash your commits.

We can merge after that.

@pradipd pradipd force-pushed the windows_routingmesh branch from bc6f51c to 96a9860 Compare September 14, 2017 23:01
@GordonTheTurtle
Copy link
Copy Markdown

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "windows_routingmesh" [email protected]:pradipd/swarmkit.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@pradipd pradipd force-pushed the windows_routingmesh branch from 96a9860 to 9fa9ce1 Compare September 14, 2017 23:02
@nishanttotla nishanttotla merged commit bd7bafb into moby:master Sep 14, 2017
@pradipd pradipd deleted the windows_routingmesh branch September 19, 2017 04:59
@stevvooe
Copy link
Copy Markdown
Contributor

@nishanttotla Make sure that api/api.pb.txt is updated before merging.

@stevvooe
Copy link
Copy Markdown
Contributor

#2385 submitted with fixes. Please ensure those flow through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants