Enabling ILB/ELB on windows using per-node, per-network LB endpoint.#2363
Enabling ILB/ELB on windows using per-node, per-network LB endpoint.#2363nishanttotla merged 1 commit intomoby:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2363 +/- ##
==========================================
+ Coverage 60.16% 60.58% +0.41%
==========================================
Files 128 128
Lines 26191 26305 +114
==========================================
+ Hits 15759 15937 +178
+ Misses 9035 8957 -78
- Partials 1397 1411 +14 |
|
@aluzzardi , @aaronlehmann , @stevvooe Can someone take a look at this PR and give feedback? Thanks! |
| DeallocateTask(t *api.Task) error | ||
|
|
||
| // AllocateLBAttachment Allocates a load balancer endpoint | ||
| AllocateLBAttachment(node *api.Node, nid string) error |
There was a problem hiding this comment.
I think we have a problem if this interface keeps growing. Are we at the right level of abstraction?
There was a problem hiding this comment.
Technically, the interface is not growing. I just renamed AllocateNode to AllocateLBAttachment.
Given your other comment about "attachment" in api/objects.proto, I'm going to leave the name as AllocateNode in the next PR.
// AllocateNode allocates the IP addresses for the networks to which
// the node is attached.
AllocateNode(node *api.Node, nid string) error
Thanks again for the feedback.
There was a problem hiding this comment.
Were these previously doing attachment allocations?
There was a problem hiding this comment.
Just to be clear I'm going to remove AllocateLBAttachment and go back to AllocateNode. So, the interface won't have any additional members. Just the method signature will change.
There was a problem hiding this comment.
@pradipd I'm just wondering if we need a bit of refactoring here before making changes.
There was a problem hiding this comment.
Yes. These were previously doing attachment allocations. Before it was only doing attachment for the ingress network. Now it is doing attachment allocations for all overlay networks.
As for refactoring: IMHO, the interface/abstraction seems correct to me. The NetworkAllocator handles network resources across the cluster. I.e. what network resource are needed for a task, service, node, network.
On the other hand, network allocator is the only allocator in the package. So, perhaps the abstraction looks off since there is only one resource being managed. Perhaps, if there were other resources (maybe a CPUallocator or MemoryAllocator), the abstraction would make more sense. E.g. I can easily see func (a *Allocator) allocateService (network.go:930) being updated in the future to also allocate other resource (CPU, Memory, etc.).
I don't really have a great answer on whether it's worth refactoring or not. The current abstraction make immediate sense to me.
|
|
||
| // LBAttachments carry per-network node attachment required for | ||
| // optimally placed distributed load-balancers. | ||
| map<string, NetworkAttachment> lb_attachments = 10; |
There was a problem hiding this comment.
This needs to operate cooperatively with the attachment field. Do we need to deprecate that field?
There was a problem hiding this comment.
attachment can be deprecated. How do I deprecate it? Can you point me to an example? Thanks.
There was a problem hiding this comment.
There is a GRPC directive that can mark it as deprecated. We'll have to look more into how this field appears in the Docker API to see impact there. (Check the swagger docs)
Also, this should not be a map type. Declare a new message with the decorated attributes.
There was a problem hiding this comment.
Will do. Thanks for the feedback.
There was a problem hiding this comment.
Regarding deprecating attachment: I followed the other examples in api*.proto and marked the field at deprecated using [deprecated=true].
Note that this differs from the recommendation from protobuf documentation https://developers.google.com/protocol-buffers/docs/proto3#updating
but, it is consistent with other deprecated fields in swarmkit.
There was a problem hiding this comment.
Regarding "Also, this should not be a map type. Declare a new message with the decorated attributes."
Thought about this, tried it out, and it looks really odd doing that. I moved the map into it's own message and the map was the only field.
Why do you think the map should be a new message? We use maps in many other messages.
Maybe I misunderstood your comment.
For now, I left it as is, and updated the documentation as suggested by aaronlehmann.
There was a problem hiding this comment.
@pradipd Maps are almost never the right data structure, unless there is a uniqueness constraint (and even then, I would recommend against their usage). It also makes it unclear what the value of the key is.
Using a proper structure makes it clear what the values actually are. However, I am pretty confused about why you have it this way, since the network id is already a part of the attachment. This looks completely redundant.
There was a problem hiding this comment.
The redundancy is intentional. Maps are useful for scalability. I opted for a map (as opposed to an array) to make the lookup's more efficient. E.g. in AllocateLBAttachment().
If we use an array then we will iterate over all the networks for every node we allocate.
My assumptions is that there will be multiple networks. But, perhaps we will always have a very small number of networks (like 1 or 2).
I can change it to an array if you feel strongly about it.
There was a problem hiding this comment.
@pradipd So, index the array in the implementation. Don't choose the datastructure based on the current implementation but on what it represents.
There was a problem hiding this comment.
Addressed in the latest changes.
|
ping @ijc |
| NodeRole role = 9; | ||
|
|
||
| // LBAttachments carry per-network node attachment required for | ||
| // optimally placed distributed load-balancers. |
There was a problem hiding this comment.
The comment should explain what keys are used in this map.
| // Allocator state to indicate if allocation has been | ||
| // successfully completed for this node. | ||
| nodes map[string]struct{} | ||
| nodes map[string]map[string]struct{} |
There was a problem hiding this comment.
It would be helpful for the comment to explain what the keyspace of the inner map is.
|
@stevvooe @aaronlehmann |
|
|
||
| for _, na := range node.LbAttachments { | ||
| if _, ok := storeNodeLBMap[na.Network.ID]; !ok { | ||
| //PR-Question: Do we need to NetworkAttachment.Copy() the attachments? |
There was a problem hiding this comment.
@nishanttotla : Do we need to copy each network attachment? Or can I just do:
storeNode.LBAttachemtns = node.LbAttachments.
It is not clear why I need to make a copy since a copy will be done when the node is updated in the store.
There was a problem hiding this comment.
It's not necessary to copy them here because UpdateNode will automatically make a deep copy of the node.
| lbAttachment.Network = network.Copy() | ||
| if err := a.netCtx.nwkAllocator.AllocateLBAttachment(node, lbAttachment); err != nil { | ||
| log.G(ctx).WithError(err).Errorf("Failed to allocate network resources for node %s", node.ID) | ||
| //TODO: Should we add a unallocatedNode and retry allocating resources like we do for network, tasks, services? |
There was a problem hiding this comment.
@nishanttotla : Seems like we should have some retry logic here. As far as I can tell, the original code does not have any retry logic if we fail to allocate network resources for the ingress network.
There was a problem hiding this comment.
@pradipd if retry logic does not exist, perhaps this PR doesn't need to add it. If I understand correctly, not having retry logic here will mean that if the LB endpoint attachment fails, then the endpoint does not work. Is there a way to allocate later in that case? What is the fix in case of failure?
There was a problem hiding this comment.
In the existing code, if we fail to allocate the endpoint for the ingress network, the node still becomes active, and we can schedule tasks on it. But, the node, won't be part of an load balancers (ILB or ELB(routing mesh)). This would be "bad" (I know that's arbitrary) for a customer, because they may not realize some node is not participating in the routing mesh.
My change does nothing to add this, other than leave a TODO.
If we were to address this, we should add retry logic similar to unallocatedTasks, unallocatedServices, and unallocatedNetworks. See L317 when we retry unallocated items after a timeout or when something was deleted.
Also, I feel addressing this should be a separate issue and have a separate PR.
| } | ||
| if err := na.allocateNetworkIPs(nAttach); err != nil { | ||
| if err := na.releaseEndpoints(t.Networks[:i]); err != nil { | ||
| log.G(context.TODO()).WithError(err).Errorf("Failed to release IP addresses while rolling back allocation for task %s network %s", t.ID, nAttach.Network.ID) |
There was a problem hiding this comment.
nit: error messages start with lower case across the repo.
There was a problem hiding this comment.
this verbatim from old code. some stuff got moved and made the diff wonky.
nishanttotla
left a comment
There was a problem hiding this comment.
I would like for this PR to have some unit tests. @pradipd do you think it's possible to add some quickly? Let me know if you need help with that.
Also cc @dperny @anshulpundir for additional reviews.
|
I have one. I'll add it to the PR. And address the other issues/questions. |
|
It would also be helpful to have a small doc written explaining what this change does and means. I'm mostly following from the PR history but I'm unfamiliar with exactly what this is and why it's needed. |
|
Ok this has been explained to me out of band. It LGTM, with the caveat that I don't know this portion of the code very well. |
nishanttotla
left a comment
There was a problem hiding this comment.
LGTM (I've read the design and the SwarmKit changes seem reasonable, even though I haven't reviewed other related PRs).
@pradipd couple of things:
- nits: fix comments in some places (space needed after
// - Squash your commits.
We can merge after that.
bc6f51c to
96a9860
Compare
|
Please sign your commits following these rules: $ git clone -b "windows_routingmesh" [email protected]:pradipd/swarmkit.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -fAmending updates the existing PR. You DO NOT need to open a new one. |
Signed-off-by: Pradip Dhara <[email protected]>
96a9860 to
9fa9ce1
Compare
|
@nishanttotla Make sure that api/api.pb.txt is updated before merging. |
|
#2385 submitted with fixes. Please ensure those flow through. |
This is 1 of 3 PRs (the other 2 are in libnetwork and moby repos).
This change enables ILB/ELB functionality on windows using a per-node, per-network LB endpoint. The per-node, per-network LB endpoint was discussed with docker
as a solution to moby/moby#30820. Note that only windows is using the LB endpoint. We have not changed the linux networking code to take advantage of the LB endpoint.
Signed-off-by: Pradip Dhara [email protected]