Skip to content

Comments

api: add GwPriority field to EndpointSettings#48936

Merged
thaJeztah merged 2 commits intomoby:masterfrom
akerouanton:EndpointSettings-Priority-v2
Nov 29, 2024
Merged

api: add GwPriority field to EndpointSettings#48936
thaJeztah merged 2 commits intomoby:masterfrom
akerouanton:EndpointSettings-Priority-v2

Conversation

@akerouanton
Copy link
Member

@akerouanton akerouanton commented Nov 23, 2024

Fixes:

Supersedes:

- What I did

1st commit is just a minor refactoring.

2nd commit adds a new GwPriority field to EndpointSettings. It can be specified both during initial ContainerCreate or during subsequent NetworkConnect. This field is passed as is to libnetwork. Libnet already supports this property -- it'll use it to determine which endpoint should provide the default gateway for a container. If two endpoints have the same priority, they're sorted in lexicographic order, and the one sorted first is picked.

- How I did it

- How to verify it

- Description for the changelog

- Add a way to specify which network should provide the default gateway for a container.

@akerouanton akerouanton added area/api API status/2-code-review kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny area/networking Networking impact/api impact/changelog labels Nov 23, 2024
@akerouanton akerouanton added this to the 28.0.0 milestone Nov 23, 2024
@akerouanton akerouanton requested a review from robmry November 23, 2024 10:47
@akerouanton akerouanton self-assigned this Nov 23, 2024
@akerouanton akerouanton force-pushed the EndpointSettings-Priority-v2 branch 5 times, most recently from 635e18c to 6160568 Compare November 25, 2024 13:17
Comment on lines 53 to 58
* `POST /networks/{id}/connect` and `POST /containers/create` now accept a
`Priority` field in `EndpointsConfig`. This value is used to determine which
network endpoint provides the default gateway for the container. The endpoint
with the highest priority is selected. If multiple endpoints have the same
priority, endpoints are sorted lexicographically by their network name, and
the one that sorts first is picked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question (ISTR @robmry also mentioned something along those lines); if the intent of this option is to determine the default gateway, would it make sense to instead consider adding a new option to define which network should be used for this?

Basically considering that the order itself was a workaround for not being able to configure the default; it seems like now we're improving the workaround, but perhaps we could implement the actual thing that it's used for somehow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious what the behavior is when connecting to a new network with a higher priority; will it update the existing config to reconfigure the default gateway, or does this only happen when the container is initially created?

Copy link
Member Author

@akerouanton akerouanton Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the intent of this option is to determine the default gateway

That's what it is.

would it make sense to instead consider adding a new option to define which network should be used for this?

Such a new option would have a major drawback: since you can't update a container, your container would have no gateway and loose access to the outside world when you'd disconnect it from the network providing the default gateway.

Another drawback: you couldn't use docker network connect to attach a network that should become the default gateway as the parameter you're ideating would be attached to the container and would need to be valid (ie. reference an existing endpoint) when the container is started.

Also curious what the behavior is when connecting to a new network with a higher priority; will it update the existing config to reconfigure the default gateway

Yup, that's what's happening -- libnetwork already does that automatically when the gateway changes for any reason (eg. because the container was not connected to a non-internal network, or because the network getting connected sorts before in lexicographic order).

This gives the ability to end-users to choose which network should be the default gateway without worrying about the order of calls made to the NetworkConnect endpoint.

On NetworkDisconnect the endpoint with the highest priority or sorting first in lexicographic order will be used.

Basically considering that the order itself was a workaround for not being able to configure the default

You mean, the lexicographic sorting? It's not going away, but with this change it's neither the only nor the recommended way to set the default gateway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so considering that (and this would already be the current situation); in the following case, project_network would become the default gateway; this would (e.g.) be the project's default network used in a compose-stack;

docker network connect project_network container

But if the container now is later connected to global_network, that one (sorting before project_network) would now become the default gateway; this can be a common scenario for users using (e.g.) traefik or nginx-proxy as proxy to map domains;

docker network connect global_network container

So to counteract that, I would need to disconnect from the project_network, find a value that's higher than any of the existing networks it's connected to, and re-connect to that network. While a use this network as default gateway option is not ideal, it may give a slightly stronger indication "this is what to use";

  • if no connected network as use this network as default gateway option is set, we would still use the current (sort networks by name)
  • if a network has the option set though, then we use that as the default (priority over any other ordering)
  • ❓ we could disallow connecting if there's already a network connected that's configured as default gateway
  • if that network would be disconnected, we'd probably have to fall back to the "sort by name" approach; I could see use-cases where disconnecting would mean "no gateway", but that would be trickier to implement probably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A priority value seems better to me - because it's clear what will happen when the current gateway network is disconnected, or a new/desired-gateway network is connected to a running container.

If a specific network should always be the gateway, only that network will need a priority value. In that case, setting "priority=1" is equivalent to saying "this is the gateway" - no more or less confusing, but the corner cases or wanting to switch gateway networks are easier to deal with. (A fixed choice of gateway network probably the common case though - after not having a choice of gateway networks, or it not mattering which network is used.)

In the global/project_network example ... if the user wants to keep project_network as the gateway, they can either give it a high priority when creating the container, then not specify a priority for other networks. Or, specify a negative priority (I think, maybe worth a testcase?) when connecting global_network. If they want global_network to take over, they just use a higher priority.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want a dual stack network acting as the IPv4 gateway, and an IPv6 only network acting as the v6 gateway ...

At the moment, even with the new priority field, the dual stack network will be the gateway for both. But, with priority plumbed in we could/should change that, and pick the v4 and v6 endpoints separately.

Then, if I give the v6 network higher priority than the dual stack, and the dual stack network higher priority than any other networks - it'll do what I want.

We can make that change in a separate PR - but I think it strengthens the case for priority values. Without them, we'd need separate options for selecting v4 and v6 gateways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I added a negative priority to TestConnectWithPriority, and it works as expected (the endpoint with the default priority stays the default gateway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make that change in a separate PR - but I think it strengthens the case for priority values. Without them, we'd need separate options for selecting v4 and v6 gateways.

Although ... if I have two dual-stack networks, and want one to be the IPv4 gateway and the other the IPv6 gateway - we can't describe that with a priority field. It'd have to be done with two priority values, or two "this is the gateway" flags. So, that doesn't help!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although ... if I have two dual-stack networks, and want one to be the IPv4 gateway and the other the IPv6 gateway - we can't describe that with a priority field. It'd have to be done with two priority values, or two "this is the gateway" flags. So, that doesn't help!

You can still split your network into a v4-only, and a v6-only network for that. I think we shouldn't go too far trying to handle edge-cases for an option that's not going to be used by the masses anyway.

Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider a "driver option" way of setting the gateway priority, so that it can be set using an old (current) version of Compose that doesn't know about the new top-level field?

Something like com.docker.network.endpoint.gwpriority ... which we have for macaddress, exposedports, dnsservers, sysctls.

I'm not sure it's needed, but maybe worth ruling out.

api/swagger.yaml Outdated
example:
com.example.some-label: "some-value"
com.example.some-other-label: "some-other-value"
Priority:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should call it GatewayPriority or GwPriority, to make it clear what it controls - specifically, that it doesn't affect which order the endpoints are connected in (and therefore how network interfaces are numbered in the container)... but that's what people might expect because of Compose's Priority field. We might want to add a JoinPriority or something like that as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a JoinPriority or something like that as well?

I don't think that's necessary. Users who care about interface names should rather explicitly set the interface name (I have a wip branch for that). A JoinPriority would implicitly define the interface name, and it wouldn't guarantee that ifnames are stable across restarts (eg. if NetworkConnect is called after the initial ContainerStart with a priority higher than existing endpoints).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and +1 on renaming this field into something more explicit)

for _, epConfig := range networkingConfig.EndpointsConfig {
// Before 1.48, all endpoints had the same priority, so
// reinitialize this field.
epConfig.Priority = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be an error, so we don't get bugs about the feature not working if someone forgot to update their client's API version number?

There's no good reason for the field to be present in a request from an older client, and the default is the old behaviour - so it's not like some of the other options where we need to set config to mimic old behaviour. (If there is a Priority field in an old-API request, returning an error would be a breaking change. I'm not sure how strict we need to be about that.)

It's like VolumeOptions.Subpath, which causes an error rather than getting ignored. But, it's also like Healthcheck.StartInterval and we zero that if it's set in an old-API request. So, it's a bit mixed!

Copy link
Member Author

@akerouanton akerouanton Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about VolumeOptions.Subpath, but in general I think we prefer resetting a field that shouldn't be usable with a given API version. This ensures that a new field is handled the same way on new and old engines.

@robmry
Copy link
Contributor

robmry commented Nov 27, 2024

Did you consider a "driver option" way of setting the gateway priority, so that it can be set using an old (current) version of Compose that doesn't know about the new top-level field?

Something like com.docker.network.endpoint.gwpriority ... which we have for macaddress, exposedports, dnsservers, sysctls.

I'm not sure it's needed, but maybe worth ruling out.

We discussed this on the networking maintainers call ... and decided it isn't needed.

Without it, the priority field won't be usable without compose/cli updates. So, we'll just have to get those done (which hopefully we'd be able to anyway).

@akerouanton akerouanton force-pushed the EndpointSettings-Priority-v2 branch from 6160568 to bfc074d Compare November 28, 2024 15:53
@akerouanton akerouanton changed the title api: add Priority field to EndpointSettings api: add GwPriority field to EndpointSettings Nov 28, 2024
@akerouanton akerouanton requested a review from robmry November 28, 2024 15:54
This new field is used by libnetwork to determine which endpoint
provides the default gateway for a container.

Signed-off-by: Albin Kerouanton <[email protected]>
@akerouanton akerouanton force-pushed the EndpointSettings-Priority-v2 branch from bfc074d to 5b752fa Compare November 28, 2024 16:06
Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


var joinOptions []libnetwork.EndpointOption
joinOptions := []libnetwork.EndpointOption{
libnetwork.JoinOptionPriority(epConfig.GwPriority),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardest part reviewing this PR was to check "how does the sorting work?", and OHBOY things are libnetwork confusing; so

  • the GwPriority is set on the endpoint config
  • but set on the sandbox that it belongs to, indexed by endpoint-ID

sb, ok := c.sandboxes[ep.sandboxID]

So when does sorting happen?

This happens when an endpoint is added to a sandbox, it is injected in the right place in the list of endpoints attached to the sandbox

moby/libnetwork/sandbox.go

Lines 317 to 320 in 71b93a8

l := len(sb.endpoints)
i := sort.Search(l, func(j int) bool {
return ep.Less(sb.endpoints[j])
})

☝️ this is relevant, because it means that endpoints MUST go through that function to be added to the sandbox (just appending to the list won't put them in the right order)

But (prepare for a bumpy road here) 👇

The ordering happens by;

  • each endpoint to get the sandbox it belongs to
  • ❓ as they're both either already connected, or to be connected to the sandbox, wouldn't that sandbox always be the same???
  • 👉 even more fun; to get the sandbox;
    • the endpoint gets the network it belongs to
    • from which it gets the controller the network belongs to
    • which has a map of sandbox-ID -> sandbox
    • which is used to get the sandbox (which ... may be the sandbox we're coming from in the first place?)

moby/libnetwork/endpoint.go

Lines 1086 to 1097 in 71b93a8

func (ep *Endpoint) getSandbox() (*Sandbox, bool) {
c := ep.network.getController()
ep.mu.Lock()
sid := ep.sandboxID
ep.mu.Unlock()
c.mu.Lock()
ps, ok := c.sandboxes[sid]
c.mu.Unlock()
return ps, ok
}

After we have the sandbox, we can look up what priority was stored for the endpoint .. for that sandbox .. so that we can (if set) sort;

moby/libnetwork/sandbox.go

Lines 664 to 669 in 71b93a8

if sbi != nil {
prioi = sbi.epPriority[epi.ID()]
}
if sbj != nil {
prioj = sbj.epPriority[epj.ID()]
}

After which we do other things again, like (AGAIN) getting the network that the endpoint belongs to, to see if that network "internal" (which is also taken into account for priority);

moby/libnetwork/sandbox.go

Lines 681 to 683 in 71b93a8

inti := epi.getNetwork().Internal()
intj := epj.getNetwork().Internal()
if inti != intj {

@akerouanton akerouanton deleted the EndpointSettings-Priority-v2 branch November 29, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api API area/networking Networking impact/api impact/changelog kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/2-code-review

Projects

None yet

3 participants