Align fixed-cidr-v6 with fixed-cidr, use default ULA prefix if no fixed-cidr-v6#48319
Align fixed-cidr-v6 with fixed-cidr, use default ULA prefix if no fixed-cidr-v6#48319robmry merged 10 commits intomoby:masterfrom
Conversation
9c43cd2 to
2aa8461
Compare
2aa8461 to
adc3724
Compare
72dc89e to
704765d
Compare
daemon/daemon_unix.go
Outdated
| if hip.IsGlobalUnicast() { | ||
| ipamV4Conf.Gateway = nw.IP.String() | ||
| } | ||
| ipamV4Conf.Gateway = nw.IP.String() |
There was a problem hiding this comment.
Maybe we should log something if this IP address isn't a 'global unicast'?
There was a problem hiding this comment.
I don't think it'd be too meaningful - LL addresses have been allowed by the broken check, so they're supported now.
There was a problem hiding this comment.
Would a comment be useful to document "we know, and we allow because .."?
There was a problem hiding this comment.
I don't think so ... there's not really anything special about allowing link-local addresses.
Maybe, although I'm not sure, the original idea was to disallow them. But, that would have been the special case that needed explaining.
daemon/daemon_unix.go
Outdated
| // is wrong. But, just log rather than raise an error that would cause daemon | ||
| // startup to fail, because this has been allowed by earlier versions. Remove | ||
| // the SubPool, so that addresses are allocated from the whole of PreferredPool. | ||
| log.G(context.TODO()).WithFields(log.Fields{ |
There was a problem hiding this comment.
I'm wondering if we should really allow this misconfiguration to slip in. I think it's fine to log something for now, but maybe we should consider barfing out in later releases (and temporarily add an escape hatch).
Your last commit reuses the same code and so makes it possible to misconfigure IPv6 the same way. We should avoid that.
There was a problem hiding this comment.
It's a pity not to have the same behaviour for both again, but I've made it an error for IPv6 and split the testcases.
There was a problem hiding this comment.
Is this something we should consider "deprecating" (incorrect config); possibly warn, and in future error?
There was a problem hiding this comment.
A warning is logged, and it's now an error for IPv6.
I'll make it log only for IPv4, and add "this may be treated as an error in a future release" to the log message ... then we can deal with it later if we want to.
daemon/daemon_unix.go
Outdated
| } | ||
| } | ||
|
|
||
| if cfg.DefaultGatewayIPv4 != nil { |
There was a problem hiding this comment.
This condition doesn't seem related to the rest of the function. It's probably better to keep it in initBridgeDriver.
There was a problem hiding this comment.
The function is getDefaultBridgeIPAMConf, and this is part of the IPAM conf. In later commits, I think it makes more sense to keep it in the function rather than deal with it separately for IPv4 and IPv6.
daemon/daemon_unix.go
Outdated
| // Select v4/v6 config based on family | ||
| fixedCIDR, fixedCIDROpt := cfg.FixedCIDR, "fixed-cidr" | ||
| cfgBIP, cfgBIPOpt := cfg.IP, "bip" | ||
| cfgDefGw, cfgDefGwOpt := cfg.DefaultGatewayIPv4, "DefaultGatewayIPv4" | ||
| if family == netlink.FAMILY_V6 { | ||
| fixedCIDR, fixedCIDROpt = cfg.FixedCIDRv6, "fixed-cidr-v6" | ||
| cfgBIP, cfgBIPOpt = cfg.IP6, "bip6" | ||
| cfgDefGw, cfgDefGwOpt = cfg.DefaultGatewayIPv6, "DefaultGatewayIPv6" | ||
| } |
There was a problem hiding this comment.
Why not parse these outside of this function and pass them as arguments? I'm not a big fan of these *Opt vars.
There was a problem hiding this comment.
I've tarted it up a bit ... any better?
There was a problem hiding this comment.
Not sure we get it any better without touching how daemon settings are organized.
| // on this network until after the driver has created the endpoint and returned the | ||
| // constructed address. Libnetwork will then reserve this address with the ipam driver. | ||
| ones, _ := fCIDRv6.Mask.Size() | ||
| deferIPv6Alloc = ones <= 80 |
There was a problem hiding this comment.
I think it should also apply when there's no more --fixed-cidr-v6 / --bip option but the bridge already exists and has a subnet assigned.
Also, before that change there was no --bip6, but it should now be taken into account.
You can probably solve both issues by using ipamV6Conf.PreferredPool.
There was a problem hiding this comment.
Hmm, yes ... if the bridge doesn't exist, and there's no --fixed-cidr-v6/--bip, there won't be an ipamV6Conf.PreferredPool either.
Without asking for deferred IPv6 address allocation in that case too, container IPv6 addresses will be IPAM-allocated when the daemon first starts, then "MAC based" after restarts when the bridge has an address.
I think it's ok for this code to assume the daemon's default ULA prefix is big enough. (It's 64-bits, and I don't think it can be overridden separately, fixed-cidr-v6 is the way to do that?)
(This special case is grim - derive a MAC addresses from an IPAM-allocated IPv4 address, then use the MAC address to derive an IPv6 address - but only for containers in the default network, with bits of the implementation scattered betwen daemon/libnetwork/bridge driver. It'll be even worse when we move to using random MAC addresses with gratuitous ARPs/NAs, to deal with IPv6-only. We should think about dropping it.)
There was a problem hiding this comment.
Done, and added tests.
704765d to
878949e
Compare
878949e to
ff4cfe0
Compare
|
Rebased to resolve conflicts. |
ff4cfe0 to
8a6354d
Compare
|
Added |
| // outside the daemon, and it's still possible to use '--bip'. It is | ||
| // intended only for use in moby tests; it may be removed, its behaviour | ||
| // may be modified, and it may not do what you want anyway! | ||
| if bn := os.Getenv("DOCKER_TEST_CREATE_DEFAULT_BRIDGE"); bn != "" { |
There was a problem hiding this comment.
Not for this PR, because some of this is existing but;
- I was confused for a second, as I thought
DOCKER_TEST_CREATE_DEFAULT_BRIDGEwould be a boolean, but it's to override the name to use. Perhaps a different name would be clearer (DOCKER_TEST_OVERRIDE_DEFAULT_BRIDGE_NAMEor "something" indicating it's setting the name) - It looks like this env-var found its way into multiple locations, which makes me wonder if it's in too many places, and if it should really be in a single place that sets "use custom name instead of the default default name".
git grep 'DOCKER_TEST_CREATE_DEFAULT_BRIDGE'
daemon/daemon_unix.go: // Env var DOCKER_TEST_CREATE_DEFAULT_BRIDGE env var modifies the default
daemon/daemon_unix.go: if bn := os.Getenv("DOCKER_TEST_CREATE_DEFAULT_BRIDGE"); bn != "" {
integration/daemon/daemon_linux_test.go: d := daemon.New(t, daemon.WithEnvVars("DOCKER_TEST_CREATE_DEFAULT_BRIDGE="+bridgeName))
integration/daemon/daemon_linux_test.go: dOpts = append(dOpts, daemon.WithEnvVars("DOCKER_TEST_CREATE_DEFAULT_BRIDGE="+bridgeName))
libnetwork/drivers/bridge/setup_device_linux.go: // DOCKER_TEST_CREATE_DEFAULT_BRIDGE env var. It should be used only for
libnetwork/drivers/bridge/setup_device_linux.go: if defaultBridgeName = os.Getenv("DOCKER_TEST_CREATE_DEFAULT_BRIDGE"); defaultBridgeName == "" {
daemon/daemon_unix.go
Outdated
| if hip.IsGlobalUnicast() { | ||
| ipamV4Conf.Gateway = nw.IP.String() | ||
| } | ||
| ipamV4Conf.Gateway = nw.IP.String() |
There was a problem hiding this comment.
Would a comment be useful to document "we know, and we allow because .."?
daemon/daemon_unix.go
Outdated
| _, fCIDR, err := net.ParseCIDR(cfg.FixedCIDR) | ||
| fCidrIP, fCidrNet, err := net.ParseCIDR(cfg.FixedCIDR) |
There was a problem hiding this comment.
Interesting; I see that we have a defBrOptsV4 option specifically for handling these options, but we end up parsing the raw cfg here.
Also not for this PR probably , but wondering if parsing these options should happen earlier in the daemon's lifecycle, so that also dockerd --validate would already catch invalid values (and possibly we'd have a strong-typed value once we arrive here.
Slightly related to that, wondering if at this point we should still refer to fixed-cider (name of the flag), or FixedCIDR instead, but I guess if we don't validate before this, we don't have an error related to what the flag (or field) is named.
There was a problem hiding this comment.
defBrOptsv4 is introduced and used in the commit after this one, and the option name is included in the error message. (But I'll update the error for failing to parse --bip/--bip6 to include the option name.)
daemon/daemon_unix.go
Outdated
| ipamV4Conf.PreferredPool = bIPNet.String() | ||
| ipamV4Conf.Gateway = bIP.String() |
There was a problem hiding this comment.
Was about to suggest; we should look at making this a proper type, but looks like there's already a TODO for that;
Lines 84 to 85 in 48e43eb
daemon/daemon_unix.go
Outdated
| bIP, bIPNet = selectBIP(nwList, fCidrIP, fCidrIPNet) | ||
| // If the bridge is docker-managed (docker0) and fixed-cidr is not | ||
| // inside a subnet belonging to any existing bridge IP, ignore the | ||
| // bridge IP. | ||
| if !userManagedBridge && fCidrIP != nil && bIPNet != nil && !bIPNet.Contains(fCidrIP) { | ||
| bIP = nil | ||
| bIPNet = nil | ||
| if !userManagedBridge && fCidrIP != nil && bIPNet != nil { |
There was a problem hiding this comment.
Only glancing over, but wondering if the result of selectBIP is always used here or if if should be executed conditionally.
This looks to be the only location where selectBIP is used, so perhaps some of the logic should/could be combined.
There was a problem hiding this comment.
If selectBIP is called, and it's able to determine an address, the result is always used (in if bIP != nil {, just after this if-else). It's only called if there's no --bip/--bip6 to override the value.
I've moved the logic to fix-up the result based on current (changed) fixed-cidr/bip options for docker0 into selectBIP. (I think that's what you're suggesting, rather than inlining selectBIP?)
daemon/daemon_unix.go
Outdated
| // is wrong. But, just log rather than raise an error that would cause daemon | ||
| // startup to fail, because this has been allowed by earlier versions. Remove | ||
| // the SubPool, so that addresses are allocated from the whole of PreferredPool. | ||
| log.G(context.TODO()).WithFields(log.Fields{ |
There was a problem hiding this comment.
Is this something we should consider "deprecating" (incorrect config); possibly warn, and in future error?
daemon/daemon_unix.go
Outdated
| return o.cfg.IP, "bip" | ||
| } | ||
| func (o defBrOptsV4) defGw() (gw net.IP, optName string) { | ||
| return o.cfg.DefaultGatewayIPv4, "DefaultGatewayIPv4" |
There was a problem hiding this comment.
Is DefaultGatewayIPV4 the name that will be used in the daemon.json ? I noticed that all other options we use hyphenated names, e.g.
moby/daemon/config/config_linux.go
Lines 62 to 63 in 48e43eb
Also curious if it would make sense to make these options in the internal/opts package (but haven't looked too closely)
There was a problem hiding this comment.
Is DefaultGatewayIPV4 the name that will be used in the daemon.json ? I noticed that all other options we use hyphenated name
Ah, no - it's the label used in ipamConf.AuxAddresses[] ... which is confusing. I've made defGw return both (the optName isn't used, but I think it's good to have it there anyway).
6aec9c2 to
e482eb9
Compare
| libnetwork.NetworkOptionDriverOpts(netOption), | ||
| libnetwork.NetworkOptionIpam("default", "", v4Conf, v6Conf, nil), | ||
| libnetwork.NetworkOptionDeferIPv6Alloc(deferIPv6Alloc)) | ||
| return []*libnetwork.IpamConf{ipamConf}, nil |
There was a problem hiding this comment.
Probably fine to look for a future change, but my eye dropped on the []*libnetwork.IpamConf{} as type used;
- it made me curious why we use a slice with pointers (as a slice already is implicitly a pointer); also see https://andrii-kushch.medium.com/go-what-to-return-a-slice-of-structs-vs-a-slice-of-pointers-42647912530a
- This function always returns a single ipamConf, so I was curious why we convert it to a slice
- ☝️ but it looks like we use
[]*libnetwork.IpamConf{}in many places, so this was probably out of convenience - 👉 probably something to look at in a follow-up.
Becuase I'm about to add tests that use netlink, and the netlink package breaks compilation under Windows. Signed-off-by: Rob Murray <[email protected]>
Signed-off-by: Rob Murray <[email protected]>
The intention may have been to only accept a globally routable address as the gateway address on a user-supplied default bridge. But, the test didn't do anything, so it's been allowing LL subnets. It's too late to change that now so, remove the broken check. Signed-off-by: Rob Murray <[email protected]>
When a user-managed bridge is used instead of docker0 (--bridge), with a fixed-cidr - the bridge should have an IP address/subnet that encompasses fixed-cidr ... the bridge address's subnet then defines the network's subnet, and fixed-cidr defines the allocatable range within that. But, selection of the correct subnet/address from the bridge depended on the address being within fixed-cidr (within the allocatable range). This change removes that assumption. So, a bridge address with a subnet that includes fixed-cidr is selected. Signed-off-by: Rob Murray <[email protected]>
Factor out selection of addresses from an existing bridge from the code that uses the selected address to set up IPAM for the default bridge. Signed-off-by: Rob Murray <[email protected]>
When a docker-managed default bridge (docker0) already has an address, and the fixed-cidr subnet fits within the subnet defined by that address, the existing address should be used as the gateway and to define the subnet. But, when fixed-cidr is changed, no --bip is supplied, and no existing bridge network includes fixed-cidr ... the existing bridge address needs to be updated. Signed-off-by: Rob Murray <[email protected]>
For a docker-managed default bridge (docker0), when no --bip is supplied, the gateway address and subnet size can be inferred from existing bridge addresses. But, if fixed-cidr's subnet size is increased so that it's biggger than the subnet of the bridge's existing address - the bridge's subnet needs to be incresed to match. (fixed-cidr determines the range of addresses that can be automatically allocated, and these should not fall outside the default bridge's subnet.) Signed-off-by: Rob Murray <[email protected]>
When a user-managed bridge is used for the default network (--bridge), an address from the bridge determines the subnet for the network. If a fixed-cidr is supplied, it should fall within that subnet. If it doesn't, it's a misconfiguration - fixed-cidr is the range of allocatable addresses, and they need to be in the network. (Either the user's bridge is missing an address that matches their fixed-cidr or the fixed-cidr is wrong.) When this happens, because it's been allowed in the past (and, because the address-pool implementation treats fixed-cidr/SubPool as an offset into the network rather than an actual address range, so working IP addresses would normally still be assigned to containers) ... don't reject the config and cause daemon startup to fail. Just log a warning and ignore fixed-cidr. Signed-off-by: Rob Murray <[email protected]>
Signed-off-by: Rob Murray <[email protected]>
Use the same logic to generate IPAMConf for IPv6 as for IPv4.
- When no fixed-cidr-v6 is specified, rather than error out, use
the default address pools (as for an IPv4 default bridge with no
fixed-cidr, and as for user-defined networks).
- Add daemon option --bip6, similar to --bip.
- Necessary because it's the only way to override an old address
on docker0 (daemon-managed default bridge), as illustrated by
test cases.
- For a user-managed default bridge (--bridge), use IPv6 addresses
on the user's bridge to determine the pool, sub-pool and gateway.
Following the same rules as IPv4.
- Don't set up IPv6 IPAMConf if IPv6 is not enabled.
Signed-off-by: Rob Murray <[email protected]>
e482eb9 to
0b5b1db
Compare
|
Rebased - the conflict was just in |
## Description Updates for moby 28.0 networking. ## Related issues or tickets Series of commits ... - Fix description of 'inhibit_ipv4' - Not changed in moby 28.0, updated to clarify difference from (new) IPv6-only networks. - Updates to default bridge address config - moby/moby#48319 - Describe IPv6-only network config - moby/moby#48271 - docker/cli#5599 - Update description of gateway modes - moby/moby#48594 - moby/moby#48596 - moby/moby#48597 - Describe gateway selection in the networking overview. - docker/cli#5664 - Describe gateway mode `isolated` - moby/moby#49262 ## Reviews <!-- Notes for reviewers here --> <!-- List applicable reviews (optionally @tag reviewers) --> - [ ] Technical review - [ ] Editorial review - [ ] Product review --------- Signed-off-by: Rob Murray <[email protected]>
- What I did
fixed-cidr-v6.fixed-cidr, it uses default pools (or the daemon's default ULA prefix).fixed-cidr's implementation.fixed-cidr-v6withfixed-cidr.--bip6(to set the default bridge's address and subnet, like--bip).- How I did it
The existing code's intent was a bit opaque, but seemed to be:
--bip(if supplied for a docker-managed bridge,docker0), else an existing bridge address:Gatewayaddress (the bridge's address), and thePreferredPool(the docker network's subnet).fixed-cidris provided:PreferredPool, if it hasn't been determined from--bipor an existing bridge address.SubPool(equivalent to--ip-rangefor a user-defined network, it's the range of addresses the IPAM driver can allocate from. (SubPoolis expected to be withinPreferredPool, but that's not enforced.)So ...
fixed-cidr.FIXMEcomments in the tests ... mostly confusion between daemon-managed (docker0), and user-managed (--bridge) bridges, leading to surprising and incorrect behaviour.fixed-cidr/bip/existing bridge addresses into anIPAMConfcan be used forfixed-cidr-v6.--bip6because it's the only way to be suredocker0's address and subnet size are updated (similar to existing-bipfor IPv4 addresses).fixed-cidr-v6when IPv6 is enabled.fixed-cidr-v6.Example of confusion between daemon/user-managed default bridge network configuration:
default-address-pool, for example172.18.0.0/16.fixed-cidr: "10.0.0.0/24".network inspectsays ...docker0being treated like a user-managed bridge - the bridge's existing address takes priority, even though there's no configuration saying it can't change.Issues with existing implementation:
docker0'sfixed-cidrwithout specifying--bip(the issue described in the example above).fixed-cidr, or by expanding the bridge's subnet iffixed-cidrencompasses the existing subnet.TestDaemonDefaultBridgeIPAM_Docker0/old_bridge_subnet_within_fixed-cidrTestDaemonDefaultBridgeIPAM_Docker0/old_bridge_subnet_outside_fixed-cidrdocker0, when there's no --bip to dictate the subnet, it's derived from an existing bridge address. If fixed-cidr's subnet is made smaller following a daemon restart, a user might reasonably expect the default bridge network's subnet to shrink to match. However, that has not been the behaviour - instead, only the allocatable range is reduced (as would happen with a user-managed bridge).docker0, or supply a--bip.TestDaemonDefaultBridgeIPAM_Docker0/fixed-cidr_within_old_bridge_subnetTestDaemonDefaultBridgeIPAM_Docker0/fixed-cidr_within_old_bridge_subnet_with_bipto show that--bipsorts things out.)fixed-cidr. But, that's just the allocatable range.fixed-cidr, even if the address itself isn't withinfixed-cidr.fixed-cidr, the allocatable address range, to be bigger than the subnet inferred from the bridge's addresss (then fix that, as described below).TestDaemonDefaultBridgeIPAM_UserBr/fcidr_in_bridge_subnet_and_bridge_ip_not_in_fcidr.fixed-cidrdid not overlap with the subnet belonging to any address on the bridge, or was bigger than a bridge address's subnet (the user-managed bridge version of thedocker0example above) - any address was selected from the bridge, and the allocatable range would just be wrong.fixed-cidr, making the whole subnet inferred from the bridge address allocatable. (Before, because the allocator's implementation treats the sub-pool as an offset into the pool rather than an actual address range, this could result in an allocatable range within or bigger than the pool - so, this is a change in behaviour, but the configuration is broken ... I don't think it's worth trying to preserve the range that would have happened-to-work.)TestDaemonDefaultBridgeIPAM_UserBr/fixed-cidr_bigger_than_bridge_subnetTestDaemonDefaultBridgeIPAM_UserBr/no_bridge_ip_within_fixed-cidrTestDaemonDefaultBridgeIPAM_UserBr/fixed-cidr_contains_bridge_subnetIsGlobalUnicast().fixed-cidr-v6was supplied with IPv6 was disabled.fixed-cidr-v6's prefix length.--bip6to make it explicitly configurable.fixed-cidr-v6now behaves like a change infixed-cidrrather than just being an override. But, that's important for a user-managed bridge, and overridable with--bip6fordocker0).- How to verify it
New integration tests...
- Description for the changelog