generate: implement just-in-time behaviour for generate#162
generate: implement just-in-time behaviour for generate#162slyon merged 6 commits intocanonical:masterfrom
Conversation
If system is initializing (basic.target not reached yet), and `netplan generate` is called ensure that any netplan generated service units are added to the initial boot transaction. This should resolve cloud-init first-time booting with systemd-networkd disabled, or with requirement to add wlan/wifi units.
|
@OddBloke FYI |
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
=======================================
Coverage 88.15% 88.16%
=======================================
Files 9 9
Lines 2770 2771 +1
=======================================
+ Hits 2442 2443 +1
Misses 328 328
Continue to review full report at Codecov.
|
slyon
left a comment
There was a problem hiding this comment.
Thank you very much @xnox for this sleek solution to the tricky situation!
Using my reproducer from #157 I can confirm that cloud-init/OVS first boot-is working on Focal & Groovy, without modifications required in cloud-init or systemd.
Semantically it is IMHO still not correct for netplan generate to actively start service units (i.e. apply changes), instead of just generating the required config/setup files (as described by the documentation). But as this behavior is limited to a very specific situation during the systemd initializing phase (i.e. "Early bootup, before basic.target is reached or the maintenance state entered"), I think that would be acceptable to fix the problems we see with OVS and WPA cloud-init first-boot.
Before I put my ACK on this review, I'd like to see a relevant integration test added (and all new/old tests passing) and possibly some kind of documentation regarding this special case added as well.
Groovy systemd 246.4-1ubuntu1 + netplan 0.100-0ubuntu1+pr162
root@ovs-y:~# journalctl -u systemd-networkd -u systemd-networkd.socket -u systemd-networkd-wait-online -u network.target -u basic.target -u network-online.target -u netplan-*.service -b
-- Logs begin at Thu 2020-07-30 12:11:40 UTC, end at Tue 2020-09-08 08:17:01 UTC. --
Sep 08 08:15:51 ovs-y systemd[1]: Starting Network Service...
Sep 08 08:15:51 ovs-y systemd-networkd[125]: /run/systemd/network/10-netplan-eth0.21.network: ovs0 NetDev could not be found, ignoring assignment.
Sep 08 08:15:51 ovs-y systemd-networkd[125]: eth0: IPv6 successfully enabled
Sep 08 08:15:51 ovs-y systemd-networkd[125]: eth0: Gained IPv6LL
Sep 08 08:15:51 ovs-y systemd-networkd[125]: Enumeration completed
Sep 08 08:15:51 ovs-y systemd[1]: Started Network Service.
Sep 08 08:15:51 ovs-y systemd-networkd[125]: eth0.21: netdev ready
Sep 08 08:15:51 ovs-y systemd-networkd[125]: eth0.21: Link UP
Sep 08 08:15:51 ovs-y systemd[1]: Starting Wait for Network to be Configured...
Sep 08 08:15:51 ovs-y systemd-networkd[125]: eth0.21: Gained carrier
Sep 08 08:15:51 ovs-y systemd-networkd-wait-online[157]: managing: eth0.21
Sep 08 08:15:51 ovs-y systemd-networkd-wait-online[157]: managing: eth0.21
Sep 08 08:15:52 ovs-y systemd-networkd-wait-online[157]: managing: eth0.21
Sep 08 08:15:54 ovs-y systemd-networkd-wait-online[157]: managing: eth0.21
Sep 08 08:15:54 ovs-y systemd-networkd-wait-online[157]: managing: eth0
Sep 08 08:15:54 ovs-y systemd[1]: Finished Wait for Network to be Configured.
Sep 08 08:15:55 ovs-y systemd[1]: Reached target Basic System.
Sep 08 08:15:56 ovs-y systemd[1]: Starting OpenVSwitch configuration for cleanup...
Sep 08 08:15:56 ovs-y systemd[1]: netplan-ovs-cleanup.service: Succeeded.
Sep 08 08:15:56 ovs-y systemd[1]: Finished OpenVSwitch configuration for cleanup.
Sep 08 08:15:56 ovs-y systemd[1]: Starting OpenVSwitch configuration for ovs0...
Sep 08 08:15:56 ovs-y ovs-vsctl[262]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-br ovs0
Sep 08 08:15:56 ovs-y systemd-networkd[125]: ovs0: IPv6 successfully enabled
Sep 08 08:15:56 ovs-y systemd-networkd[125]: ovs0: Link UP
Sep 08 08:15:56 ovs-y systemd-networkd[125]: ovs0: Gained carrier
Sep 08 08:15:56 ovs-y ovs-vsctl[279]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-port ovs0 eth0.21
Sep 08 08:15:56 ovs-y ovs-vsctl[280]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan=true
Sep 08 08:15:56 ovs-y ovs-vsctl[281]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set-fail-mode ovs0 standalone
Sep 08 08:15:56 ovs-y ovs-vsctl[282]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/global/set-fail-mode=standalone
Sep 08 08:15:56 ovs-y ovs-vsctl[283]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 mcast_snooping_enable=false
Sep 08 08:15:56 ovs-y ovs-vsctl[284]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/mcast_snooping_enable=false
Sep 08 08:15:56 ovs-y ovs-vsctl[285]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 rstp_enable=false
Sep 08 08:15:56 ovs-y ovs-vsctl[286]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/rstp_enable=false
Sep 08 08:15:56 ovs-y systemd[1]: netplan-ovs-ovs0.service: Succeeded.
Sep 08 08:15:56 ovs-y systemd[1]: Finished OpenVSwitch configuration for ovs0.
Sep 08 08:15:56 ovs-y systemd[1]: Reached target Network.
Sep 08 08:15:56 ovs-y systemd[1]: Reached target Network is Online.
Sep 08 08:15:58 ovs-y systemd-networkd[125]: ovs0: Gained IPv6LL
root@ovs-y:~# ovs-vsctl show
557f8343-b72b-4a23-b62f-9ebe80c9d116
Bridge ovs0
fail_mode: standalone
Port eth0.21
Interface eth0.21
Port ovs0
Interface ovs0
type: internal
ovs_version: "2.13.0"
Focal systemd 245.4-4ubuntu3.2 + netplan 0.100-0ubuntu1+pr162
-- Logs begin at Tue 2020-09-08 08:37:33 UTC, end at Tue 2020-09-08 08:43:07 UTC. --
Sep 08 08:43:03 f-cloud systemd[1]: Starting Network Service...
Sep 08 08:43:03 f-cloud systemd-networkd[121]: /run/systemd/network/10-netplan-eth0.21.network: ovs0 NetDev could not be found, ignoring assignment.
Sep 08 08:43:03 f-cloud systemd-networkd[121]: eth0: IPv6 successfully enabled
Sep 08 08:43:03 f-cloud systemd-networkd[121]: Enumeration completed
Sep 08 08:43:03 f-cloud systemd[1]: Started Network Service.
Sep 08 08:43:03 f-cloud systemd[1]: Starting Wait for Network to be Configured...
Sep 08 08:43:03 f-cloud systemd-networkd[121]: eth0.21: netdev ready
Sep 08 08:43:03 f-cloud systemd-networkd[121]: eth0.21: Link UP
Sep 08 08:43:03 f-cloud systemd-networkd[121]: eth0.21: Gained carrier
Sep 08 08:43:03 f-cloud systemd-networkd[121]: eth0: Gained IPv6LL
Sep 08 08:43:03 f-cloud systemd-networkd-wait-online[152]: managing: eth0
Sep 08 08:43:03 f-cloud systemd-networkd-wait-online[152]: managing: eth0.21
Sep 08 08:43:03 f-cloud systemd[1]: Finished Wait for Network to be Configured.
Sep 08 08:43:04 f-cloud systemd[1]: Reached target Basic System.
Sep 08 08:43:04 f-cloud systemd[1]: Starting OpenVSwitch configuration for cleanup...
Sep 08 08:43:05 f-cloud systemd[1]: netplan-ovs-cleanup.service: Succeeded.
Sep 08 08:43:05 f-cloud systemd[1]: Finished OpenVSwitch configuration for cleanup.
Sep 08 08:43:05 f-cloud systemd[1]: Starting OpenVSwitch configuration for ovs0...
Sep 08 08:43:05 f-cloud ovs-vsctl[260]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-br ovs0
Sep 08 08:43:05 f-cloud systemd-networkd[121]: ovs0: IPv6 successfully enabled
Sep 08 08:43:05 f-cloud systemd-networkd[121]: ovs0: Link UP
Sep 08 08:43:05 f-cloud systemd-networkd[121]: ovs0: Gained carrier
Sep 08 08:43:05 f-cloud ovs-vsctl[278]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-port ovs0 eth0.21
Sep 08 08:43:05 f-cloud ovs-vsctl[279]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan=true
Sep 08 08:43:05 f-cloud ovs-vsctl[280]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set-fail-mode ovs0 standalone
Sep 08 08:43:05 f-cloud ovs-vsctl[281]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/global/set-fail-mode=standalone
Sep 08 08:43:05 f-cloud ovs-vsctl[282]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 mcast_snooping_enable=false
Sep 08 08:43:05 f-cloud ovs-vsctl[283]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/mcast_snooping_enable=false
Sep 08 08:43:05 f-cloud ovs-vsctl[284]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 rstp_enable=false
Sep 08 08:43:05 f-cloud ovs-vsctl[285]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs0 external-ids:netplan/rstp_enable=false
Sep 08 08:43:05 f-cloud systemd[1]: netplan-ovs-ovs0.service: Succeeded.
Sep 08 08:43:05 f-cloud systemd[1]: Finished OpenVSwitch configuration for ovs0.
Sep 08 08:43:05 f-cloud systemd[1]: Reached target Network.
Sep 08 08:43:05 f-cloud systemd[1]: Reached target Network is Online.
Sep 08 08:43:07 f-cloud systemd-networkd[121]: ovs0: Gained IPv6LL
root@f-cloud:~# ovs-vsctl show
504abab8-4484-4cd1-bbf5-c239dbe131d9
Bridge ovs0
fail_mode: standalone
Port eth0.21
Interface eth0.21
Port ovs0
Interface ovs0
type: internal
ovs_version: "2.13.0"
Note netplan generate is not quite innert already, as it does call udevadm control --reload. But this jit & udevadm control could be moved into a standalone service unit, started by network-pre.target. In similar spirit to the sriov / ovs-cleanup units. We can scope things further to check for INVOCATION_ID / systemd cgroup name, to ensure that these interraction with the system only happen if we are called by a systemd service unit. Such that interactive calls to netplan generate remain completely innert / static.
Ack. |
|
Hm, I'm now wondering if we should remove system-generator altogether, and move it to network-pre.target netplan-generate.service. Because we only ever want to generate networking info just-in-time, and we don't want to regenerate it every time |
raharper
left a comment
There was a problem hiding this comment.
Should this capability be enumerated in netplan features?
| /* When booting with cloud-init, network configuration | ||
| * might be provided just-in-time. Specifically after | ||
| * system-generators were executed, but before | ||
| * network.target is started. In such case, auxiliary |
There was a problem hiding this comment.
If this runs after network.target is started, does it matter? If so, should the check_called_just_in_time() ensure it only returns true if network.target is not started?
There was a problem hiding this comment.
IMO this is correct. I introduced an additional check in check_called_just_in_time().
| if (any_networkd) { | ||
| start_unit_jit("systemd-networkd.socket"); | ||
| start_unit_jit("systemd-networkd-wait-online.service"); | ||
| start_unit_jit("systemd-networkd.service"); |
There was a problem hiding this comment.
What about NetworkManager?
There was a problem hiding this comment.
All service units created by netplan (OVS/WPA) are only related to the networkd backend, so no special handling is needed for NetworkManager.
There was a problem hiding this comment.
In my mind, NetworkManager should be enabled/started in a similar fashion too. Because for example, we may have on Ubuntu Core cases where cloud-init provides netplan yaml, that expects to start ModemManager and NetworkManager.
And I still ideally want to have i.e. both pre-installed in the image (or cloud-init-local installing them) and enabling things on the fly. Because after this is done, I want to have the ability to install packages during boot too specifically to support OVS if it's not seeded in the image.
There was a problem hiding this comment.
But I think NetworkManager may need to come separately, later. As i.e. we don't currently generate enable for NM, nor return something like "any_networkmanager" flag to see if we need to start it.
Units shall only be started JIT during early boot, by the system user (root). If a normal user calls 'netplan generate' it shall not start the units. Avoid asking for a password if a user (or test) executes this command and rather fail with missing authentication.
Do not try to enqueue new network related netplan-*.service units, if network.target was already started.
I put that question on the agenda for the next netplan sync meeting. For this PR/fix I'd like to stick to the current architecture, but then discuss with the team about switching away from the |
Yes, that might be useful. I added the |
|
@slyon I like / approve all the changes you did to this. Github doesn't let me put a checkmark in. |
slyon
left a comment
There was a problem hiding this comment.
@xnox Thanks! That's what I wanted to ask you next :-)
I verified again in my Focal reproducer and made sure all tests (incl. the new cloud-init reboot test) are working: https://paste.ubuntu.com/p/4ZH6HGXqBx/
So I think this should be good for merging!
(And preparing a distro-patch on top of 0.100-0ubuntu1)
|
I have run into this just now - or something very similar. |
This is not enough information. Please open a bug report on https://launchpad.net/cloud-images and include full details of your image and the full journal logs too. |
|
This is still not working with ubuntu20.04.2 with netplan wpa service not start at first boot, is check_called_just_in_time failed? if it's what might be the problem? |
can you please open a new bug report about that? I have seen for example loading wifi driver form the initrd, without firmware resulting in the wifi device there but unable to actually do any connectivity. |
|
Just find out that this works on systemd 245.4-4ubuntu3.5 but the ubuntu 20.04.2 for rpi4 is the systemd 245.4-4ubuntu3.4 version, so it's not working! any theory what's the difference? |
|
It's good to know that it works with 245.4-4ubuntu3.5! Please keep your system updated to get all the fixes. The exact difference can be seen here: http://launchpadlibrarian.net/526292852/systemd_245.4-4ubuntu3.4_245.4-4ubuntu3.5.diff.gz |
|
@slyon what's you suggestion for the time being, when there is only wifi network for the rpi4 how to do cloud-init things like package upgrade? have to use ethernet? or somehow repackage the 20.04.2 img with updated systemd? appreciate you thoughts. |
|
@danywheeler @slyon there are no changes in said systemd SRU related to this netplan behaviour. Thus it is unlikely that systemd change is whats making things work. @danywheeler I still do not have answers to all of the requested information from you about your board, how you configured netplan & cloud init. Thus it is still not possible to diagnose this. Also discussing unrelated new bug report, on a closed pull requests is not very productive, as very few people are monitoring these and we cannot use normal bug trianging and esclatation processes; set status; have attachments. Please please please stop commenting here, and open a new bug report on launchpad. Explain which board you use, which model/revision, which wifi chip it has (even the same Pi models can have different wifi chips depending on batch), how you did first boot & subsequent boots, how you changed your netplan configuration, how you changed your cloud-init configuration. Please attach the full journal logs if you can, and ideally sosreport too. The information you so far have provided is inconclusive, and it's impossible to tell at the moment what made things work for you. At the moment it seems that first boot is still broken for you, but any subsequent boot is fine - with or without applying any package updates. |
|
@xnox I've put together a bug in launchpad https://bugs.launchpad.net/netplan/+bug/1922358 with logs, sorry for commenting again. |
See original PR: canonical#162
See original PR: canonical#162
See original PR: canonical#162
See original PR: canonical#162
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
See original PR: canonical#162
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (canonical#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
…nerator This was originally implemented to generate & start systemd units just-in-time during the boot transaction (#162). With the implementation of a proper systemd-generator, Netplan generates the corresponding units in /run/system/generator* and automatically re-loads and re-calculates dependencies during "daemon-reload". Therefore, we do not need to inject them manually. This is covered by the "cloud-init" autopkgtest.
If system is initializing (basic.target not reached yet), and
netplan generateis called ensure that any netplan generated service unitsare added to the initial boot transaction.
This should resolve cloud-init first-time booting with
systemd-networkd disabled, or with requirement to add wlan/wifi units.
Description
I've tested this by sideloading this new code branch to lxd container.
Disabling systemd-networkd
systemctl disable systemd-networkdRemoving
rm -rf /etc/netplan/50-cloud-init.yaml /var/log/cloud-init-output.log /var/log/cloud-init.logExecuting
cloud-init cleanAnd rebooting.
On reboot, I observed, that despite not being enabled systemd-networkd & netplan-ovs-cleanup did start or attempted to start, at the right time during initial boot, and correctly ordered before reaching network.target / network-online.target.
The below was tested on focal.
This thus implements the just-in-time nature of ensuring that required units from netplan are started during boot, even if netplan configuration arrives on disk, just-in-time, before network-pre.target.
Furthermore, we could potentially completely eliminate requirement for cloud-init to call
netplan generateby-hand if we were to add a service unit which doesExecStart=/usr/bin/netplan generateand is wanted by thenetwork-pre.target. That is true, because cloud-init-local.service is before network-pre.target and ensures that network configuration is on disk before network-pre.target is starting.Checklist
make checksuccessfully.make check-coverage).Coverage is reduced! but this code path will need at integration test-suite that requires root, as newly added code paths must not be called with a root-dir arg.