Despite our best efforts in #113 and #114, it is possible to repeatedly get containerd start up too late for kubeadm to succeed in cloud-final.
This can be done with a 3-node shared storage cluster with the networking deliberately slowed to 1Gb, which may not be uncommon in home lab environments and smaller organisations.
In this situation, if the backing storage is busy, then during machine boot, the following will happen:
- systemd will start containerd prior to cloud-final
- containerd will not have initialized until after cloud-final fails
Mar 11 20:09:35 test-md-0-7bf7c858b4-6p6rx systemd[1]: Starting containerd container runtime...
Mar 11 20:09:35 test-md-0-7bf7c858b4-6p6rx systemd[1]: Started containerd container runtime.
...
20 seconds of nothing, thanks
...
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.472851668Z" level=info msg="starting containerd" revision=d76c121f76a5fc8a462dc64594aea72fe18e1178 version=v1.3.3
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.504408678Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.505534329Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
Mar 11 20:09:42 test-md-0-7bf7c858b4-6p6rx systemd[1]: Starting Execute cloud user/final scripts...
Mar 11 20:09:51 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: W0311 20:09:51.668517 1067 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
Mar 11 20:09:51 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: W0311 20:09:51.991649 1067 common.go:77] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta1". Please use 'kubeadm config migrate --old-config old.yam
Mar 11 20:09:52 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: [preflight] Running pre-flight checks
...
you could have waited one more second!
...
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: error execution phase preflight: [preflight] Some fatal errors occurred:
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: [ERROR CRI]: container runtime is not running: output: time="2020-03-11T20:09:56Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as ro
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: , error: exit status 1
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
root@test-md-0-7bf7c858b4-6p6rx:/etc/systemd/system# systemd-analyze blame
17.243s motd-news.service
15.152s cloud-final.service
6.650s cloud-config.service
3.554s networkd-dispatcher.service
3.397s systemd-networkd-wait-online.service
2.816s dev-sda1.device
2.770s cloud-init-local.service
1.784s apparmor.service
1.624s cloud-init.service
1.505s grub-common.service
1.294s ebtables.service
910ms accounts-daemon.service
898ms ssh.service
882ms rsyslog.service
867ms conntrackd.service
773ms systemd-timesyncd.service
669ms setvtrgb.service
530ms systemd-machine-id-commit.service
340ms systemd-logind.service
206ms keyboard-setup.service
109ms systemd-resolved.service
109ms systemd-remount-fs.service
104ms fstrim.service
94ms systemd-tmpfiles-setup.service
90ms systemd-journald.service
70ms systemd-udevd.service
69ms systemd-sysctl.service
69ms systemd-user-sessions.service
64ms systemd-journal-flush.service
63ms [email protected]
61ms systemd-tmpfiles-setup-dev.service
46ms console-setup.service
45ms kmod-static-nodes.service
42ms systemd-modules-load.service
root@test-md-0-7bf7c858b4-6p6rx:/etc/systemd/system# systemd-analyze critical-chain cloud-final.service
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.
cloud-final.service +15.152s
└─cloud-config.service @12.013s +6.650s
└─containerd.service @12.003s +10ms
└─basic.target @11.120s
└─sockets.target @11.120s
└─dbus.socket @11.120s
└─sysinit.target @11.119s
└─cloud-init.service @9.494s +1.624s
└─systemd-networkd-wait-online.service @6.096s +3.397s
└─systemd-networkd.service @6.061s +34ms
└─network-pre.target @6.060s
└─cloud-init-local.service @3.289s +2.770s
└─open-vm-tools.service @3.289s
└─vgauth.service @3.288s
└─apparmor.service @1.504s +1.784s
└─local-fs.target @1.502s
└─local-fs-pre.target @1.502s
└─keyboard-setup.service @1.296s +206ms
└─systemd-journald.socket @1.296s
└─system.slice @1.295s
└─-.slice @1.292s
Despite our best efforts in #113 and #114, it is possible to repeatedly get containerd start up too late for kubeadm to succeed in
cloud-final.This can be done with a 3-node shared storage cluster with the networking deliberately slowed to 1Gb, which may not be uncommon in home lab environments and smaller organisations.
In this situation, if the backing storage is busy, then during machine boot, the following will happen: