Skip to content

containerd - cloud-init ordering part 2 #173

@randomvariable

Description

@randomvariable

Despite our best efforts in #113 and #114, it is possible to repeatedly get containerd start up too late for kubeadm to succeed in cloud-final.

This can be done with a 3-node shared storage cluster with the networking deliberately slowed to 1Gb, which may not be uncommon in home lab environments and smaller organisations.

In this situation, if the backing storage is busy, then during machine boot, the following will happen:

  1. systemd will start containerd prior to cloud-final
  2. containerd will not have initialized until after cloud-final fails
Mar 11 20:09:35 test-md-0-7bf7c858b4-6p6rx systemd[1]: Starting containerd container runtime...
Mar 11 20:09:35 test-md-0-7bf7c858b4-6p6rx systemd[1]: Started containerd container runtime.
...
20 seconds of nothing, thanks
...
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.472851668Z" level=info msg="starting containerd" revision=d76c121f76a5fc8a462dc64594aea72fe18e1178 version=v1.3.3
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.504408678Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk containerd[806]: time="2020-03-11T20:09:57.505534329Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
Mar 11 20:09:42 test-md-0-7bf7c858b4-6p6rx systemd[1]: Starting Execute cloud user/final scripts...
Mar 11 20:09:51 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: W0311 20:09:51.668517    1067 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
Mar 11 20:09:51 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: W0311 20:09:51.991649    1067 common.go:77] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta1". Please use 'kubeadm config migrate --old-config old.yam
Mar 11 20:09:52 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: [preflight] Running pre-flight checks
...
you could have waited one more second!
...
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: error execution phase preflight: [preflight] Some fatal errors occurred:
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]:         [ERROR CRI]: container runtime is not running: output: time="2020-03-11T20:09:56Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as ro
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: , error: exit status 1
Mar 11 20:09:57 test-md-0-7bf7c858b4-6p6rx.home.internal.randomvariable.co.uk cloud-init[1035]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
root@test-md-0-7bf7c858b4-6p6rx:/etc/systemd/system# systemd-analyze blame
         17.243s motd-news.service
         15.152s cloud-final.service
          6.650s cloud-config.service
          3.554s networkd-dispatcher.service
          3.397s systemd-networkd-wait-online.service
          2.816s dev-sda1.device
          2.770s cloud-init-local.service
          1.784s apparmor.service
          1.624s cloud-init.service
          1.505s grub-common.service
          1.294s ebtables.service
           910ms accounts-daemon.service
           898ms ssh.service
           882ms rsyslog.service
           867ms conntrackd.service
           773ms systemd-timesyncd.service
           669ms setvtrgb.service
           530ms systemd-machine-id-commit.service
           340ms systemd-logind.service
           206ms keyboard-setup.service
           109ms systemd-resolved.service
           109ms systemd-remount-fs.service
           104ms fstrim.service
            94ms systemd-tmpfiles-setup.service
            90ms systemd-journald.service
            70ms systemd-udevd.service
            69ms systemd-sysctl.service
            69ms systemd-user-sessions.service
            64ms systemd-journal-flush.service
            63ms [email protected]
            61ms systemd-tmpfiles-setup-dev.service
            46ms console-setup.service
            45ms kmod-static-nodes.service
            42ms systemd-modules-load.service
root@test-md-0-7bf7c858b4-6p6rx:/etc/systemd/system# systemd-analyze critical-chain cloud-final.service
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

cloud-final.service +15.152s
└─cloud-config.service @12.013s +6.650s
  └─containerd.service @12.003s +10ms
    └─basic.target @11.120s
      └─sockets.target @11.120s
        └─dbus.socket @11.120s
          └─sysinit.target @11.119s
            └─cloud-init.service @9.494s +1.624s
              └─systemd-networkd-wait-online.service @6.096s +3.397s
                └─systemd-networkd.service @6.061s +34ms
                  └─network-pre.target @6.060s
                    └─cloud-init-local.service @3.289s +2.770s
                      └─open-vm-tools.service @3.289s
                        └─vgauth.service @3.288s
                          └─apparmor.service @1.504s +1.784s
                            └─local-fs.target @1.502s
                              └─local-fs-pre.target @1.502s
                                └─keyboard-setup.service @1.296s +206ms
                                  └─systemd-journald.socket @1.296s
                                    └─system.slice @1.295s
                                      └─-.slice @1.292s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions