Skip to content

containerd: Required by cloud-config instead of cloud-final#114

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:masterfrom
randomvariable:containerd-systemd-fixup
Jan 9, 2020
Merged

containerd: Required by cloud-config instead of cloud-final#114
k8s-ci-robot merged 1 commit intokubernetes-sigs:masterfrom
randomvariable:containerd-systemd-fixup

Conversation

@randomvariable
Copy link
Copy Markdown
Member

@randomvariable randomvariable commented Jan 6, 2020

Fixes kubernetes-sigs/cluster-api#1714 (comment)
Critical chain becomes:

cloud-final.service +461ms
└─cloud-config.service @4.999s +638ms
  └─containerd.service @4.941s +9ms
    └─cloud-init.service @4.055s +875ms
      └─network.service @1.848s +2.201s
        └─network-pre.target @1.847s
          └─cloud-init-local.service @1.099s +748ms
            └─basic.target @1.042s
              └─sockets.target @1.042s
                └─dbus.socket @1.041s
                  └─sysinit.target @923ms
                    └─systemd-update-utmp.service @910ms +12ms
                      └─systemd-tmpfiles-setup.service @828ms +15ms
                        └─local-fs.target @821ms
                          └─local-fs-pre.target @820ms
                            └─lvm2-monitor.service @305ms +514ms
                              └─lvm2-lvmetad.service @399ms
                                └─lvm2-lvmetad.socket @274ms
                                  └─-.slice

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 6, 2020
@randomvariable randomvariable changed the title containerd: Require cloud-config instead of cloud-final containerd: Required by cloud-config instead of cloud-final Jan 6, 2020
@randomvariable randomvariable force-pushed the containerd-systemd-fixup branch from cd497ce to c432ea0 Compare January 6, 2020 12:05
@figo
Copy link
Copy Markdown
Contributor

figo commented Jan 6, 2020

Thank you for looking into it @randomvariable .

Two questions in my mind:

  1. cloud-init.service is not defined as any of boot stages according to doc: https://cloudinit.readthedocs.io/en/latest/topics/boot.html.
    local test shows it can run pretty early on:
multi-user.target @47.324s
`-vmtoolsd.service @47.324s
  `-cloud-final.service @4.645s +42.677s
    `-containerd.service @4.642s +1ms
      `-cloud-config.service @4.224s +417ms
        `-network-online.target @4.224s
          `-cloud-init.service @3.456s +765ms
            `-systemd-networkd-wait-online.service @1.767s +1.688s
              `-systemd-networkd.service @1.756s +10ms
                `-network-pre.target @1.756s
                  `-cloud-init-local.service @420ms +1.335s
                    `-basic.target @412ms

Would like to understand more about cloud-init.service

  1. According to the same boot stage doc above, runcmd is run at cloud-final stage.
user-scripts (including runcmd) run at Final stage

I would like to understand more about those two questions.
cc @akutz @detiber

@akutz
Copy link
Copy Markdown
Contributor

akutz commented Jan 6, 2020

Hi @figo,

cloud-init.service is defined here https://cloudinit.readthedocs.io/en/latest/topics/boot.html#network

@randomvariable
Copy link
Copy Markdown
Member Author

randomvariable commented Jan 6, 2020

Vendors have customised /etc/cloud/cloud.cfg, for RH/CentOS, Ubuntu and AL2 at least, they have all moved runcmd to cloud-config modules. In addition, we mainly want to definitely queue containerd after cloud-init.service because that's where write_files is.

@randomvariable
Copy link
Copy Markdown
Member Author

For completeness:

CentOS 7:

users:
 - default

disable_root: 1
ssh_pwauth:   0

mount_default_fields: [~, ~, 'auto', 'defaults,nofail,x-systemd.requires=cloud-init.service', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys:   0
ssh_genkeytypes:  ~
syslog_fix_perms: ~
disable_vmware_customization: false

cloud_init_modules:
 - disk_setup
 - migrator
 - bootcmd
 - write-files
 - growpart
 - resizefs
 - set_hostname
 - update_hostname
 - update_etc_hosts
 - rsyslog
 - users-groups
 - ssh

cloud_config_modules:
 - mounts
 - locale
 - set-passwords
 - rh_subscription
 - yum-add-repo
 - package-update-upgrade-install
 - timezone
 - puppet
 - chef
 - salt-minion
 - mcollective
 - disable-ec2-metadata
 - runcmd

cloud_final_modules:
 - rightscale_userdata
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

system_info:
  default_user:
    name: centos
    lock_passwd: true
    gecos: Cloud User
    groups: [wheel, adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  distro: rhel
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

# vim:syntax=yaml

centos 8:

users:
 - default

disable_root: 1
ssh_pwauth:   0

mount_default_fields: [~, ~, 'auto', 'defaults,nofail,x-systemd.requires=cloud-init.service', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys:   0
ssh_genkeytypes:  ~
syslog_fix_perms: ~
disable_vmware_customization: false

cloud_init_modules:
 - disk_setup
 - migrator
 - bootcmd
 - write-files
 - growpart
 - resizefs
 - set_hostname
 - update_hostname
 - update_etc_hosts
 - rsyslog
 - users-groups
 - ssh

cloud_config_modules:
 - mounts
 - locale
 - set-passwords
 - rh_subscription
 - yum-add-repo
 - package-update-upgrade-install
 - timezone
 - puppet
 - chef
 - salt-minion
 - mcollective
 - disable-ec2-metadata
 - runcmd

cloud_final_modules:
 - rightscale_userdata
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

system_info:
  default_user:
    name: cloud-user
    lock_passwd: true
    gecos: Cloud User
    groups: [wheel, adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  distro: rhel
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

# vim:syntax=yaml

Ubuntu 18.04:

# The top level settings are used as module
# and system configuration.

# A set of users which may be applied and/or used by various modules
# when a 'default' entry is found it will reference the 'default_user'
# from the distro configuration specified below
users:
   - default

# If this is set, 'root' will not be able to ssh in and they
# will get a message to login instead as the default $user
disable_root: true

# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false

# Example datasource config
# datasource:
#    Ec2:
#      metadata_urls: [ 'blah.com' ]
#      timeout: 5 # (defaults to 50 seconds)
#      max_wait: 10 # (defaults to 120 seconds)

# The modules that run in the 'init' stage
cloud_init_modules:
 - migrator
 - seed_random
 - bootcmd
 - write-files
 - growpart
 - resizefs
 - disk_setup
 - mounts
 - set_hostname
 - update_hostname
 - update_etc_hosts
 - ca-certs
 - rsyslog
 - users-groups
 - ssh

# The modules that run in the 'config' stage
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
 - emit_upstart
 - snap
 - snap_config  # DEPRECATED- Drop in version 18.2
 - ssh-import-id
 - locale
 - set-passwords
 - grub-dpkg
 - apt-pipelining
 - apt-configure
 - ubuntu-advantage
 - ntp
 - timezone
 - disable-ec2-metadata
 - runcmd
 - byobu

# The modules that run in the 'final' stage
cloud_final_modules:
 - snappy  # DEPRECATED- Drop in version 18.2
 - package-update-upgrade-install
 - fan
 - landscape
 - lxd
 - ubuntu-drivers
 - puppet
 - chef
 - mcollective
 - salt-minion
 - rightscale_userdata
 - scripts-vendor
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

# System and/or distro specific settings
# (not accessible to handlers/transforms)
system_info:
   # This will affect which distro class gets used
   distro: ubuntu
   # Default user name + that default users groups (if added/used)
   default_user:
     name: ubuntu
     lock_passwd: True
     gecos: Ubuntu
     groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
     sudo: ["ALL=(ALL) NOPASSWD:ALL"]
     shell: /bin/bash
   # Automatically discover the best ntp_client
   ntp_client: auto
   # Other config here will be given to the distro class and/or path classes
   paths:
      cloud_dir: /var/lib/cloud/
      templates_dir: /etc/cloud/templates/
      upstart_dir: /etc/init/
   package_mirrors:
     - arches: [i386, amd64]
       failsafe:
         primary: http://archive.ubuntu.com/ubuntu
         security: http://security.ubuntu.com/ubuntu
       search:
         primary:
           - http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
           - http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
           - http://%(region)s.clouds.archive.ubuntu.com/ubuntu/
         security: []
     - arches: [arm64, armel, armhf]
       failsafe:
         primary: http://ports.ubuntu.com/ubuntu-ports
         security: http://ports.ubuntu.com/ubuntu-ports
       search:
         primary:
           - http://%(ec2_region)s.ec2.ports.ubuntu.com/ubuntu-ports/
           - http://%(availability_zone)s.clouds.ports.ubuntu.com/ubuntu-ports/
           - http://%(region)s.clouds.ports.ubuntu.com/ubuntu-ports/
         security: []
     - arches: [default]
       failsafe:
         primary: http://ports.ubuntu.com/ubuntu-ports
         security: http://ports.ubuntu.com/ubuntu-ports
   ssh_svcname: ssh

Amazon Linux 2:

# WARNING: Modifications to this file may be overridden by files in
# /etc/cloud/cloud.cfg.d

users:
 - default

disable_root: true
ssh_pwauth:   false

mount_default_fields: [~, ~, 'auto', 'defaults,nofail', '0', '2']
resize_rootfs: noblock
resize_rootfs_tmp: /dev
ssh_deletekeys:   false
ssh_genkeytypes:  ~
syslog_fix_perms: ~

datasource_list: [ Ec2, None ]
repo_upgrade: security
repo_upgrade_exclude:
 - kernel
 - nvidia*
 - cuda*

# Might interfere with ec2-net-utils
network:
  config: disabled

cloud_init_modules:
 - migrator
 - bootcmd
 - write-files
 - write-metadata
 - growpart
 - resizefs
 - set-hostname
 - update-hostname
 - update-etc-hosts
 - rsyslog
 - users-groups
 - ssh
 - resolv-conf

cloud_config_modules:
 - disk_setup
 - mounts
 - locale
 - set-passwords
 - yum-configure
 - yum-add-repo
 - package-update-upgrade-install
 - timezone
 - disable-ec2-metadata
 - runcmd

cloud_final_modules:
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

system_info:
  # This will affect which distro class gets used
  distro: amazon
  distro_short: amzn
  default_user:
    name: ec2-user
    lock_passwd: true
    gecos: EC2 Default User
    groups: [wheel, adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

mounts:
 - [ ephemeral0, /media/ephemeral0 ]
 - [ swap, none, swap, sw, "0", "0" ]
# vim:syntax=yaml

@figo
Copy link
Copy Markdown
Contributor

figo commented Jan 6, 2020

/lgtm
/approve
/hold
to allow for more reviews, but feel free to cancel it.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jan 6, 2020
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: figo, randomvariable

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2020
@figo
Copy link
Copy Markdown
Contributor

figo commented Jan 9, 2020

/hold cancel
I think we should check in this.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2020
@k8s-ci-robot k8s-ci-robot merged commit 68adeef into kubernetes-sigs:master Jan 9, 2020
@detiber
Copy link
Copy Markdown
Contributor

detiber commented Jan 21, 2020

One thing to note is that the original targets where chosen to allow one to specify containerd configuration in cloud-init prior to containerd being started. With this change we can no longer expect that to work and instead one would have to also issue a systemctl restart containerd to have any configuration made during cloud-init take effect.

I think this is fine, but I'm wondering if we should document this somewhere.

@randomvariable
Copy link
Copy Markdown
Member Author

randomvariable commented Jan 24, 2020

@detiber I already checked the stages for that. Write-files happens in cloud-init, so we're safe.

See #114 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in cloud init

5 participants