Skip to content

Conversation

@nkinkade
Copy link
Contributor

@nkinkade nkinkade commented Apr 27, 2023

This PR is primarily around deploying the new epoxy-extension-server to API machines, and removing the old token-server and bmc-store-password services from API machines.

Additionally, there are changes to the physical machine images, which now embed the script setup_k8s.sh into the filesystem, and the script is no longer a template but instead leverages the new V2 allocate_k8s_token extension, which returns all the data the machine needs to join the cluster. setup_k8s.sh is now a static script.

Beyond that, this PR also contains a few bug fixes in the create-control-plane.sh script which I discovered while testing all of this.


This change is Reviewable

nkinkade added 12 commits April 14, 2023 15:55
The token-server and bmc-store-password ePoxy extensions are now
replaced by a new ePoxy "extension server." Instead of individual
extension container images, they are now all combined into a single
binary and container image that listens on a single port.
See long comment in change set for details
Previously there were separate token-server and bmc-store-password
containers and systemd units. These have now been combined into a single
extension server that listens on a single port.
flannel was failing to start on sandbox nodes, causing the node to be
ina NotReady state because networking was not ready.  The pod
description had this event:

"Error: failed to create containerd container: get apparmor_parser
version: exec: "apparmor_parser": executable file not found in $PATH"

This appears to be related to some changes going on in containerd:

containerd/containerd#8087
The create-control-plane.service is supposed to run _after_
mount-data-api, but that ordering was broken because the name of the
service changed and I failed to update the "After" block with the new
name.
If the query to the live cluster for its version fails, then don't
bother doing any version checking. The live cluster may not even exist,
and possibly needs the images from this build so that it can be created.
Adds an additional, redundant check for the existence of
/etc/kubernetes/admin.conf before initializing the cluster. A bug in our
config caused the service unit to run even though that file existed, and
kubeadm overwrote numerous things before finally erroring out. Can't
hurt to add the additional check in this file.

For nodes joining the cluster, wait for 90s (up from 60s) before trying
to join to give the primary control plane node time to finish setting
everything up. I discovered that 60s was not quite enough, and nodes
joining the control plane might get a connection refused from the
primary API endpoint.
On control plane machines, /etc/kubernetes is supposed to be a symlink
to /mnt/cluster-data/kubernetes. When /etc/kubernetes already exists as
a regular dir, then ln creates a symlink inside /etc/kubernetes,
breaking the configuration and breakage of the create-control-plane
service. Anyway, on control plane nodes that directory will be created
automatically by kubeadm.
ePoxy extension allocate_k8s_token V2 returns all the data needed to
join the cluster. This commit removes all templating from setup_k8s.sh
and moves it into the physical image filesystem. It is now a static
script which can fetch everything it needs from allocate_k8s_token V2.
@nkinkade nkinkade requested a review from robertodauria April 28, 2023 17:14
Previously, the script assumed that all VMs were going to be part of a
MIG. We have decided to have a hybrid approach with both MIGs and
standard VMs, which required a few changes.

Additionally, configure the script to the V2 allocate_k8s_token ePoxy
extension, which returns all the data needed to join the cluster, not
just the token. This also required some refactoring of the code.
Copy link
Contributor

@robertodauria robertodauria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 13 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @nkinkade)


configs/virtual_ubuntu/opt/mlab/bin/join-cluster.sh line 71 at r1 (raw file):

extension_v1="{\"v1\":{\"hostname\":\"${hostname}\",\"last_boot\":\"$(date --utc +%Y-%m-%dT%T.%NZ)\"}}"

# Fetch cluser bootstrap join data from the epoxy-extension-server.

Typo: cluster

@nkinkade nkinkade merged commit ca6a0b1 into main May 9, 2023
@nkinkade nkinkade deleted the sandbox-kinkade branch May 9, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants