0% found this document useful (0 votes)
45 views141 pages

Linux Foundations - Kubernetes Fundamentals

The document outlines the fundamentals of Kubernetes and its role in managing containerized applications, detailing its architecture, components, and the challenges it addresses. It also provides an overview of the Linux Foundation, its training offerings, and events that promote open source collaboration. Additionally, it discusses the heritage of Kubernetes from Google's internal project Borg and compares it with other container management solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views141 pages

Linux Foundations - Kubernetes Fundamentals

The document outlines the fundamentals of Kubernetes and its role in managing containerized applications, detailing its architecture, components, and the challenges it addresses. It also provides an overview of the Linux Foundation, its training offerings, and events that promote open source collaboration. Additionally, it discusses the heritage of Kubernetes from Google's internal project Borg and compares it with other container management solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Linux Foundations – Kubernetes Fundamentals

Content before of the Course


1. Linux Distributions

 Standard?
 Learn More?
 Certification?

Debian: stability. Open source project with the most complete software repository. Ubuntu is
debian-based. It is used for servers & desktops, uses apt-get and front-ends for instal.

 https://wiki.debian.org/systemd/CheatSheet

Red Hat / Fedora Family: standard. Red Hat Enterprise Linux, CentOS, Scientific Linux and
Oracle Linux. RHEL is the most popular for Enterprise and CentOS 7 is the main used. RPM-
based, yum or dnf for install.

 https://fedoraproject.org/wiki/SysVinit_to_Systemd_Cheatsheet

openSUSE Family: servers. Rpm-based. Zypper for install.

 https://en.opensuse.org/openSUSE:Cheat_sheet_13.1#Services

Common:
2. How to setup a GCE Lab Environment

3. How to setup a AWS Lab Environment

4. How to use Putty

01. COURSE INTRODUCTION


Course formatting:

The Linux Foundation


The Linux Foundation provides a neutral, trusted hub for developers
to code, manage, and scale open technology projects. Founded in
2000, The Linux Foundation is supported by more than 1,000
members and is the world’s leading home for collaboration on open
source software, open standards, open data and open hardware. The
Linux Foundation’s methodology focuses on leveraging best practices
and addressing the needs of contributors, users and solution
providers to create sustainable models for open collaboration.

The Linux Foundation hosts Linux, the world's largest and most
pervasive open source software project in history. It is also home to
Linux creator Linus Torvalds and lead maintainer Greg Kroah-
Hartman. The success of Linux has catalyzed growth in the open
source community, demonstrating the commercial efficacy of open
source and inspiring countless new projects across all industries and
levels of the technology stack.

As a result, the Linux Foundation today hosts far more than Linux; it is
the umbrella for many critical open source projects that power
corporations today, spanning virtually all industry sectors. Some of
the technologies we focus on include big data and analytics,
networking, embedded systems and IoT, web tools, cloud computing,
edge computing, automotive, security, blockchain, and many more.

Cloud Native Computing


Foundation (CNCF)
Cloud Native Computing Foundation (CNCF) is an open source
software foundation under the Linux Foundation umbrella dedicated
to making cloud native computing universal and sustainable. Cloud
native computing uses an open source software stack to deploy
applications as microservices, packaging each part into its own
container, and dynamically orchestrating those containers to optimize
resource utilization. Cloud native technologies enable software
developers to build great products faster.

CNCF serves as a vendor-neutral home for many of the fastest-


growing projects on GitHub, including Kubernetes, Prometheus, and
Envoy, fostering collaboration between the industry's top developers,
end users and vendors.

The Linux Foundation Events


Over 85,000 open source technologists and leaders worldwide gather
at Linux Foundation events annually to share ideas, learn and
collaborate. Linux Foundation events are the meeting place of choice
for open source maintainers, developers, architects, infrastructure
managers, and sysadmins and technologists leading open source
program offices, and other critical leadership functions.

These events are the best place to gain visibility within the open
source community quickly and advance open source development
work by forming connections with the people evaluating and creating
the next generation of technology. They provide a forum to share and
gain knowledge, help organizations identify software trends early to
inform future technology investments, connect employers with talent,
and showcase technologies and services to influential open source
professionals, media, and analysts around the globe.

The Linux Foundation hosts an increasing number of events each


year, including:

 Open Source Summit North America, Europe, and Japan


 Embedded Linux Conference North America and Europe
 Open Networking & Edge Summit
 KubeCon + CloudNativeCon North America, Europe, and China
 Automotive Linux Summit
 KVM Forum
 Linux Storage Filesystem and Memory Management Summit
 Linux Security Summit North America and Europe
 Linux Kernel Maintainer Summit
 The Linux Foundation Member Summit
 Open Compliance Summit
 And many more.

You can learn more about the Linux Foundation events online.

Training Venues
The Linux Foundation's training is for the community, by the
community, and features instructors and content straight from the
leaders of the Linux developer community.

The Linux Foundation offers several types of training:

 Classroom
 Online
 On-site
 Events-based.

Attendees receive Linux and open source software training that is


distribution-flexible, technically advanced and created with the actual
leaders of the Linux and open source software development
community themselves. The Linux Foundation courses give attendees
the broad, foundational knowledge and networking needed to thrive
in their careers today. With either online or in-person training, The
Linux Foundation classes can keep you or your developers ahead of
the curve on open source essentials.

The Linux Foundation Training


Offerings
Our current course offerings include:

 Linux Programming & Development Training


 Enterprise IT & Open Source System Administration Courses
 Open Source Compliance Courses.
To get more information about specific courses offered by the Linux
Foundation, including technical requirements and other logistics, visit
the Linux Foundation training website.

The Linux Foundation


Certifications
The Linux Foundation certifications give you a way to differentiate
yourself in a job market that's hungry for your skills. We've taken a
new, innovative approach to open source certification that allows you
to showcase your skills in a way that other peers will respect and
employers will trust:

 You can take your certification exam from any computer, anywhere,
at any time.
 The certification exams are either performance-based or multiple
choice.
 The exams are distribution-flexible.
 The exams are up-to-date, testing knowledge and skills that actually
matter in today's IT environment.

Training/Certification Firewall
The Linux Foundation has two separate training divisions: Course
Delivery and Certification. These two divisions are separated by
a firewall.

The curriculum development and maintenance division of the Linux


Foundation Training department has no direct role in developing,
administering, or grading certification exams.
Enforcing this self-imposed firewall ensures that independent
organizations and companies can develop third party training
material, geared towards helping test takers pass their certification
exams.

Furthermore, it ensures that there are no secret "tips" (or secrets in


general) that one needs to be familiar with in order to succeed.

It also permits the Linux Foundation to develop a very robust set of


courses that do far more than teach the test, but rather equip
attendees with a broad knowledge of the many areas they may be
required to master to have a successful career in open source system
administration.

02. BASICS OF KUBERNETES


INTRODUCTION
Chapter Overview
In this session we are gonna cover the basics of kubernetes. We are
gonna understand how kubernetes fits into a production environment.
The word kubernetes comes from the Greek word of “pilot”:
κυβερνήτης, “helmsman” or “pilot” in Greek and it’s called that
because Kubernetes is all about orchestration or the deployment and
clean up of the resources in an automated action. Kubernetes started
as an internal google project called Borg for deploy resources around
the world. The project was “alive” during 15 years before to be
dropped to the open source community.

Kubernetes is all about decouple and transient services. Decoupling


means that everything has been design to not require anything else
in particular. Transient meaning that the whole system expects
various components to be terminated and replaced. By having a
framework which allows to connect new components to the existing
one and the replacements to ensure the scalability and flexibility.

Learning Objectives
By the end of this chapter, you should be able to:
 Discuss Kubernetes.
 Learn the basic Kubernetes terminology.
 Discuss the configuration tools.
 Learn what community resources are available.

BASICS OF KUBERNETES

What Is Kubernetes?
Running a container on a laptop is relatively simple. But, connecting
containers across multiple hosts, scaling them, deploying applications
without downtime, and service discovery among several aspects, can
be difficult.

Kubernetes addresses those challenges from the start with a set of


primitives and a powerful open and extensible API. The ability to add
new objects and controllers allows easy customization for various
production needs.

According to the kubernetes.io website, Kubernetes is:

"an open-source software for automating deployment, scaling, and


management of containerized applications".

A key aspect of Kubernetes is that it builds on 15 years of experience


at Google in a project called borg.

Google's infrastructure started reaching high scale before virtual


machines became pervasive in the datacenter, and containers
provided a fine-grained solution for packing clusters efficiently.
Efficiency in using clusters and managing distributed applications has
been at the core of Google challenges.

Components of
Kubernetes
Deploying containers and using Kubernetes may require a change in
the development and the system administration approach to
deploying applications. In a traditional environment, an application
(such as a web server) would be a monolithic application placed on a
dedicated server. As the web traffic increases, the application would
be tuned, and perhaps moved to bigger and bigger hardware. After a
couple of years, a lot of customization may have been done in order
to meet the current web traffic needs.

Instead of using a large server, Kubernetes approaches the same


issue by deploying a large number of small servers, or microservices.
The server and client sides of the application are written to expect
that there are many possible agents available to respond to a
request. It is also important that clients expect the server processes
to die and be replaced, leading to a transient server deployment.
Instead of a large Apache web server with many httpd daemons
responding to page requests, there would be many nginx servers,
each responding.

The transient nature of smaller services also allows for decoupling.


Each aspect of the traditional application is replaced with a dedicated,
but transient, microservice or agent. To join these agents, or their
replacements together, we use services. A service ties traffic from
one agent to another (for example, a frontend web server to a
backend database) and handles new IP or other information, should
either one die and be replaced.

Communication is entirely API call-driven, which allows for flexibility.


Cluster configuration information is stored in a JSON format inside of
etcd, but is most often written in YAML by the community. Kubernetes
agents convert the YAML to JSON prior to persistence to the database.

Challenges
Containers provide a great way to package, ship, and run applications
- that is the Docker motto.

The developer experience has been boosted tremendously thanks to


containers. Containers, and Docker specifically, have empowered
developers with ease of building container images, simplicity of
sharing images via registries, and providing a powerful user
experience to manage containers.
However, managing containers at scale and designing a distributed
application based on microservices' principles may be challenging.

A smart first step is deciding on a continuous integration/continuous


delivery (CI/CD) pipeline to build, test and verify container images.
Tools such as Spinnaker, Jenkins and Helm can be helpful to use,
among other possible tools. This will help with the challenges of a
dynamic environment.

Then, you need a cluster of machines acting as your base


infrastructure on which to run your containers. You also need a
system to launch your containers, and watch over them when things
fail and replace as required. Rolling updates and easy rollbacks of
containers is an important feature, and eventually tear down the
resource when no longer needed.

All of these actions require flexible, scalable, and easy-to-use network


and storage. As containers are launched on any worker node, the
network must join the resource to other containers, while still keeping
the traffic secure from others. We also need a storage structure which
provides and keeps or recycles storage in a seamless manner.

When Kubernetes answers these concerns, one of the biggest


challenges to adoption is the applications themselves, running inside
the container. They need to be written, or re-written, to be truly
transient. A good question to ponder: If you were to deploy Chaos
Monkey, which could terminate any containers at any time, would
your customers notice?

Other Solutions
Built on open source and easily extensible, Kubernetes is definitely a
solution to manage containerized applications. There are other
solutions as well.

Docker Swarm

Docker Swarm is the solution provided by Docker


Inc. It has been re-architected recently and is
based on SwarmKit. It is embedded with the
Docker Engine.

Apache Mesos

Apache Mesos is a data center scheduler, which


can run containers through the use
of frameworks. Marathon is the framework that lets you orchestrate
containers.

Nomad

Nomad from HashiCorp, the makers of Vagrant


and Consul, is another solution for managing
containerized applications. Nomad schedules tasks
defined in Jobs. It has a Docker driver which lets
you define a running container as a task.

Rancher

Rancher is a container orchestrator-agnostic


system, which provides a single pane of glass
interface to manage applications. It supports
Mesos, Swarm, Kubernetes.

Borg Heritage
What primarily distinguishes Kubernetes from other systems is its
heritage. Kubernetes is inspired by Borg - the internal system used by
Google to manage its applications (e.g. Gmail, Apps, GCE).

With Google pouring the valuable lessons they learned from writing
and operating Borg for over 15 years into Kubernetes, this makes
Kubernetes a safe choice when having to decide on what system to
use to manage containers. While a powerful tool, part of the current
growth in Kubernetes is making it easier to work with and handle
workloads not found in a Google data center.

To learn more about the ideas behind Kubernetes, you can read
the Large-scale cluster management at Google with Borg paper.
The Kubernetes Lineage
Chip Childers, Cloud Foundry Foundation
Retrieved from The Platform for Forking Cloud Native
Applications presentation

Borg has inspired current data center systems, as well as the


underlying technologies used in container runtime today. Google
contributed cgroups to the Linux kernel in 2007; it limits the
resources used by collection of processes. Both cgroups and Linux
namespaces are at the heart of containers today, including Docker.

Mesos was inspired by discussions with Google when Borg was still a
secret. Indeed, Mesos builds a multi-level scheduler, which aims to
better use a data center cluster.

The Cloud Foundry Foundation embraces the 12 factor application


principles. These principles provide great guidance to build web
applications that can scale easily, can be deployed in the cloud, and
whose build is automated. Borg and Kubernetes address these
principles as well.

Kubernetes Architecture
To quickly demystify Kubernetes, let's have a look at the Kubernetes
Architecture graphic, which shows a high-level architecture diagram
of the system components. Not all components are shown. Every
node running a container would have kubelet and kube-proxy, for
example.

Kubernetes Architecture

In its simplest form, Kubernetes is made of control plane nodes


(aka cp nodes) and worker nodes, once called minions. We will see in
a follow-on chapter how you can actually run everything on a single
node for testing purposes. The cp runs an API server, a scheduler,
various controllers and a storage system to keep the state of the
cluster, container settings, and the networking configuration.

Kubernetes exposes an API via the API server. You can communicate
with the API using a local client called kubectl or you can write your
own client and use curl commands. The kube-scheduler is
forwarded the pod spec for running containers coming to the API and
finds a suitable node to run those containers. Each node in the cluster
runs two processes: a kubelet, which is often a systemd process, not
a container, and kube-proxy. The kubelet receives requests to run the
containers, manages any resources necessary and works with the
container engine to manage them on the local node. The local
container engine could be Docker, cri-o, containerd, or some other.

The kube-proxy creates and manages networking rules to expose


the container on the network to other containers or the outside world.

Using an API-based communication scheme allows for non-Linux


worker nodes and containers. Support for Windows Server 2019 was
graduated to Stable with the 1.14 release. Only Linux nodes can be cp
of the cluster at this time.

Terminology
We have learned that Kubernetes is an orchestration system to
deploy and manage containers. Containers are not managed
individually; instead, they are part of a larger object called a Pod. A
Pod consists of one or more containers which share an IP address,
access to storage and namespace. Typically, one container in a Pod
runs an application, while other containers support the primary
application.

Kubernetes uses namespaces to keep objects distinct from each


other, for resource control and multi-tenant considerations. Some
objects are cluster-scoped, others are scoped to one namespace at a
time. As the namespace is a segregation of resources, pods would
need to leverage services to communicate.

Orchestration is managed through a series of watch-loops, also called


controllers or operators. Each controller interrogates the kube-
apiserver for a particular object state, then modifying the object
until the declared state matches the current state. These controllers
are compiled into the kube-controller-manager, but others can be
added using custom resource definitions. The default and feature-
filled operator for containers is a Deployment. A Deployment does
not directly work with pods. Instead it manages ReplicaSets. The
ReplicaSet is an operator which will create or terminate pods
according to a podSpec. The podSpec is sent to the kubelet, which
then interacts with the container engine to download and make
available the required resources, then spawn or terminate containers
until the status matches the spec.

The service operator requests existing IP addresses and information


from the endpoint operator, and will manage the network connectivity
based on labels. A service is used to communicate between pods,
namespaces, and outside the cluster. There are also Jobs and
CronJobs to handle single or recurring tasks, among other default
operators.

To easily manage thousands of Pods across hundreds of nodes could


be difficult. To make management easier, we can use labels,
arbitrary strings which become part of the object metadata. These
can then be used when checking or changing the state of objects
without having to know individual names or UIDs. Nodes can
have taints to discourage Pod assignments, unless the Pod has
a toleration in its metadata.
There is also space in metadata for annotations which remain with
the object but is not used as a selector. This information could be
used by the containers, by third-party agents or other tools.

Innovation
Since Its inception, Kubernetes has seen a terrific pace of innovation
and adoption. The community of developers, users, testers, and
advocates is continuously growing every day. The software is also
moving at an extremely fast pace, which is even putting GitHub to the
test:

 Given to open source in June 2014


 Thousands of contributors
 More than 100k commits
 Tens of thousands on Slack
 Currently, on a four month major release cycle
 Minor releases every ten days or so
 Constant changes.

Understanding and expecting constant change is often a challenge for


those used to single-vendor, monolithic applications. While there are
major releases every four months, you can expect minor releases
every ten days. Constant testing of new releases and changes is
essential to properly maintain a Kubernetes cluster.

User Community
Kubernetes is being adopted at a very rapid pace. To learn more, you
should check out the case studies presented on the Kubernetes
website. Ebay, Box, Pearson and Wikimedia have all shared their
stories.

Pokemon Go, the fastest growing mobile game, also runs on Google
Container Engine (GKE), the Kubernetes service from Google Cloud
Platform (GCP).

Tools
There are several tools you can use to work with Kubernetes. As the
project has grown, new tools are made available, while old ones are
being deprecated. Minikube is a very simple tool meant to run inside
of VirtualBox. If you have limited resources and do not want much
hassle, it is the easiest way to get up and running. We mention it for
those who are not interested in a typical production environment, but
want to use the tool.
Our labs will focus on the use of kubeadm and kubectl, which are
very powerful and complex tools.

In a later chapter, we will work with helm, an easy tool for using
Kubernetes, to search for and install software using charts. There are
also helpful commands like Kompose to translate Docker Compose
files into Kubernetes objects, should that be desired.

Expect these tools to change often!

Cloud Native Computing


Foundation (CNCF)
Kubernetes is an open source software with an Apache
license. Google donated Kubernetes to a newly formed collaborative
project within The Linux Foundation in July 2015, when Kubernetes
reached the v1.0 release. This project is known as the Cloud Native
Computing Foundation (CNCF).

CNCF is not just about Kubernetes, it serves as the governing body for
open source software that solves specific issues faced by cloud native
applications (i.e. applications that are written specifically for a cloud
environment).

CNCF has many corporate members that collaborate, such as Cisco,


the Cloud Foundry Foundation, AT&T, Box, Goldman Sachs, and many
others.

Resource Recommendations
If you want to go beyond this general introduction to Kubernetes, here
are a few things we recommend:

 Read the Borg paper.


 Listen to John Wilkes talking about Borg and Kubernetes.
 Add the Kubernetes community hangout to your calendar, and attend
at least once.
 Join the community on Slack and go in the #kubernetes-
users channel.
 Check out the very active Stack Overflow community.
Exercise. Lab 2.1. View Online
Resources

Knowledge Check

Question 2.1
Which of the following are part of a Pod?

 A. One or more containers

 B. Shared IP address

 C. One namespace

 D. All of the above

Question 2.2
Which company developed Borg as an internal project?

 A. Amazon

 B. Google

 C. IBM

 D. Toyota

Question 2.3
In what database are the objects and the state of the cluster stored?

 A. ZooKeeper

 B. MySQL

 C. etcd
 D. Couchbase

Question 2.4
Orchestration is managed through a series of watch-loops or
controllers. Each interrogates the ___________________ for a particular
object state.

 A. kube-apiserver

 B. etcd

 C. kubelet

 D. ntpd

Correct answers: d,b,c,a

03. INSTALLATION AND


CONFIGURATION

INTRODUCTION
Chapter Overview
In this session we are going to talk about installation and basic
configuration of our Kubernetes cluster. The session is going to focus
on the kube adm command, this is a vendor neutral, but beta product,
which is becoming popular in the Kubernetes environment.

There are other tools that you can use. For example, you can use
various gcloud commands as part of the Google Kubernetes Engine
environment. The minikube commands allows you to configure a
simple Kubernetes cluster in a single VM.
We are also going to talk about the kubectl command. This is the
primary command that you would use to interact with the cluster. It
get tokens, context and other information from the .kube/config file,
found in your home directory.

Part of the configuration that one does with kubeadm is to determine


which pod network to use. There are several to choose from. Flannel
is probably the easiest to use, but it does not support network
security policies. Other do, such as the Calico project, and WeaveNet.
These need to be decided prior to initializing components and objects
inside of your Kubernetes cluster. While you can change it after the
fact, it’s an awful lot of effort.

Let’s get starded!

Learning Objectives
By the end of this chapter, you should be able to:

 Download installation and configuration tools.


 Install a Kubernetes master and grow a cluster.
 Configure a network solution for secure communications.
 Discuss highly-available deployment considerations.

INSTALLATION AND CONFIGURATION


Installation Tools
This chapter is about Kubernetes installation and configuration. We
are going to review a few installation mechanisms that you can use to
create your own Kubernetes cluster.

To get started without having to dive right away into installing and
configuring a cluster, there are a few choices.

One way is to use Google Container Engine (GKE), a cloud service


from the Google Cloud Platform, that lets you request a Kubernetes
cluster with the latest stable version. Amazon has a service called
Elastic Kubernetes Service (EKS) which allows more control of the cp
nodes.

Another easy way to get started is to use Minikube. It is a single


binary which deploys into the Oracle VirtualBox software, which can
run in several operating systems. While Minikube is local and single
node, it will give you a learning, testing, and development
platform. MicroK8s is a newer tool tool developed by Canonical, and
aimed at easy installation. Aimed at appliance-like installations, it
currently runs on Ubuntu 16.04 and later.

To be able to use the Kubernetes cluster, you will need to have


installed the Kubernetes command line, called kubectl, or a wrapper
command such as gcloud. This runs locally on your machine and
targets the API server endpoint. It allows you to create, manage, and
delete all Kubernetes resources (e.g. Pods, Deployments, Services). It
is a powerful CLI that we will use throughout the rest of this course.
So, you should become familiar with it.

In this course, we will use kubeadm, the community-suggested tool


from the Kubernetes project, that makes installing Kubernetes easy
and avoids vendor-specific installers. Getting a cluster running
involves two commands: kubeadm init, that you run on a cp node,
and then, kubeadm join, that you run on your worker or redundant cp
nodes, and your cluster bootstraps itself. The flexibility of these tools
allows Kubernetes to be deployed in a number of places. Lab
exercises use this method.

Other installation mechanisms, such as kubespray or kops, are


used to create a Kubernetes cluster on AWS nodes. As Kubernetes is
so popular, there are several other tools you may use to configure a
cluster. Some may become popular, others may become defunct
within months.

Installing kubectl
To configure and manage your cluster, you will probably use
the kubectl command. You can use RESTful calls or the Go language,
as well.

Enterprise Linux distributions have the various Kubernetes utilities


and other files available in their repositories. For example, on
RHEL/CentOS, you would find kubectl in the kubernetes-
client package. On OpenShift they use a command very similar
to kubectl called oc.

You can (if needed) download the code from GitHub, and go through
the usual steps to compile and install kubectl.

This command line will use $HOME/.kube/config as a configuration


file. This contains all the Kubernetes endpoints that you might use. If
you examine it, you will see cluster definitions (i.e. IP endpoints),
credentials, and contexts.

A context is a combination of a cluster and user credentials. You can


pass these parameters on the command line, or switch the shell
between contexts with a command, as in:

$ kubectl config use-context foobar

This is handy when going from a local environment to a cluster in the


cloud, or from one cluster to another, such as from development to
production.

Using Google Kubernetes


Engine (GKE)
Google takes every Kubernetes release through rigorous testing and
makes it available via its GKE service. To be able to use GKE, you will
need the following:

 An account on Google Cloud.


 A method of payment for services you will use.
 The gcloud command line client.

There is an extensive documentation to get it installed. Pick your


favorite installation method and set it up. For more details, you can
visit the Installing Cloud SDK web page.

You will then be able to follow the GKE quickstart guide and you will
be ready to create your first Kubernetes cluster:

$ gcloud container clusters create linuxfoundation

$ gcloud container clusters list

$ kubectl get nodes

By installing gcloud, you will have automatically installed kubectl. In


the commands above, we created the cluster, listed it, and then,
listed the nodes of the cluster with kubectl.

Once you are done, do not forget to delete your cluster;


otherwise you will keep on getting charged for it. See the following
command:

$ gcloud container clusters delete linuxfoundation


Using Minikube
You can also use Minikube, an open source project within the
GitHub Kubernetes organization. While you can download a release
from GitHub, following listed directions, it may be easier to download
a pre-compiled binary. Make sure to verify and get the latest version.

For example, to get the v.0.22.2 version, run the following


commands:

$ curl -Lo minikube


ht‌
tps://storage.googleapis.com/minikube/releases/latest/min
ikube-darwin-amd64

$ chmod +x minikube

$ sudo mv minikube /usr/local/bin

With Minikube now installed, starting Kubernetes on your local


machine is very easy. Use these commands:

$ minikube start

$ kubectl get nodes

This will start a VirtualBox virtual machine that will contain a single
node Kubernetes deployment and the Docker engine. Internally,
minikube runs a single Go binary called localkube. This binary runs
all the components of Kubernetes together. This makes Minikube
simpler than a full Kubernetes deployment. In addition, the Minikube
VM also runs Docker, in order to be able to run containers.

Installing with kubeadm


Once you become familiar with Kubernetes using Minikube, you may
want to start building a real cluster. Currently, the most
straightforward method is to use kubeadm, which appeared in
Kubernetes v1.4.0, and can be used to bootstrap a cluster quickly. As
the community has focused on kubeadm, it has moved from beta to
stable and added high availability with v1.15.0.

The Kubernetes website provides documentation on how to use


kubeadm to create a cluster.
Package repositories are available for current versions of Ubuntu and
CentOS, among others. We will work with Ubuntu in our lab exercises.

To join other nodes to the cluster, you will need at least one token
and an SHA256 hash. This information is returned by the
command kubeadm init. Once the cp has initialized, you would apply
a network plugin. Main steps:

 Run kubeadm init on the control plane node.


 Create a network for IP-per-Pod criteria.
 Run kubeadm join on workers or secondary cp nodes.

You can also create the network with kubectl by using a resource
manifest of the network plugin to be used. Each plugin may have a
different method of installation.

For example, to use the Weave network, you would run the following
command:

$ kubectl create -f ht‌


tps://git.io/weave-kube

Once all the steps are completed, workers and other cp nodes joined,
you will have a functional multi-node Kubernetes cluster and you will
be able to use kubectl to interact with it.

kubeadm-upgrade
If you build your cluster with kubeadm, you also have the option to
upgrade the cluster using the kubeadm upgrade command. While
most choose to remain with a version for as long as possible, and will
often skip several releases, this does offer a useful path to regular
upgrades for security reasons.

 plan
This will check the installed version against the newest found in the
repository, and verify if the cluster can be upgraded.
 apply
Upgrades the first control plane node of the cluster to the specified
version.
 diff
Similar to an apply --dry-run, this command will show the
differences applied during an upgrade.
 node
This allows for updating the local kubelet configuration on worker
nodes, or the control planes of other cp nodes if there is more than
one. Also, it will access a phase command to step through the
upgrade process.

General upgrade process:


 Update the software
 Check the software version
 Drain the control plane
 View the planned upgrade
 Apply the upgrade
 Uncordon the control plane to allow pods to be scheduled.

Detailed steps can be found in the Kubernetes


documentation: Upgrading kubeadm clusters.

Installing a Pod Network

Prior to initializing the Kubernetes cluster, the network must be


considered and IP conflicts avoided. There are several Pod networking
choices, in varying levels of development and feature set.

Many of the projects will mention the Container Network Interface


(CNI), which is a CNCF project. Several container runtimes currently
use CNI. As a standard to handle deployment management and
cleanup of network resources, CNI will become more popular.

Click on each box to learn about available pod networking


choices.

Pod Networking Choices

Calico

A flat Layer 3 network which communicates


without IP encapsulation, used in production
with software such as Kubernetes, OpenShift,
Docker, Mesos and OpenStack. Viewed as a
simple and flexible networking model, it scales
well for large environments. Another network
option, Canal, also part of this project, allows
for integration with Flannel. Allows for
implementation of network policies.

For more details, check out the Project Calico


web page.
Flannel

A Layer 3 IPv4 network between the nodes of a


cluster. Developed by CoreOS, it has a long
history with Kubernetes. Focused on traffic
between hosts, not how containers configure
local networking, it can use one of several
backend mechanisms, such as VXLAN. A
flanneld agent on each node allocates subnet
leases for the host. While it can be configured
after deployment, it is much easier prior to any
Pods being added.

You can learn more about Flannel from their


GitHub pages.

Kube-Router

Feature-filled single binary which claims to "do


it all". The project is in the alpha stage, but
promises to offer a distributed load balancer,
firewall, and router purposely built for
Kubernetes.

For more details, check out the Kube-Router


web page.

Cilium

This is a newer but incredibly powerful network


plugin which is used by major cloud providers.
Via the use of eBPF and other features, this
network plugin has become so powerful it is
considered a service mesh, which we will
discuss later in the course.

To learn more about Cilium, visit the project


page.

More Installation Tools


Since Kubernetes is, after all, like any other applications that you
install on a server (whether physical or virtual), all of the
configuration management systems (e.g., Chef, Puppet, Ansible,
Terraform) can be used. Various recipes are available on the Internet.

The best way to learn how to install Kubernetes using step-by-step


manual commands is to examine the Kelsey Hightower walkthrough.

Examples of Installation Tools

kubespray

kubespray is now in the Kubernetes


incubator. It is an advanced Ansible
playbook which allows you to set up a
Kubernetes cluster on various operating
systems and use different network
providers. It was once known as kargo.

To learn more about kubespray, check out


their GitHub page.

kops

kops (Kubernetes Operations) lets you


create a Kubernetes cluster on AWS via a
single command line. Also in beta for GKE
and alpha for VMware.

Learn more about kops from their GitHub


page.

kube-aws

kube-aws is a command line tool that


makes use of the AWS Cloud Formation to
provision a Kubernetes cluster on AWS.

For more details about kube-aws, check out


their web page.

kind

Kind is one of a few methods to run


Kubernetes locally. It is currently written to
work with Docker.
Installation Considerations
To begin the installation process, you should start experimenting with
a single-node deployment. This single-node will run all the Kubernetes
components (e.g., API server, controller, scheduler, kubelet, and
kube-proxy). You can do this with Minikube for example.

Once you want to deploy on a cluster of servers (physical or virtual),


you will have many choices to make, just like with any other
distributed system:

 Which provider should I use? A public or private cloud? Physical or


virtual?
 Which operating system should I use? Kubernetes runs on most
operating systems (e.g. Debian, Ubuntu, CentOS, etc.), plus on
container-optimized OSes (e.g. CoreOS, Atomic).
 Which networking solution should I use? Do I need an overlay?
 Where should I run my etcd cluster?
 Can I configure Highly Available (HA) head nodes?

To learn more about how to choose the best options, you can read
the Getting Started documentation page.

With systemd becoming the dominant init system on Linux, your


Kubernetes components will end up being run as systemd unit files in
most cases. Or, they will be run via a kubelet running on the head
node (i.e. kubadm).

Lab exercises in this course were written using Google Compute


Engine (GCE) nodes. Each node has 2vCPUs and 7.5GB of memory,
running Ubuntu 20.04. Smaller nodes should work, but you should
expect slow response. Other operating system images are also
possible, but there may be slight differences in some command
outputs. Use of GCE requires setting up an account and will incur
expenses if using nodes of the size suggested. You can view
the Getting Started pages for more details.

Amazon Web Services (AWS) is another provider of cloud-based


nodes, and requires an account; you will incur expenses for nodes of
the suggested size. You can find videos and information of how to get
started online.

Virtual machines such as KVM, VirtualBox, or VMware can also be


used for lab systems. Putting the VMs on a private network can make
troubleshooting easier.

Finally, using bare metal nodes with access to the Internet will also
work for lab exercises.
Main Deployment
Configurations
At a high level, you have four main deployment configurations:

 Single-node
With a single-node deployment, all the components run on the same
server. This is great for testing, learning, and developing around
Kubernetes.
 Single head node, multiple workers
Adding more workers, a single head node and multiple workers
typically will consist of a single node etcd instance running on the
head node with the API, the scheduler, and the controller-manager.
 Multiple head nodes with HA, multiple workers
Multiple head nodes in an HA configuration and multiple workers add
more durability to the cluster. The API server will be fronted by a load
balancer, the scheduler and the controller-manager will elect a leader
(which is configured via flags). The etcd setup can still be single node.
 HA etcd, HA head nodes, multiple workers
The most advanced and resilient setup would be an HA etcd cluster,
with HA head nodes and multiple workers. Also, etcd would run as a
true cluster, which would provide HA and would run on nodes
separate from the Kubernetes head nodes.

Which of the four you will use will depend on how advanced you are in
your Kubernetes journey, but also on what your goals are.

The use of Kubernetes Federation also offers high availability. Multiple


clusters are joined together with a common control plane allowing
movement of resources from one cluster to another administratively
or after failure. While Federation has has some issues, there is
hope v2 will be a stronger product.

Compiling from Source


The list of binary releases is available on GitHub. Together
with gcloud, minikube, and kubeadmin, these cover several
scenarios to get started with Kubernetes.

Kubernetes can also be compiled from source relatively quickly. You


can clone the repository from GitHub, and then use the Makefile to
build the binaries. You can build them natively on your platform if you
have a Golang environment properly set up, or via containers or
virtual machines.
To build natively with Golang, first install Golang. Download files and
directions can be found online.

Once Golang is working, you can clone the kubernetes repository,


around 500MB in size. Change into the directory and use make. See
the following commands:

$ cd $GOPATH
$ git clone ht‌
tps://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make

There may be other software and settings you need in order for
the make to work properly. Review the output until it completes
properly.

The _output/bin directory will contain the newly built binaries.

You may find some materials via GitLab, but most resources are
on GitHub at the moment.

Lab Exercises

Lab 3.1. Install Kubernetes


Lab 3.2. Grow the Cluster
Lab 3.3. Finish Cluster Setup
Lab 3.4. Deploy a Simple
Application
Lab 3.5. Access from Outside
the Cluster

Knowledge Check
Question 3.1
What is the kubeadm command used for?

 A. Assign an administrator to the cluster

 B. Start a new Pod

 C. Create a cluster and add nodes

 D. All of the above

Question 3.2
Which of the following is the main binary for working with objects of a
Kubernetes cluster?

 A. OpenStack

 B. Make

 C. adminCreate

 D. kubectl

Question 3.3
How many pod networks can you have per cluster?

 A. 1

 B. 2

 C. 3

 D. 4

Question 3.4
The ~/.kube/config file contains _____________.

 A. Endpoints

 B. SSL keys
 C. Contexts

 D. All of the above

Correct answers: c,d,a,d

04. KUBERNETES
ARCHITECTURE
Introduction

Chapter Overview

In this session, we will be talking about Kubernetes architecture.


There are two basic node types: worker nodes and master nodes.
There are many agents running on the master nodes, the primary of
which is the kube-apiserver. All other agents send their requests to
kube-api, which authenticates and authorizes them, and sends them
along to where they need to go.

Kube-apiserver is also responsible for the persistence state of the


database. It updates the etcd database with the current state. Only
the kube-apiserver talks to the etcd database.

On the workers nodes, our primary agent is the kubelet agent. This
does all of the local configurations, deploying controllers, pods,
downloading the image for the container runtime, and handling all of
the local configuration that it gets from kube-apiserver. The kubelet
also sends back status and other information to the kube-apiserver on
a master node.

There are many other objects available to us inside of Kubernetes. We


are going to see some of the basic architecture in this module.

Let’s begin!
Learning Objectives
By the end of this chapter, you should be able to:

 Discuss the main components of a Kubernetes cluster.


 Learn details of the master agent kube-apiserver.
 Explain how the etcd database keeps the cluster state and
configuration.
 Study the kubelet local agent.
 Examine how controllers are used to manage the cluster state.
 Discover what a Pod is to the cluster.
 Examine network configurations of a cluster
 Discuss Kubernetes services.

Kubernetes Architecture

Main Components
Kubernetes has the following main components:

 Control plane(s) and worker node(s)


 Operators
 Services
 Pods of containers
 Namespaces and quotas
 Network and policies
 Storage.

A Kubernetes cluster is made of one or more cp node and a set of


worker nodes. The cluster is all driven via API calls to operators. A
network plugin helps handle both interior as well as exterior traffic.
We will take a closer look at these components next.

Most of the processes are executed inside a container. There are


some differences, depending on the vendor and the tool used to build
the cluster.

When upgrading a cluster, be aware that each of these components


are developed to work together by multiple teams. Care should be
taken to ensure a proper match of versions. The kubeadm upgrade
plan command is useful to discover this information.
Control Plane Node
The Kubernetes cp runs various server and manager processes for the
cluster. As the software has matured, new components have been
created to handle dedicated needs, such as the cloud-controller-
manager; it handles tasks once handled by the kube-controller-
manager to interact with other tools, such as Rancher or
DigitalOcean for third-party cluster management and reporting.

There are several add-ons which have become essential to a typical


production cluster, such as DNS services. Others are third-party
solutions where Kubernetes has not yet developed a local component,
such as cluster-level logging and resource monitoring.

As a concept, the various pods responsible for ensuring the current


state of the cluster matches the desired state are called the control
plane.

When building a cluster using kubeadm, the kubelet process is


managed by systemd. Once running, it will start every pod found
in /etc/kubernetes/manifests/.

Components of the Control Plane Node

 kube-apiserver
The kube-apiserver is central to the operation of the
Kubernetes cluster. All calls, both internal and external traffic,
are handled via this agent. All actions are accepted and
validated by this agent, and it is the only connection to
the etcd database. It validates and configures data for API
objects, and services REST operations. As a result, it acts as a
cp process for the entire cluster, and acts as a frontend of the
cluster's shared state.

Starting as a beta feature in v1.18, the Konnectivity service


provides the ability to separate user-initiated traffic from server-
initiated traffic. Until these features are developed, most
network plugins commingle the traffic, which has performance,
capacity, and security ramifications.

 kube-scheduler
The kube-scheduler uses an algorithm to determine which
node will host a Pod of containers. The scheduler will try to view
available resources (such as volumes) to bind, and then try and
retry to deploy the Pod based on availability and success. There
are several ways you can affect the algorithm, or a custom
scheduler could be used instead. You can also bind a Pod to a
particular node, though the Pod may remain in a pending state
due to other settings. One of the first settings referenced is if
the Pod can be deployed within the current quota restrictions. If
so, then the taints and tolerations, and labels of the Pods are
used along with the metadata of the nodes to determine the
proper placement.

The details of the scheduler can be found on GitHub.

 etcd Database
The state of the cluster, networking, and other persistent
information is kept in an etcd database, or, more accurately, a
b+tree key-value store. Rather than finding and changing an
entry, values are always appended to the end. Previous copies
of the data are then marked for future removal by a compaction
process. It works with curl and other HTTP libraries, and
provides reliable watch queries.

Simultaneous requests to update a value all travel via


the kube-apiserver, which then passes along the request
to etcd in a series. The first request would update the
database. The second request would no longer have the same
version number, in which case the kube-apiserver would reply
with an error 409 to the requester. There is no logic past that
response on the server side, meaning the client needs to expect
this and act upon the denial to update.

There is a Leader database along with possible followers, or


non-voting Learners who are in the process of joining the
cluster. They communicate with each other on an ongoing
basis to determine which will be the Leader, and determine
another in the event of failure. While very fast and potentially
durable, there have been some hiccups with new tools, such
as kubeadm, and features like whole cluster upgrades.

While most Kubernetes objects are designed to be decoupled, a


transient microservice which can be terminated without much
concern etcd is the exception. As it is, the persistent state of
the entire cluster must be protected and secured. Before
upgrades or maintenance, you should plan on backing up etcd.
The etcdctl command allows for snapshot save and snapshot
restore.

 Other Agents

The kube-controller-manager is a core control loop daemon


which interacts with the kube-apiserver to determine the
state of the cluster. If the state does not match, the manager
will contact the necessary controller to match the desired state.
There are several operators in use, such as endpoints,
namespace, and replication. The full list has expanded as
Kubernetes has matured.

Remaining in beta since v1.11, the cloud-controller-


manager (ccm) interacts with agents outside of the cloud. It
handles tasks once handled by kube-controller-manager.
This allows faster changes without altering the core Kubernetes
control process. Each kubelet must use the --cloud-provider-
external settings passed to the binary. You can also develop
your own ccm, which can be deployed as a daemonset as an in-
tree deployment or as a free-standing out-of-tree installation.
The cloud-controller-manager is an optional agent which
takes a few steps to enable. You can learn more about
the cloud-controller-manager online.

Depending on which network plugin has been chosen, there


may be various pods to control network traffic. To handle DNS
queries, Kubernetes service discovery, and other functions,
the CoreDNS server has replaced kube-dns. Using chains of
plugins, one of many provided or custom written, the server is
easily extensible.

Worker Nodes
All nodes run the kubelet and kube-proxy, as well as the container
engine, such as Docker or cri-o, among several options. Other
management daemons are deployed to watch these agents or provide
services not yet included with Kubernetes.

The kubelet interacts with the underlying container engine also


installed on all the nodes, and makes sure that the containers that
need to run are actually running. The kube-proxy is in charge of
managing the network connectivity to the containers. It does so
through the use of iptables entries. It also has the userspace mode, in
which it monitors Services and Endpoints using a random port to
proxy traffic via ipvs. A network plugin pod, such as calico-node, may
be found depending on the plugin in use.

Each node could run in a different engine. It is likely that Kubernetes


will support additional container runtime engines.

Supervisord is a lightweight process monitor used in traditional Linux


environments to monitor and notify about other processes. In non-
systemd cluster, this daemon can be used to monitor both the kubelet
and docker processes. It will try to restart them if they fail, and log
events. While not part of a typical installation, some may add this
monitor for added reporting.

Kubernetes does not have cluster-wide logging yet. Instead, another


CNCF project is used, called Fluentd. When implemented, it provides a
unified logging layer for the cluster, which filters, buffers, and routes
messages.

Cluster-wide metrics is another area with limited functionality. The


metrics-server SIG provides basic node and pod CPU and memory
utilization. For more metrics, many use the Prometheus project.

Kubelet
The kubelet systemd process is the heavy lifter for changes and
configuration on worker nodes. It accepts the API calls for Pod
specifications (a PodSpec is a JSON or YAML file that describes a pod).
It will work to configure the local node until the specification has been
met.

Should a Pod require access to storage, Secrets or ConfigMaps, the


kubelet will ensure access or creation. It also sends back status to the
kube-apiserver for eventual persistence.

 Uses PodSpec
 Mounts volumes to Pod
 Downloads secrets
 Passes request to local container engine
 Reports status of Pods and node to cluster.

Kubelet calls other components such as the Topology Manager, which


uses hints from other components to configure topology-aware
resource NUMA assignments such as for CPU and hardware
accelerators. As an alpha feature, it is not enabled by default.

Operators
An important concept for orchestration is the use of operators,
otherwise known as controllers or watch-loops. Various operators ship
with Kubernetes, and you can create your own, as well. A simplified
view of an operator is an agent, or Informer, and a downstream store.
Using a DeltaFIFO queue, the source and downstream are compared.
A loop process receives an obj or object, which is an array of deltas
from the FIFO queue. As long as the delta is not of the type Deleted,
the logic of the operator is used to create or modify some object until
it matches the specification.

The Informer which uses the API server as a source requests the state
of an object via an API call. The data is cached to minimize API server
transactions. A similar agent is the SharedInformer; objects are often
used by multiple other objects. It creates a shared cache of the state
for multiple requests.

A Workqueue uses a key to hand out tasks to various workers. The


standard Go work queues of rate limiting, delayed, and time queue
are typically used.

The endpoints, namespace, and serviceaccounts operators each


manage the eponymous resources for Pods. Deployments manage
replicaSets, which manage Pods running the same podSpec, or
replicas.

Service Operator
With every object and agent decoupled we need a flexible and
scalable agent which connects resources together and will reconnect,
should something die and a replacement is spawned. A service is an
operator which listens to the endpoint operator to provide a persistent
IP for Pods. Pods have ephemeral IP addresses chosen from a pool.
Then the service operator sends messages via the kube-apiserver
which forwards settings to kube-proxy on every node, as well as the
network plugin such as calico-kube-controllers.

A service also handles access policies for inbound requests, useful for
resource control, as well as for security.

 Connect Pods together


 Expose Pods to Internet
 Decouple settings
 Define Pod access policy.

Pods
The whole point of Kubernetes is to orchestrate the lifecycle of a
container. We do not interact with particular containers. Instead, the
smallest unit we can work with is a Pod. Some would say a pod of
whales or peas-in-a-pod. Due to shared resources, the design of a Pod
typically follows a one-process-per-container architecture.

Containers in a Pod are started in parallel. As a result, there is no way


to determine which container becomes available first inside a pod.
The use of InitContainers can order startup, to some extent. To
support a single process running in a container, you may need
logging, a proxy, or special adapter. These tasks are often handled by
other containers in the same pod.

There is only one IP address per Pod, for almost every network plugin.
If there is more than one container in a pod, they must share the IP.
To communicate with each other, they can either use IPC, the
loopback interface, or a shared filesystem.

While Pods are often deployed with one application container in each,
a common reason to have multiple containers in a Pod is for logging.
You may find the term sidecar for a container dedicated to
performing a helper task, like handling logs and responding to
requests, as the primary application container may not have this
ability. The term sidecar, like ambassador and adapter, does not
have a special setting, but refers to the concept of what secondary
pods are included to do.

Rewrite Legacy Applications


Moving legacy applications to Kubernetes often brings up the
question if the application should be containerized as is, or rewritten
as a transient, decoupled microservice. The cost and time of rewriting
legacy applications can be high, but there is also value to leveraging
the flexibility of Kubernetes.

 Video 04 - Kubernetes Architecture - 01 - Rewrite Legacy


Applications.mp4

This video discusses the issue comparing a city bus


(monolithic legacy application) to a scooter (transient,
decoupled microservices).

Containers
While Kubernetes orchestration does not allow direct manipulation on
a container level, we can manage the resources containers are
allowed to consume.

In the resources section of the PodSpec you can pass parameters


which will be passed to the container runtime on the scheduled node:

resources:
limits:
cpu: "1"
memory: "4Gi"
requests:
cpu: "0.5"
memory: "500Mi"

Another way to manage resource usage of the containers is by


creating a ResourceQuota object, which allows hard and soft limits to
be set in a namespace. The quotas allow management of more
resources than just CPU and memory and allows limiting several
objects.

A beta feature in v1.12 uses the scopeSelector field in the quota


spec to run a pod at a specific priority if it has the
appropriate priorityClassName in its pod spec.

Init Containers
Not all containers are the same. Standard containers are sent to the
container engine at the same time, and may start in any order.
LivenessProbes, ReadinessProbes, and StatefulSets can be used to
determine the order, but can add complexity. Another option can be
an Init container, which must complete before app containers will
be started. Should the init container fail, it will be restarted until
completion, without the app container running.

The init container can have a different view of the storage and
security settings, which allows utilities and commands to be used,
which the application would not be allowed to use.. Init containers can
contain code or utilities that are not in an app. It also has an
independent security from app containers.

The code below will run the init container until the ls command
succeeds; then the database container will start.

spec:
containers:
- name: main-app
image: databaseD
initContainers:
- name: wait-database
image: busybox
command: ['sh', '-c', 'until ls /db/dir ; do sleep 5;
done; ']

Component Review
Now that we have seen some of the components, let's take another
look with some of the connections shown. Not all connections are
shown in the diagram below. Note that all of the components are
communicating with kube-apiserver. Only kube-apiserver
communicates with the etcd database.
Kubernetes Architectural Review

We also see some commands, which we may need to install


separately to work with various components. There is
an etcdctl command to interrogate the database and calicoctl to
view more of how the network is configured. We can see Felix, which
is the primary Calico agent on each machine. This agent, or daemon,
is responsible for interface monitoring and management, route
programming, ACL configuration and state reporting.

BIRD is a dynamic IP routing daemon used by Felix to read routing


state and distribute that information to other nodes in the cluster.
This allows a client to connect to any node, and eventually be
connected to the workload on a container, even if not the node
originally contacted.

API Call Flow


 Video: 04 - Kubernetes Architecture - 02 - API Call Flow.mp4

This video should help you understand the API call flow from a
request for a new pod through to pod and container
deployment and ongoing cluster status.

Node
A node is an API object created outside the cluster representing an
instance. While a cp must be Linux, worker nodes can also be
Microsoft Windows Server 2019. Once the node has the necessary
software installed, it is ingested into the API server.

At the moment, you can create a cp node with the kubeadm


init command and worker nodes by passing join. In the near future,
secondary cp nodes and/or etcd nodes may be joined.

If the kube-apiserver cannot communicate with the kubelet on a node


for 5 minutes, the default NodeLease will schedule the node for
deletion and the NodeStatus will change from ready. The pods will be
evicted once a connection is re-established. They are no longer
forcibly removed and rescheduled by the cluster.

Each node object exists in the kube-node-lease namespace. To


remove a node from the cluster, first use kubectl delete node
<node-name> to remove it from the API server. This will cause pods to
be evacuated. Then, use kubeadm reset to remove cluster-specific
information. You may also need to remove iptables information,
depending on if you plan on re-using the node.

To view CPU, memory, and other resource usage, requests and limits
use the kubectl describe node command. The output will show
capacity and pods allowed, as well as details on current pods and
resource utilization.

Single IP per Pod


A pod represents a group of co-located containers with some
associated data volumes. All containers in a pod share the same
network namespace.
Pod Network

The graphic shows a pod with two containers, A and B, and two data
volumes, 1 and 2. Containers A and B share the network namespace
of a third container, known as the pause container. The pause
container is used to get an IP address, then all the containers in the
pod will use its network namespace. Volumes 1 and 2 are shown for
completeness.

To communicate with each other, containers within pods can use the
loopback interface, write to files on a common filesystem, or via inter-
process communication (IPC). There is now a network plugin from HPE
Labs which allows multiple IP addresses per pod, but this feature has
not grown past this new plugin.

Starting as an alpha feature in 1.16 is the ability to use IPv4 and IPv6
for pods and services. In the current version, when creating a service,
you need to create the network for each address family separately.

Container to Outside Path


This graphic shows a node with a single, dual-container pod. A
NodePort service connects the Pod to the outside network.
Container/Services Networking

Even though there are two containers, they share the same
namespace and the same IP address, which would be configured by
kubectl working with kube-proxy. The IP address is assigned before
the containers are started, and will be inserted into the containers.
The container will have an interface like eth0@tun10. This IP is set for
the life of the pod.

The endpoint is created at the same time as the service. Note that it
uses the pod IP address, but also includes a port. The service
connects network traffic from a node high-number port to the
endpoint using iptables with ipvs on the way. The kube-controller-
manager handles the watch loops to monitor the need for endpoints
and services, as well as any updates or deletions.

Services
We can use a service to connect one pod to another, or to outside of
the cluster.

Service Network
This graphic shows a pod with a primary container, App, with an
optional sidecar Logger. Also seen is the pause container, which is
used by the cluster to reserve the IP address in the namespace prior
to starting the other pods. This container is not seen from within
Kubernetes, but can be seen using docker and crictl.

This graphic also shows a ClusterIP which is used to connect inside


the cluster, not the IP of the cluster. As the graphic shows, this can be
used to connect to a NodePort for outside the cluster, an
IngressController or proxy, or another ”backend” pod or pods.

Networking Setup
Getting all the previous components running is a common task for
system administrators who are accustomed to configuration
management. But, to get a fully functional Kubernetes cluster, the
network will need to be set up properly, as well.

A detailed explanation about the Kubernetes networking model can


be seen on the Cluster Networking page in the Kubernetes
documentation.

If you have experience deploying virtual machines (VMs) based on


IaaS solutions, this will sound familiar. The only caveat is that, in
Kubernetes, the lowest compute unit is not a container, but what we
call a pod.

A pod is a group of co-located containers that share the same IP


address. From a networking perspective, a pod can be seen as a
virtual machine of physical hosts. The network needs to assign IP
addresses to pods, and needs to provide traffic routes between all
pods on any nodes.

The three main networking challenges to solve in a container


orchestration system are:

 Coupled container-to-container communication (solved by the pod


concept).
 Pod-to-pod communication.
 External-to-pod communication (solved by the services concept,
which we will discuss later).

Kubernetes expects the network configuration to enable pod-to-pod


communication to be available; it will not do it for you.

Tim Hockin, one of the lead Kubernetes developers, has created a


very useful slide deck to understand the Kubernetes networking: An
Illustrated Guide to Kubernetes Networking.
CNI Network Configuration File
To provide container networking, Kubernetes is standardizing on the
Container Network Interface (CNI) specification. Since v1.6.0, the goal
of kubeadm (the Kubernetes cluster bootstrapping tool) has been to
use CNI, but you may need to recompile to do so.

CNI is an emerging specification with associated libraries to write


plugins that configure container networking and remove allocated
resources when the container is deleted. Its aim is to provide a
common interface between the various networking solutions and
container runtimes. As the CNI specification is language-agnostic,
there are many plugins from Amazon ECS, to SR-IOV, to Cloud
Foundry, and more.

With CNI, you can write a network configuration file:

{
"cniVersion": "0.2.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}

This configuration defines a standard Linux bridge named cni0, which


will give out IP addresses in the subnet 10.22.0.0./16. The bridge
plugin will configure the network interfaces in the correct namespaces
to define the container network properly.

The main README of the CNI GitHub repository has more information.

Pod-to-Pod Communication
While a CNI plugin can be used to configure the network of a pod and
provide a single IP per pod, CNI does not help you with pod-to-pod
communication across nodes.

The requirement from Kubernetes is the following:


 All pods can communicate with each other across nodes.
 All nodes can communicate with all pods.
 No Network Address Translation (NAT).

Basically, all IPs involved (nodes and pods) are routable without NAT.
This can be achieved at the physical network infrastructure if you
have access to it (e.g. GKE). Or, this can be achieved with a software
defined overlay with solutions like:

 Weave
 Flannel
 Calico
 Romana.

See this documentation page or the list of networking add-ons for a


more complete list.

Mesos
At a high level, there is nothing different between Kubernetes and
other clustering systems.

A central manager exposes an API, a scheduler places the workloads


on a set of nodes, and the state of the cluster is stored in a persistent
layer.

For example, you could compare Kubernetes with Mesos, and you
would see the similarities. In Kubernetes, however, the persistence
layer is implemented with etcd, instead of Zookeeper for Mesos.
Mesos Architecture
The Apache Software Foundation
Retrieved from the Mesos website

You should also consider systems like OpenStack and CloudStack.


Think about what runs on their head node, and what runs on their
worker nodes. How do they keep state? How do they handle
networking? If you are familiar with those systems, Kubernetes will
not seem that different.

What really sets Kubernetes apart is its features oriented towards


fault-tolerance, self-discovery, and scaling, coupled with a mindset
that is purely API-driven.

Lab Exercises

Lab 4.1. Basic Node


Maintenance
Lab 4.2. Working with CPU and
Memory Constraints
Lab 4.3. Resource Limits for a
Namespace

Knowledge Check

Question 4.1
What is the smallest object or unit we can work with in Kubernetes?

 A. Container

 B. Pod

 C. ReplicaSet

 D. Deployment

Question 4.2
How many IP addresses can be configured for a Pod?

 A. 1

 B. 2

 C. 8

 D. None

Question 4.3
What is the main configuration agent on a master server?
 A. watcher

 B. kubelet

 C. loop

 D. kube-apiserver

Question 4.4
What is the main agent on a worker node?

 A. watcher

 B. kubelet

 C. loop

 D. kube-apiserver

Question 4.5
What object connects other resources together and handles Ingress
and Egress traffic?

 A. Pod

 B. Controller

 C. etcd

 D. Service

Correct answers: b,a,d,b,d

05. APIS AND ACCESS


Introduction
Chapter Overview
In this session, we are going to talk about APIs and accessing the
Kubernetes cluster. Kubernetes is an API-driven architecture. We can
use standard http verbs to make RESTful calls to view or change the
cluster.

We are also going to discuss annotations. An annotation is a string in


metadata that we can access, but it’s not used with kubectl. Instead,
it’s helpful for exterior projects to gain access and view a string in
each object, without needing some third-party database tying the
object to that metadata.

We are also going to talk about namespaces. Some objects are


namespaced, meaning that the commands only work if you declare
the namespace that it’s in. Other objects do not exist in a namespace,
and can be used among any namespace simultaneously.

We are also going to talk about the versions of the API: everything
from alpha, which would indicate that it’s not stable, and might
change at any moment; beta, of which we have two different versions
of beta – this typically indicates that it’s more stable, and should have
some backwards compatibility, but probably is not fully trusted in a
production environment; and finally, a stable version, which is
backwards compatible and considered to be safe for production.

Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Understand the API REST-based architecture.


 Work with annotations.
 Understand a simple Pod template.
 Use kubectl with greater verbosity for troubleshooting.
 Separate cluster resources using namespaces.

APIs and Access


API Access
Kubernetes has a powerful REST-based API. The entire architecture is
API-driven. Knowing where to find resource endpoints and
understanding how the API changes between versions can be
important to ongoing administrative tasks, as there is much ongoing
change and growth. Starting with v1.16 deprecated objects are no
longer honored by the API server.

As we learned in the Kubernetes Architecture chapter, the main agent


for communication between cluster agents and from outside the
cluster is the kube-apiserver. A curl query to the agent will expose
the current API groups. Groups may have multiple versions, which
evolve independently of other groups, and follow a domain-name
format with several names reserved, such as single-word domains,
the empty group, and any name ending in .k8s.io.

RESTful
kubectl makes API calls on your behalf, responding to typical HTTP
verbs (GET, POST, DELETE). You can also make calls externally,
using curl or other program. With the appropriate certificates and
keys, you can make requests, or pass JSON files to make
configuration changes. See the following command:

$ curl --cert userbob.pem --key userBob-key.pem \


--cacert /path/to/ca.pem \
https://k8sServer:6443/api/v1/pods

The ability to impersonate other users or groups, subject to RBAC


configuration, allows a manual override authentication. This can be
helpful for debugging authorization policies of other users.

Checking Access
While there is more detail on security in a later chapter, it is helpful to
check the current authorizations, both as an administrator, as well as
another user. The following shows what user bob could do in
the default namespace and the developer namespace, using
the auth can-i subcommand to query (commands and outputs):

$ kubectl auth can-i create deployments


yes

$ kubectl auth can-i create deployments --as bob


no

$ kubectl auth can-i create deployments --as bob --


namespace developer
yes
There are currently three APIs which can be applied to set who and
what can be queried:

 SelfSubjectAccessReview
Access review for any user, helpful for delegating to others.

 LocalSubjectAccessReview
Review is restricted to a specific namespace.

 SelfSubjectRulesReview
A review which shows allowed actions for a user within a
particular namespace.

The use of reconcile allows a check of authorization necessary to


create an object from a file. No output indicates the creation would be
allowed.

Optimistic Concurrency
The default serialization for API calls must be JSON. There is an effort
to use Google's protobuf serialization, but this remains experimental.
While we may work with files in a YAML format, they are converted to
and from JSON.

Kubernetes uses the resourceVersion value to determine API


updates and implement optimistic concurrency. In other words, an
object is not locked from the time it has been read until the object is
written.

Instead, upon an updated call to an object, the resourceVersion is


checked, and a 409 CONFLICT is returned, should the number have
changed. The resourceVersion is currently backed via
the modifiedIndex parameter in the etcd database, and is unique to
the namespace, kind, and server. Operations which do not change an
object, such as WATCH or GET, do not update this value.

Using Annotations
Labels are used to work with objects or collections of objects;
annotations are not.

Instead, annotations allow for metadata to be included with an object


that may be helpful outside of the Kubernetes object interaction.
Similar to labels, they are key to value maps. They are also able to
hold more information, and more human-readable information than
labels.
Having this kind of metadata can be used to track information such as
a timestamp, pointers to related objects from other ecosystems, or
even an email from the developer responsible for that object's
creation.

The annotation data could otherwise be held in an exterior database,


but that would limit the flexibility of the data. The more this metadata
is included, the easier it is to integrate management and deployment
tools or shared client libraries.

For example, to annotate only Pods within a namespace, you can


overwrite the annotation, and finally delete it. See the following
commands:

$ kubectl annotate pods --all description='Production Pods'


-n prod

$ kubectl annotate --overwrite pod webpod description="Old


Production Pods" -n prod

$ kubectl -n prod annotate pod webpod description-

Simple Pod
As discussed earlier, a Pod is the lowest compute unit and individual
object we can work with in Kubernetes. It can be a single container,
but often, it will consist of a primary application container and one or
more supporting containers.

Below is an example of a simple pod manifest in YAML format. You


can see the apiVersion (it must match the existing API group),
the kind (the type of object to create), the metadata (at least a
name), and its spec (what to create and parameters), which define
the container that actually runs in this pod:

apiVersion: v1
kind: Pod
metadata:
name: firstpod
spec:
containers:
- image: nginx
name: stan
You can use the kubectl create command to create this pod in
Kubernetes. Once it is created, you can check its status with kubectl
get pods. The output is omitted to save space:

$ kubectl create -f simple.yaml

$ kubectl get pods

$ kubectl get pod firstpod -o yaml

$ kubectl get pod firstpod -o json

Manage API Resources with


kubectl
Kubernetes exposes resources via RESTful API calls, which allows all
resources to be managed via HTTP, JSON or even XML, the typical
protocol being HTTP. The state of the resources can be changed using
standard HTTP verbs (e.g. GET, POST, PATCH, DELETE, etc.).

kubectl has a verbose mode argument which shows details from


where the command gets and updates information. Other output
includes curl commands you could use to obtain the same result.
While the verbosity accepts levels from zero to any number, there is
currently no verbosity value greater than ten. You can check this out
for kubectl get. The output below has been formatted for clarity:

$ kubectl --v=10 get pods firstpod

....
I1215 17:46:47.860958 29909 round_trippers.go:417]
curl -k -v -XGET -H "Accept: application/json"
-H "User-Agent: kubectl/v1.8.5 (linux/amd64)
kubernetes/cce11c6"
ht‌
tps://10.128.0.3:6443/api/v1/namespaces/default/pods/firs
tpod
....

If you delete this pod, you will see that the HTTP method changes
from XGET to XDELETE.

$ kubectl --v=10 delete pods firstpod

....
I1215 17:49:32.166115 30452 round_trippers.go:417]
curl -k -v -XDELETE -H "Accept: application/json, */*"
-H "User-Agent: kubectl/v1.8.5 (linux/amd64)
kubernetes/cce11c6"
ht‌
tps://10.128.0.3:6443/api/v1/namespaces/default/pods/firs
tpod
....

Access from Outside the


Cluster
The primary tool used from the command line will be kubectl, which
calls curl on your behalf. You can also use the curl command from
outside the cluster to view or make changes.

The basic server information, with redacted TLS certificate


information, can be found in the output of the following command:

$ kubectl config view

If you view the verbose output from a previous page, you will note
that the first line references a configuration file where this information
is pulled from, ~/.kube/config:

I1215 17:35:46.725407 27695 loader.go:357]


Config loaded from file /home/student/.kube/config

Without the certificate authority, key and certificate from this file,
only insecure curl commands can be used, which will not expose
much due to security settings. We will use curl to access our cluster
using TLS in an upcoming lab.

~/.kube/config
Take a look at the output below:

apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdF.....
server: ht‌tps://10.128.0.3:6443 ;
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: LS0tLS1CRUdJTib.....
client-key-data: LS0tLS1CRUdJTi....

The output above shows 19 lines of output, with each of the keys
being heavily truncated. While the keys may look similar, close
examination shows them to be distinct.

Keys:

apiVersion As with other objects, this instructs the kube-apiserver where to


assign the data.
clusters This key contains the name of the cluster, as well as where to
send the API calls. The certificate-authority-data is passed
to authenticate the curl request.
contexts This is a setting which allows easy access to multiple clusters,
possibly as various users, from one configuration file. It can be
used to set namespace, user, and cluster.
current- This shows which cluster and user the kubectl command would
context use. These settings can also be passed on a per-command basis.
kind Every object within Kubernetes must have this setting; in this
case, a declaration of object type Config.
preferences Currently not used, this is an optional settings for
the kubectl command, such as colorizing output.
users A nickname associated with client credentials, which can be
client key and certificate, username and password, and a token.
Token and username/password are mutually exclusive. These
can be configured via the kubectl config set-
credentials command.

Namespaces
The term namespace is used to reference both the kernel feature and
the segregation of API objects by Kubernetes. Both are means to keep
resources distinct.

Every API call includes a namespace, using default if not otherwise


declared:

ht‌
tps://10.128.0.3:6443/api/v1/namespaces/default/pods

Namespaces, a Linux kernel feature that segregates system


resources, are intended to isolate multiple groups and the resources
they have access to work with via quotas. Eventually, access control
policies will work on namespace boundaries, as well. One could use
labels to group resources for administrative reasons.
Namespaces:

default This is where all the resources are assumed, unless set
otherwise.
kube-node-lease This is the namespace where worker node lease
information is kept.
kube-public A namespace readable by all, even those not
authenticated. General information is often included in this
namespace.
kube-system This namespace contains infrastructure pods.

Should you want to see all the resources on a system, you must pass
the --all-namespaces option to the kubectl command.

Working with Namespaces


Take a look at the following commands:

$ kubectl get ns

$ kubectl create ns linuxcon

$ kubectl describe ns linuxcon

$ kubectl get ns/linuxcon -o yaml

$ kubectl delete ns/linuxcon

The above commands show how to view, create and delete


namespaces. Note that the describe subcommand shows several
settings, such as Labels, Annotations, resource quotas, and resource
limits, which we will discus later in the course.

Once a namespace has been created, you can reference it via YAML
when creating a resource (command and output below):

$ cat redis.yaml

apiVersion: V1
kind: Pod
metadata:
name: redis
namespace: linuxcon
...

API Resources with kubectl


All API resources exposed are available via kubectl. To get more
information, do kubectl help.

kubectl [command] [type] [Name] [flag]

Expect the list below to change:

Table: List of API Resources

podsecuritypolic
all events (ev)
ies (psp)

certificatesigningr horizontalpodautos
podtemplates
equests (csr) calers (hpa)

clusterrolebindings ingresses (ing) replicasets (rs)

replicationcontr
clusterroles jobs
ollers (rc)

clusters (valid only for limitranges (limits resourcequotas (q


federation apiservers) ) uota)

componentstatuses (c
namespaces (ns) rolebindings
s)

networkpolicies (ne
configmaps (cm) roles
tpol)

controllerrevisions nodes (no) secrets

persistentvolumecl serviceaccounts (
cronjobs
aims (pvc) sa)

customresourcedefin persistentvolumes (
services (svc)
ition (crd) pv)

poddisruptionbudge
daemonsets (ds) statefulsets
ts (pdb)

deployments (deploy) podpreset storageclasses

endpoints (ep) pods (po)


Additional Resource Methods
In addition to basic resource management via REST, the API also
provides some extremely useful endpoints for certain resources.

For example, you can access the logs of a container, exec into it, and
watch changes to it with the following endpoints:

$ curl --cert /tmp/client.pem --key /tmp/client-key.pem \


--cacert /tmp/ca.pem -v -XGET \
ht‌
tps://10.128.0.3:6443/api/v1/namespaces/default/pods/fir
stpod/log

This would be the same as the following. If the container does not
have any standard out, there would be no logs:

$ kubectl logs firstpod

There are other calls you could make, following the various API groups
on your cluster:

GET /api/v1/namespaces/{namespace}/pods/{name}/exec

GET /api/v1/namespaces/{namespace}/pods/{name}/log

GET /api/v1/watch/namespaces/{namespace}/pods/{name}

Swagger and OpenAPI


The entire Kubernetes API was built using a Swagger specification.
This has been evolving towards the OpenAPI initiative. It is extremely
useful, as it allows, for example, to auto-generate client code. All the
stable resources definitions are available on the documentation site.

You can browse some of the API groups via a Swagger UI on


the OpenAPI Specification web page.
API Maturity
The use of API groups and different versions allows for development
to advance without changes to an existing group of APIs. This allows
for easier growth and separation of work among separate teams.
While there is an attempt to maintain some consistency between API
and software versions, they are only indirectly linked.

The use of JSON and Google's Protobuf serialization scheme will follow
the same release guidelines.

API Versions:
 Alpha: An Alpha level release, noted with alpha in the names, may be buggy
and is disabled by default. Features could change or disappear at any time,
and backward compatibility is not guaranteed. Only use these features on a
test cluster which is often rebuilt.
 Beta: The Beta levels, found with beta in the names, has more well-tested
code and is enabled by default. It also ensures that, as changes move
forward, they will be tested for backwards compatibility between versions. It
has not been adopted and tested enough to be called stable. You can expect
some bugs and issues.

 Stable: Use of the Stable version, denoted by only an integer which may be
preceded by the letter v, is for stable APIs. At the moment, v1 is the only
stable version.

Lab Exercises

Lab 5.1. Configuring TLS


Access
Lab 5.2. Explore API Calls

Knowledge Check

Question 5.1
Kubernetes uses a RESTful API-driven architecture, accepting
standard HTTP verbs. True or False?

 A. True

 B. False

Question 5.2
_____________ allow for metadata to be included with an object that
may be helpful outside the Kubernetes object interaction.

 A. Annotations

 B. Labels

 C. kubectl

 D. Controllers

Question 5.3
Which of the following must be included in a pod template?

 A. kind

 B. metadata

 C. spec

 D. apiVersion
 E. All of the above

Question 5.4
What should be appended to the command in order to affect every
namespace with kubectl?

 A. This cannot be done

 B. --every-object

 C. --all-namespaces

 D. --all

Question 5.5
All objects are restricted to a single namespace. True or False?

 A. True

 B. False

Correct answers: a,a,e,c,b

06. APIS OBJECTS


Introduction

Chapter Overview.
In this session, we are going to talk about some of the API objects
available to us inside of Kubernetes. We’ve already touched on a pod,
which is one ore more containers with access to an IP address and
storage.

We are also going to talk about services, which allow for IP traffic
between pods, or to the outside world. Adding to that, we are going to
talk about a deployment.

A deployment is a controller that ensures ReplicaSets are created on


your behalf. The ReplicaSet then deploys pods of the particular
version you requested. When you decide to upgrade your application,
you tell the deployment that you’d like a new version, the deployment
updates the ReplicaSets. The ReplicaSets then will update new pods
on your behalf. You can also roll back to previous versions.

We have other types of controllers, as well. For example, the


DaemonSet. This controller ensures that a particular type of pod is
deployed on each on your nodes. If you add a new node, the
DaemonSet will start a new pod on your behalf. This could be very
handy if you’d like something like a logging daemon to be running on
each of your nodes without you having to remember to start it when
you add the node to the cluster.

We’ll talk about some of the autoscaling resources available, whether


it’s node scaling or pod scaling, based off of resource utilization. Then
we will talk about jobs, which could either be cronjobs, which run on a
regular basis, or batch jobs, which just run once.

Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Explore API versions.


 Discuss rapid change and development.
 Deploy and configure and application using a Deployment.
 Examine primitives for a self-healing application.
 Scale an application.

API Objects
Overview
This chapter is about additional API resources or objects. We will learn
about resources in the v1 API group, among others. Stability increases
and code becomes more stable as objects move from alpha versions,
to beta, and then v1, indicating stability.

DaemonSets, which ensure a Pod on every node, and and


StatefulSets, which stick a container to a node and otherwise act like
a deployment, have progressed to apps/v1 stability. Jobs and
CronJobs are now in batch/v1.

Role-Based Access Control (RBAC), essential to security, has made


the leap from v1alpha1 to the stable v1 status in one release.

As a fast moving project keeping track of changes, any possible


changes can be an important part of the ongoing system
administration. Release notes, as well as discussions to release notes,
can be found in version-dependent subdirectories in the Features
tracking repository for Kubernetes releases on GitHub. For example,
the v1.17 release feature status can be found online, on
the Kubernetes v1.17.0 Release Notes page.

v1 API Group
The v1 API group is no longer a single group, but rather a collection of
groups for each main object category. For example, there is
a v1 group, a storage.k8s.io/v1 group, and
an rbac.authorization.k8s.io/v1, etc. Currently, there are eight
v1 groups.

Objects:

 Node: Represents a machine - physical or virtual - that is part


of your Kubernetes cluster. You can get more information about
nodes with the kubectl get nodes command. You can turn on
and off the scheduling to a node with the kubectl
cordon/uncordon commands
 Service Account: Provides an identifier for processes running
in a pod to access the API server and performs actions that it is
authorized to do.

 Resource Quota: It is an extremely useful tool, allowing you to


define quotas per namespace. For example, if you want to limit
a specific namespace to only run a given number of pods, you
can write a resourcequota manifest, create it with kubectl and
the quota will be enforced.

 Endpoint: Generally, you do not manage endpoints. They


represent the set of IPs for pods that match a particular service.
They are handy when you want to check that a service actually
matches some running pods. If an endpoint is empty, then it
means that there are no matching pods and something is most
likely wrong with your service definition.

Discovering API Groups


We can take a closer look at the output of the request for current
APIs. Each of the name values can be appended to the URL to see
details of that group. For example, you could drill down to find
included objects at this
URL: ht‌tps://localhost:6443/apis/apiregistration.k8s.io/v1b
eta1.

If you follow this URL, you will find only one resource, with a name of
apiservices. If it seems to be listed twice, the lower output is for
status. You'll notice that there are different verbs or actions for each.
Another entry is if this object is namespaced, or restricted to only one
namespace. In this case, it is not. See the command and output
below:

$ curl https://localhost:6443/apis --header "Authorization:


Bearer $token" -k

{
"kind": "APIGroupList",
"apiVersion": "v1",
"groups": [
{
"name": "apiregistration.k8s.io",
"versions": [
{
"groupVersion": "apiregistration.k8s.io/v1",
"version": "v1"
}
],
"preferredVersion": {
"groupVersion": "apiregistration.k8s.io/v1",
"version": "v1"
}

You can then curl each of these URIs and discover additional API
objects, their characteristics and associated verbs.

Deploying an Application
Using the kubectl create command, we can quickly deploy an
application. We have looked at the Pods created running the
application, like nginx. Looking closer, you will find that a Deployment
was created, which manages a ReplicaSet, which then deploys the
Pod.

Objects:

 Deployment: It is a controller which manages the state of


ReplicaSets and the pods within. The higher level control allows
for more flexibility with upgrades and administration. Unless
you have a good reason, use a deployment.

 ReplicaSet: Orchestrates individual pod lifecycle and updates.


These are newer versions of Replication Controllers, which differ
only in selector support.

 Pod: As we've already mentioned, it is the lowest unit we can


manage; it runs the application container, and possibly support
containers.

DaemonSets
Should you want to have a logging application on every node, a
DaemonSet may be a good choice. The controller ensures that a
single pod, of the same type, runs on every node in the cluster. When
a new node is added to the cluster, a Pod, same as deployed on the
other nodes, is started. When the node is removed, the DaemonSet
makes sure the local Pod is deleted. DaemonSets are often used for
logging, metrics and security pods, and can be configured to avoid
nodes.

As usual, you get all the CRUD operations via kubectl command:

$ kubectl get daemonsets

$ kubectl get ds

StatefulSets
According to Kubernetes documentation, a StatefulSet is the workload
API object used to manage stateful applications. Pods deployed using
a StatefulSet use the same Pod specification. How this is different
than a Deployment is that a StatefulSet considers each Pod as unique
and provides ordering to Pod deployment.

In order to track each Pod as a unique object, the controllers uses an


identity composed of stable storage, stable network identity, and an
ordinal. This identity remains with the node, regardless of which node
the Pod is running on at any one time.

The default deployment scheme is sequential, starting with 0, such


as app-0, app-1, app-2, etc. A following Pod will not launch until the
current Pod reaches a running and ready state. They are not deployed
in parallel.

StatefulSets are stable as of Kubernetes v1.9.

Autoscaling
In the autoscaling group we find the Horizontal Pod
Autoscalers (HPA). This is a stable resource. HPAs automatically
scale Replication Controllers, ReplicaSets, or Deployments based on a
target of 50% CPU usage by default. The usage is checked by the
kubelet every 30 seconds, and retrieved by the Metrics Server API call
every minute. HPA checks with the Metrics Server every 30 seconds.
Should a Pod be added or removed, HPA waits 180 seconds before
further action.
Other metrics can be used and queried via REST. The autoscaler does
not collect the metrics, it only makes a request for the aggregated
information and increases or decreases the number of replicas to
match the configuration.

The Cluster Autoscaler (CA) adds or removes nodes to the cluster,


based on the inability to deploy a Pod or having nodes with low
utilization for at least 10 minutes. This allows dynamic requests of
resources from the cloud provider and minimizes expenses for unused
nodes. If you are using CA, nodes should be added and removed
through cluster-autoscaler- commands. Scale-up and down of
nodes is checked every 10 seconds, but decisions are made on a
node every 10 minutes. Should a scale-down fail, the group will be
rechecked in 3 minutes, with the failing node being eligible in five
minutes. The total time to allocate a new node is largely dependent
on the cloud provider.

Another project still under development is the Vertical Pod Autoscaler.


This component will adjust the amount of CPU and memory requested
by Pods.

Jobs
Jobs are part of the batch API group. They are used to run a set
number of pods to completion. If a pod fails, it will be restarted until
the number of completion is reached.

While they can be seen as a way to do batch processing in


Kubernetes, they can also be used to run one-off pods. A Job
specification will have a parallelism and a completion key. If omitted,
they will be set to one. If they are present, the parallelism number will
set the number of pods that can run concurrently, and the completion
number will set how many pods need to run successfully for the Job
itself to be considered done. Several Job patterns can be
implemented, like a traditional work queue.

Cronjobs work in a similar manner to Linux jobs, with the same time
syntax. There are some cases where a job would not be run during a
time period or could run twice; as a result, the requested Pod should
be idempotent.

An option spec field is .spec.concurrencyPolicy, which determines


how to handle existing jobs, should the time segment expire. If set
to Allow, the default, another concurrent job will be run. If set
to Forbid, the current job continues and the new job is skipped. A
value of Replace cancels the current job and starts a new job in its
place.

RBAC
The last API resources that we will look at are in
the rbac.authorization.k8s.io group. We actually have four
resources: ClusterRole, Role, ClusterRoleBinding, and RoleBinding.
They are used for Role Based Access Control (RBAC) to Kubernetes.

Take a look at the command and output presented below:

$ curl localhost:8080/apis/rbac.authorization.k8s.io/v1

...
"groupVersion": "rbac.authorization.k8s.io/v1",
"resources": [
...
"kind": "ClusterRoleBinding"
...
"kind": "ClusterRole"
...
"kind": "RoleBinding"
...
"kind": "Role"
...

These resources allow us to define Roles within a cluster and


associate users to these Roles. For example, we can define a Role for
someone who can only read pods in a specific namespace, or a Role
that can create deployments, but no services. We will talk more about
RBAC later in the course, in the Security chapter.

Lab Exercises

Lab 6.1. RESTful API Access


Lab 6.2. Using the Proxy
Lab 6.3. Working with Jobs

Knowledge Check

Question 6.1
All API versions should be considered stable. True or False?

 A. True

 B. False

Question 6.2
Which of the following is the suggested object for deploying and
scaling an application?

 A. Pod

 B. Container

 C. Deployment

 D. Service

Question 6.3
From the smallest object to the largest, which is the correct order of
the following Kubernetes objects?

 A. Container, ReplicaSet, Pod, Deployment

 B. Pod, Container, ReplicaSet, Deployment

 C. Container, Pod, ReplicaSet, Deployment

 D. Container, Pod, Deployment, ReplicaSet

Question 6.4
How many Pods does a DaemonSet run on each node?
 A. None

 B. One

 C. Depends on the replica setting

 D. None of the above

Question 6.5
Deployments handle scaling of an application based on administrative
configuration. Which of the following scales resources based on CPU
usage (50% by default)?

 A. ReplicaSet

 B. Deployment

 C. Horizontal Pod Autoscaling

 D. Vertical Node Autoscaling

Question 6.6
What API group do Jobs and CronJobs belong to?

 A. v1

 B. security.k8s.api

 C. batch

 D. kubeadm

Correct answers: b,c,c,b,c,c


07. MANAGING STATE WITH
DEPLOYMENTS

Introduction

Chapter Overview

In this session, we are going to talk about managing the state of an


application with a deployment. We will dig into the details of a YAML
file used to create a new deployment, and we’re also going to
understand that, when we create a deployment, that it creates a
ReplicaSet on our behalf. The ReplicaSet then creates a new pod. The
pod will download or run whatever container we have configured. We
can then use a deployment to upgrade or roll back our ReplicaSet,
which will then be responsible for deploying a new pod or using a
previously configured pod.

We’re also going to talk about labels. A label is a string that we, as an
admin, can give to various objects inside of Kubernetes. We can then
use the label with the kubectl command to act upon objects that
might not be related in other ways. So, if I give a label to pods, and
nodes, and even controller, I can operate on all of them with a single
kubectl command.

Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Discuss Deployment configuration details.


 Scale a Deployment up and down.
 Implement rolling updates and rollback.
 Use Labels to select various objects.
Managing State with Deployments

Overview
The default controller for a container deployed via kubectl
run command is a Deployment. While we have been working with
them already, we will take a closer look at configuration options.

As with other objects, a deployment can be made from a YAML or


JSON spec file. When added to the cluster, the controller will create a
ReplicaSet and a Pod automatically. The containers, their settings and
applications can be modified via an update, which generates a new
ReplicaSet, which, in turn, generates new Pods.

The updated objects can be staged to replace previous objects as a


block or as a rolling update, which is determined as part of the
deployment specification. Most updates can be configured by editing
a YAML file and running kubectl apply. You can also use kubectl
edit to modify the in-use configuration. Previous versions of the
ReplicaSets are kept, allowing a rollback to return to a previous
configuration.

We will also talk more about labels. Labels are essential to


administration in Kubernetes, but are not an API resource. They are
user-defined key-value pairs which can be attached to any resource,
and are stored in the metadata. Labels are used to query or select
resources in your cluster, allowing for flexible and complex
management of the cluster.

As a label is arbitrary, you could select all resources used by


developers, or belonging to a user, or any attached string, without
having to figure out what kind or how many of such resources exist.

Deployments
ReplicationControllers (RC) ensure that a specified number of pod
replicas is running at any one time. ReplicationControllers also give
you the ability to perform rolling updates. However, those updates are
managed on the client side. This is problematic if the client loses
connectivity, and can leave the cluster in an unplanned state. To
avoid problems when scaling the ReplicationControllers on the client
side, a new resource was introduced in the apps/v1 API group:
Deployments.
Deployments allow server-side updates to pods at a specified rate.
They are used for canary and other deployment patterns.
Deployments generate ReplicaSets, which offer more selection
features than ReplicationControllers, such as matchExpressions. See
the command and output below:

$ kubectl create deployment dev-web --image=nginx:1.13.7-


alpine

deployment "dev-web" created

Object Relationship
Here you can see the relationship between objects from the
container, which Kubernetes does not directly manage, up to the
deployment.

Nested Objects

The boxes and shapes are logical, in that they represent the
controllers, or watch loops, running as a thread of the kube-controller-
manager. Each controller queries the kube-apiserver for the current
state of the object they track. The state of each object on a worker
node is sent back from the local kubelet.

The graphic in the upper left represents a container running nginx


1.11. Kubernetes does not directly manage the container. Instead, the
kubelet daemon checks the pod specifications by asking the container
engine, which could be Docker or cri-o, for the current status. The
graphic to the right of the container shows a pod which represents a
watch loop checking the container status. kubelet compares the
current pod spec against what the container engine replies and will
terminate and restart the pod if necessary.

A multi-container pod is shown next. While there are several names


used, such as sidecar or ambassador, these are all multi-container
pods. The names are used to indicate the particular reason to have a
second container in the pod, instead of denoting a new kind of pod.

On the lower left we see a replicaSet. This controller will ensure you
have a certain number of pods running. The pods are all deployed
with the same podSpec, which is why they are called replicas. Should
a pod terminate or a new pod be found, the replicaSet will create or
terminate pods until the current number of running pods matches the
specifications. Any of the current pods could be terminated should the
spec demand fewer pods running.

The graphic in the lower right shows a deployment. This controller


allows us to manage the versions of images deployed in the pods.
Should an edit be made to the deployment, a new replicaSet is
created, which will deploy pods using the new podSpec. The
deployment will then direct the old replicaSet to shut down pods as
the new replicaSet pods become available. Once the old pods are all
terminated, the deployment terminates the old replicaSet and the
deployment returns to having only one replicaSet running.

Deployment Details
In the previous page, we created a new deployment running a
particular version of the nginx web server.

To generate the YAML file of the newly created objects, run the
following command:

$ kubectl get deployments,rs,pods -o yaml

Sometimes, a JSON output can make it more clear. Try this command:

$ kubectl get deployments,rs,pods -o json

Now we will look at the YAML output, which also shows default values
not passed to the object when created:

apiVersion: v1
items:
- apiVersion: apps/v1
kind: Deployment
Explanation of Objects:

 apiVersion: A value of v1 indicates this object is considered to be a


stable resource. In this case, it is not the deployment. It is a reference
to the List type.

 items: As the previous line is a List, this declares the list of items the
command is showing.

 - apiVersion: The dash is a YAML indication of the first item of the list,
which declares the apiVersion of the object as apps/v1. This
indicates the object is considered stable. Deployments are an
operator used in many cases.

 kind: This is where the type of object to create is declared, in this case,
a deployment.

Deployment Configuration
Metadata
Continuing with the YAML output, we see the next general block of
output concerns the metadata of the deployment. This is where we
would find labels, annotations, and other non-configuration
information. Note that this output will not show all possible
configuration. Many settings which are set to false by default are not
shown, like podAffinity or nodeAffinity.

metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: 2017-12-21T13:57:07Z
generation: 1
labels:
app: dev-web
name: dev-web
namespace: default
resourceVersion: "774003"
uid: d52d3a63-e656-11e7-9319-42010a800003

Information present in the deployment metadata:

 annotations: These values do not configure the object, but provide further
information that could be helpful to third-party applications or administrative
tracking. Unlike labels, they cannot be used to select an object with kubectl.
 creationTimestamp: Shows when the object was originally created. Does not
update if the object is edited.

 generation: How many times this object has been edited, such as changing
the number of replicas, for example.

 labels: Arbitrary strings used to select or exclude objects for use


with kubectl, or other API calls. Helpful for administrators to select objects
outside of typical object boundaries.

 name: This is a required string, which we passed from the command line. The
name must be unique to the namespace.

 resourceVersion: A value tied to the etcd database to help with concurrency of


objects. Any changes to the database will cause this number to change.

 uid: Remains a unique ID for the life of the object.

Deployment Configuration
Spec
There are two spec declarations for the deployment. The first will
modify the ReplicaSet created, while the second will pass along the
Pod configuration.

spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: dev-web
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate

Elements present in our example:

 spec: A declaration that the following items will configure the object being
created.

 progressDeadlineSeconds: Time in seconds until a progress error is reported


during a change. Reasons could be quotas, image issues, or limit ranges.
 replicas: As the object being created is a ReplicaSet, this parameter
determines how many Pods should be created. If you were to use kubectl
edit and change this value to two, a second Pod would be generated.

 revisionHistoryLimit: How many old ReplicaSet specifications to retain for


rollback.

 selector: A collection of values ANDed together. All must be satisfied for the
replica to match. Do not create Pods which match these selectors, as the
deployment controller may try to control the resource, leading to issues.

 matchLabels: Set-based requirements of the Pod selector. Often found with


the matchExpressions statement, to further designate where the resource
should be scheduled.

 strategy: A header for values having to do with updating Pods. Works with the
later listed type. Could also be set to Recreate, which would delete all
existing pods before new pods are created. With RollingUpdate, you can
control how many Pods are deleted at a time with the following parameters.

 maxSurge: Maximum number of Pods over desired number of Pods to create.


Can be a percentage, default of 25%, or an absolute number. This creates a
certain number of new Pods before deleting old ones, for continued access.

 maxUnavailable: A number or percentage of Pods which can be in a state


other than Ready during the update process.

 type: Even though listed last in the section, due to the level of white space
indentation, it is read as the type of object being configured.
(e.g. RollingUpdate).

Deployment Configuration Pod


Template
Next, we will take a look at a configuration template for the pods to
be deployed. We will see some similar values.

template:
metadata:
creationTimestamp: null
labels:
app: dev-web
spec:
containers:
- image: nginx:1.17.7-alpine
imagePullPolicy: IfNotPresent
name: dev-web
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

Explanation of Configuration Elements:

 template: Data being passed to the ReplicaSet to determine how to


deploy an object (in this case, containers).

 containers: Key word indicating that the following items of this


indentation are for a container.

 image: This is the image name passed to the container engine,


typically Docker. The engine will pull the image and create the Pod.

 imagePullPolicy: Policy settings passed along to the container engine,


about when and if an image should be downloaded or used from a
local cache.

 name: The leading stub of the Pod names. A unique string will be
appended.

 resources: By default, empty. This is where you would set resource


restrictions and settings, such as a limit on CPU or memory for the
containers.

 terminationMessagePath: A customizable location of where to output


success or failure information of a container.

 terminationMessagePolicy: The default value is File, which holds the


termination method. It could also be set to FallbackToLogsOnError,
which will use the last chunk of container log if the message file is
empty and the container shows an error.

 dnsPolicy: Determines if DNS queries should go to coredns or, if set


to Default, use the node's DNS resolution configuration.
 restartPolicy: Should the container be restarted if killed? Automatic
restarts are part of the typical strength of Kubernetes.

 scheduleName: Allows for the use of a custom scheduler, instead of the


Kubernetes default.

 securityContext: Flexible setting to pass one or more security settings,


such as SELinux context, AppArmor values, users and UIDs for the
containers to use.

 terminationGracePeriodSeconds: The amount of time to wait for


a SIGTERM to run until a SIGKILL is used to terminate the container.

Deployment Configuration
Status
The status output is generated when the information is requested:

status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2017-12-21T13:57:07Z
lastUpdateTime: 2017-12-21T13:57:07Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-07-29T06:00:24Z"
lastUpdateTime: "2021-07-29T06:00:33Z"
message: ReplicaSet "test-5f6778868d" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 2
replicas: 2
updatedReplicas: 2

The output above shows what the same deployment were to look like
if the number of replicas were increased to two. The times are
different than when the deployment was first generated.

Explanation of Additional Elements:

 availableReplicas: Indicates how many were configured by the


ReplicaSet. This would be compared to the later value
of readyReplicas, which would be used to determine if all replicas
have been fully generated and without error.

 observedGeneration: Shows how often the deployment has been


updated. This information can be used to understand the rollout
and rollback situation of the deployment.

Scaling and Rolling Updates


The API server allows for the configurations settings to be updated for
most values. There are some immutable values, which may be
different depending on the version of Kubernetes you have deployed.

A common update is to change the number of replicas running. If this


number is set to zero, there would be no containers, but there would
still be a ReplicaSet and Deployment. This is the backend process
when a Deployment is deleted. See the commands and outputs
presented below:

$ kubectl scale deploy/dev-web --replicas=4

deployment "dev-web" scaled

$ kubectl get deployments

NAME READY UP-TO-DATE AVAILABLE AGE


dev-web 4/4 4 4 20s

Non-immutable values can be edited via a text editor, as well. Use


edit to trigger an update. For example, to change the deployed
version of the nginx web server to an older version run this command
(followed by the output):

$ kubectl edit deployment dev-web

....
containers:
- image: nginx:1.8 #<<---Set to an older version
imagePullPolicy: IfNotPresent
name: dev-web
....

This would trigger a rolling update of the deployment. While the


deployment would show an older age, a review of the Pods would
show a recent update and older version of the web server application
deployed.
Deployment Rollbacks
With some of the previous ReplicaSets of a Deployment being kept,
you can also roll back to a previous revision by scaling up and down.
The number of previous configurations kept is configurable, and has
changed from version to version. Next, we will have a closer look at
rollbacks, using the --record option of the kubectl
create command, which allows annotation in the resource definition
(commands followed by the output).

$ kubectl create deploy ghost --image=ghost --record

$ kubectl get deployments ghost -o yaml

deployment.kubernetes.io/revision: "1"
kubernetes.io/change-cause: kubectl create deploy ghost --
image=ghost --record

Should an update fail, due to an improper image version, for example,


you can roll back the change to a working version with the kubectl
rollout undo command (followed by the output):

$ kubectl set image deployment/ghost ghost=ghost:09 --all

$ kubectl rollout history deployment/ghost deployments


"ghost":

REVISION CHANGE-CAUSE
1 kubectl create deploy ghost --image=ghost --
record
2 kubectl set image deployment/ghost
ghost=ghost:09 --all

$ kubectl get pods

NAME READY STATUS RESTARTS


AGE
ghost-2141819201-tcths 0/1 ImagePullBackOff 0
1m

$ kubectl rollout undo deployment/ghost ; kubectl get pods

NAME READY STATUS RESTARTS AGE


ghost-3378155678-eq5i6 1/1 Running 0 7s

You can roll back to a specific revision with the --to-


revision=2 option.

You can also edit a Deployment using the kubectl edit command.
You can also pause a Deployment, and then resume. See the
following two commands:

$ kubectl rollout pause deployment/ghost

$ kubectl rollout resume deployment/ghost

Please note that you can still do a rolling update on


ReplicationControllers with the kubectl rolling-update command,
but this is done on the client side. Hence, if you close your client, the
rolling update will stop.

Using DaemonSets
A newer object to work with is the DaemonSet. This controller ensures
that a single pod exists on each node in the cluster. Every Pod uses
the same image. Should a new node be added, the DaemonSet
controller will deploy a new Pod on your behalf. Should a node be
removed, the controller will delete the Pod also.

The use of a DaemonSet allows for ensuring a particular container is


always running. In a large and dynamic environment, it can be helpful
to have a logging or metric generation application on every node
without an administrator remembering to deploy that application.

Use kind: DaemonSet.

There are ways of effecting the kube-apischeduler such that some


nodes will not run a DaemonSet.

Labels
Part of the metadata of an object is a label. Though labels are not API
objects, they are an important tool for cluster administration. They
can be used to select an object based on an arbitrary string,
regardless of the object type. As of API version apps/v1,
a Deployment's label selector is immutable after it gets created.

Every resource can contain labels in its metadata. By default, creating


a Deployment with kubectl create adds a label, as we saw in:
....
labels:
pod-template-hash: "3378155678"
run: ghost ....

You could then view labels in new columns (commands and outputs
below):

$ kubectl get pods -l run=ghost

NAME READY STATUS RESTARTS AGE


ghost-3378155678-eq5i6 1/1 Running 0 10m

$ kubectl get pods -L run

NAME READY STATUS RESTARTS AGE RUN


ghost-3378155678-eq5i6 1/1 Running 0 10m
ghost
nginx-3771699605-4v27e 1/1 Running 1 1h
nginx

While you typically define labels in pod templates and in the


specifications of Deployments, you can also add labels on the fly
(commands and output below):

$ kubectl label pods ghost-3378155678-eq5i6 foo=bar

$ kubectl get pods --show-labels

NAME READY STATUS RESTARTS AGE


LABELS
ghost-3378155678-eq5i6 1/1 Running 0 11m
foo=bar, pod-template-hash=3378155678,run=ghost

For example, if you want to force the scheduling of a pod on a specific


node, you can use a nodeSelector in a pod definition, add specific
labels to certain nodes in your cluster and use those labels in the pod.
See the following example:

....
spec:
containers:
- image: nginx
nodeSelector:
disktype: ssd

Lab Exercises
Lab 7.1. Working with
ReplicaSets
Lab 7.2. Working with
DaemonSets
Lab 7.3. Rolling Updates and
Rollbacks

Knowledge Check

Question 7.1
What Deployment value determines the number of duplicate Pods
deployed?

 A. label

 B. uid

 C. status

 D. replicas

Question 7.2
Which of the following is a header value having to do with updating
Pods?

 A. selector

 B. type

 C. strategy

 D. None of the above

Question 7.3
Which of the following metadata is used to select an object
with kubectl, based on an arbitrary string, regardless of the object
type?

 A. strategy

 B. uid

 C. replicas

 D. label

Question 7.4
Which of the following arguments do we pass to the kubectl
rollout command to view object revisions?

 A. history

 B. rollout

 C. undo

 D. redo

Question 7.5
What argument do we pass to the kubectl rollout command in
order to return to a previous version?

 A. history

 B. rollout

 C. undo

 D. redo

Correct answers: d,c,d,a,c


08. VOLUMES AND DATA

Introduction

Chapter Overview
In this session, we are going to talk about volumes and data, the persistent
storage available to Kubernetes. We have many backend options available
to us, from local storage, to Ceph, to dynamic provisioned storage from a
provider like Google or Amazon. When storage is given to a pod, all of the
containers inside of the pod have equal access, just like an IP address. It is
another way that containers can actually talk to each other, by writing to
shared storage, so the other container can then read it.
We’re going to talk about persistent volumes. A persistent volume is how we
make storage available to the cluster. Then, the persistent volume can be
used by an object, like a pod, through a persistent volume claim.
We will talk about the entire lifecycle: from ingesting the storage, to binding
it to a particular pod, to the various recycle options that you have once it’s
been detached from the pod.
We’ll also talk about passing pre-configured information to our pods through
secrets or ConfigMaps. A secret is a base64 encoded; it is not encrypted. To
have encryption, you’d have to look at LUKS or some other type of at-rest
encryption. The secret is not and encrypted data; its only encoded. A
ConfigMap is neither encrypted, nor encoded; it’s just plain raw data that is
very easily accessible. This can be handy for passing /etc hosts files, or
/resolve files, or anything else that you might want a pod to have.
Let’s dig in!

Learning Objectives
By the end of this chapter, you should be able to:

 Understand and create persistent volumes.


 Configure persistent volume claims.
 Manage volume access modes.
 Deploy an application with access to persistent storage.
 Discuss the dynamic provisioning of storage.
 Configure secrets and ConfigMaps.
Volumes and Data

Overview
Container engines have traditionally not offered storage that outlives
the container. As containers are considered transient, this could lead
to a loss of data, or complex exterior storage options. A
Kubernetes volume shares the Pod lifetime, not the containers
within. Should a container terminate, the data would continue to be
available to the new container.

A volume is a directory, possibly pre-populated, made available to


containers in a Pod. The creation of the directory, the backend
storage of the data and the contents depend on the volume type. As
of v1.13, there were 27 different volume types ranging from rbd to
gain access to Ceph, to NFS, to dynamic volumes from a cloud
provider like Google's gcePersistentDisk. Each has particular
configuration options and dependencies.

The Container Storage Interface (CSI) adoption enables the goal of an


industry standard interface for container orchestration to allow access
to arbitrary storage systems. Currently, volume plugins are "in-tree",
meaning they are compiled and built with the core Kubernetes
binaries. This "out-of-tree" object will allow storage vendors to
develop a single driver and allow the plugin to be containerized. This
will replace the existing Flex plugin which requires elevated access to
the host node, a large security concern.

Should you want your storage lifetime to be distinct from a Pod, you
can use Persistent Volumes. These allow for empty or pre-populated
volumes to be claimed by a Pod using a Persistent Volume Claim,
then outlive the Pod. Data inside the volume could then be used by
another Pod, or as a means of retrieving data.

There are two API objects which exist to provide data to a Pod
already. Encoded data can be passed using a Secret and non-encoded
data can be passed with a ConfigMap. These can be used to pass
important data like SSH keys, passwords, or even a configuration file
like /etc/hosts.

Introducing Volumes
A Pod specification can declare one or more volumes and where they
are made available. Each requires a name, a type, and a mount point.
The same volume can be made available to multiple containers within
a Pod, which can be a method of container-to-container
communication. A volume can be made available to multiple Pods,
with each given an access mode to write. There is no concurrency
checking, which means data corruption is probable, unless outside
locking takes place.

Kubernetes Pod Volumes

A particular access mode is part of a Pod request. As a request, the


user may be granted more, but not less access, though a direct match
is attempted first. The cluster groups volumes with the same mode
together, then sorts volumes by size, from smallest to largest. The
claim is checked against each in that access mode group, until a
volume of sufficient size matches. The three access modes are:

 ReadWriteOnce, which allows read-write by a single node


 ReadOnlyMany, which allows read-only by multiple nodes
 ReadWriteMany, which allows read-write by many nodes.

Thus two pods on the same node can write to a ReadWriteOnce, but a
third pod on a different node would not become ready due to a
FailedAttachVolume error.
When a volume is requested, the local kubelet uses
the kubelet_pods.go script to map the raw devices, determine and
make the mount point for the container, then create the symbolic link
on the host node filesystem to associate the storage to the container.
The API server makes a request for the storage to
the StorageClass plugin, but the specifics of the requests to the
backend storage depend on the plugin in use.

If a request for a particular StorageClass was not made, then the


only parameters used will be access mode and size. The volume could
come from any of the storage types available, and there is no
configuration to determine which of the available ones will be used.

Volume Spec
One of the many types of storage available is an emptyDir. The
kubelet will create the directory in the container, but not mount any
storage. Any data created is written to the shared container space. As
a result, it would not be persistent storage. When the Pod is
destroyed, the directory would be deleted along with the container.

apiVersion: v1
kind: Pod
metadata:
name: fordpinto
namespace: default
spec:
containers:
- image: simpleapp
name: gastank
command:
- sleep
- "3600"
volumeMounts:
- mountPath: /scratch
name: scratch-volume
volumes:
- name: scratch-volume
emptyDir: {}

The YAML file above would create a Pod with a single container with a
volume named scratch-volume created, which would create
the /scratch directory inside the container.

Volume Types
There are several types that you can use to define volumes, each with
their pros and cons. Some are local, and many make use of network-
based resources.

In GCE or AWS, you can use volumes of


type GCEpersistentDisk or awsElasticBlockStore, which allows
you to mount GCE and EBS disks in your Pods, assuming you have
already set up accounts and privileges.

emptyDir and hostPath volumes are easy to use. As


mentioned, emptyDir is an empty directory that gets erased when the
Pod dies, but is recreated when the container restarts.
The hostPath volume mounts a resource from the host node
filesystem. The resource could be a directory, file socket, character,
or block device. These resources must already exist on the host to be
used. There are two types, DirectoryOrCreate and FileOrCreate,
which create the resources on the host, and use them if they don't
already exist.

NFS (Network File System) and iSCSI (Internet Small Computer


System Interface) are straightforward choices for multiple readers
scenarios.

rbd for block storage or CephFS and GlusterFS, if available in your


Kubernetes cluster, can be a good choice for multiple writer needs.

Besides the volume types we just mentioned, there are many other
possible, with more being
added: azureDisk, azureFile, csi, downwardAPI, fc (fibre
channel), flocker, gitRepo, local, projected, portworxVolume, quo
byte, scaleIO, secret, storageos, vsphereVolume, persistentVolu
meClaim, CSIPersistentVolumeSource, etc.

CSI allows for even more flexibility and decoupling plugins without the
need to edit the core Kubernetes code. It was developed as a
standard for exposing arbitrary plugins in the future.

Shared Volume Example


The following YAML file creates a pod, exampleA, with two containers,
both with access to a shared volume:

....
containers:
- name: alphacont
image: busybox
volumeMounts:
- mountPath: /alphadir
name: sharevol
- name: betacont
image: busybox
volumeMounts:
- mountPath: /betadir
name: sharevol
volumes:
- name: sharevol
emptyDir: {}

Now, take a look at the following commands and outputs:

$ kubectl exec -ti exampleA -c betacont -- touch


/betadir/foobar
$ kubectl exec -ti exampleA -c alphacont -- ls -l /alphadir

total 0
-rw-r--r-- 1 root root 0 Nov 19 16:26 foobar

You could use emptyDir or hostPath easily, since those types do not
require any additional setup, and will work in your Kubernetes
cluster.

Note that one container (betacont) wrote, and the other container
(alphacont) had immediate access to the data. There is nothing to
keep the containers from overwriting the other's data. Locking or
versioning considerations must be part of the containerized
application to avoid corruption.

Persistent Volumes and


Claims
A persistent volume (pv) is a storage abstraction used to retain
data longer then the Pod using it. Pods define a volume of
type persistentVolumeClaim (pvc) with various parameters for size
and possibly the type of backend storage known as its StorageClass.
The cluster then attaches the persistentVolume.

Kubernetes will dynamically use volumes that are available,


irrespective of its storage type, allowing claims to any backend
storage.

Persistent Storage Phases:


1. Provision:

Provisioning can be from PVs created in advance by the cluster


administrator, or requested from a dynamic source, such as the cloud
provider.

2. Bind:

Binding occurs when a control loop on the cp notices the PVC,


containing an amount of storage, access request, and optionally, a
particular StorageClass. The watcher locates a matching PV or waits
for the StorageClass provisioner to create one. The PV must match
at least the storage amount requested, but may provide more.

3. Use:

The use phase begins when the bound volume is mounted for the Pod
to use, which continues as long as the Pod requires.

4. Release:

Releasing happens when the Pod is done with the volume and an API
request is sent, deleting the PVC. The volume remains in the state
from when the claim is deleted until available to a new claim. The
resident data remains depending on
the persistentVolumeReclaimPolicy.

5. Reclaim:

The reclaim phase has three options:

 Retain, which keeps the data intact, allowing for an


administrator to handle the storage and data.
 Delete tells the volume plugin to delete the API object, as well
as the storage behind it.
 The Recycle option runs an rm -rf /mountpoint and then
makes it available to a new claim. With the stability of dynamic
provisioning, the Recycle option is planned to be deprecated.

Note the following two commands:

$ kubectl get pv

$ kubectl get pvc


Persistent Volume
The following example shows a basic declaration of a Persistent
Volume using the hostPath type.

kind: PersistentVolume
apiVersion: v1
metadata:
name: 10Gpv01
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/somepath/data01"

Each type will have its own configuration settings. For example, an
already created Ceph or GCE Persistent Disk would not need to be
configured, but could be claimed from the provider.

Persistent volumes are not a namespaces object, but persistent


volume claims are. A beta feature of v1.13 allows for static
provisioning of Raw Block Volumes, which currently support the Fibre
Channel plugin, AWS EBS, Azure Disk and RBD plugins among others.

The use of locally attached storage has been graduated to a stable


feature. This feature is often used as part of distributed filesystems
and databases.

Persistent Volume Claim


With a persistent volume created in your cluster, you can then write a
manifest for a claim, and use that claim in your pod definition. In the
Pod, the volume uses the persistentVolumeClaim.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8GI

In the Pod:

spec:
containers:
....
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: myclaim

The Pod configuration could also be as complex as this:

volumeMounts:
- name: Cephpd
mountPath: /data/rbd
volumes:
- name: rbdpd
rbd:
monitors:
- '10.19.14.22:6789'
- '10.19.14.23:6789'
- '10.19.14.24:6789'
pool: k8s
image: client
fsType: ext4
readOnly: true
user: admin
keyring: /etc/ceph/keyring

Dynamic Provisioning
While handling volumes with a persistent volume definition and
abstracting the storage provider using a claim is powerful, a cluster
administrator still needs to create those volumes in the first place.
Starting with Kubernetes v1.4, Dynamic Provisioning allowed for the
cluster to request storage from an exterior, pre-configured source. API
calls made by the appropriate plugin allow for a wide range of
dynamic storage use.

The StorageClass API resource allows an administrator to define a


persistent volume provisioner of a certain type, passing storage-
specific parameters.
With a StorageClass created, a user can request a claim, which the
API Server fills via auto-provisioning. The resource will also be
reclaimed as configured by the provider. AWS and GCE are common
choices for dynamic storage, but other options exist, such as a Ceph
cluster or iSCSI. Single, default class is possible via annotation.

Here is an example of a StorageClass using GCE:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast # Could be any name
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd

Using Rook for Storage


Orchestration
In keeping with the decoupled and distributed nature of the Cloud
technology, the Rook project allows orchestration of storage using
multiple storage providers.

As with other agents of the cluster, Rook uses custom resource


definitions (CRD) and a custom operator to provision storage
according to the backend storage type, upon API call.

Several storage providers are supported:

 Ceph
 Cassandra
 Network File System (NFS).

Secrets
Pods can access local data using volumes, but there is some data you
don't want readable to the naked eye. Passwords may be an example.
Using the Secret API resource, the same password could be encoded
or encrypted.

You can create, get, or delete secrets (see the following commands):

$ kubectl get secrets


Secrets can be encoded manually or via kubectl create secret:

$ kubectl create secret generic --help

$ kubectl create secret generic mysql --from-


literal=password=root

A secret is not encrypted, only base64-encoded, by default. You must


create an EncryptionConfiguration with a key and proper identity.
Then, the kube-apiserver needs the --encryption-provider-
config flag set to a previously configured provider, such
as aescbc or ksm. Once this is enabled, you need to recreate every
secret, as they are encrypted upon write.

Multiple keys are possible. Each key for a provider is tried during
decryption. The first key of the first provider is used for encryption. To
rotate keys, first create a new key, restart (all) kube-apiserver
processes, then recreate every secret.

You can see the encoded string inside the secret with kubectl. The
secret will be decoded and be presented as a string saved to a file.
The file can be used as an environmental variable or in a new
directory, similar to the presentation of a volume.

A secret can be made manually as well, then inserted into a YAML file
(see commands and outputs below):

$ echo LFTr@1n | base64

TEZUckAxbgo=

$ vim secret.yaml

apiVersion: v1
kind: Secret
metadata:
name: lf-secret
data:
password: TEZUckAxbgo=

Using Secrets via Environment


Variables
A secret can be used as an environmental variable in a Pod. You can
see one being configured in the following example:
...
spec:
containers:
- image: mysql:5.5
name: dbpod
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql
key: password

There is no limit to the number of Secrets used, but there is a 1MB


limit to their size. Each secret occupies memory, along with other API
objects, so very large numbers of secrets could deplete memory on a
host.

They are stored in the tmpfs storage on the host node, and are only
sent to the host running Pod. All volumes requested by a Pod must be
mounted before the containers within the Pod are started. So, a
secret must exist prior to being requested.

Mounting Secrets as Volumes


You can also mount secrets as files using a volume definition in a pod
manifest. The mount path will contain a file whose name will be the
key of the secret created with the kubectl create secret step
earlier.

...
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
volumeMounts:
- mountPath: /mysqlpassword
name: mysql
name: busy
volumes:
- name: mysql
secret:
secretName: mysql
Once the pod is running, you can verify that the secret is indeed
accessible in the container by running this command (followed by the
output):

$ kubectl exec -ti busybox -- cat /mysqlpassword/password

LFTr@1n

Portable Data with ConfigMaps


A similar API resource to Secrets is the ConfigMap, except the data is
not encoded. In keeping with the concept of decoupling in
Kubernetes, using a ConfigMap decouples a container image from
configuration artifacts.

They store data as sets of key-value pairs or plain configuration files


in any format. The data can come from a collection of files or all files
in a directory. It can also be populated from a literal value.

A ConfigMap can be used in several different ways. A container can


use the data as environmental variables from one or more sources.
The values contained inside can be passed to commands inside the
pod. A Volume or a file in a Volume can be created, including different
names and particular access modes. In addition, cluster components
like controllers can use the data.

Let's say you have a file on your local filesystem called config.js.
You can create a ConfigMap that contains this file.
The configmap object will have a data section containing the content
of the file (see the command and output below):

$ kubectl get configmap foobar -o yaml

kind: ConfigMap
apiVersion: v1
metadata:
name: foobar
data:
config.js: |
{
...

ConfigMaps can be consumed in various ways:

 Pod environmental variables from single or multiple ConfigMaps


 Use ConfigMap values in Pod commands
 Populate Volume from ConfigMap
 Add ConfigMap data to specific path in Volume
 Set file names and access mode in Volume from ConfigMap data
 Can be used by system components and controllers.

Using ConfigMaps
Like secrets, you can use ConfigMaps as environment variables or
using a volume mount. They must exist prior to being used by a Pod,
unless marked as optional. They also reside in a specific namespace.

In the case of environment variables, your pod manifest will use


the valueFrom key and the configMapKeyRef value to read the
values. For instance:

env:
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: special.how

With volumes, you define a volume with the configMap type in your
pod and mount it where it needs to be used:

volumes:
- name: config-volume
configMap:
name: special-config

Lab Exercises

Lab 8.1. Create a ConfigMap


Lab 8.2. Create a Persistent
NFS Volume (PV)
Lab 8.3. Creating a Persistent
Volume Claim (PVC)
Lab 8.4. Use a ResourceQuota
to Limit PVC Count and Usage

Knowledge Check

Question 8.1
Applications must use persistent storage. True or False?

 A. True

 B. False

Question 8.2
Does a Deployment use a Persistent Volume or a Persistent Volume
Claim?

 A. Persistent Volume

 B. Persistent Volume Claim

Question 8.3
Which of the following setting determines what happens to persistent
storage upon release?

 A. persistentVolumeReclaimPolicy

 B. releasePolicy

 C. StorageSettingK8S

 D. None of the above

Question 8.4
A Secret contains encrypted data. True or False?
 A. True

 B. False

Question 8.5
ConfigMaps can be created from _____________.

 A. Literal values

 B. Individual files

 C. Multiple files in the same directory

 D. All of the above

Correct answers: b,b,a,b,d

09. SERVICES
Introduction

Chapter Overview

In this session, we are going to talk about services, essential to the


architecture of Kubernetes. We expect various objects to be transient.
As a result, should a pod fail and be replaced, we need an agent
capable of getting the traffic to the replacement pod. That’s part of
what a service does for us. A service gets traffic from the outside
world to a pod, or from on pod to another.
We have four different types of services, currently.
 There is the ClusterIP – this is meant to be an internal-facing IP
address that we can use for access, troubleshooting, and other
maintenance type of operations.
 We have a NodePort. This is a static IP address accessible to the
outside world. It’s static – as a result, it can be very useful when
you’re trying to go through a firewall to gain access to your
Kubernetes cluster.
 LoadBalancer is similar, but is willing to spread the traffic to
multiple pods. Should one of the pods fail, when a replacement
becomes available, the LoadBalancer will send the traffic along
to it. This is a very flexible service that we implemented in an
earlier lab exercise.
 A newer service is called ExternalName. While not really
handling traffic, it’s an alias for interacting with DNS.
In this chapter, we will also talk about DNS usage by the cluster, and
get some understanding and interaction with local or remote DNS
services.
Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Explain Kubernetes services.


 Expose an application.
 Discuss the service types available.
 Start a local proxy.
 Use the cluster DNS.

Services

Overview
As touched on previously, the Kubernetes architecture is built on the
concept of transient, decoupled objects connected together. Services
are the agents which connect Pods together, or provide access
outside of the cluster, with the idea that any particular Pod could be
terminated and rebuilt. Typically using Labels, the refreshed Pod is
connected and the microservice continues to provide the expected
resource via an Endpoint object. Google has been working on
Extensible Service Proxy (ESP), based off the nginx HTTP reverse
proxy server, to provide a more flexible and powerful object than
Endpoints, but ESP has not been adopted much outside of the Google
App Engine or GKE environments.

There are several different service types, with the flexibility to add
more, as necessary. Each service can be exposed internally or
externally to the cluster. A service can also connect internal resources
to an external resource, such as a third-party database.

The kube-proxy agent watches the Kubernetes API for new services
and endpoints being created on each node. It opens random ports
and listens for traffic to the ClusterIP:Port, and redirects the traffic
to the randomly generated service endpoints.

Services provide automatic load-balancing, matching a label query.


While there is no configuration of this option, there is the possibility of
session affinity via IP. Also, a headless service, one without a fixed IP
nor load-balancing, can be configured.

Unique IP addresses are assigned and configured via the etcd


database, so that Services implement iptables to route traffic, but
could leverage other technologies to provide access to resources in
the future.

Service Update Pattern


Labels are used to determine which Pods should receive traffic from a
service. As we have learned, labels can be dynamically updated for an
object, which may affect which Pods continue to connect to a service.

The default update pattern is for a rolling deployment, where new


Pods are added, with different versions of an application, and due to
automatic load balancing, receive traffic along with previous versions
of the application.

Should there be a difference in applications deployed, such that


clients would have issues communicating with different versions, you
may consider a more specific label for the deployment, which includes
a version number. When the deployment creates a new replication
controller for the update, the label would not match. Once the new
Pods have been created, and perhaps allowed to fully initialize, we
would edit the labels for which the Service connects. Traffic would
shift to the new and ready version, minimizing client version
confusion.
Accessing an Application with
a Service
The basic step to access a new service is to use kubectl. See the
following commands and outputs:

$ kubectl expose deployment/nginx --port=80 --type=NodePort

$ kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)


AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP
18h
nginx NodePort 10.0.0.112 <none>
80:31230/TCP 5s

$ kubectl get svc nginx -o yaml

apiVersion: v1
kind: Service
...
spec:
clusterIP: 10.0.0.112
ports:
- nodePort: 31230
...

Open browser http://Public-IP:31230.

The kubectl expose command created a service for


the nginx deployment. This service used port 80 and generated a
random port on all the nodes. A particular port and targetPort can
also be passed during object creation to avoid random values.
The targetPort defaults to the port, but could be set to any value,
including a string referring to a port on a backend Pod. Each Pod
could have a different port, but traffic is still passed via the name.
Switching traffic to a different port would maintain a client
connection, while changing versions of software, for example.

The kubectl get svc command gave you a list of all the existing
services, and we saw the nginx service, which was created with an
internal cluster IP.

The range of cluster IPs and the range of ports used for the random
NodePort are configurable in the API server startup options.
Services can also be used to point to a service in a different
namespace, or even a resource outside the cluster, such as a legacy
application not yet in Kubernetes.

Service Types
 ClusterIP: The ClusterIP service type is the default, and only provides
access internally (except if manually creating an external endpoint).
The range of ClusterIP used is defined via an API server startup
option.

 NodePort: The NodePort type is great for debugging, or when a static IP


address is necessary, such as opening a particular address through a
firewall. The NodePort range is defined in the cluster configuration.

 LoadBalancer: The LoadBalancer service was created to pass requests to


a cloud provider like GKE or AWS. Private cloud solutions also may
implement this service type if there is a cloud provider plugin, such as
with CloudStack and OpenStack. Even without a cloud provider, the
address is made available to public traffic, and packets are spread
among the Pods in the deployment automatically.

 ExternalName: A newer service is ExternalName, which is a bit different.


It has no selectors, nor does it define ports or endpoints. It allows the
return of an alias to an external service. The redirection happens at
the DNS level, not via a proxy or forward. This object can be useful for
services not yet brought into the Kubernetes cluster. A simple change
of the type in the future would redirect traffic to the internal objects.

The kubectl proxy command creates a local service to access a ClusterIP.


This can be useful for troubleshooting or development work.

While we have talked about three services, some build upon others. A
Service is an operator running inside the kube-controller-manager,
which sends API calls via the kube-apiserver to the Network Plugin
(such as Calico) and the kube-proxy pods running all nodes. The
Service operator also creates an Endpoint operator, which queries for
the ephemeral IP addresses of pods with a particular label. These
agents work together to manage firewall rules using iptables or ipvs.

Take a look at the image below. The ClusterIP service configures a


persistent IP address and directs traffic sent to that address to the
existing pod's ephemeral addresses. This only handles inside the
cluster traffic.
When a request for a NodePort is made, the operator first creates a
ClusterIP. After the ClusterIP has been created, a high numbered port
is determined and a firewall rule is sent out so that traffic to the high
numbered port on any node will be sent to the persistent IP, which
then will be sent to the pod(s).

A LoadBalancer does not create a load balancer. Instead, it creates a


NodePort and makes an async request to use a load balancer. If a
listener sees the request, as found when using public cloud providers,
one would be created. Otherwise, the status will remain Pending as no
load balancer has responded to the API call.

An ingress controller is a microservice running in a pod, listening to a


high port on whichever node the pod may be running, which will send
traffic to a Service based on the URL requested. It is not a built-in
service, but is often used with services to centralize traffic to services.
More on an ingress controller is found in a future chapter.

Services Diagram
The controllers of services and endpoints run inside the kube-
controller-manager and send API calls to the kube-apiserver. API calls
are then sent to the network plugin, such as calico-kube-controller,
which then communicates with agents on each node, such as calico-
node. Every kube-proxy is also sent an API call so that it can manage
the firewall locally. The firewall is often iptables or ipvs. The kube-
proxy mode is configured via a flag sent during initialization, such
as mode=iptables, and could also be IPVS or userspace.
Service Traffic

In the iptables proxy mode, kube-proxy continues to get updates from


the API server for changes in Service and Endpoint objects, and
updates rules for each object when created or removed.

The graphic above shows two workers, each with a replica of MyApp
running. A NodePort has been configured, which will direct traffic from
port 35001 to the ClusterIP and on to the ephemeral IP of the pod. All
nodes use the same firewall rule. As a result, you can connect to any
node, and Calico will get the traffic to a node which is running the
pod.

Overall Network View


An example of a multi-container pod with two services sending traffic
to its ephemeral IP can be seem in the diagram below. The diagram
also shows an ingress controller, which would typically be represented
as a pod, but has a different shape to show that it is listening to a
high numbered port of an interface and is sending traffic to a service.
Typically, the service the ingress controller sends traffic to would be a
ClusterIP, but the diagram shows that it would be possible to send
traffic to a NodePort or a LoadBalancer.
Example of Cluster Networking

Local Proxy for Development


When developing an application or service, one quick way to check
your service is to run a local proxy with kubectl. It will capture the
shell, unless you place it in the background. When running, you can
make calls to the Kubernetes API on localhost and also reach the
ClusterIP services on their API URL. The IP and port where the proxy
listens can be configured with command arguments.

Run a proxy (command and output below):


$ kubectl proxy

Starting to serve on 127.0.0.1:8001

Next, to access a ghost service using the local proxy, we could use
the following URL, for example,
at http://localhost:8001/api/v1/namespaces/default/services
/ghost.

If the service port has a name, the path will


be http://localhost:8001/api/v1/namespaces/default/services
/ghost:<port_name>.

DNS
DNS has been provided as CoreDNS by default as of v1.13. The use of
CoreDNS allows for a great amount of flexibility. Once the container
starts, it will run a Server for the zones it has been configured to
serve. Then, each server can load one or more plugin chains to
provide other functionality. As with other microservices, clients would
it access using a service, kube-dns.

The thirty or so in-tree plugins provide most common functionality,


with an easy process to write and enable other plugins as necessary.

Common plugins can provide metrics for consumption by Prometheus,


error logging, health reporting, and TLS to configure certificates for
TLS and gRPC servers.

More can be found on the CoreDNS Plugins web page.

Verifying DNS Registration


To make sure that your DNS setup works well and that services get
registered, the easiest way to do it is to run a pod with a shell and
network tools in the cluster, create a service to connect to the pod,
then exec in it to do a DNS lookup.

Troubleshooting of DNS uses typical tools such


as nslookup, dig, nc, wireshark and more. The difference is that we
leverage a service to access the DNS server, so we need to check
labels and selectors in addition to standard network concerns.

Other steps, similar to any DNS troubleshooting, would be to check


the /etc/resolv.conf file of the container, as well as Network
Policies and firewalls. We will cover more on Network Policies in
the Security chapter.

Lab Exercises

Lab 9.1. Deploy a New Service


Lab 9.2. Configure a NodePort
Lab 9.3. Working with CoreDNS
Lab 9.4. Use Labels to Manage
Resources

Knowledge Check

Question 9.1
Which of the following are Kubernetes service types?

 A. ClusterIP

 B. NodePort

 C. LoadBalancer

 D. ExternalName

 E. All of the above

Question 9.2
Which Kubernetes agent watches the API server for configuration
changes and iptable updates?

 A. kube-proxy

 B. kubeadm
 C. kubectl

 D. kubernetes

Question 9.3
Which of the following service types spreads packets among Pods in a
Deployment automatically?

 A. NodePort

 B. LoadBalancer

 C. PacketSpreader

 D. None of the above

Question 9.4
How can you start a local proxy, which is useful for development and
testing?

 A. kubectl proxy

 B. kubeadm proxy

 C. proxy-start

 D. None of the above

Correct answers: e,a,b,a

10. HELM
Introduction

Chapter Overview
In this session, we’re going to talk about Helm. Helm is like a package
manager for Kubernetes. There are three major components when working
with Helm.
There is a Chart. This is the template of what Helm should install. It would
declare the volumes used, policies, pods, and applications that should be
deployed.
This is done for us through a process called Tiller. Tiller then uses the chart
to determine what to install and how to install it.
We interact with it through a command called “helm”. We begin by
initializing a database to keep track of what is installed, and how it’s
installed, and continue to use the helm command for the entire lifecycle of
the package.
The use of Helm can be very handy for installing complex applications with
many parts, both the ones that we might create, or we can add repositories
and gain access to vendor provided software in a very easy-to-use manner.
Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Examine easy Kubernetes deployments using the Helm package


manager.
 Understand the Chart template used to describe what
application to deploy.
 Discuss how Tiller creates the Deployment based on the Chart.
 Initialize Helm in a cluster.

Helm

Deploying Complex
Applications
We have used Kubernetes tools to deploy simple containers and
services. Also necessary was to have a canonical location for
software. Helm is similar to a package manager like yum or apt, with
a chart being similar to a package, in that it has the binaries , as well
as the installation and removal scripts.
A typical containerized application will have several manifests.
Manifests for deployments, services, and configMaps. You will
probably also create some secrets, Ingress, and other objects. Each of
these will need a manifest.

With Helm, you can package all those manifests and make them
available as a single tarball. You can put the tarball in a repository,
search that repository, discover an application, and then, with a single
command, deploy and start the entire application, one or more times.

The tarballs can be collected in a repository for sharing. You can


connect to multiple repositories of applications, including those
provided by vendors.

You will also be able to upgrade or roll back an application easily from
the command line.

Helm v3
With the near complete overhaul of Helm, the processes and
commands have changed quite a bit. Expect to spend some time
updating and integrating these changes if you are currently using the
outdated Helm v2.

One of the most noticeable changes is the removal of the Tiller pod.
This was an ongoing security issue, as the pod needed elevated
permissions to deploy charts. The functionality is in the command
alone, and no longer requires initialization to use.

In version 2, an update to a chart and deployment used a 2-way


strategic merge for patching. This compared the previous manifest to
the intended manifest, but not the possible edits done outside
of helm commands. The third way now checked is the live state of
objects.

Among other changes, software installation no longer generates a


name automatically. One must be provided, or the --generated-
name option must be passed.

Chart Contents
A chart is an archived set of Kubernetes resource manifests that
make up a distributed application. You can learn more from the Helm
3 documentation. Others exist and can be easily created, for example
by a vendor providing software. Charts are similar to the use of
independent YUM repositories.
├── Chart.yaml
├── README.md
├── templates
│ ├── NOTES.txt
│ ├── _helpers.tpl
│ ├── configmap.yaml
│ ├── deployment.yaml
│ ├── pvc.yaml
│ ├── secrets.yaml
│ └── svc.yaml
└── values.yaml

Chart Components:
 Chart.yaml: The Chart.yaml file contains some metadata about the
Chart, like its name, version, keywords, and so on, in this case, for MariaDB.

 values.yaml: The values.yaml file contains keys and values that are used
to generate the release in your cluster. These values are replaced in the
resource manifests using the Go templating syntax.

 templates: The templates directory contains the resource manifests that


make up this MariaDB application.

Templates
The templates are resource manifests that use the Go templating
syntax. Variables defined in the values file, for example, get injected
in the template when a release is created. In the MariaDB example we
provided, the database passwords are stored in a Kubernetes secret,
and the database configuration is stored in a Kubernetes ConfigMap.

We can see that a set of labels are defined in the Secret metadata
using the Chart name, Release name, etc. The actual values of the
passwords are read from the values.yaml file.

apiVersion: v1
kind: Secret
metadata:
name: {{ template "fullname" . }}
labels:
app: {{ template "fullname" . }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
release: "{{ .Release.Name }}"
heritage: "{{ .Release.Service }}"
type: Opaque
data:
mariadb-root-password: {{ default
"" .Values.mariadbRootPassword | b64enc | quote }}
mariadb-password: {{ default "" .Values.mariadbPassword
| b64enc | quote }}

Chart Repositories and Hub


Repositories are currently simple HTTP servers that contain an index
file and a tarball of all the Charts present. Prior to adding a repository,
you can only search the Artifact Hub using the helm search
hub command.

$ helm search hub redis

You can interact with a repository using the helm repo commands
(commands are followed by the output):

$ helm repo add bitnami ht‌


tps://charts.bitnami.com/bitnami

$ helm repo list

NAME URL
bitnami ht‌
tps://charts.bitnami.com/bitnami

Once you have a repository available, you can search for Charts
based on keywords. Below, we search for a redis Chart:

$ helm search repo bitnami

Once you find the chart within a repository, you can deploy it on your
cluster.

Deploying a Chart
To deploy a Chart, you can just use the helm install command.
There may be several required resources for the installation to be
successful, such as available PVs to match chart PVC. Currently, the
only way to discover which resources need to exist is by reading
the READMEs for each chart. This can be found by downloading the
tarball and expanding it into the current directory. Once requirements
are met and edits are made, you can install using the local files.
Commands and output below:

$ helm fetch bitnami/apache --untar


$ cd apache/
$ ls
Chart.lock
Chart.yaml README.md charts ci files templates
values.schema.json values.yaml

$ helm install anotherweb

You will be able to list the release, delete it, even upgrade it and roll
back.

The output of the deployment should be carefully reviewed. It often


includes information on access to the applications within. If your
cluster did not have a required cluster resource, the output is often
the first place to begin troubleshooting.

Lab Exercises

Lab 10.1. Working with Helm


and Charts

Knowledge Check

Question 10.1
Which of the following is the template that describes the application
to deploy, configurations, and dependencies?

 A. Chart

 B. Tiller

 C. Helm

 D. Repository

Question 10.2
A chart deployment output tells us about missing dependencies. True
or False?
 A. True

 B. False

Question 10.3
Which of the following is the agent that deploys objects based on a
chart?

 A. Chart

 B. Tiller

 C. Helm

 D. Repository

Question 10.4
What is a collection of charts called?

 A. Chart

 B. Tiller

 C. Helm

 D. Repository

Correct answers: a,a,b,d

11. INGRESS
Introduction

Chapter Overview
In this session, we’re going to talk about Ingress. In previous sessions, we
talked about services that allow us to expose a pod to other pods or to the
outside world. An Ingress Controller allows the same sort of activity, but is
more efficient.
Instead of individual services for each of the pods, we can set up a controller
that handles all the traffic. Currently, there are two supported controllers:
one for nginx, the other one for GCE. HA Proxy is being developed, but is not
yet considered supported and stable resource.
We’re also going to talk about, once we’ve deployed a controller, How do we
create rules? How do we set where the traffic should go?
Let’s begin!

Learning Objectives
By the end of this chapter, you should be able to:

 Discuss the difference between an Ingress Controller and a


Service.
 Learn about nginx and GCE Ingress Controllers.
 Deploy an Ingress Controller.
 Configure an Ingress Rule.

Ingress

Overview
In an earlier chapter, we learned about using a Service to expose a
containerized application outside of the cluster. We use Ingress
Controllers and Rules to do the same function. The difference is
efficiency. Instead of using lots of services, such as LoadBalancer, you
can route traffic based on the request host or path. This allows for
centralization of many services to a single point.

An Ingress Controller is different than most controllers, as it does not


run as part of the kube-controller-manager binary. You can deploy
multiple controllers, each with unique configurations. A controller
uses Ingress Rules to handle traffic to and from outside the cluster.

There are many ingress controllers such as GKE, nginx, Traefik,


Contour and Envoy to name a few. Any tool capable of reverse proxy
should work. These agents consume rules and listen for associated
traffic. An Ingress Rule is an API resource that you can create
with kubectl. When you create that resource, it reprograms and
reconfigures your Ingress Controller to allow traffic to flow from the
outside to an internal service. You can leave a service as a ClusterIP
type and define how the traffic gets routed to that internal service
using an Ingress Rule.

Ingress Controller
An Ingress Controller is a daemon running in a Pod which watches
the /ingresses endpoint on the API server, which is found under
the networking.k8s.io/v1beta1 group for new objects. When a new
endpoint is created, the daemon uses the configured set of rules to
allow inbound connection to a service, most often HTTP traffic. This
allows easy access to a service through an edge router to Pods,
regardless of where the Pod is deployed.

Multiple Ingress Controllers can be deployed. Traffic should use


annotations to select the proper controller. The lack of a matching
annotation will cause every controller to attempt to satisfy the ingress
traffic.

The Ingress Controller for Inbound Connections


nginx
Deploying an nginx controller has been made easy through the use of
provided YAML files, which can be found in
the ingress-nginx/docs/deploy GitHub repository.

This page has configuration files to configure nginx on several


platforms, such as AWS, GKE, Azure, and bare metal, among others.

As with any Ingress Controller, there are some configuration


requirements for proper deployment. Customization can be done via a
ConfigMap, Annotations, or, for detailed configuration, a custom
template:

 Easy integration with RBAC


 Uses the annotation kubernetes.io/ingress.class: "nginx"
 L7 traffic requires the proxy-real-ip-cidr setting
 Bypasses kube-proxy to allow session affinity
 Does not use conntrack entries for iptables DNAT
 TLS requires the host field to be defined.

Google Load Balancer


Controller (GLBC)
There are several objects which need to be created to deploy the GCE
Ingress Controller. YAML files are available to make the process easy.
Be aware that several objects would be created for each service, and
currently, quotas are not evaluated prior to creation.

The GLBC Controller must be created and started first. Also, you must
create a ReplicationController with a single replica, three services for
the application Pod, and an Ingress with two hostnames and three
endpoints for each service. The backend is a group of virtual machine
instances, Instance Group.

Each path for traffic uses a group of like objects referred to as a pool.
Each pool regularly checks the next hop up to ensure connectivity.

The multi-pool path is:

Global Forwarding Rule -> Target HTTP Proxy -> URL map ->
Backend Service -> Instance Group
Currently, the TLS Ingress only supports port 443 and assumes TLS
termination. It does not support SNI, only using the first certificate.
The TLS secret must contain keys named tls.crt and tls.key.

Ingress API Resources


Ingress objects are now part of the networking.k8s.io API, but still a
beta object. A typical Ingress object that you can POST to the API
server is:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: ghost
spec:
rules:
- host: ghost.192.168.99.100.nip.io
http:
paths:
- backend:
service
name: ghost
port:
number: 2368
path: /
pathType: ImplementationSpecific

You can manage ingress resources like you do pods, deployments,


services, etc. See the following commands:

$ kubectl get ingress

$ kubectl delete ingress <ingress_name>

$ kubectl edit ingress <ingress_name>

Deploying the Ingress


Controller
To deploy an Ingress Controller, it can be as simple as creating it
with kubectl. The source for a sample controller deployment is
available on GitHub. Here is the command to do so:

$ kubectl create -f backend.yaml


The result will be a set of pods managed by a replication controller
and some internal services. You will notice a default HTTP backend
which serves 404 pages. See the following command and the output:

$ kubectl get pods,rc,svc

NAME READY STATUS


RESTARTS AGE
po/default-http-backend-xvep8 1/1 Running 0
4m
po/nginx-ingress-controller-fkshm 1/1 Running 0
4m

NAME DESIRED CURRENT READY AGE


rc/default-http-backend 1 1 0 4m

NAME CLUSTER-IP EXTERNAL-IP PORT(S)


AGE
svc/default-http-backend 10.0.0.212 <none> 80/TCP
4m
svc/kubernetes 10.0.0.1 <none> 443/TCP
77d

Creating an Ingress Rule


To get exposed with ingress quickly, you can go ahead and try to
create a similar rule as mentioned on the previous page. First, start
a ghost deployment and expose it with an internal ClusterIP service.
See the following commands:

$ kubectl run ghost --image=ghost

$ kubectl expose deployments ghost --port=2368

With the deployment exposed and the Ingress rules in place, you
should be able to access the application from outside the cluster.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
rules:
- host: ghost.192.168.99.100.nip.io
http:
paths:
- backend:
service
name: ghost
port:
number: 2368
path: /
pathType: ImplementationSpecific

Multiple Rules
On the previous page, we defined a single rule. If you have multiple
services, you can define multiple rules in the same Ingress, each rule
forwarding traffic to a specific service.

- host: ghost.192.168.99.100.nip.io
http:
paths:
- backend:
service
name: external
port:
number: 80
....
- host: nginx.192.168.99.100.nip.io
http:
paths:
- backend:
service
name: internal
port:
number: 8080
....

Intelligent Connected Proxies


For more complex connections or resources such as service
discovery, rate limiting, traffic management and advanced metrics,
you may want to implement a service mesh.

A service mesh consists of edge and embedded proxies


communicating with each other and handling traffic based on rules
from a control plane. Various options are available, including Envoy,
Istio, and linkerd.
Istio Service Mesh
retrieved from the Istio Documentation

Service Mesh Options:


 Envoy: Envoy is a modular and extensible proxy favored due to its
modular construction, open architecture and dedication to remaining
unmonetized. It is often used as a data plane under other tools of a
service mesh.

 Istio: Istio is a powerful tool set which leverages Envoy proxies via a
multi-component control plane. It is built to be platform-independent,
and it can be used to make the service mesh flexible and feature-
filled.

 linkerd: linkerd is another service mesh, purposely built to be easy


to deploy, fast, and ultralight.
Lab Exercises

Lab 11.1. Service Mesh


Lab 11.2. Ingress Controller

Knowledge Check

Question 11.1
According to the kubernetes.io documentation, how many Ingress
Controllers are currently supported?

 A. One

 B. Two

 C. Three

Question 11.2
What is the main reason to use an Ingress Controller instead of
multiple services?

 A. It is fun

 B. Efficiency

 C. It is required

 D. All of the above

Question 11.3
Both L4 and L7 traffic can be configured. True or False?

 A. True
 B. False

Correct answers: c,b,a

12. SCHEDULING
Introduction

Chapter Overview
This session covers scheduling. We have a very flexible scheduler available
to us with Kubernetes. We can affect where pods will be deployed based off
of several features which we set with labels. The kube-scheduler agent then
looks at these labels, and looks at the labels of the nodes, to determine
where a pod should be deployed. We have the ability to configure affinity
and anti-affinity.
For example, for higher availability reasons, we may want to spread out our
pods, so that they don’t run on the same node. On the other hand, for
performance, we may want locality of data. We may want our pods to be
running on the same node as much as possible.
We then can taint a node. This is a label, that is just a string declared for
that node. Our pod then might avoid the node, unless the pod has a
toleration. A toleration is just another label that allows it to run on a tainted
node.
We can also set a NodeSelector and actually call out a particular node for a
pod to run on. In a large and diverse environment, this may not be the
easiest way to go about doing things, but, should you have very specialized
hardware needs, it is an option. As well, we can also deploy our own Custom
Scheduler, and run it alongside of the existing kube-scheduler agent.
Let’s dig in!

Learning Objectives
By the end of this chapter, you should be able to:

 Learn how kube-scheduler schedules Pod placement.


 Use Labels to manage Pod scheduling.
 Configure taints and tolerations.
 Use podAffinity and podAntiAffinity.
 Understand how to run multiple schedulers.

Scheduling

kube-scheduler
The larger and more diverse a Kubernetes deployment becomes, the
more administration of scheduling can be important. The kube-
scheduler determines which nodes will run a Pod, using a topology-
aware algorithm.

Users can set the priority of a pod, which will allow preemption of
lower priority pods. The eviction of lower priority pods would then
allow the higher priority pod to be scheduled.

The scheduler tracks the set of nodes in your cluster, filters them
based on a set of predicates, then uses priority functions to score
or determine on which node each Pod should be scheduled. The Pod
specification as part of a request is sent to the kubelet on the node
for creation.

The default scheduling decision can be affected through the use of


Labels on nodes or Pods. Labels of podAffinity, taints, and pod
bindings allow for configuration from the Pod or the node perspective.
Some, like tolerations, allow a Pod to work with a node, even when
the node has a taint that would otherwise preclude a Pod being
scheduled.

Not all labels are drastic. Affinity settings may encourage a Pod to be
deployed on a node, but would deploy the Pod elsewhere if the node
was not available. Sometimes, documentation may use the
term require, but practice shows the setting to be more of a request.
As beta features, expect the specifics to change. Some settings will
evict Pods from a node should the required condition no longer be
true, such
as requiredDuringScheduling, RequiredDuringExecution.

Other options, like a custom scheduler, need to be programmed and


deployed into your Kubernetes cluster.
Filtering (Predicates)
The scheduler goes through a set of filters, or predicates, to find
available nodes, then ranks each node using priority functions. The
node with the highest rank is selected to run the Pod.

predicatesOrdering = []string{CheckNodeConditionPred,
GeneralPred, HostNamePred,
PodFitsHostPortsPred,MatchNodeSelectorPred,
PodFitsResourcesPred,
NoDiskConflictPred,PodToleratesNodeTaintsPred,
PodToleratesNodeNoExecuteTaintsPred,CheckNodeLabelPresenceP
red, checkServiceAffinityPred,MaxEBSVolumeCountPred,
MaxGCEPDVolumeCountPred,MaxAzureDiskVolumeCountPred,
CheckVolumeBindingPred,NoVolumeZoneConflictPred,
CheckNodeMemoryPressurePred,CheckNodeDiskPressurePred,
MatchInterPodAffinityPred}

The predicates, such as PodFitsHost or NoDiskConflict, are


evaluated in a particular and configurable order. In this way, a node
has the least amount of checks for new Pod deployment, which can
be useful to exclude a node from unnecessary checks if the node is
not in the proper condition.

For example, there is a filter called HostNamePred, which is also


known as HostName, which filters out nodes that do not match the
node name specified in the pod specification. Another predicate
is PodFitsResources to make sure that the available CPU and
memory can fit the resources required by the Pod.

The scheduler can be updated by passing a configuration of kind:


Policy, which can order predicates, give special weights to priorities,
and even hardPodAffinitySymmetricWeight, which deploys Pods
such that if we set Pod A to run with Pod B, then Pod B should
automatically be run with Pod A.

Scoring (Priorities)
Priorities are functions used to weight resources. Unless Pod and
node affinity has been configured to the SelectorSpreadPriority
setting, which ranks nodes based on the number of existing running
pods, they will select the node with the least amount of Pods. This is a
basic way to spread Pods across the cluster.
Other priorities can be used for particular cluster needs.
The ImageLocalityPriorityMap favors nodes which already have
downloaded container images. The total sum of image size is
compared with the largest having the highest priority, but does not
check the image about to be used.

Currently, there are more than ten included priorities, which range
from checking the existence of a label to choosing a node with the
most requested CPU and memory usage. You can view a list of
priorities at cp/pkg/scheduler/algorithm/priorities.

A stable feature as of v1.14 allows the setting of


a PriorityClass and assigning pods via the use
of PriorityClassName settings. This allows users to preempt, or
evict, lower priority pods so that their higher priority pods can be
scheduled. The kube-scheduler determines a node where the pending
pod could run if one or more existing pods were evicted. If a node is
found, the low priority pod(s) are evicted and the higher priority pod
is scheduled. The use of a Pod Disruption Budget (PDB) is a way to
limit the number of pods preemption evicts to ensure enough pods
remain running. The scheduler will remove pods even if the PDB is
violated if no other options are available.

Scheduling Policies
The default scheduler contains a number of predicates and priorities;
however, these can be changed via a scheduler policy file.

A short version is shown below:

"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "MatchNodeSelector", "order": 6},
{"name" : "PodFitsHostPorts", "order": 2},
{"name" : "PodFitsResources", "order": 3},
{"name" : "NoDiskConflict", "order": 4},
{"name" : "PodToleratesNodeTaints", "order": 5},
{"name" : "PodFitsHost", "order": 1}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" :
1},
{"name" : "ServiceSpreadingPriority", "weight" :
2},
{"name" : "EqualPriority", "weight" : 1}
],
"hardPodAffinitySymmetricWeight" : 10
}

Typically, you will configure a scheduler with this policy using the --
policy-config-file parameter and define a name for this scheduler
using the --scheduler-name parameter. You will then have two
schedulers running and will be able to specify which scheduler to use
in the pod specification.

With multiple schedulers, there could be conflict in the Pod allocation.


Each Pod should declare which scheduler should be used. But, if
separate schedulers determine that a node is eligible because of
available resources and both attempt to deploy, causing the resource
to no longer be available, a conflict would occur. The current solution
is for the local kubelet to return the Pods to the scheduler for
reassignment. Eventually, one Pod will succeed and the other will be
scheduled elsewhere.

Pod Specification
Most scheduling decisions can be made as part of the Podspec. A pod
specification contains several fields that inform scheduling, namely:

 nodeName & nodeSelector:


The nodeName and nodeSelector options allow a Pod to be assigned
to a single node or a group of nodes with particular labels.

 affinity & anti-affinity: Affinity and anti-affinity can be used


to require or prefer which node is used by the scheduler. If using a
preference instead, a matching node is chosen first, but other nodes
would be used if no match is present.

 schedulerName: Should none of the options above meet the needs


of the cluster, there is also the ability to deploy a custom scheduler.
Each Pod could then include a schedulerName to choose which
schedule to use.

 taints & tolerations: The use of taints allows a node to be


labeled such that Pods would not be scheduled for some reason, such
as the cp node after initialization. A toleration allows a Pod to ignore
the taint and be scheduled assuming other requirements are met.
Specifying the Node Label
The nodeSelector field in a pod specification provides a
straightforward way to target a node or a set of nodes, using one or
more key-value pairs.

spec:
containers:
- name: redis
image: redis
nodeSelector:
net: fast

Setting the nodeSelector tells the scheduler to place the pod on a


node that matches the labels. All listed selectors must be met, but the
node could have more labels. In the example above, any node with a
key of net set to fast would be a candidate for scheduling.
Remember that labels are administrator-created tags, with no tie to
actual resources. This node could have a slow network.

The pod would remain Pending until a node is found with the
matching labels.

The use of affinity/anti-affinity should be able to express every feature


as nodeSelector.

Scheduler Profiles
Another way to configure the scheduler is via the use of scheduling
profiles. These profiles allow the configuration of extension points at
which plugins can be used.

An extension point is one of the twelve stages of scheduling, at


which point a plugin can be used to modify how that state of a
scheduler works:

 queueSort
 preFilter
 filter
 postFilter
 preScore
 score
 reserve
 permit
 preBind
 bind
 postBind
 multiPoint

There are quite a few plugins which are enabled, or can be enabled,
to effect how the scheduler chooses a node for a podSpec. You can
take a look at the current scheduling plugins options.

A scheduler can have more than one profile enabled at the same
time, which may remove the need to have multiple schedulers. Each
podSpec would need to declare which profile to use, or will use
the default-scheduler if a profile has not been declared.

Pod Affinity Rules


Pods which may communicate a lot or share data may operate best if
co-located, which would be a form of affinity. For greater fault
tolerance, you may want Pods to be as separate as possible, which
would be anti-affinity. These settings are used by the scheduler based
on the labels of Pods that are already running. As a result, the
scheduler must interrogate each node and track the labels of running
Pods. Clusters larger than several hundred nodes may see significant
performance loss. Pod affinity rules use In, NotIn, Exists,
and DoesNotExist operators.

Pod Affinity Rules:


 requiredDuringSchedulingIgnoredDuringExecution: The
use of requiredDuringSchedulingIgnoredDuringExecution means
that the Pod will not be scheduled on a node unless the following
operator is true. If the operator changes to become false in the future,
the Pod will continue to run. This could be seen as a hard rule.

 preferredDuringSchedulingIgnoredDuringExecution:
Similarly, preferredDuringSchedulingIgnoredDuringExecution will
choose a node with the desired setting before those without. If no
properly-labeled nodes are available, the Pod will execute anyway.
This is more of a soft setting, which declares a preference instead of a
requirement.

 podAffinity: With the use of podAffinity, the scheduler will try to


schedule Pods together.

 podAntiAffinity: The use of podAntiAffinity would cause the


scheduler to keep Pods on different nodes.
podAffinity Example
An example of affinity and podAffinity settings can be seen
below. This also requires a particular label to be matched when the
Pod starts, but not required if the label is later removed.

spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1

The Pod can be scheduled on a node running a Pod with a key label
of security and a value of S1. If this requirement is not met, the Pod
will remain in a Pending state.

podAntiAffinity Example
With podAntiAffinity, we can prefer to avoid nodes with a particular
label. In this case, the scheduler will prefer to avoid a node with a key
set to security and value of S2.

podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2

In a large, varied environment, there may be multiple situations to be


avoided. As a preference, this setting tries to avoid certain labels, but
will still schedule the Pod on some node. As the Pod will still run, we
can provide a weight to a particular rule. The weights can be declared
as a value from 1 to 100. The scheduler then tries to choose, or avoid
the node with the greatest combined value.
Node Affinity Rules
Where Pod affinity/anti-affinity has to do with other Pods, the use
of nodeAffinity allows Pod scheduling based on node labels. This is
similar and will some day replace the use of
the nodeSelector setting. The scheduler will not look at other Pods
on the system, but the labels of the nodes. This should have much
less performance impact on the cluster, even with a large number of
nodes.

 Uses In, NotIn, Exists, DoesNotExist operators


 requiredDuringSchedulingIgnoredDuringExecution
 preferredDuringSchedulingIgnoredDuringExecution
 Planned for
future: requiredDuringSchedulingRequiredDuringExecution.

Until nodeSelector has been fully deprecated, both the selector and
required labels must be met for a Pod to be scheduled.

Node Affinity Example


spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: diskspeed
operator: In
values:
- quick
- fast

The nodeAffinity prefers a node with the above rule, but the pod
would be scheduled even if there were no matching nodes. The rule
gives extra weight to nodes with a key of diskspeed with a value
of fast or quick.

Taints
A node with a particular taint will repel Pods without tolerations for
that taint. A taint is expressed as key=value:effect. The key and the
value are created by the administrator.

The key and value used can be any legal string, and this allows
flexibility to prevent Pods from running on nodes based off of any
need. If a Pod does not have an existing toleration, the scheduler will
not consider the tainted node.

Ways to Handle Pod Scheduling


1. NoSchedule: The scheduler will not schedule a Pod on this node,
unless the Pod has this toleration. Existing Pods continue to run,
regardless of toleration.

2. PreferNoSchedule: The scheduler will avoid using this node, unless


there are no untainted nodes for the Pods toleration. Existing Pods
are unaffected.

3. NoExecute: This taint will cause existing Pods to be evacuated and


no future Pods scheduled. Should an existing Pod have a toleration, it
will continue to run. If the Pod tolerationSeconds is set, they will
remain for that many seconds, then be evicted. Certain node issues
will cause the kubelet to add 300 second tolerations to avoid
unnecessary evictions.

If a node has multiple taints, the scheduler ignores those with


matching tolerations. The remaining unignored taints have their
typical effect.

The use of TaintBasedEvictions is still an alpha feature. The kubelet


uses taints to rate-limit evictions when the node has problems.

Tolerations
Setting tolerations on a node are used to schedule Pods on tainted
nodes. This provides an easy way to avoid Pods using the node. Only
those with a particular toleration would be scheduled.

An operator can be included in a Pod specification, defaulting


to Equal if not declared. The use of the operator Equal requires a
value to match. The Exists operator should not be specified. If an
empty key uses the Exists operator, it will tolerate every taint. If
there is no effect, but a key and operator are declared, all effects are
matched with the declared key.
tolerations:
- key: "server"
operator: "Equal"
value: "ap-east"
effect: "NoExecute"
tolerationSeconds: 3600

In the above example, the Pod will remain on the server with a key
of server and a value of ap-east for 3600 seconds after the node has
been tainted with NoExecute. When the time runs out, the Pod will be
evicted.

Custom Scheduler
If the default scheduling mechanisms (affinity, taints, policies) are not
flexible enough for your needs, you can write your own scheduler. The
programming of a custom scheduler is outside the scope of this
course, but you may want to start with the existing scheduler code,
which can be found in the Scheduler repository on GitHub.

If a Pod specification does not declare which scheduler to use, the


standard scheduler is used by default. If the Pod declares a scheduler,
and that container is not running, the Pod would remain in
a Pending state forever.

The end result of the scheduling process is that a pod gets a binding
that specifies which node it should run on. A binding is a Kubernetes
API primitive in the api/v1 group. Technically, without any scheduler
running, you could still schedule a pod on a node, by specifying a
binding for that pod.

You can also run multiple schedulers simultaneously.

You can view the scheduler and other information with this command:

$ kubectl get events

Lab Exercises

Lab 12.1. Assign Pods Using


Labels
Lab 12.2. Using Taints to
Control Pod Deployment

Knowledge Check

Question 12.1
Multiple schedulers can be deployed at the same time. True or False?

 A. True

 B. False

Question 12.2
Labels and annotations are used for the same purpose. True or False?

 A. True

 B. False

Question 12.3
When a node has been tainted, what does a Pod require to be
deployed on that node?

 A. Annotation

 B. Permission

 C. Ability

 D. Toleration

Question 12.4
All taints cause Pods to stop running on a node. True or False?

 A. True

 B. False

Correct answers: a,b,d,b

13. LOGGING AND


TROUBLESHOOTING
Introduction

Chapter Overview
This session covers scheduling. We have a very flexible scheduler available
to us with Kubernetes. We can affect where pods will be deployed based off
of several features which we set with labels. The kube-scheduler agent then
looks at these labels, and looks at the labels of the nodes, to determine
where a pod should be deployed. We have the ability to configure affinity
and anti-affinity.
For example, for higher availability reasons, we may want to spread out our
pods, so that they don’t run on the same node. On the other hand, for
performance, we may want locality of data. We may want our pods to be
running on the same node as much as possible.
We then can taint a node. This is a label, that is just a string declared for
that node. Our pod then might avoid the node, unless the pod has a
toleration. A toleration is just another label that allows it to run on a tainted
node.
We can also set a NodeSelector and actually call out a particular node for a
pod to run on. In a large and diverse environment, this may not be the
easiest way to go about doing things, but, should you have very specialized
hardware needs, it is an option. As well, we can also deploy our own Custom
Scheduler, and run it alongside of the existing kube-scheduler agent.
Let’s dig in!

You might also like