The Kubernetes Troubleshooting Handbook
The Kubernetes Troubleshooting Handbook
Piotr · Follow
Published in ITNEXT · 16 min read · 1 day ago
85
Introduction
Debugging Kubernetes applications can feel like navigating a labyrinth. With its
distributed nature and myriad of components, identifying and resolving issues in
Kubernetes requires a robust set of tools and techniques.
In this blog we will explore various techniques and tools to help with troubleshooting
and debugging Kubernetes. Whether you’re an experienced Kubernetes user or just
getting started, this guide will provide valuable insights into efficient debugging
practices.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 1/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
To analyze the lifecycle events of a pod, you can use the kubectl get and kubectl
describe commands.
The kubectl get command provides a high-level overview of the status of pods:
Output:
This output shows the current status of each pod, which can help you identify pods
that need further investigation.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 2/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Output snippet:
Name: web-server-pod
Namespace: default
Node: node-1/192.168.1.1
Start Time: Mon, 01 Jan 2024 10:00:00 GMT
Labels: app=web-server
Status: Running
IP: 10.244.0.2
Containers:
web-container:
Container ID: docker://abcdef123456
Image: nginx:latest
State: Running
Started: Mon, 01 Jan 2024 10:01:00 GMT
Ready: True
Restart Count: 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/web-
Normal Pulled 9m kubelet, node-1 Container image "nginx:latest" alr
Normal Created 9m kubelet, node-1 Created container web-container
Normal Started 9m kubelet, node-1 Started container web-container
The Events section in the kubectl describe output provides a chronological log of
significant events that have occurred for the pod. These events can help you
understand the lifecycle transitions and identify issues such as:
Image Pull Errors: Failures in pulling container images can indicate network
issues or problems with the container registry.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 3/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Audit logs kind: Policy on the other hand are useful for ensuring compliance and
securtity on the cluster. They can show login attempts, pod priviledges escalation and
more.
Kubernetes Events
Viewing Events
To view events in your cluster, use the kubectl get events command:
Output example:
Filtering Events
You can filter events to focus on specific namespaces, resource types, or time
periods. For example, to view events related to a specific pod:
Describing Resources
The kubectl describe command includes events in its output, providing detailed
information about a specific resource along with its event history:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 4/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Output snippet:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/web-
Normal Pulled 9m kubelet, node-1 Container image "nginx:latest" alr
Normal Created 9m kubelet, node-1 Created container web-container
Normal Started 9m kubelet, node-1 Started container web-container
To enable audit logging, configure the API server with the appropriate flags and audit
policy. Here’s an example of an audit policy configuration:
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: ""
resources: ["pods"]
- level: RequestResponse
users: ["admin"]
verbs: ["update", "patch"]
resources:
- group: ""
resources: ["configmaps"]
Specify the audit policy file and log file location when starting the API server:
Audit logs are typically written to a file. You can use standard log analysis tools to
view and filter the logs. Here’s an example of an audit log entry:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 5/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "Metadata",
"auditID": "12345",
"stage": "ResponseComplete",
"requestURI": "/api/v1/namespaces/default/pods",
"verb": "create",
"user": {
"username": "admin",
"groups": ["system:masters"]
},
"sourceIPs": ["192.168.1.1"],
"objectRef": {
"resource": "pods",
"namespace": "default",
"name": "web-server-pod"
},
"responseStatus": {
"metadata": {},
"code": 201
},
"requestReceivedTimestamp": "2024-01-01T12:00:00Z",
"stageTimestamp": "2024-01-01T12:00:01Z"
}
Kubernetes Dashboard
The Kubernetes Dashboard is a web-based UI that provides an easy way to manage
and troubleshoot your Kubernetes cluster. It allows you to visualize cluster resources,
deploy applications, and perform various administrative tasks.
Please refer to the kubernetes documentaiton for details on installing and accessing
the dashboard.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 6/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
The Dashboard provides various features to help manage and troubleshoot your
Kubernetes cluster:
1. Cluster Overview: View the overall status of your cluster, including nodes,
namespaces, and resource usage.
3. Services and Ingress: Manage services and ingress resources to control network
traffic.
5. Logs and Events: View logs and events for troubleshooting and auditing
purposes.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 7/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Monitoring resource usage helps you understand how your applications consume
resources and identify opportunities for optimization.
The kubectl top command shows the current CPU and memory usage of pods and
nodes.
Example output:
Basic Usage
The simplest way to retrieve logs from a pod is by using the kubectl logs command
followed by the pod name and namespace. Here’s a basic example for a pod running
in a default namespace:
This command fetches the logs from the first container in the specified pod. If your
pod has multiple containers, you need to specify the container name as well:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 8/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
This is particularly useful for monitoring logs as your application runs and observing
the output of live processes.
There are projects that enchance the log tailing with additional capabilities, for
example stern.
If a pod has restarted, you can view the logs from the previous instance using the --
previous flag:
This helps in understanding what caused the pod to restart by examining the logs
before the failure.
You can combine kubectl logs with other Linux commands to enhance your
debugging process. For example, to search for a specific error message in the logs,
you can use grep :
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 9/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Practical Tips
Log Rotation and Retention: Ensure that your application handles log rotation to
prevent the logs from consuming excessive disk space.
Structured Logging: Use structured logging (e.g., JSON format) to make it easier to
parse and analyze logs using tools like jq .
Basic Usage
To execute a command in a specific container within a pod, use the -c flag. Note
that this will execute a command and immediatelly exit the container.
One of the most common uses of kubectl exec is to open an interactive shell
session within a container. This allows you to run multiple commands interactively.
Here’s how to do it:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 10/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Suppose you need to inspect a configuration file inside the container. You can
use cat or any text editor available inside the container:
If you don’t have a binary you need inside a container, it’s easy to files to and from
containers using kubectl cp . For example, to copy a file from your local machine to
a container:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 11/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Practical Tips
Use the -i and -t Flags: The -i flag makes the session interactive, and the -
Run as a Specific User: Use the --user flag to execute commands as a specific
user inside the container, if required.
Most of the debugging techniquest focus on the application level, however it’s also
possible to debug a specific kubernetes node using kubectl debug node command.
Node-level debugging is crucial for diagnosing issues that affect the Kubernetes
nodes themselves, such as resource exhaustion, misconfigurations, or hardware
failures.
This way the debugging Pod can access the root filesystem of the Node, mounted
at /host in the Pod.
Use the kubectl debug command to start a debugging session on a node. This
command creates a pod running a debug container on the specified node.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 12/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Replace <node-name> with the name of the node you want to debug. The -it flag
opens an interactive terminal, and --image=busybox specifies the image to use for
the debug container.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-netshoot
labels:
app: nginx-netshoot
spec:
replicas: 1
selector:
matchLabels:
app: nginx-netshoot
template:
metadata:
labels:
app: nginx-netshoot
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
- name: netshoot
image: nicolaka/netshoot
command: ["/bin/bash"]
args: ["-c", "while true; do ping localhost; sleep 60;done"]
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 13/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Practical Tips
Set Restart Policies: Ensure that your pod specifications have appropriate restart
policies to handle different failure scenarios.
Automated Monitoring: Set up automated monitoring and alerting for critical issues
such as CrashLoopBackOff using Prometheus and Alertmanager.
Tool Availability: Allows the use of specialized tools that may not be present in
the application container.
Temporary Nature: These pods can be easily created and destroyed as needed,
without leaving residual impact on the cluster.
This command creates a debug pod using the netshoot image and opens an
interactive shell.
Tool Availability: Ensure the debug container image includes all necessary tools for
troubleshooting, such as curl , netcat , nslookup , df , top , and others.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 14/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Let’s walk through an example of using a custom debug container for advanced
debugging tasks.
redis5 ~
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 15/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Server: 10.96.0.10
Address:10.96.0.10#53
** server can't find my-db-service: NXDOMAIN
Inspect the logs of CoreDNS pods to identify any DNS resolution issues.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 16/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Ensure that the service and endpoints exist and are correctly configured.
Expected output:
Server: 10.96.0.10
Address:10.96.0.10#53
Name: my-db-service.default.svc.cluster.local
Address:10.96.0.11
Practical Tips
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 17/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Stateful applications maintain state information across sessions and restarts, often
using persistent storage. Examples include databases, message queues, and other
applications that require data persistence.
Persistent Storage Issues: Problems with PVCs or PVs can lead to data loss or
unavailability.
Output snippet:
Name: my-mysql
Namespace: default
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 18/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Selector: app=my-mysql
Replicas: 3 desired | 3 total
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 1m statefulset-controller create Pod my-mysql-0
Normal SuccessfulCreate 1m statefulset-controller create Pod my-mysql-1
Normal SuccessfulCreate 1m statefulset-controller create Pod my-mysql-2
Output snippet:
Name: data-my-mysql-0
Namespace: default
Status: Bound
Volume: pvc-1234abcd-56ef-78gh-90ij-klmnopqrstuv
...
Output snippet:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 19/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Output snippet:
Jaeger is an open-source, end-to-end distributed tracing tool that helps monitor and
troubleshoot transactions in complex distributed systems. Profiling with Jaeger can
provide insights into the performance of your microservices and help identify latency
issues.
You can install Jaeger in your Kubernetes cluster using the Jaeger Operator or Helm.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 20/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Ensure your application is instrumented to send tracing data to Jaeger. This typically
involves adding Jaeger client libraries to your application code and configuring them
to report to the Jaeger backend.
Example in a Go application:
import (
"github.com/opentracing/opentracing-go"
"github.com/uber/jaeger-client-go"
"github.com/uber/jaeger-client-go/config"
)
func initJaeger(service string) (opentracing.Tracer, io.Closer) {
cfg := config.Configuration{
ServiceName: service,
Sampler: &config.SamplerConfig{
Type: "const",
Param: 1,
},
Reporter: &config.ReporterConfig{
LogSpans: true,
LocalAgentHostPort: "jaeger-agent.default.svc.cluster.local:6831",
},
}
tracer, closer, _ := cfg.NewTracer()
opentracing.SetGlobalTracer(tracer)
return tracer, closer
}
Setting Up mirrord
Start a mirrord session to connect your local environment to your Kubernetes cluster.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 21/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
mirrord connect
Swap Deployment:
Use mirrord to swap a deployment in your cluster with your local service.
This command redirects traffic, environment variables, and file operations from your
Kubernetes cluster to your local machine, allowing you to debug the service as if it
were running locally.
Once the mirrord session is set up, you can use your favorite debugging tools and
IDEs to debug the service running on your local machine.
Set Breakpoints: Use your IDE to set breakpoints and step through the code.
Make Changes: Make code changes and immediately see the effects without
redeploying to the cluster.
For a detailed example and more information on using mirrord for debugging,
read this blog post.
Additional Tools
In addition to the core Kubernetes commands and open-source tools, there are
several other tools available that can enhance your troubleshooting capabilities
across various categories. Here are a few noteworthy tools:
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 22/24
7/20/24, 4:06 PM The Kubernetes Troubleshooting Handbook | by Piotr | Jul, 2024 | ITNEXT
Closing Thoughts
Debugging Kubernetes applications can be a complex and challenging task, but with
the right tools and techniques, it becomes much more manageable.
Remember, effective debugging is not just about resolving issues as they arise but
also about proactive monitoring, efficient resource management, and a deep
understanding of your application’s architecture and dependencies.
By implementing the strategies and best practices outlined in this guide, you can
build a robust debugging framework that empowers you to quickly identify,
diagnose, and resolve issues, ensuring the smooth operation of your Kubernetes
deployments.
Thanks for taking the time to read this post. I hope you found it interesting and
informative.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff 24/24