0% found this document useful (0 votes)
12 views31 pages

Devops

Uploaded by

foochwanboon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Devops

Uploaded by

foochwanboon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

By aiwebix

@aiwebix
[Link]
Foundational Concepts

1. What is DevOps?

DevOps is a modern approach to building and delivering software that combines cultural
philosophies, practices, and tools to increase an organization's ability to deliver applications at
high velocity. It aims to break down the traditional walls or "silos" that exist between software
development teams (Dev) and IT operations teams (Ops).

Detailed Explanation: Think of a traditional restaurant. The chefs who cook the food
(Developers) work in the kitchen, and the waiters who serve the food to customers (Operations)
work in the dining area. If a customer complains that the food is cold, the waiters might blame
the chefs for slow cooking, and the chefs might blame the waiters for slow serving. Each team
works in its own silo, and the customer remains unhappy. This is the traditional, inefficient model
of software delivery.

DevOps makes the chefs and waiters work as one single, collaborative team with a shared goal:
delivering a great experience to the customer. They share ownership of the entire process.

●​ Cultural Philosophy: The team adopts a mindset of shared responsibility. If the food is
cold, it's the team's problem to solve together.
●​ Practices: They implement better processes, like creating a direct communication
channel between the kitchen and the dining floor.
●​ Tools & Automation: They use tools like food warmers (automation) and an efficient
kitchen-to-table routing system (a CI/CD pipeline) to ensure the software (hot, delicious
food) is delivered quickly and reliably.

The core business value of DevOps is to shorten the systems development life cycle, which
means faster time to market for new features, increased reliability and stability of the system,
and the ability to innovate more quickly than competitors.

2. How is DevOps different from the Agile methodology?

While they are highly complementary and often used together, they focus on different parts of the

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 1


software delivery process.

Detailed Explanation: Agile is a project management methodology that is primarily focused on


the software development process. Using our restaurant analogy, Agile is all about how the
kitchen is run. It's a way of organizing the chefs (developers) to cook the food efficiently. It
focuses on breaking down a large, complex meal into small, manageable dishes (called "sprints")
so the team can quickly adapt if the customer changes their order. Agile answers the question,
"How do we build the software efficiently and adapt to change?"

DevOps, on the other hand, is a broader philosophy that covers the entire delivery lifecycle,
from the moment an order is placed to the moment the food is delivered and even after. It
includes the kitchen's process (Agile) but also focuses on how the food is packaged, delivered to
the table, and how customer feedback is collected and sent back to the kitchen. DevOps answers
the bigger question, "How do we deliver and run the software reliably and quickly?"

Aspect Agile Methodology DevOps Philosophy

Scope Primarily the development lifecycle. The entire software delivery and operations
lifecycle.

Focus Adaptive planning, iterative Automation, collaboration, and fast feedback.


development.

Teams Focuses on collaboration within the Focuses on collaboration between Dev, Ops,
dev team. and Security.

Goal To build the right software and To deliver and run that software reliably and
adapt to change. at speed.

Core Unit The "sprint" or iteration. The automated "CI/CD pipeline."

3. What are the core principles of DevOps?

The key principles are often summarized by the acronym CAMS:

●​ Culture: This is the most important and foundational principle. It's about changing the
organizational mindset from finger-pointing to shared ownership. It means fostering a
culture of collaboration between all teams (Dev, Ops, Security, Business). A key part of
this is creating psychological safety and adopting a blameless approach to failures.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 2


When an incident occurs, the goal is to fix the systemic process that allowed the failure,
not to blame the individual who executed the command.
●​ Automation: This is the technical engine of DevOps. The goal is to automate everything
that is repetitive and prone to human error. This includes:
○​ CI/CD Pipelines: Automating the build, testing, and deployment of code.
○​ Infrastructure as Code (IaC): Automating the provisioning of servers and
infrastructure.
○​ Configuration Management: Automating the configuration of servers.
○​ Monitoring & Alerting: Automating the detection of issues. Automation reduces
manual work ("toil"), increases speed, and ensures consistency.
●​ Measurement: You can't improve what you can't measure. This principle is about making
data-driven decisions by collecting metrics on every part of the system. This includes:
○​ Pipeline Metrics: How long does a build take? What is the deployment failure
rate?
○​ Application Metrics: What is the application's error rate? What is the average
response time?
○​ Business Metrics: How does a new feature impact user engagement? This data
provides a crucial feedback loop that helps the team continuously improve.
●​ Sharing: This is about breaking down silos by encouraging open communication and the
sharing of knowledge, tools, and code across teams. When developers are encouraged to
look at monitoring dashboards and operations engineers are encouraged to look at the
application code, they both gain a better understanding of the entire system. This shared
context allows them to build better, more reliable software together.

4. What are the "Three Ways" of DevOps?

These are the founding principles from the book The Phoenix Project that describe the journey to
achieving a DevOps mindset. They provide a high-level framework for thinking about your
processes.

●​ The First Way (Systems Thinking): This means looking at the entire software delivery
process as one single system, from a business idea all the way to a feature running in
production and delivering value to the customer. The goal is to maximize the flow of
work through this system, moving from left (Development) to right (Operations) as
quickly and smoothly as possible. This involves identifying and removing bottlenecks,
just like a factory would optimize its assembly line.
●​ The Second Way (Amplify Feedback Loops): This is about creating fast and constant
feedback from the right side (Operations) back to the left side (Development). When a
problem happens in production, developers should know about it immediately, not days
later. This is achieved through comprehensive monitoring, automated testing that provides
instant results, and quick communication channels. The goal is to find and fix problems as
close to the source as possible, which makes them cheaper and easier to fix.
●​ The Third Way (Culture of Continual Experimentation and Learning): This is about
creating a high-trust culture where it's safe to experiment, take calculated risks, and learn

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 3


from failure. It means understanding that mastery comes from practice and repetition.
This culture encourages trying new things, institutionalizing learning through practices
like blameless post-mortems, and dedicating time to improvement (e.g., fixing technical
debt or improving automation).

5. What is a "blameless culture" and why is it important for DevOps?

A blameless culture is an environment where, when something goes wrong, the immediate
question is "What went wrong?" and "How can we prevent this from happening again?"—not
"Who did this?" It's a foundational element of DevOps because it focuses on fixing the system
and the process, not on blaming an individual.

This is critically important because DevOps encourages speed and frequent deployments, which
naturally increases the chance of errors. If people are afraid of being punished for making a
mistake, they will start to hide problems, avoid taking risks, and point fingers at other teams. This
creates a culture of fear that destroys collaboration and slows everything down. A blameless
culture fosters psychological safety, which encourages transparency, learning, and a sense of
shared ownership, ultimately making the entire system more reliable and resilient.

6. What is Site Reliability Engineering (SRE)? How does it relate to DevOps?

Site Reliability Engineering (SRE) is a discipline that takes the principles of software
engineering and applies them to operations problems. It can be thought of as a very specific,
prescriptive implementation of DevOps, originally developed at Google.

While DevOps is a broad philosophy about culture and collaboration ("what" and "why"), SRE
provides the specific practices and roles for "how" to do it. For example, SRE introduces
concrete concepts like:

●​ Service Level Objectives (SLOs): Setting specific, measurable reliability targets for a
service.
●​ Error Budgets: A data-driven way to balance reliability with the need to release new
features.
●​ Toil Automation: A commitment to automating manual, repetitive operational tasks.

Essentially, if DevOps is the goal of making Dev and Ops work together, SRE is a job description
for an engineer who achieves that goal using software engineering practices.

CI/CD (Continuous Integration / Continuous Delivery /


Continuous Deployment)

7. What is Continuous Integration (CI)?

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 4


Continuous Integration (CI) is the practice where developers merge their code changes into a
central repository (like a Git main branch) frequently—often multiple times a day. Each time a
developer merges their code, an automated process kicks off that builds the software and runs a
suite of tests.

Think of it like multiple authors writing a book. Instead of each author writing their entire
chapter in isolation and then trying to combine them all at the end (which would be a nightmare
of conflicting storylines), CI is like having them add one paragraph at a time to a shared
document. After each new paragraph is added, an automated editor checks for spelling and
grammar mistakes. This allows the team to find and fix integration problems early and quickly,
leading to higher quality code and a more stable project.

8. What is Continuous Delivery (CD)?

Continuous Delivery (CD) is the logical next step after Continuous Integration. It's the practice
of automating the entire software release process. After the code is successfully built and tested
in the CI stage, the CD process automatically packages it and prepares it for release.

The result is a production-ready artifact that has passed all automated checks. With Continuous
Delivery, the operations team can deploy this new version to production with the push of a
button. The key idea is that the software is always in a releasable state, which dramatically
reduces the risk and anxiety of release days.

9. How is Continuous Deployment different from Continuous Delivery?

Continuous Deployment takes Continuous Delivery one step further by removing the manual
button push. In this practice, every single code change that passes the entire automated test suite
is automatically deployed to production without any human intervention.

This is the ultimate goal for many DevOps teams, but it requires a very high level of confidence
in the automated testing and monitoring systems. While Continuous Delivery ensures you can
release at any time, Continuous Deployment ensures you do release every time a change is ready.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 5


For example, a team practicing Continuous Deployment might release new code to their users 50
times a day, automatically.

10. What is a CI/CD pipeline? What are its typical stages?

A CI/CD pipeline is the automated workflow that acts as the backbone of the entire DevOps
process. It's the set of steps that takes a developer's code from their machine all the way to the
production environment.

The typical stages are:

●​ Source Stage: This stage is triggered when a developer commits code to a version control
system like Git.
●​ Build Stage: The pipeline checks out the code, compiles it if necessary, and packages it
into a runnable artifact. For example, it might create a Docker image or a Java JAR file.
●​ Test Stage: The pipeline runs a series of automated tests against the artifact to ensure it's
high quality and doesn't have any bugs. This can include unit tests, integration tests, and
security scans.
●​ Deploy Stage: If all the tests pass, the artifact is deployed to an environment. This
usually starts with a staging or testing environment, and after further checks, it's promoted
to the production environment for users.

11. Name some popular CI/CD tools.

●​ Jenkins: The most popular open-source automation server. It's incredibly powerful and
flexible due to its massive library of plugins, but this flexibility can also make it complex
to set up and manage.
●​ GitLab CI/CD: A CI/CD tool that is tightly integrated into the GitLab platform. It's
known for its power and simplicity, using a single YAML file (.[Link]) for
configuration and providing built-in features like a container registry.
●​ GitHub Actions: A CI/CD tool that is natively integrated into GitHub. It's event-driven,
meaning you can trigger workflows on any GitHub event (like a push, a pull request, or a
new issue). It has a large marketplace of reusable "actions" that make it easy to build
complex pipelines.
●​ CircleCI: A popular cloud-based CI/CD tool known for its speed, performance, and
easy-to-use YAML configuration.

12. What is a Jenkinsfile?

A Jenkinsfile is a text file that contains the definition of a Jenkins pipeline. It is checked into the
source code repository alongside the application code. This is the core of "pipeline as code."

Instead of configuring a pipeline by clicking around in the Jenkins user interface, you define all
the stages, steps, and logic of your pipeline in this file using a Groovy-based language. There are
two types of syntax:

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 6


●​ Declarative: A newer, more structured, and simpler syntax that is easier to read and
write.
●​ Scripted: An older, more flexible syntax that offers more power for complex pipelines.
Using a Jenkinsfile is the standard best practice because it makes your pipeline
version-controlled, reviewable, and reusable.

13. How would you design a CI/CD pipeline for a microservices architecture?

The key principle for a microservices pipeline is that each service must be independently
deployable. To achieve this without creating a management nightmare, you would design a
reusable pipeline template.

Instead of creating a unique, handcrafted pipeline for each of the dozens or hundreds of services,
you create a master template that defines the standard stages (build, test, scan, deploy). In
Jenkins, this is done with a Shared Library. In GitLab CI, this is done with a include keyword
to import a template. In GitHub Actions, this is a Reusable Workflow.

Each microservice then has its own repository and a very small pipeline configuration file that
simply calls this reusable template. This approach keeps the core pipeline logic centralized and
easy to update, while giving each service team the autonomy to release their service whenever
they need to.

Git and Version Control

14. What is Git? Explain the difference between git pull and git fetch.

Git is a distributed version control system, which means it's a tool for tracking changes in code
and coordinating work among multiple people.

●​ git fetch: This command connects to a remote repository (like GitHub) and downloads all
the new changes, but it does not integrate any of them into your current working files. It's
like going to the post office to pick up your mail; you have the letters in your hand, but
you haven't opened them yet. This is a safe way to review what others have done without
affecting your own work.
●​ git pull: This command is actually two commands in one: git fetch followed by git merge.
It downloads all the new changes and then immediately tries to merge them into your
current branch. Using the same analogy, this is like having the mail delivered directly to
your desk and automatically opened and sorted into your current paperwork.

15. What is the difference between git rebase and git merge?

Both commands are used to integrate changes from one branch into another, but they do so in
very different ways, resulting in a different project history.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 7


●​ git merge is like taking two separate storylines and tying them together with a new
concluding chapter. It takes the commits from two branches and creates a new, special
"merge commit" that combines them. This preserves the original history of both branches
exactly as they happened, but it can make the project history look complex and
graph-like.
●​ git rebase is like taking one storyline and rewriting it as if it happened after the other. It
takes all the commits from your current branch, temporarily saves them, and then replays
them one by one on top of the target branch. This creates a perfectly linear and clean
history, as if the work was done in a straight line. However, because it rewrites the
commit history, it should only be used on private, local branches before sharing your
work.

16. What is git cherry-pick?

git cherry-pick is a command that allows you to select a single commit from one branch and
apply it onto another branch.

Imagine you have a bug fix on a feature branch that you also need on your main production
branch right now, but you're not ready to merge the entire feature branch. git cherry-pick allows
you to go into the feature branch, "pick" just that one bug fix commit, and apply it to your main
branch. It's a surgical way to move a specific change between branches.

17. What is a Git branching strategy? Compare GitFlow and Trunk-Based


Development.

A branching strategy is a set of rules and conventions that a team agrees on for how they will
create, name, and merge branches in Git.

●​ GitFlow is a very structured and complex strategy that is well-suited for projects with
scheduled, versioned releases (like traditional software). It uses multiple long-lived
branches, including main for stable releases and develop for integrating features. It also
uses temporary branches for features, releases, and hotfixes. This makes it robust but can
be slow and complex for modern web applications.
●​ Trunk-Based Development is a much simpler strategy where all developers work on a

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 8


single main branch, often called trunk or main. Any new work is done on very
short-lived feature branches that are merged back into the main trunk quickly, often
within a day. This model is essential for practicing true CI/CD because it ensures the main
branch is always up-to-date and ready to be released.

18. Explain the purpose of a .gitignore file.

A .gitignore file is a simple text file where you list files and directories that Git should
intentionally ignore. Any file or directory listed in it will not be tracked by Git, meaning it won't
be staged or committed to the repository.

Every project generates files that are specific to a local environment or are temporary build
artifacts (like log files, dependency folders such as node_modules, or compiled code).
Committing these files to a shared repository would cause unnecessary conflicts and bloat. The
.gitignore file prevents this. Most importantly, it's used to ignore files containing secrets and
credentials (like .env files). Committing secrets is a major security risk, and using .gitignore is
the first line of defense against it.

Infrastructure as Code (IaC) and Configuration


Management

19. What is "Infrastructure as Code" (IaC)?

Infrastructure as Code (IaC) is the practice of managing and provisioning your IT


infrastructure (servers, networks, databases, etc.) through code and definition files, rather than
through manual configuration in a web console.

Imagine building a house with a detailed architectural blueprint instead of just telling workers to
"build a wall here." The blueprint is your code. This approach makes your infrastructure
repeatable, consistent, and scalable. You can store your infrastructure code in Git, review
changes with your team, and automatically deploy it, just like you would with application code.

●​ Tools: Terraform (cloud-agnostic), AWS CloudFormation (AWS-specific), Pulumi.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 9


20. What is Configuration Management?

Configuration Management is a process for ensuring that your servers and software are
maintained in a desired, consistent state. It's a subset of IaC that focuses on what happens on a
server after it has been created.

For example, you might use an IaC tool like Terraform to create a new virtual server. Then, you
would use a Configuration Management tool like Ansible to log into that server and perform
tasks like installing a web server, configuring its settings, creating user accounts, and starting
services. It's about managing the software and settings on the machine itself.

●​ Tools: Ansible (agentless, uses YAML), Puppet (agent-based), Chef (agent-based).

21. What is the difference between imperative and declarative IaC tools?

This describes the two main approaches to writing automation code.

●​ Imperative (How): An imperative approach is like giving someone turn-by-turn


directions. You define the exact steps or commands needed to get to the desired state. For
example, "First, create a server. Second, check if a web server is installed. Third, if it's
not, install it." Ansible is primarily imperative.
●​ Declarative (What): A declarative approach is like telling a GPS your destination
address. You define the desired final state, and the tool is responsible for figuring out the
steps to get there. For example, "I want one server of this type with a web server
installed." Terraform and Kubernetes are declarative. The tool will check the current
state and automatically figure out if it needs to create, update, or delete resources to
match your declaration. The industry has largely shifted towards declarative tools for
infrastructure provisioning because they are more robust.

22. What is Terraform state? Why is it important?

Terraform state is a file (by default, [Link]) that acts as Terraform's brain. It's a JSON
file that keeps track of the infrastructure it manages. It's critically important because it maps the
resources defined in your code to the real-world resources that have been created in your

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 10


cloud provider.

Without the state file, Terraform would have no memory of what it created. For team
collaboration, the state file must be stored in a remote, shared location (like an AWS S3
bucket). This is called remote state. This remote location should also support state locking,
which is a crucial feature that prevents multiple people from running Terraform at the same time
and corrupting the state.

23. What is Ansible? Why is it popular?

Ansible is a configuration management and automation tool. It's extremely popular for two main
reasons:

1.​ Agentless: Unlike many other tools, Ansible does not require you to install any special
software (an "agent") on the servers you want to manage. It communicates with them
over standard, existing protocols like SSH for Linux and WinRM for Windows. This
makes it incredibly easy and fast to set up.
2.​ Simple (YAML): Ansible's automation workflows, called "playbooks," are written in
YAML, which is a simple, human-readable language. This lowers the barrier to entry,
allowing system administrators and developers who aren't expert programmers to write
powerful automation.

24. What is a "Golden Image"?

A "Golden Image" is a perfectly configured, standardized template for a virtual machine (VM)
or a container. It's like a master cookie cutter that you use to create all your new servers.

Instead of starting with a bare operating system and configuring it every time, you create a
golden image that already has the OS hardened, all necessary security patches applied, and
common tools like monitoring and logging agents pre-installed. This ensures that every new
server you create is consistent, secure, and ready to go much faster. This process should be
automated using a tool like HashiCorp's Packer, which allows you to define the image as code.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 11


Containers and Orchestration

25. What is a container? How is it different from a virtual machine (VM)?

A container is a lightweight, standalone package that contains an application and all its
dependencies. It's like a standardized shipping container for software. A Virtual Machine (VM),
on the other hand, is a complete emulation of a physical computer.

The key difference is in what they virtualize. A VM virtualizes the hardware, so each VM needs
its own full copy of an operating system. Containers virtualize the operating system itself, so
multiple containers can share the same host OS kernel. This makes containers much smaller
(megabytes vs. gigabytes for VMs), faster to start (seconds vs. minutes), and more efficient in
resource usage.

26. What is Docker?

Docker is the most popular platform for developing, shipping, and running applications inside
containers. It provides a simple set of tools and a standardized format that made containers easy
to use. The Dockerfile is a text file that acts as a recipe, containing all the instructions needed to
build a Docker image, which is the template for creating a container.

27. What is a container orchestration tool? Why do you need one?

A container orchestrator is a tool that automates the deployment, management, scaling, and
networking of containers. When you move from running a few containers on your laptop to
running hundreds or thousands of containers in production across multiple servers, you need an
orchestrator to manage the complexity.

You need one to handle critical tasks like:

●​ High Availability: Automatically restarting a container if it fails.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 12


●​ Scalability: Automatically adding or removing containers based on traffic.
●​ Load Balancing: Distributing network traffic across all your containers.
●​ Zero-Downtime Deployments: Performing rolling updates to your application.

28. What is Kubernetes (K8s)?

Kubernetes (often abbreviated as K8s) is the leading open-source container orchestration


platform. It was originally developed by Google and is now the industry standard for running
containerized applications at scale. It provides a powerful, declarative framework for managing
distributed systems resiliently.

29. Can you explain the main components of a Kubernetes cluster?

A Kubernetes cluster has two main parts:

●​ Control Plane (The Brain): This is the set of master nodes that manage the state of the
cluster. Its components include the kube-apiserver (the front door to the cluster), etcd (the
cluster's database), kube-scheduler (which decides which node a new application should
run on), and kube-controller-manager (which keeps everything running correctly).
●​ Worker Nodes (The Muscle): These are the machines where your actual applications
run. Each worker node has a kubelet (an agent that communicates with the control plane),
kube-proxy (which handles networking), and a Container Runtime (like Docker) to run
the containers.

30. What is a Pod in Kubernetes?

A Pod is the smallest and most basic deployable object in Kubernetes. It represents a single
instance of a running application. A Pod can contain one or more containers, but they are all
managed as a single unit. Containers within the same Pod are always located on the same worker
node and share the same network and storage resources.

31. What is the difference between a Deployment and a StatefulSet?

●​ A Deployment is the standard way to run a stateless application, like a web server or an
API. It manages a set of identical, interchangeable Pods (replicas) and can handle tasks
like rolling updates and scaling. If a Pod in a Deployment dies, Kubernetes simply
replaces it with a new, identical one.
●​ A StatefulSet is used to run stateful applications, like a database, where each instance is
unique and not interchangeable. It provides important guarantees like a stable, unique
network identity for each Pod and stable, persistent storage that is tied to that identity.

32. What is a Service in Kubernetes?

A Service is a Kubernetes object that provides a stable network endpoint (a single, consistent IP
address and DNS name) for a set of Pods. Pods in Kubernetes are ephemeral—they can be

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 13


created and destroyed, and their IP addresses change. A Service solves this problem by creating a
reliable way for other applications to connect to a group of Pods, and it also automatically
balances traffic across them.

33. What is an Ingress in Kubernetes?

An Ingress is an API object that manages external access to the services in a cluster, primarily
for HTTP and HTTPS traffic. While a Service provides internal access, an Ingress is what
exposes your application to the outside world.

It acts as a smart router or reverse proxy. It can handle tasks like routing traffic to different
services based on the requested URL (e.g., /api goes to the API service, / goes to the web service)
and managing SSL/TLS certificates. To work, an Ingress requires an Ingress Controller (like
Nginx or Traefik) to be running in the cluster.

34. What are Persistent Volumes (PV) and Persistent Volume Claims (PVC)?

These are the two objects that manage storage in Kubernetes.

●​ Persistent Volume (PV): This is a piece of storage in the cluster, like an AWS EBS
volume or a Google Persistent Disk. It's an administrator-level resource that represents
the available storage.
●​ Persistent Volume Claim (PVC): This is a request for storage made by a user or an
application. A Pod uses a PVC to request a specific size and type of storage. Kubernetes
then binds this PVC to an available PV. This system decouples the application's need for
storage from the underlying storage infrastructure.

35. What is Helm?

Helm is often called "the package manager for Kubernetes." It's a tool that simplifies the process
of installing and managing complex Kubernetes applications.

An application might require many different Kubernetes objects to run. Helm packages all of
these YAML definitions into a single, versioned unit called a Chart. This Chart can be easily
shared and installed with a single command. Helm also allows you to use templates, so you can
customize a Chart for different environments (like dev and prod) without duplicating all the files.

Monitoring, Logging, and Alerting

36. Why is monitoring crucial in DevOps?

Monitoring is the eyes and ears of your system. In a fast-paced DevOps environment where
changes are deployed frequently, you need immediate feedback on the health and performance

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 14


of your applications and infrastructure.

Effective monitoring allows you to:

●​ Proactively detect issues: Find and resolve problems before they impact your users.
●​ Understand deployment impact: See immediately if a new release has improved or
degraded performance.
●​ Make data-driven decisions: Use metrics to decide when to scale your infrastructure or
optimize your application.
●​ Ensure reliability: Track your performance against your SLOs and SLAs.

37. What is the difference between monitoring and logging?

●​ Logging is about recording discrete, timestamped events. A log is a detailed record of


something that happened at a specific point in time, like "User X failed to log in" or
"Error: Database connection timed out." Logs are essential for debugging and auditing.
●​ Monitoring is about collecting and analyzing metrics, which are numeric data points
measured over time. Metrics give you a high-level view of your system's health, like
"CPU usage is at 80%" or "Average application response time is 200ms." Monitoring is
used for alerting and trend analysis.

38. What are the "Three Pillars of Observability"?

Observability is a more advanced concept than monitoring. It's a measure of how well you can
understand a system's internal state just by observing its external outputs. The three pillars that
provide this understanding are:

●​ Metrics: Aggregated numeric data over time. They tell you what is happening (e.g., CPU
is high).
●​ Logs: Detailed, timestamped records of events. They tell you why it's happening (e.g., a
specific error message is flooding the logs).
●​ Traces: A detailed view of a single request as it flows through all the different services in
a distributed system. They tell you where the problem is happening (e.g., which specific
microservice is slow).

39. What are some popular monitoring tools (the "Prometheus Stack")?

The open-source stack built around Prometheus is the de facto standard for cloud-native
monitoring.

●​ Prometheus: A powerful time-series database and monitoring system. It works by


"pulling" or "scraping" metrics from instrumented applications and infrastructure at
regular intervals.
●​ Grafana: A leading visualization tool. It connects to Prometheus (and many other data
sources) to create rich, interactive dashboards for visualizing your metrics.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 15


●​ Alertmanager: The component that handles alerts sent by Prometheus. It can
deduplicate, group, and route alerts to the correct notification channels, like Slack,
PagerDuty, or email.

40. What is the ELK Stack?

The ELK Stack is a popular open-source solution for centralized logging.

●​ Elasticsearch: A powerful search and analytics engine based on Lucene. It's where all the
log data is stored and indexed.
●​ Logstash: A data processing pipeline that can ingest data from various sources, transform
or enrich it, and then send it to a destination like Elasticsearch.
●​ Kibana: A visualization tool that allows you to explore, search, and create dashboards for
the log data stored in Elasticsearch. (Note: Today, this is often called the Elastic Stack,
and a lightweight data shipper called Beats is often used to send data to Elasticsearch.

41. What are SLOs, SLAs, and SLIs?

These three acronyms are the foundation of SRE and are used to define and measure reliability.

●​ SLI (Service Level Indicator): An SLI is a quantitative measure of your service's


performance. It's a direct metric, like request latency or error rate.
●​ SLO (Service Level Objective): An SLO is the internal target you set for an SLI. For
example, "99.9% of requests will be served in under 300ms." This is the goal your team
works towards.
●​ SLA (Service Level Agreement): An SLA is a formal contract with your customers
that includes one or more SLOs and specifies the consequences (like financial penalties)
if you fail to meet them. This is an external-facing promise.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 16


In short: You measure SLIs to see if you're meeting your internal SLOs, which helps you stay in
compliance with your external SLAs.

DevSecOps and Cloud

42. What is DevSecOps?

DevSecOps is the philosophy of integrating security practices into every phase of the DevOps
lifecycle. The goal is to "shift left," which means moving security considerations to the earliest
stages of development, rather than treating security as a final gate at the end of the process. It's
about making security a shared responsibility of the entire team and automating security checks
throughout the CI/CD pipeline.

43. What are some security practices you would integrate into a CI/CD
pipeline?

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 17


●​ SAST (Static Application Security Testing): Automatically scanning the source code
for security vulnerabilities (like the OWASP Top 10) before it's even compiled.
●​ DAST (Dynamic Application Security Testing): Automatically testing the running
application for vulnerabilities after it's deployed to a staging environment.
●​ SCA (Software Composition Analysis): Automatically scanning for known
vulnerabilities in the open-source libraries and dependencies that your project uses.
●​ Container Scanning: Automatically scanning your Docker images for known
vulnerabilities in the OS packages and libraries they contain.
●​ Secret Management: Scanning the code to ensure no secrets (like API keys or
passwords) have been accidentally hardcoded.

44. What is the "Shared Responsibility Model" in the cloud?

This is a critical security concept when using a cloud provider like AWS, Azure, or GCP. It
defines which security tasks are handled by the cloud provider and which are handled by you, the
customer.

In general, the provider is responsible for the security of the cloud. This includes the physical
security of the data centers, the hardware, and the core networking and virtualization
infrastructure. The customer is responsible for security in the cloud. This includes configuring
your network security (like security groups), managing user access (IAM), encrypting your data,
and securing your applications.

45. Why might a company in India choose a multi-cloud strategy?

A multi-cloud strategy is chosen for several key reasons:

●​ Avoiding Vendor Lock-in: To prevent over-reliance on a single cloud provider and


maintain negotiating power.
●​ Leveraging Best-of-Breed Services: To use the best service for a specific job from
different clouds (e.g., Google's AI/ML services with AWS's data warehousing).
●​ Compliance and Data Sovereignty: To meet regulatory requirements, like those in
India's Digital Personal Data Protection (DPDP) Act, by hosting certain data in data
centers located within the country.
●​ Resilience and Disaster Recovery: To build a more resilient architecture that can
withstand a major outage in a single cloud provider's region.

46. How would you handle secrets in a Kubernetes cluster?

The most secure and recommended practice is to integrate Kubernetes with an external,
dedicated secret management system like HashiCorp Vault or AWS Secrets Manager.

While Kubernetes has a built-in Secret object, it only Base64 encodes the data by default, which
is not truly encrypted. An external vault provides strong encryption, fine-grained access control
policies, and detailed audit logs. You would use a tool like the Secrets Store CSI driver, which

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 18


allows a Pod to authenticate with the external vault and securely mount the secrets into the
container as a temporary file or environment variable at runtime. This ensures the secrets are
never stored in the Kubernetes database at all.

47. You have a critical security vulnerability in a library used by a production


service. What do you do?

This scenario requires a swift, systematic response known as a "hotfix." The process is:

1.​ Assess: Immediately assess the impact and exploitability of the vulnerability to determine
its urgency.
2.​ Patch: Work with the development team to update the vulnerable library to a patched,
secure version.
3.​ Test & Deploy: Push the fix through the full, automated CI/CD pipeline. It is critical not
to skip tests, as the new library version could introduce breaking changes.
4.​ Hotfix: Deploy the patched version to production immediately as a high-priority release.
5.​ Verify: Monitor the service closely after deployment to confirm that the fix is effective
and has not negatively impacted application performance.

Advanced and Situational Questions

48. A developer tells you "It works on my machine." How do you solve this for
good?

The permanent solution to this classic problem is containerization with Docker.

This phrase indicates an inconsistency between the developer's local environment and the testing
or production environment. This can be due to differences in OS versions, system libraries, or
language runtimes. Docker solves this by using a Dockerfile to define the exact environment for
the application. This file is used to build a Docker image, which is a self-contained, immutable
template. Because the container packages the application with its environment, you create a
consistent and portable unit that runs the exact same way everywhere.

49. How would you troubleshoot a CrashLoopBackOff error for a Pod in


Kubernetes?

A CrashLoopBackOff error means Kubernetes is trying to start a container, but it's repeatedly
crashing. To troubleshoot, I would follow these steps:

1.​ Check the Logs: The first and most important command is kubectl logs <pod-name>.
This will show the output from the container and will usually contain an error message
that tells you exactly why the application is failing.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 19


2.​ Describe the Pod: If the logs are empty or unhelpful, the next step is kubectl describe
pod <pod-name>. This provides more context, including an "Events" section that might
show issues like failed liveness probes or the container being killed because it exceeded
its memory limits. Common causes are application bugs, misconfigurations (like a
missing environment variable), or resource issues.

50. What is a reverse proxy?

A reverse proxy is a server that sits in front of one or more backend servers and forwards client
requests to them. From the client's perspective, it appears as if they are communicating directly
with the reverse proxy itself.

It plays several crucial roles in modern web architecture:

●​ Load Balancing: Distributing traffic across multiple backend servers.


●​ SSL Termination: Handling HTTPS encryption and decryption.
●​ Caching: Storing and serving static content to reduce load on backend servers.
●​ Security: Hiding the IP addresses of the backend servers. Nginx and HAProxy are very
popular reverse proxies.

51. What is GitOps?

GitOps is an operational framework that uses a Git repository as the single source of truth for
both declarative infrastructure and applications.

The core idea is that the desired state of your entire system is declared in a Git repository. Any
change to production must be made via a commit to this repository. An automated agent (like
Argo CD or Flux) runs inside your Kubernetes cluster, constantly comparing the live state of the
cluster to the desired state in Git. If it detects a difference, it automatically "pulls" the changes
and applies them to the cluster. This provides a perfect audit trail, eliminates configuration drift,
and makes rollbacks as simple as reverting a Git commit.

52. What is a sidecar container?

A sidecar container is an auxiliary container that runs alongside a main application container
within the same Kubernetes Pod. It's used to extend or enhance the functionality of the main
container without being part of the application's code.

Because containers in the same Pod share the same network and storage, the sidecar can interact
closely with the main container. This pattern allows you to separate concerns. Common use cases
include a logging agent sidecar (like Fluentd) that collects and forwards logs, a monitoring
sidecar that exports metrics, or a service mesh proxy (like Envoy) that intercepts all network
traffic.

53. How do you manage database schema changes in a CI/CD pipeline?

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 20


The best practice is to use dedicated database migration tools like Flyway or Liquibase. These
tools allow you to version-control your database schema changes in the same way you
version-control your application code.

You write each schema change (like CREATE TABLE) in a separate, version-numbered SQL
script, which is then committed to your Git repository. The migration tool maintains a special
history table in your database to track which scripts have already been applied. As part of your
CI/CD pipeline, you run the migration tool, which automatically applies only the new, unapplied
migration scripts in the correct order. This makes database changes repeatable,
version-controlled, and safe to automate.

54. What is a service mesh?

A service mesh is a dedicated and transparent infrastructure layer that handles service-to-service
communication within a microservices architecture. It provides a way to manage complex
communication challenges like traffic management, security, and observability without requiring
any changes to the application code.

It is typically implemented by injecting a lightweight network proxy (like Envoy) as a sidecar


container into every application Pod. All network traffic between services is routed through
these proxies. This creates a "mesh" of proxies that can be controlled from a central control plane
to provide features like sophisticated traffic routing, automatic mutual TLS (mTLS) encryption,
and detailed metrics and traces for all traffic. Istio and Linkerd are popular examples.

55. What is Chaos Engineering?

Chaos Engineering is the discipline of proactively and deliberately injecting failures into a
production system to build confidence in its ability to withstand turbulent and unexpected
conditions. It's about finding weaknesses before they cause a real outage.

The goal is not to break things randomly, but to run controlled experiments. You start with a
hypothesis about how your system will behave when a failure is injected (e.g., "If we shut down
one of our three API servers, latency will not increase"). You then carefully inject that failure in
production and measure the impact to see if your hypothesis was correct. This practice helps you
uncover hidden dependencies and build more resilient systems. Netflix's Chaos Monkey is the
most famous example.

56. How do you keep your skills updated in the fast-changing world of
DevOps?

Staying current in DevOps requires a proactive and continuous learning mindset. My approach is
multi-faceted:

●​ Community Engagement: I actively participate in online communities like Reddit's


r/devops and attend local meetups in cities like Bengaluru and Pune to learn from my

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 21


peers.
●​ Following Industry News: I subscribe to official blogs from cloud providers and key
tool creators.
●​ Hands-on Practice: The best way to learn a new tool is to use it. I maintain personal
projects on GitHub where I experiment with new technologies.
●​ Formal Learning: I pursue relevant certifications (like CKA or AWS DevOps
Professional) to get a structured learning path and validate my skills.

57. What is a "pull request" (or "merge request")?

A pull request (PR) or merge request (MR) is a feature of Git hosting platforms like GitHub and
GitLab. It's a formal way to propose changes to a repository. A developer creates a PR to notify
team members that they have completed a feature on a separate branch and it's ready to be
reviewed and merged into the main branch. It's a central place to discuss the changes, perform
code reviews, and run automated CI checks before integration.

58. What is the difference between git stash, git clean, and git reset?

●​ git stash: Temporarily saves your uncommitted local changes (both staged and unstaged)
so you can switch branches or work on something else. You can later re-apply these
changes with git stash pop.
●​ git clean: Removes untracked files from your working directory. This is a destructive
command and is useful for getting a completely clean working state.
●​ git reset: Unstages files and can be used to undo commits. git reset --soft moves the
HEAD pointer but keeps your changes. git reset --hard is destructive and discards all local
changes, resetting your branch to a specific commit.

59. What is a webhook in the context of CI/CD?

A webhook is an automated HTTP callback. In CI/CD, a Git repository (like GitHub) is


configured to send a webhook to a CI server (like Jenkins) whenever a specific event occurs,
such as a git push. This webhook is the trigger that automatically starts the CI/CD pipeline,
eliminating the need for manual intervention or polling.

60. What is "pipeline as code"?

Pipeline as code is the practice of defining your CI/CD pipeline's configuration in a text file that
is stored in the project's source code repository (e.g., a Jenkinsfile or .[Link]). This
provides several benefits: the pipeline is version-controlled, it can be reviewed and edited like
any other code, and it provides a single source of truth for the entire build and release process.

61. What is an artifact repository? Why is it important?

An artifact repository is a storage system designed to hold the binary outputs (artifacts) of your
build process. Examples include Docker images, Java JAR files, or Python packages. It's

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 22


important because it provides a centralized, versioned, and reliable location to store and retrieve
these artifacts. This decouples the build process from the deployment process and ensures that
you are deploying the exact same tested artifact to every environment.

●​ Tools: Artifactory, Nexus, Docker Hub, AWS ECR.

62. What is the difference between a Docker image and a Docker container?

●​ Image: A Docker image is a read-only, inert template that contains the application and all
its dependencies. It's the blueprint.
●​ Container: A Docker container is a runnable instance of an image. It's the live, running
application. You can have many containers running from the same image.

63. What are Docker volumes? Why are they used?

Docker volumes are the preferred mechanism for persisting data generated by and used by
Docker containers. They are managed by Docker and are stored in a dedicated part of the host
filesystem. Volumes are used because the container's own filesystem is ephemeral; if the
container is removed, its data is lost. Volumes provide a way to store data (like a database's files)
outside the container's lifecycle, allowing it to persist.

64. What is a Dockerfile ENTRYPOINT vs. CMD?

Both specify the command to be executed when a container starts, but they have different
purposes.

●​ CMD: Provides the default command and/or parameters for an executing container. These
can be easily overridden by supplying a command when you run the container (e.g.,
docker run <image> new-command).
●​ ENTRYPOINT: Configures a container that will run as an executable. It's harder to
override. The best practice is to use ENTRYPOINT to specify the main executable and
CMD to specify the default flags or arguments for that executable.

65. What is a multi-stage build in Docker?

A multi-stage build is a feature in a Dockerfile that allows you to use multiple FROM statements.
Each FROM instruction begins a new build stage. This is extremely useful for creating small,
secure production images. For example, you can use a first stage with a full SDK to build your
application, and then a second, minimal stage that only copies the compiled artifact from the first
stage, without including all the build tools and dependencies.

66. What is a Kubernetes Operator?

An Operator is a method of packaging, deploying, and managing a Kubernetes application. It's a


custom controller that uses the Kubernetes API to automate the lifecycle of a complex, stateful

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 23


application. For example, a database operator can automate tasks like creating a database cluster,
handling backups, and managing failovers, encoding human operational knowledge into
software.

67. What is a DaemonSet in Kubernetes?

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the
cluster, Pods are added to them. As nodes are removed, those Pods are garbage collected. This is
useful for deploying cluster-wide daemons like a log collector (e.g., Fluentd) or a monitoring
agent (e.g., Prometheus Node Exporter) on every node.

68. What are Network Policies in Kubernetes?

Network Policies are a Kubernetes resource that control the traffic flow between Pods. By
default, all Pods in a cluster can communicate with each other. Network Policies allow you to
define firewall rules to restrict that communication, creating a more secure, zero-trust network.
For example, you can create a policy that only allows frontend Pods to communicate with
backend Pods on a specific port.

69. What is a rolling update strategy in Kubernetes?

A rolling update is the default deployment strategy for Kubernetes Deployments. It ensures zero
downtime by slowly replacing Pods with the old version of an application with Pods with the
new version, one by one or in small batches. During the update, it ensures that a certain number
of Pods are always available to serve traffic.

70. What is a liveness probe vs. a readiness probe in Kubernetes?

●​ Liveness Probe: The kubelet uses liveness probes to know when to restart a container. If
the liveness probe fails, the kubelet kills the container, and the container is subject to its
restart policy. This is used to recover from deadlocks.
●​ Readiness Probe: The kubelet uses readiness probes to know when a container is ready
to start accepting traffic. If the readiness probe fails, the endpoints controller removes the
Pod's IP address from the endpoints of all Services that match the Pod. This is used to
ensure a Pod only receives traffic when it's ready.

71. What is "toil" in the context of SRE?

Toil is the kind of operational work that is manual, repetitive, automatable, tactical, devoid of
enduring value, and that scales linearly as a service grows. The goal of an SRE team is to reduce
and eliminate toil by automating these tasks, which frees up engineers to work on more valuable,
long-term engineering projects.

72. What is an "error budget"?

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 24


An error budget is a key SRE concept derived from an SLO. If your SLO is 99.9% availability,
your error budget is the remaining 0.1% of time where the service is allowed to be unavailable.
This budget is a data-driven way to balance reliability with the need to innovate and release new
features. As long as the team is within their error budget, they are free to deploy new releases. If
they exceed the budget, a deployment freeze is often enacted until reliability is restored.

73. What is a CDN? Why is it used?

A Content Delivery Network (CDN) is a geographically distributed network of proxy servers and
their data centers. It is used to provide high availability and performance by distributing the
service spatially relative to end-users. CDNs are commonly used to deliver static content like
images, CSS, and JavaScript files. By caching this content closer to the user, a CDN reduces
latency and offloads traffic from the origin servers.

●​ Examples: Cloudflare, Akamai, Amazon CloudFront.

74. What is DNS? Explain the role of an A record.

DNS (Domain Name System) is the phonebook of the Internet. It translates human-readable
domain names (like [Link]) into machine-readable IP addresses (like [Link]).
An A record is the most basic type of DNS record and is used to map a domain name to the IP
address (IPv4) of the computer hosting that domain.

75. What is the difference between a load balancer and a reverse proxy?

The terms are often used interchangeably, but there is a subtle difference. A reverse proxy
accepts a request from a client, forwards it to a server that can fulfill it, and returns the server's
response. A load balancer is a specific type of reverse proxy that distributes incoming traffic
among a group of backend servers to improve reliability and capacity. All load balancers are
reverse proxies, but not all reverse proxies are load balancers.

76. What is mutual TLS (mTLS)?

Mutual TLS is a method for mutual authentication. Unlike standard TLS, where only the client
verifies the server's identity, mTLS requires both the client and the server to present a certificate
and verify each other's identity before establishing a secure connection. This is a key security
feature in zero-trust networks and is often implemented by service meshes like Istio.

77. What is a bastion host (or jump server)?

A bastion host is a special-purpose computer on a network specifically designed and configured


to withstand attacks. It is a hardened server that provides access to a private network from an
external network, such as the Internet. It acts as a single, secure entry point that an administrator
can connect to via SSH, and from there, jump to other servers in the private network.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 25


78. What is a Web Application Firewall (WAF)?

A WAF is a specific type of firewall that applies a set of rules to an HTTP conversation. It sits in
front of a web application and is designed to protect it from common web-based attacks like SQL
injection, cross-site scripting (XSS), and file inclusion by filtering and monitoring the HTTP
traffic.

79. What is the principle of "least privilege"?

The principle of least privilege is a security concept that states that a user or process should only
be given the minimum levels of access – or permissions – needed to perform its job function. For
example, an application that only needs to read from a database should be given a read-only user
account, not an administrator account.

80. How would you handle a production outage?

I would follow the established incident response plan.

1.​ Acknowledge & Communicate: Immediately acknowledge the alert and inform the team
and stakeholders that an incident is in progress.
2.​ Triage & Mitigate: Form a response team and work to quickly identify the impact and
scope. The immediate priority is to restore service, even if it's a temporary fix or a
rollback.
3.​ Identify Root Cause: Once the service is stable, begin the investigation to find the
underlying root cause of the issue.
4.​ Resolve & Document: Implement a permanent fix for the root cause.
5.​ Post-Mortem: Conduct a blameless post-mortem to document the incident, the
resolution, and the action items to prevent it from happening again.

81. How would you reduce the build time of a CI pipeline?

●​ Optimize Docker builds: Use multi-stage builds and layer caching.


●​ Parallelize tests: Run different test suites in parallel jobs.
●​ Use caching: Cache dependencies (like npm packages or Maven artifacts) so they don't
need to be downloaded on every run.
●​ Use more powerful runners: Use CI runners with more CPU and memory.
●​ Split large jobs: Break down long-running jobs into smaller, faster ones.

82. What is a "pull-based" vs. a "push-based" deployment?

●​ Push-based: A central server (like Jenkins) actively "pushes" the new version of the
application to the target servers. This is the traditional model.
●​ Pull-based: An agent running on the target server "pulls" the desired configuration from
a central repository and applies it. This is the model used in GitOps, where an agent like

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 26


Argo CD pulls the state from a Git repository.

83. What is idempotency? Why is it important in automation?

Idempotency is the property of an operation that ensures it can be applied multiple times without
changing the result beyond the initial application. It's critical for automation because it makes
scripts safe to re-run. For example, an idempotent script to create a user would create the user if it
doesn't exist, and do nothing if it already exists, without throwing an error. Tools like Ansible are
designed to be idempotent.

84. What is a "feature flag" (or "feature toggle")?

A feature flag is a technique that allows you to turn certain functionality on and off during
runtime without deploying new code. It's like a set of if/else statements in your code that can be
controlled remotely. This allows you to deploy new features "dark" (turned off) to production,
and then turn them on for specific users (like internal testers or a small percentage of customers)
for testing before a full release. It decouples deployment from release.

85. What is a "Value Stream Map"?

A Value Stream Map is a lean management tool used to visualize the entire flow of work required
to deliver a product or service to a customer. In DevOps, it's used to map the entire software
delivery process, from idea to production. The goal is to identify bottlenecks, waste, and areas for
improvement in the delivery pipeline.

86. What is the difference between a monolith and a microservices


architecture?

●​ Monolith: An application built as a single, unified unit. All components are tightly
coupled and deployed together. It's simpler to develop initially but becomes harder to
scale and maintain.
●​ Microservices: An application built as a collection of small, independent services. Each
service is responsible for a specific business capability, is developed and deployed
independently, and communicates with other services over a network. This architecture is
more complex but offers better scalability, resilience, and team autonomy.

87. What is an API Gateway?

An API Gateway is a management tool that sits between a client and a collection of backend
services. In a microservices architecture, it acts as a single entry point for all clients. It can handle
tasks like request routing, authentication, rate limiting, and logging, which simplifies the client's
interaction with the backend and reduces the burden on individual microservices.

88. What is "observability-driven development"?

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 27


This is a practice where you build observability (metrics, logs, and traces) into your application
from the very beginning of the development process. Instead of adding monitoring as an
afterthought, developers are responsible for instrumenting their code so that its behavior in
production is transparent and easy to understand.

89. What is a "canary in the coal mine" in the context of a release?

This refers to the small group of users or servers that receive a new release in a Canary Release
strategy. Just like the canaries that were used to detect toxic gases in coal mines, this small group
serves as an early warning system. If they experience issues, the release is rolled back before it
affects the wider user base.

90. What is a "shift-left" strategy in security?

"Shift-left" is the core principle of DevSecOps. It means moving security practices to the left, or
earlier, in the software development lifecycle. Instead of performing a security audit at the end of
the process, security is integrated from the beginning, with automated scans in the CI pipeline,
threat modeling during design, and security-conscious coding practices.

91. What is a "service catalog"?

A service catalog is a centralized portal that provides a list of standardized IT services that can be
requested by developers or other users. In a DevOps context, this could be a self-service portal
where a developer can request a new CI/CD pipeline, a new Kubernetes namespace, or a new
cloud environment, all of which are provisioned automatically.

92. What is "cattle vs. pets" in infrastructure management?

This is an analogy used to describe the shift in mindset required for cloud computing.

●​ Pets: Servers that are treated as unique and indispensable, like a family pet. They are
given names, and when they get sick, you nurse them back to health. This is the
traditional data center model.
●​ Cattle: Servers that are treated as identical, disposable units in a herd. They are given
numbers, not names. When one gets sick, you don't try to fix it; you simply replace it with
a new, healthy one. This is the cloud-native, immutable infrastructure model.

93. What is "immutable infrastructure"?

Immutable infrastructure is a model where servers are never modified after they are deployed. If
you need to update an application or apply a patch, you don't log in and change the server.
Instead, you build a new server from a new image with the changes, deploy it, and then terminate
the old one. This approach leads to more consistent and reliable systems and prevents
configuration drift.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 28


94. What is a "side-effect" in the context of automation?

A side-effect is an unintended consequence of running a script or process. A good automation


script should have no side-effects; it should only perform the specific task it was designed for.
This is related to the principle of idempotency.

95. What is "technical debt"?

Technical debt is a concept in software development that reflects the implied cost of rework
caused by choosing an easy (limited) solution now instead of using a better approach that would
take longer. Just like financial debt, if technical debt is not repaid, it can accumulate "interest,"
making it harder and harder to implement changes in the future.

96. What is "YAML"? Why is it so common in DevOps?

YAML (YAML Ain't Markup Language) is a human-readable data serialization standard. It is


extremely common in DevOps tools (like Kubernetes, Ansible, and GitLab CI) because its
simple, indentation-based syntax is very easy for humans to read and write, making it ideal for
defining configuration files.

97. What is a "linter"?

A linter is a static code analysis tool used to check for programmatic and stylistic errors in source
code. In DevOps, linters are used to automatically check the syntax and quality of everything
from application code to Dockerfiles and Terraform configurations as part of the CI pipeline.

98. What is the role of a DevOps evangelist?

A DevOps evangelist is a leader who promotes and champions the adoption of DevOps culture
and practices within an organization. They work to break down silos, educate teams, and
demonstrate the value of collaboration, automation, and continuous improvement.

99. What is "platform engineering"?

Platform engineering is an emerging discipline in DevOps. A platform engineering team is


responsible for building and maintaining an "Internal Developer Platform" (IDP). This platform
provides developers with a set of self-service tools and automated workflows for building,
deploying, and running their applications, which reduces their cognitive load and accelerates the
delivery process.

100. What is your favorite DevOps tool and why?

This is a personal question, but a good answer shows your passion and experience.

●​ Example: "My favorite tool is currently Terraform. I appreciate its declarative approach

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 29


to infrastructure, which makes managing complex cloud environments much more
predictable and reliable. The strong community and the vast number of providers mean
you can manage almost any service with it. The workflow of plan and apply also provides
a great safety net, allowing you to review changes before they are made, which is crucial
for production systems."

101. Where do you see the future of DevOps heading?

The future of DevOps is heading towards greater abstraction and intelligence. We're seeing a rise
in Platform Engineering to provide developers with seamless self-service experiences. AI and
AIOps will play a much larger role in automating complex tasks, from analyzing pipeline data to
predicting and preventing production incidents. Finally, with the growth of technologies like
WebAssembly, we'll see even more focus on building secure, portable, and efficient applications
that can run anywhere, from the cloud to the edge.

102. What is "Infrastructure Drift"? How do you prevent it?

Infrastructure Drift is the term for when the configuration of your production environment
"drifts" or becomes different from the state defined in your Infrastructure as Code (IaC)
repository. This usually happens when someone makes a manual change to the live environment
(e.g., changing a security group rule in the AWS console) instead of updating the code.

This is dangerous because it makes your IaC an unreliable source of truth and can cause
automated deployments to fail or have unintended consequences. The best way to prevent it is
by:

1.​ Enforcing a GitOps Workflow: Mandate that all changes must be made through code
and a pull request.
2.​ Implementing Drift Detection: Use tools that can periodically scan your live
environment and compare it to the state defined in your code. Tools like Terraform can
show you the drift when you run a terraform plan, and other specialized tools can run
these checks automatically and alert you if drift is detected.

© 2025 [Link]. All Rights Reserved. ​ ​ ​ ​ ​ ​ ​ 30

You might also like