0% found this document useful (0 votes)
96 views32 pages

Nms 5th Unit

Uploaded by

jilikajithendar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
96 views32 pages

Nms 5th Unit

Uploaded by

jilikajithendar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 32

Unit 5

Zabix lab
Zabbix is an open-source monitoring software tool for diverse IT components, including
networks, servers, virtual machines, and cloud services. Zabbix provides monitoring for
metrics such as network utilization, CPU load, and disk space consumption.

Configure Zabbix integration for Grafana OnCall


This integration is available for Grafana Cloud OnCall. You must have an Admin role to
create integrations in Grafana OnCall.

1. In the Integrations tab, click + New integration to receive alerts.

2. Select Zabbix from the list of available integrations

3. Follow the instructions in the How to connect window to get your unique
integration URL and review next steps.

Configure the Zabbix server


1. Deploy a Zabbix playground if you don’t have one set up:

bash
docker run --name zabbix-appliance -t \
-p 10051:10051 \
-p 80:80 \
-d zabbix/zabbix-appliance:latest

2. Establish an ssh connection to a Zabbix server.

bash
docker exec -it zabbix-appliance bash

3. Place the grafana_oncall.sh script in the AlertScriptsPath directory specified


within the Zabbix server configuration file (zabbix_server.conf).

bash
grep AlertScriptsPath /etc/zabbix/zabbix_server.conf
Note: The script must be executable by the user running the
zabbix_server binary (usually “zabbix”) on the Zabbix server. For
example, chmod +x grafana_oncall.sh

bash
ls -lh /usr/lib/zabbix/alertscripts/grafana_oncall.sh
-rw-r--r-- 1 root root 1.5K Jun 6 07:52
/usr/lib/zabbix/alertscripts/grafana_oncall.sh

Configure Zabbix alerts


Within Zabbix web interface, do the following:

1. In a browser, open localhost:80.


2. Navigate to Adminitstration > Media Types > Create Media Type.
3. Create a Media Type with the following fields.
o Name: Grafana OnCall

o Type: script

o Script parameters:

 {ALERT.SENDTO}

 {ALERT.SUBJECT}

 {ALERT.MESSAGE}

Set the {ALERT.SEND_TO} value

To send alerts to Grafana OnCall, the {ALERT.SEND_TO} value must be set in the user
media configuration.

1. In the web UI, navigate to Administration > Users and open the user
properties form.
2. In the Media tab, click Add and copy the link from Grafana OnCall in the Send
to field.
3. Click Test in the last column to send a test alert to Grafana OnCall.
4. Specify Send to OnCall using the unique integration URL from the above step in
the testing window that opens.
Create a test message with a body and optional subject and click Test.

Grouping and auto-resolve of Zabbix notifications


Grafana OnCall provides grouping and auto-resolve of Zabbix notifications. Use the
following procedure to configure grouping and auto-resolve.
1. Provide a parameter as an identifier for group differentiation to Grafana OnCall.
2. Append that variable to the subject of the action as ONCALL_GROUP: ID, where ID is
any of the Zabbix macros. For example, {EVENT.ID}. The Grafana OnCall
script grafana_oncall.sh extracts this event and passes the alert_uid to Grafana
OnCall.
3. To enable auto-resolve within Grafana Oncall, the “Resolved” keyword is
required in the Default subject field in Recovered operations.

grafana_oncall.sh script

bash
#!/bin/bash
# This is the modification of original ericos's shell script.

# Get the url ($1), subject ($2), and message ($3)


url="$1"
subject="${2//$'\r\n'/'\n'}"
message="${3//$'\r\n'/'\n'}"

# Alert state depending on the subject indicating whether it is a trigger going


in to problem state or recovering
recoversub='^RECOVER(Y|ED)?$|^OK$|^Resolved.*'

if [[ "$subject" =~ $recoversub ]]; then


state='ok'
else
state='alerting'
fi

payload='{
"title": "'${subject}'",
"state": "'${state}'",
"message": "'${message}'"
}'

# Alert group identifier from the subject of action. Grouping will not work
without ONCALL_GROUP in the action subject
regex='ONCALL_GROUP: ([a-zA-Z0-9_\"]*)'
if [[ "$subject" =~ $regex ]]; then
alert_uid=${BASH_REMATCH[1]}
payload='{
"alert_uid": "'${alert_uid}'",
"title": "'${subject}'",
"state": "'${state}'",
"message": "'${message}'"
}'
fi

return=$(curl $url -d "${payload}" -H "Content-Type: application/json" -X POST)


What is Nagios?
Nagios is an open source IT system monitoring tool. It was designed to run on
the Linux operating system and can monitor devices running Linux, Windows and
Unix OSes.

Nagios software runs periodic checks on critical parameters of application, network


and server resources. For example, Nagios can monitor memory use, disk use and
microprocessor load, as well as the number of currently running processes and log
files. Nagios also can monitor services such as Simple Mail Transfer Protocol
(SMTP), Post Office Protocol 3, Hypertext Transfer Protocol (HTTP) and other
common network protocols. Nagios initiates active checks, while passive checks come
from external applications connected to the monitoring tool.

Originally released in 1999 as NetSaint, Nagios was developed by Ethan Galstad and
subsequently refined by numerous contributors as an open source project. Nagios
Enterprises, a company based around the Nagios Core technology, offers multiple
products, such as Nagios XI, Log Server, Network Analyzer and Fusion.

How Nagios works


Users can choose to work in the command-line interface or select a web-based
graphical user interface in some versions of Nagios and from third parties. Nagios'
dashboard provides an overview of the critical parameters monitored on assets.

Based on the parameters and thresholds defined, Nagios can send out alerts if critical
levels are reached. These notifications can be sent through email and text messages.
An authorization system enables administrators to restrict access.

Nagios runs both agent-based and agentless configurations. Independent agents are
installed on any hardware or software system to collect data that is then reported back
to the management server. Agentless monitoring uses existing protocols to emulate an
agent. Both approaches can monitor file system use, OS metrics, service and process
states. Examples of Nagios agents include Nagios Remote Data Processor (NRDP),
Nagios Cross Platform Agent and NSClient++.

Nagios plugins
Nagios can also run remote scripts and plugins using the Nagios Remote Plugin
Executor (NRPE) agent. NRPE enables remote monitoring of system metrics such as
system load, memory and disk use. It consists of the check_nrpe plugin, which is
stored on the local monitoring machine, and NRDP, which runs on the remote
machine. Nagios uses a plugin to consolidate data from the NRPE agent before it goes
to the management server for processing. NRPE can also communicate with Windows
agents to monitor Windows machines.

Nagios supports plugins that are stand-alone add-ons and extensions so users can
define targets and which target parameters to monitor. Nagios plugins process
command-line arguments and communicate commands with Nagios Core.

There are around 50 plugins developed and maintained by Nagios, while there are
over 3,000 from the community. These plugins are categorized into lists including
hardware, software, cloud, OSes, security, log files and network connections. As an
example, when used in conjunction with environmental-sensing systems, a Nagios
plugin can share data on environmental variables, such as temperature, humidity or
barometric pressure.

Nagios tools
Nagios has proven popular among small and large businesses, as well as internet
service providers, educational institutions, government agencies, healthcare
institutions, manufacturing companies and financial institutions.

Users can choose among free and paid options, depending on the needed services and
support.
Nagios Core

The service that was originally known as Nagios is now referred to as Nagios Core.
Core is freely available as an open source monitoring software for IT systems,
networks and infrastructure. Core contains a wide array of infrastructure monitoring
through allowing plugins to extend its monitoring capabilities. It is the base for paid
Nagios monitoring systems.

Nagios Core has an optional web interface, which displays network status,
notifications and log files. Core can notify its user when there are server or host
issues. Additionally, Core can monitor network services such as SMTP, HTTP
and Ping.

Nagios XI

Nagios XI is an extended interface of Nagios Core, intended as the enterprise-level


version of the monitoring tool. XI acts as monitoring software, configuration manager
and toolkit. While Nagios Core is free, XI must be purchased from Nagios
Enterprises. Atop the same features as Core, XI adds preconfigured virtual machines
(VMs), a web configuration user interface, performance graphing, a mobile
application, dashboards, scheduled reporting and technical support through email.

Nagios XI monitors IT infrastructure components such as applications, OSes,


networks and system metrics. Plugins are supported for these infrastructure
components to expand on XI's monitoring capabilities.

Nagios commercial extensions


Nagios Log Server is a log monitoring and management tool that enables an
organization to view, sort and configure logs from its IT infrastructure, including
Windows event logs. Log Server can analyze, collect and store logged data based on
custom and preassigned specifications. Administrators can set alerts to notify Log
Server users when there is a potential threat or malfunction on a monitored asset. For
example, an alert goes out to the Microsoft Exchange administrator when there are
three failed login attempts to Exchange Server, meaning there could be an
unwarranted person trying to guess the password to the system.

Nagios Network Analyzer tracks network traffic and bandwidth use. Network
Analyzer can resolve network outages, abnormalities and security threats. Features
include automated security alerts, customizable application monitoring, integration
with Nagios IX and a bandwidth use calculator.

Nagios Fusion is an aggregation service for Nagios Core and Nagios XI servers that
shows multiple systems in one view. Fusion condenses network management by
centralizing features and data from Nagios XI and Core in one location, creating a
granular view of a network infrastructure. With Fusion, administrators can specify
which XI and Core servers are displayed and manage which users are allowed to view
those servers. Additionally, Fusion users can log into any managed server and
use cached or live data to configure charts and other graphics to appear on
dashboards.

Nagios competitors
Nagios competitors include Zenoss, Zabbix, Microsoft System Center Operations
Manager (SCOM) and SolarWinds, among other open source and commercial
monitoring tools.

Zenoss

Zenoss is IT monitoring software for cloud, virtual and physical IT environments.


Zenoss monitors servers, networks, VMs, databases and other hardware and software
assets in an IT infrastructure. Similar to Nagios, Zenoss is available as an open source
version called Zenoss Core or more extensive paid, supported options including
Zenoss Service Dynamics and Zenoss as a Service. Service Dynamics is the on-site
version of the software, while Zenoss as a Service is a software-as-a-service option.
Similar to Nagios products, Zenoss products provide plugins, called ZenPacks, which
extend monitoring capabilities.
Zabbix

Zabbix is an open source monitoring tool for Linux, Unix and Windows OSes that
relies on agents to collect monitoring data. It can also use common protocols for
agentless operation. The technology monitors physical and cloud assets, VMs,
services and applications. Zabbix is evolving for cloud deployment, as well as on
premises.

Microsoft SCOM

Microsoft SCOM enables users to configure, manage and monitor devices and
applications via the same console. SCOM tracks server hardware, system services,
OSes, hypervisors and applications. SCOM, like Nagios, relies on agents or agentless-
based monitoring for its data collection, and supports plugins.

SolarWinds

SolarWinds' Server & Application Monitor software works with applications, servers
and databases. Server & Application Monitor includes performance monitoring,
server management, alerts and reporting through agentless monitoring. Server &
Application Monitor also supports other SolarWinds products.

Google cloud network


If your organization or customers haven’t yet migrated to the cloud, don’t think it
won’t happen, because it very likely will.

If you’re a network engineer, system admin, or anyone that is supporting


infrastructure in the cloud, it’s in in your best interest to understand how it differs
from your traditional on-premises data center environments – networking
included.

If you’ve worked with Amazon Web Services (AWS) or Microsoft Azure, you’ll
feel quite at home with GCP, which shares much of the same concepts and
capabilities.
However, there are a few key differences between each provider that you should
be aware of if migrating or mobilizing workloads between them.

GCP Components & Fundamentals include:


In this article, we’ll be going over some of the networking basics of GCP for 2023,
including:

 VPCs
 Projects
 Networks
 Regions
 Zones
 Subnets
 Switching
 Routing
 Firewalls
Essentially, we’ll be going over this diagram:
Google’s Global Cloud Network
Needless to say, Google is global, meaning as a customer of GCP you can be
global too. As of this writing, GCP has 11 regions, 33 zones and over 100 points
of presence throughout the globe.

Regions and Zones


When architecting your apps in GCP, it’s important to understand regions and
zones, as well as the resources that are regional or zonal.
A region is a specific geographical location that is sub-divided into zones. For
example, in the Americas, you have four regions, and each of these regions is
broken down into multiple zones:

Americas:

 Region: us-central1
 Zone: us-central1-a
 Zone: us-central1-b
 Zone: us-central1-c
 Zone: us-central1-f
 Region: us-west1
 Zone: us-west1-a
 Zone: us-west1-b
 Zone: us-west1-c
 Region: us-east4
 Zone: us-east4-a
 Zone: us-east4-b
 Zone: us-east4-c
 Region: us-east1
 Zone: us-east1-b
 Zone: us-east1-c
 Zone: us-east1-d
This diagram shows the (current) regions and zones in GCP:

While some of the core resources in GCP are global, others may be restricted by
region or zone. Regional resources can be used anywhere within the same
region, while zonal resources can be used anywhere within the same zone.
Some examples of this are:

Global Resources:

 Images
 Snapshots
 VPC Network
 Firewalls
 Routes
Regional Resources:

 Static external IP addresses


 Subnets
Zonal Resources:

 Instances (VMs)
 Persistent Disks
For example, I can attach a disk from one instance to another within the same
zone, but I cannot do this across zones. However, since images and snapshots
are Global Resources, I can use these across zones in the same region.

Why use Regions and Zones?


When building an application for high availability and fault tolerance, it’s crucial to
distribute your resources across multiple zones and regions.
Zones are independent of each other, with completely separate physical
infrastructure, networking, and isolated control planes that ensure typical failure
events only affect that zone. This is why you should have your application
distributed across zones, to handle the failure of any particular zone. The same
applies to regional issues.
Another design consideration is speed and latency. Zones have high-bandwidth,
low-latency connections to other zones in the same region. Moreover, if most
user traffic will be initiated from certain parts of the globe, it’s best to design for
regions and zones closest to that point of service.

Virtual Private Cloud (VPC)


Verbatim from Google’s documentation:

“A Virtual Private Cloud (VPC) is a global private isolated virtual network partition
that provides managed networking functionality for your Google Cloud Platform
(GCP) resources.”

You can think of a VPC as a virtual version of your traditional physical network.
VPCs are global, spanning all regions. The instances within the VPC have
internal IP addresses and can communicate privately with each other across the
globe. This logical representation of your network infrastructure abstracts much
of the complexities of dealing with on-premises architectures.
In this diagram, you can see the default VPC network spanning multiple regions
and zones, and subnets within various parts of the network servicing VMs. All of
these subnets can natively route to each other, and as long as the firewalls
permit it, VMs can reach one another within this VPC.

GCP Projects
Before going further, I just want to briefly mention projects within GCP. Before
you can do anything, you must create a project. A project is really an
organizational construct used for billing and permissions. Some organizations
use projects for various apps or various environments like Prod/Test/Dev; some
use it for departments like Finance/HR/Marketing etc.; some use it to provide
billing to customers based on their usage within a cloud-hosted environment. The
important part to understand is that it’s simply a way to organize resources from
a billing and permissions perspective, and each project has its own VPC
network(s) isolated from other projects in GCP.

VPC Networks and Subnets


When you create your first project in GCP, you’ll have the option to automatically
create your VPC network or custom build it. Assuming you choose auto mode –
a default network with a series of default subnets will be deployed ready for
immediate use. VM instances can be deployed on the default subnets without
any network configuration.
Don’t confuse “VPC network” with something related to IP addresses. Instead,
relate “VPC network” to that of a network topology or if you have a networking
background – VRF.

Each network has its own subnets, routes, firewall, internal DNS, and more
beyond the basics listed here.

GCP Project

 VPC Network
 Subnets
 Routes
 Firewall
 Internal DNS
Screenshot from GCP console showing default network and a default subnet in
each region:
Note in the screenshot that the VPC network called default is global, while each
of the subnets within it is regional. When you create an instance you’ll place it in
a subnet. Instances in this subnet can be in any zone within the region it’s
assigned to. Even though subnets are regional, instances can communicate with
other instances in the same VPC network using their private IP addresses. Of
course, you can isolate these subnets within the network if you wish using
firewall policies.

If you want complete isolation between various applications, customers, etc., you
could create multiple networks.
You can have up to five networks per project, including the default network.
Multiple networks within a single project can provide multi-tenancy, IP overlap, or
isolation within the project itself. Just another option instead of having multiple
projects.

IP Addresses
Each VM instance in GCP will have an internal IP address and typically an
external IP address. The internal IP address is used to communicate between
instances in the same VPC network, while the external IP address is used to
communicate with instances in other networks or the Internet. These IP
addresses are ephemeral by default but can be statically assigned.
Internal IPs are allocated to instances from the subnet’s IP range via DHCP. This
means the IPs are ephemeral and will be released if the instance is deleted.

External IPs are also assigned via DHCP from some Google-provided pool.
These IPs are mapped to the internal IPs of the VM instances for you. You can
reserve static IP addresses if needed.

Regarding Static External IP Addresses, they can be either global or regional


depending on your requirements. For example, global static IP addresses are
available for global forwarding rules used for global load balancing.

Routes
GCP is a global software-defined network with layers of controllers and
impressive routing capabilities that have been mostly abstracted for the end-user.
All networks have routes in order to communicate with each other. The default
network has a default route to the internet and individual routes to each subnet.
Here’s a screenshot from the console:

Routes are considered a “network resource” and cannot be shared between


projects or networks. Tables are used to tell which routes and rules apply to each
VM instance. Routes could apply to multiple instances or single instances
depending on the tags used in the route statement. If an instance tag is used, the
route applies to that instance, and if an instance tag is not used, then the route
applies to all instances in that network. Individual read-only route tables are
created for each VM instance based off of the parent route table.

The picture below from Google documentation helps visualize the concept of
these route tables. Even though there are no “routers” in the software-defined
network, you can still think of each VM instance as connected to some core
router, with all traffic passing through it based on the perspective of each node’s
individual route table.
Routing decisions apply to traffic egressing a VM. The most specific route in the
table will match. Traffic must match a route and firewall rule in order for it to pass.

You can add custom static routes or setup BGP for dynamic routing between
clouds or on-premises environments.

Firewalls
Each VPC network has its own distributed firewall, which allows or denies traffic
into and out of the network, and between VM instances. The firewall has an
explicit-deny policy, meaning that any traffic that needs to be permitted must
have a rule created. You cannot create “deny” rules, only “allow” rules.
If you have a concept in your mind that all this traffic is flowing through some
single firewall chokepoint device somewhere, you’re mistaken. GCP is a full
SDN, with firewall policies applied at the instance-level, no matter where it
resides. These checks are performed immediately without having to funnel traffic
through dedicated security appliances.
Firewall rules can match IP addresses or ranges, but can also match tags. Tags
are user-defined strings that help organize firewall policies for standards-based
policy approach. For example, you could have a tag called web-server, and have
a firewall policy that says any VM with the tag web-server should have ports
HTTP, HTTPS, and SSH opened.

Firewall rules are at the network resource level and are not shared between
projects are other networks.

DNS Resolution
Another great thing about GCP is the way it handles DNS. When a VM instance
is created, DNS entries are automatically created resolving to a formatted
hostname.

FQDN = <pre>[hostname].c.[project-id].internal</pre>

So, if I had an instance named “porcupine” in my project called “tree”, my DNS


FQDN would be:

porcupine.c.tree.internal

Resolution of this name is handled by an internal metadata server that acts as a


DNS resolver (169.254.169.254), provided as a part of Google Compute Engine
(GCE). This resolver will answer both internal queries and external DNS queries
using Google’s public DNS servers.
If an instance or service needs to be accessed publicly by FQDN, a public-facing
DNS record will need to exist pointing to the external IP address of the instance
or service. This can be done by publishing public DNS records. You have the
option of using some external DNS service outside of GCP or using Google
Cloud DNS.

Network Billing
I won’t go into too much detail here, but the gist is GCP bills clients for egress
traffic only. Egress traffic is considered as traffic to the Internet, or from one
region to another (in the same network), or between zones within a region.

You are not billed for ingress traffic. Ingress traffic includes VM-to-VM traffic in a
single zone (same region, network), and traffic to most GCP services.

Some notes on caveats/limitations

 VPC networks only support IPv4 unicast traffic (No IPv6, or broadcast/multicast)
 Maximum of 7000 VM instances per VPC network

Automation with terraform

For teams that use Terraform as a key part of a change management


and deployment pipeline, it can be desirable to orchestrate Terraform
runs in some sort of automation in order to ensure consistency
between runs, and provide other interesting features such as
integration with version control hooks.
Automation of Terraform can come in various forms, and to varying
degrees. Some teams continue to run Terraform locally but
use wrapper scripts to prepare a consistent working directory for
Terraform to run in, while other teams run Terraform entirely within
an orchestration tool such as Jenkins.
This tutorial covers some things that should be considered when
implementing such automation, both to ensure safe operation of
Terraform and to accommodate some current limitations in
Terraform's workflow that require careful attention in automation. It
assumes that Terraform will be running in an non-
interactive environment, where it is not possible to prompt for input
at the terminal. This is not necessarily true for wrapper scripts, but is
often true when running in orchestration tools.
This tutorial's goal is to give an overview of things to consider when
automating the standard Terraform workflows. The following
tutorials will guide you through implementing the concepts discussed
in this tutorial.
1. The Deploy Terraform infrastructure with CircleCI tutorial guides
you through automating the standard Terraform workflow using
AWS S3 as a backend. This approach uses
the hashicorp/terraform:light Docker image to run Terraform
locally in each CircleCI job.
2. The Automate Terraform with GitHub Actions tutorial guides you
through automating the standard Terraform Cloud workflow.
This approach leverages Terraform Cloud for remote runs and
state management. While Terraform Cloud offers version control
system integrations, including GitHub, this approach enables you
to add status checks before or after Terraform Cloud remote
runs are triggered, better adapting Terraform Cloud to your use
case.
Automated Terraform CLI Workflow
When running Terraform in automation, the focus is usually on the
core plan/apply cycle. The main path, then, is broadly the same as for
CLI usage:
1. Initialize the Terraform working directory.
2. Produce a plan for changing resources to match the current
configuration.
3. Have a human operator review that plan, to ensure it is
acceptable.
4. Apply the changes described by the plan.
Steps 1, 2 and 4 can be carried out using the familiar Terraform CLI
commands, with some additional options:
 terraform init -input=false to initialize the working directory.
 terraform plan -out=tfplan -input=false to create a plan and save
it to the local file tfplan.
 terraform apply -input=false tfplan to apply the plan stored in
the file tfplan.
The -input=false option indicates that Terraform should not attempt
to prompt for input, and instead expect all necessary values to be
provided by either configuration files or the command line. It may
therefore be necessary to use the -var and -var-file options
on terraform plan to specify any variable values that would
traditionally have been manually-entered under interactive usage.
It is strongly recommended to use a backend that supports remote
state, since that allows Terraform to automatically save the state in a
persistent location where it can be found and updated by subsequent
runs. Selecting a backend that supports state locking will additionally
provide safety against race conditions that can be caused by
concurrent Terraform runs.
Controlling Terraform Output in Automation
By default, some Terraform commands conclude by presenting a
description of a possible next step to the user, often including a
specific command to run next.
An automation tool will often abstract away the details of exactly
which commands are being run, causing these messages to be
confusing and un-actionable, and possibly harmful if they
inadvertently encourage a user to bypass the automation tool
entirely.
When the environment variable TF_IN_AUTOMATION is set to any
non-empty value, Terraform makes some minor adjustments to its
output to de-emphasize specific commands to run. The specific
changes made will vary over time, but generally-speaking Terraform
will consider this variable to indicate that there is some wrapping
application that will help the user with the next step.
To reduce complexity, this feature is implemented primarily for the
main workflow commands described above. Other ancillary
commands may still produce command line suggestions, regardless of
this setting.
Plan and Apply on different machines
When running in an orchestration tool, it can be difficult or impossible
to ensure that the plan and apply subcommands are run on the same
machine, in the same directory, with all of the same files present.
Running plan and apply on different machines requires some
additional steps to ensure correct behavior. A robust strategy is as
follows:
 After plan completes, archive the entire working directory,
including the .terraform subdirectory created during init, and
save it somewhere where it will be available to the apply step. A
common choice is as a "build artifact" within the chosen
orchestration tool.
 Before running apply, obtain the archive created in the previous
step and extract it at the same absolute path. This re-creates
everything that was present after plan, avoiding strange issues
where local files were created during the plan step.
Terraform currently makes some assumptions which must be
accommodated by such an automation setup:
 The saved plan file can contain absolute paths to child modules
and other data files referred to by configuration. Therefore it is
necessary to ensure that the archived configuration is extracted
at an identical absolute path. This is most commonly achieved by
running Terraform in some sort of isolation, such as a Docker
container, where the filesystem layout can be controlled.
 Terraform assumes that the plan will be applied on the same
operating system and CPU architecture as where it was created.
For example, this means that it is not possible to create a plan
on a Windows computer and then apply it on a Linux server.
 Terraform expects the provider plugins that were used to
produce a plan to be available and identical when the plan is
applied, to ensure that the plan is interpreted correctly. An error
will be produced if Terraform or any plugins are upgraded
between creating and applying a plan.
 Terraform can't automatically detect if the credentials used to
create a plan grant access to the same resources used to apply
that plan. If using different credentials for each (e.g. to generate
the plan using read-only credentials) it is important to ensure
that the two are consistent in which account on the
corresponding service they belong to.
The plan file contains a full copy of the configuration, the state that
the plan applies to, and any variables passed to terraform plan. If any
of these contain sensitive data then the archived working directory
containing the plan file should be protected accordingly. For provider
authentication credentials, it is recommended to use environment
variables instead where possible since these are not included in the
plan or persisted to disk by Terraform in any other way.
Interactive Approval of Plans
Another challenge with automating the Terraform workflow is the
desire for an interactive approval step between plan and apply. To
implement this robustly, it is important to ensure that either only one
plan can be outstanding at a time or that the two steps are connected
such that approving a plan passes along enough information to the
apply step to ensure that the correct plan is applied, as opposed to
some later plan that also exists.
Different orchestration tools address this in different ways, but
generally this is implemented via a build pipeline feature, where
different steps can be applied in sequence, with later steps having
access to data produced by earlier steps.
The recommended approach is to allow only one plan to be
outstanding at a time. When a plan is applied, any other existing plans
that were produced against the same state are invalidated, since they
must now be recomputed relative to the new state. By forcing plans
to be approved (or dismissed) in sequence, this can be avoided.
Auto-Approval of Plans
While manual review of plans is strongly recommended for
production use-cases, it is sometimes desirable to take a more
automatic approach when deploying in pre-production or
development situations.
Where manual approval is not required, a simpler sequence of
commands can be used:
 terraform init -input=false
 terraform apply -input=false -auto-approve
This variant of the apply command implicitly creates a new plan and
then immediately applies it. The -auto-approve option tells Terraform
not to require interactive approval of the plan before applying it.
When Terraform is empowered to make destructive changes to
infrastructure, manual review of plans is always recommended unless
downtime is tolerated in the event of unintended changes. Use
automatic approval only with non-critical infrastructure.
Testing Pull Requests with terraform plan
terraform plan can be used as a way to perform certain limited
verification of the validity of a Terraform configuration, without
affecting real infrastructure. Although the plan step updates the state
to match real resources, thus ensuring an accurate plan, the updated
state is not persisted, and so this command can safely be used to
produce "throwaway" plans that are created only to aid in code
review.
When implementing such a workflow, hooks can be used within the
code review tool in question (for example, Github Pull Requests) to
trigger an orchestration tool for each new commit under review.
Terraform can be run in this case as follows:
 terraform plan -input=false
As in the "main" workflow, it may be necessary to provide -var or -
var-file as appropriate. The -out option is not used in this scenario
because a plan produced for code review purposes will never be
applied. Instead, a new plan can be created and applied from the
primary version control branch once the change is merged.
Beware that passing sensitive/secret data to Terraform via variables
or via environment variables will make it possible for anyone who can
submit a PR to discover those values, so this flow must be used with
care on an open source project, or on any private project where some
or all contributors should not have direct access to credentials, etc.
Multi-environment Deployment
Automation of Terraform often goes hand-in-hand with creating the
same configuration multiple times to produce parallel environments
for use-cases such as pre-release testing or multi-tenant
infrastructure. Automation in such a situation can help ensure that
the correct settings are used for each environment, and that the
working directory is properly configured before each operation.
The two most interesting commands for multi-environment
orchestration are terraform init and terraform workspace. The former
can be used with additional options to tailor the backend
configuration for any differences between environments, while the
latter can be used to safely switch between multiple states for the
same config stored in a single backend.
Where possible, it's recommended to use a single backend
configuration for all environments and use the terraform
workspace command to switch between workspaces:
 terraform init -input=false
 terraform workspace select QA
In this usage model, a fixed naming scheme is used within the
backend storage to allow multiple states to exist without any further
configuration.
Alternatively, the automation tool can set the environment
variable TF_WORKSPACE to an existing workspace name, which
overrides any selection made with the terraform workspace
select command. Using this environment variable is recommended
only for non-interactive usage, since in a local shell environment it can
be easy to forget the variable is set and apply changes to the wrong
state.
In some more complex situations it is impossible to share the
same backend configuration across environments. For example, the
environments may exist in entirely separate accounts within the
target service, and thus need to use different credentials or endpoints
for the backend itself. In such situations, backend configuration
settings can be overridden via the -backend-config option to terraform
init.
Pre-installed Plugins
In default usage, terraform init downloads and installs the plugins for
any providers used in the configuration automatically, placing them in
a subdirectory of the .terraform directory. This affords a simpler
workflow for straightforward cases, and allows each configuration to
potentially use different versions of plugins.
In automation environments, it can be desirable to disable this
behavior and instead provide a fixed set of plugins already installed
on the system where Terraform is running. This then avoids the
overhead of re-downloading the plugins on each execution, and
allows the system administrator to control which plugins are
available.
To use this mechanism, create a directory somewhere on the system
where Terraform will run and place into it the plugin executable files.
The plugin release archives are available for download
on releases.hashicorp.com. Be sure to download the appropriate
archive for the target operating system and architecture.
After extracting the necessary plugins, the contents of the new plugin
directory will look something like this:
$ ls -lah /usr/lib/custom-terraform-plugins
-rwxrwxr-x 1 user user 84M Jun 13 15:13 terraform-provider-aws-
v1.0.0-x3
-rwxrwxr-x 1 user user 84M Jun 13 15:15 terraform-provider-rundeck-
v2.3.0-x3
-rwxrwxr-x 1 user user 84M Jun 13 15:15 terraform-provider-mysql-
v1.2.0-x3
Copy
The version information at the end of the filenames is important so
that Terraform can infer the version number of each plugin. Multiple
versions of the same provider plugin can be installed, and Terraform
will use the newest one that matches the provider version
constraints in the Terraform configuration.
With this directory populated, the usual auto-download and plugin
discovery behavior can be bypassed using the -plugin-dir option
to terraform init:
 terraform init -input=false -plugin-dir=/usr/lib/custom-
terraform-plugins
When this option is used, only the plugins in the given directory are
available for use. This gives the system administrator a high level of
control over the execution environment, but on the other hand it
prevents use of newer plugin versions that have not yet been installed
into the local plugin directory. Which approach is more appropriate
will depend on unique constraints within each organization.
Plugins can also be provided along with the configuration by creating
a terraform.d/plugins/OS_ARCH directory, which will be searched
before automatically downloading additional plugins. The -get-
plugins=false flag can be used to prevent Terraform from
automatically downloading additional plugins.
Terraform Cloud
As an alternative to home-grown automation solutions, HashiCorp
offers Terraform Cloud, which adds additional features to the core
Terraform CLI functionality, including direct integration with version
control systems, plan and apply lifecycle orchestration, remote state
storage, an ephemeral environment for Terraform operations, role-
based access control, and a user interface for reviewing and approving
plans.
Internally, Terraform Cloud runs the same Terraform CLI commands as
Terraform Community Edition, using the same release binaries. Most
of the considerations in this guide apply to infrastructure provisioning
pipelines that use Terraform Community Edition with a backend for
remote state storage. However, you can also use Terraform Cloud
within your automated build pipelines.
In local execution mode, Terraform operations occur in your CI
environment, and Terraform Cloud stores the state remotely. In that
case, automating Terraform Cloud in your pipeline requires the same
considerations as using Terraform Community Edition with a remote
state backend.
However, when using remote execution with CLI-driven runs,
Terraform operations take place in Terraform Cloud instead of on
within your pipeline. You cannot apply a saved plan to Terraform
Cloud remote operations. You can review probable infrastructure
changes by triggering a speculative plan, but speculative plans are not
applyable. When you trigger a new apply run to implement those
changes, if your infrastructure configuration has drifted since the
speculative plan, Terraform may apply changes that you never have
the chance to manually review or approve. Protect against drift by
making sure that no one can change your infrastructure outside of
your automated build pipeline, and be sure to approve any runs
promptly so the configuration is not stale.
To configure the CLI to use Terraform Cloud for either local or remote
operations, your configuration must include a cloud block that
establishes a Terraform Cloud integration. As of Terraform 1.2, you
can configure the cloud block using environment variables. This allows
for partial configuration as shown below, and lets you set your
Terraform Cloud token as an environment variable rather than writing
it to a credentials file.
main.tf
terraform {
cloud {}
required_providers {
##...
}
required_version = ">= 1.2.0"
}
Terraform accesses the following environment variables to configure
the configuration's cloud block:
 Use TF_CLOUD_ORGANIZATION for the organization name
 Use TF_CLOUD_HOSTNAME for the hostname if using Terraform
Enterprise
 Use TF_WORKSPACE for the workspace to operate in
 Use TF_TOKEN_<hostname> for the Terraform Cloud token to
use to authenticate operations. Name the
variable TF_TOKEN_app_terraform_io for Terraform Cloud, or
update the hostname with your Terraform Enterprise endpoint,
replacing any periods with underscores.
It will always be possible to run Terraform via in-house automation,
to allow for usage in situations where Terraform Cloud is not
appropriate. Terraform Cloud is an alternative to in-house solutions,
since it provides an out-of-the-box solution that already incorporates
the best practices described in this guide and can thus reduce time
spent developing and maintaining an in-house alternative.

You might also like