Observability and Monitoring
Context:
In a cloud native microservices environment, it’s very tough and tedious to
monitor hundreds and thousands thousands of microservices and their
scaled containers.
So we need observability and monitoring patterns to manage these heavy
microservices traffic and their inter-communications. some of these
important patterns in detail:
Distributed tracing
Log aggregation
Application metrics
Problem:
1. DEBUG A PROBLEM IN MICROSERVICES COMMUNICATION:
How do we trace one or more transactions across multiple services,
physical machines, and different data stores, and try to find where exactly
the problem or bug is
2. AGGREGATE ALL MICROSERVICES LOGS:
How do we combine all the logs from multiple services into a central
location where they can be indexed, searched, filtered, and grouped to
find bugs that are contributing to a problem.
3. MONITOR CHAIN OF MICROSERVICE CALS:
How do we understand for a specific chain of service call the path it
travelled inside our microservices network, time it took at each micro
service
Solution: Observability and Monitoring Pattern
Distributed Tracing
When multiple microservices interact with each other for various business
use cases, there may be a possibility of failure of services during this inter-
communication, which can break the business flow and make it complex and
tedious to identify and debug the issue.
Observability and Monitoring
Logging is just not enough; it's a very tedious process to read logs and
identify issues when we have thousands of lines of logs of multiple
microservices on different containers in a multi-cloud environment.
In some cases, the same microservice is deployed on multiple clusters and
data centers. The following diagram has two microservices such as
Customer and Order which create and persist tracing logs and push to the
centralized tracing service:
We need a mechanism to manage, monitor, and debug the production
issues. So, Distributed Tracing pattern is based on the REST API tracing to
quickly track the issue of any culprit and buggy services.
In this pattern, client request and response logs for the microservices
REST API are recorded and monitored asynchronously.
This tracing can be done by various techniques, common correlation ID is
one the popular techniques where end-to-end API calls from the client/UI to
backend services are done by a unique correlation ID.
Every external client request assigns a unique request/correlation ID
which can be passed to all subsequent services. In this way, tracking can be
done easily by persisting all request-response payload.
Advantages:
Customize and visualize a centralized web UI dashboard to track and
observe all microservices, environment and databases, and so on.
It’s also does profiling of the application and checks performance.
It filters requests using many business logic and data points.
Observability and Monitoring
It shows all tracing data from different clusters and multi-cloud on a
single dashboard. It’s easy to manage and monitor from a single pane of
the glass.
Customizes query on these API tracing logs by the dashboard and APIs.
It also tracks the request and response time and peak usage duration.
These traces are being collected automatically without any code
changes.
Use cases
The use cases are as follows:
Microservices tracing for debugging production issues.
Application performance monitoring using APM tools.
Log tracing for apps, databases, message broker, Kubernetes container
solutions, and other infrastructure systems.
Implementation:
Spring Cloud Sleuth(https://spring.io/projects/spring-cloud-sleuth) :
• Spring Cloud Sleuth provides Spring Boot auto-configuration for
distributed tracing.
• It adds trace and span ids to all the logs,so we can just extract from a
given trace or span in a log aggregator.
• It does this by adding the filters and interacting with other Spring
components to let the correlation IDs being generated pass through to all
the system calls.
Spring Cloud Sleuth will add three pieces of information to all the logs
written by a microservice I.e. [<App Name>,<Trace ID>, <Span ID>]
1. Application name of the service: This is going to be the application
name where the log entry is being made. Spring Cloud Sleuth get this name
from the ‘spring.application.name’ property.
2. Trace ID: Trace ID is the equivalent term for correlation ID. It’s a unique
number that represents an entire transaction.
Observability and Monitoring
Span ID: A span ID is a unique ID that represents part of the overall
transaction. Each service participating within the transaction will have its
own span ID. Span IDs are particularly relevant when we integrate with
Zipkin to visualize our transactions.
Step-1:
Add slueth dependency in pom.xml file of category service, product-brands service as shown below.
Step-2: add logger statements in category and brands service as shown below, just to know the logs
when inter-microservices communication is happening.
Observability and Monitoring
Step-3: state the product service and category service, in console log we can see the below statements
Step-4: invoke the get product categories endpoint which also, will invoke category service get all
categories end point internally using feign client. Then in category service we can see the same trace id
as that of the one in brands service.
Zipkin(https://zipkin.io/):
• Zipkin is a is an open-source data-visualization tool that can helps
aggregating all the logs and gather timing data needed to trouble shoot
latency problems in microservices architectures.
• It allows us to break a transaction down into its component pieces and
visually identify where there might be performance hot spots. Thus
reducing time in triaging by contextualizing errors and delays.
Zipkin Architecture Overview
Observability and Monitoring
In Order to get started with zipkin, we no need to build a seperate microservice in side our application.
We can always have setup zipkin in a seperate server with the help of docker image or with its installer.
Step-1: download the self executable zipkin jar file as shown below:
Step-2: run the jar file above downloaded as shown below:
Observability and Monitoring
Step-3: access the zipkin dashboard in any of the browser.
Step-4: add the zipkin dependency in both brands and categories service as shown below:
Step-5: to collect the logs by the zipkin , add the below properties in application.properties file of the
brands and categories service as shown below:
Step-6: restart the brands and category service then we can see the services registered in zipkin server
as shown below:
Observability and Monitoring
Step-7: now execute the brands get all categories as shown below:
Step-8: then we can see the complete trace of the request from brands service to category service as
shown below:
Observability and Monitoring
Log aggregation
This is a pattern to aggregate logs at a centralized location. It’s a technique to collect logs from various
microservices and other apps and persist at another location for query and visualization on log
monitoring UI dashboards.
To complete business use cases, multiple microservices interact with each other on different containers
and servers. During this interaction, they also write thousands of lines of log messages on different
containers and servers.
It's difficult to analyse humongous end-to-end logs of business use cases on different servers for
developers and DevOps operators. Sometimes, it takes a couple of tedious days and nights to analyze
logs and identify production issues, which may cause loss of revenue and customer’s trust, which is
MOST important.
Log aggregation and analysis are very important for any organization. The following diagram has two
microservices, which writes logs locally/externally and finally all logs aggregated and forwarded to the
Centralized Log Aggregation service:
Observability and Monitoring
There should be a mechanism to aggregate end-to-end logs for given use cases sequentially for faster
analysis and debugging of logs. There are a lot of open source (OSS) and enterprise tools which
aggregate these logs from many sources and persist at the centralized location asynchronously at the
external location which is dedicated for central logging and analysis. It can be on the same cluster or on
the cloud.
It’s important to write logs asynchronously to improve application performance of the application
because in this scenario, the actual API/application response time won’t be impacted.
There are multiple log aggregation solutions like ELF, EFK, Splunk, APM, and some other enterprise APM
tools.
Structured logging can be done using various tools, which filter and format the unstructured log to the
required format with various log filter tools. Fluentd/Fluent Bit and Logstash open-source tools are
useful for making data structured.
Advantages
Easy to debug and identify issues with a unique correlation ID quickly.
Stores structured and organized logs for complex queries for a given filter condition.
External centralized log’s location to avoid overhead of extra storage and compute.
Central logging and analyzing asynchronously. It makes application faster to log async logging.
Use cases
Good integration with distributed microservices.
Async logging on an external server using the file system or messaging topic.
Log analytics, dashboarding, visualization graphs, and so on.
Implementation
ELK Local Setup
Step-1: Setup Elasticsearch
Download latest version of Elasticsearch from this download page and unzip it any folder.
Run bin\elasticsearch.bat from command prompt.
By default, it would start at http://localhost:9200
Observability and Monitoring
Step-2: Setup Kibana
Download the latest distribution from download page and unzip into any folder.
Open config/kibana.yml in an editor and set elasticsearch.url to point at your
Elasticsearch instance. In our case as we will use the local instance just uncomment
elasticsearch.url: "http://localhost:9200"
Run bin\kibana.bat from command prompt.
Once started successfully, Kibana will start on default port 5601 and Kibana UI will be available at
http://localhost:5601
Step-3: Setup Logstash
Download the latest distribution from download page and unzip into any folder.
Create one file logstash.conf with following content in bin director
Now run bin/logstash -f logstash.conf to start logstash
Input
tcp {
port => 5000 codec => "json"
output {
elasticsearch {
hosts=> ["localhost"]
index => "shopifyme-%{serviceName}"
}
Observability and Monitoring
Step-4: Emitting logs from our application to logstash
1. Configure the below dependencies in brands and categories service
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>4.9</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.2.3</version>
</dependency>
2. Configure logback.xml with the LogstashTcpSocketAppender. configured and destination of logstash
input.
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread, %X{X-B3-TraceId:-},%X{X-B3-SpanId:-}]
%-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<appender name="STASH"
class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<destination>localhost:5000</destination>
<encoder
class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<mdc /> <!-- MDC variables on the Thread will be written as JSON fields
-->
<context /> <!--Outputs entries from logback's context -->
<version /> <!-- Logstash json format version, the @version field in the
output -->
<logLevel />
<loggerName />
Observability and Monitoring
<pattern>
<pattern>
{
"serviceName": "product-brands-service"
}
</pattern>
</pattern>
<threadName />
<message />
<logstashMarkers />
<stackTrace />
</providers>
</encoder>
</appender>
<root level="info">
<appender-ref ref="STDOUT" />
<appender-ref ref="STASH" />
</root>
</configuration>
Application metrics
Application metrics patterns will help us to check application metrics, their performance, audit logs, and
so on. It’s a measure of microservices applications/REST APIs characteristics which are quantifiable or
countable.
It helps to check the performance of the REST API like how many requests an API is handling per second
and what’s the response time.
It helps to scale the application and provide faster applications to web/mobile clients. It also checks
Transaction Per Second (TPS) and other metrics of applications.
There are many tools available to check matrices of applications like Spring Boot Micrometer,
Prometheus, and APM tools like Wavefront, Dynatrace, Datadog, and so on.
They work either with push or pull models using REST APIs. For example, Grafana pulls metrics of the
applications by using the integrated Prometheus REST API handler and visualizes on the Grafana
dashboard.
Advantages
Helps in scaling hardware resources for future.
Identifies the performance of REST APIs.
Identifies Return of Investment (ROI).
Monitors and analyzes application behaviors during peak and odd hours.
Limitations
It needs extra hardware like compute, memory, and storage. It slows down the performance of
Observability and Monitoring
applications because it runs with applications and also consumes the same memory and computes with
the speed of a server.
ELK Setup on AWS
Step-1: Create a machine of type t2.large with the Ubuntu Platform
Step-2: update the machine as shown below:
sudo apt-get update
Step-3: Install JAVA on Machine:
sudo apt-get install default-jre -y
Elastic Search Install and Setup
Step-1: update Elasticsearch Repository in ubuntu repositories list
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a
/etc/apt/sources.list.d/elastic-7.x.list
echo "deb https://artifacts.elastic.co/packages/oss-7.x/apt stable main" | sudo tee -a
/etc/apt/sources.list.d/elastic-7.x.list
Step-2: Install the Elastic Search
sudo apt-get update
sudo apt-get install elasticsearch -y
Observability and Monitoring
Step-3: Update ES Network Config:
sudo vim /etc/elasticsearch/elasticsearch.yml
network.host: "localhost"
http.port:9200
Step-4: Start Elastic Search As a Service
sudo systemctl start elasticsearch
Sudo systemctl enable elasticsearch
Sudo service elasticsearchstatus
Observability and Monitoring
Step-5: Verify the elastic search service status
sudo curl http://localhost:9200
Logstash Installation
Step-1: update the machine
Step-2: Install the Logstash
apt-get install logstash
Observability and Monitoring
Step-3: Enable Logstash as a service
systemctl start logstash
systemctl enable logstash
service logstash status
Kibana Installation
Step-1: update the machine
Step-2: Install Kibana
sudo apt-get install kibana
Step-3: Enable kibana as a service
Observability and Monitoring
systemctl start kibana
systemctl enable kibana
service kibana status
Step-4: Update Kibana Network settings to let it know where the ES is.
vim /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
Step-5: restart kibana
Install MetricBeat
Step-1: update the machine
Observability and Monitoring
Step-2: Install MetricBeat
sudo apt-get install metricbeat
Step-3: Enable MetricBeat as a service
systemctl start metricbeat
systemctl enable metricbeat
service metricbeat status
Logstash-ElasticSearch Integration
Observability and Monitoring
Step-1: Create a file called apache-01.conf in /etc/logstash/conf.d/ directory
Step-2: update the file with the below content
input {
file {
path => "/home/ubuntu/apache-daily-access.log"
start_position => "beginning"
sincedb_path => "/dev/null"
tcp {
port => 5000
codec => "json"
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
geoip {
source => "0.0.0.0"
output {
elasticsearch {
hosts => ["localhost:9200"]
Observability and Monitoring
index => "app-services-%{serviceName}"
Step-3: restart logstash