All About Splunk
All About Splunk
www.linkedin.com/in/farhathnathvi
Table of Contents
What Is Splunk Used For? (2024 )
What Is Splunk?
SPL Syntax
Index Statistics
Reload apps
Debug Traces
Configuration
Capacity Planning
www.linkedin.com/in/farhathnathvi
What Is Splunk?
In today's data-driven cyber landscape, organizations across the globe are faced with an ever-
increasing volume of data from various assets and network infrastructure. To harness the
power of this data and enable cyber resilience, they need tools and technologies that can help
them collect, analyze, and visualize the logs and events effectively to detect and prevent cyber
security threats.
Splunk is a powerful SIEM (Security Information and Event Management) tool that is widely used
to solve this purpose. It offers a comprehensive platform for collecting, analyzing, and
visualizing machine-generated data to gain valuable insights and detect potential security
threats.
Though Splunk is usually considered a SIEM tool, it has been recently rebranded as a Unified
Security and Observability Platform, and currently, Splunk is offered as Splunk Cloud, Splunk
Enterprise, and Splunk Observability Cloud platforms.
So, what is Splunk used for? Splunk is designed to ingest and index large volumes of data from
various sources, including logs, sensors, devices, applications, and systems. It provides real-
time monitoring, analysis, security, and observability capabilities, allowing organizations to
identify and respond to security incidents proactively.
www.linkedin.com/in/farhathnathvi
One of the key features of Splunk is its ability to correlate and aggregate data from different
sources like servers, firewalls, load balancers, network devices, etc., enabling security analysts
to investigate and identify patterns, anomalies, and potential threats. Its advanced search and
query functionalities allow users to perform complex searches and create custom reports and
dashboards.
Splunk also offers a wide range of security-specific applications and add-ons that provide
additional functionality and help automate various security tasks. These include threat
intelligence, incident response, compliance monitoring, observability, and user behavior
analytics, among others.
By analyzing and visualizing data in real-time, Splunk helps organizations improve their
security posture by identifying and mitigating vulnerabilities, detecting and responding to
security incidents, and ensuring compliance with industry regulations and best practices.
In addition to its security applications, Splunk is also widely used for other purposes, such as IT
operations monitoring, application performance monitoring, business analytics, and log
management. Its versatility and scalability make it a popular choice for organizations of all sizes
and across various industries.
www.linkedin.com/in/farhathnathvi
How Does Splunk Work?
Splunk's architecture consists of various components that work together to enable data
ingestion, indexing, searching, and visualization. Here is a typical Splunk architecture diagram
and the corresponding key components of Splunk architecture:
www.linkedin.com/in/farhathnathvi
1. Forwarders:
Universal Forwarder: A lightweight component installed on data sources to collect and forward
data to the Splunk indexer. It has minimal resource requirements and is suitable for high-
volume data sources.
Heavy Forwarder: A more feature-rich version of the universal forwarder that allows data
preprocessing before indexing. It is suitable for environments requiring additional data
manipulation.
4. Indexer:
Indexer Cluster: Multiple indexers can be configured in a cluster to ensure high availability and
fault tolerance. Indexers receive data from forwarders, index it, and make it searchable.
5. Search Head:
Search Head Cluster: The search head is responsible for handling search requests and
presenting the results. A cluster of search heads can be configured for load balancing and
redundancy.
Search Head Pooling: Distributes search requests across a pool of search heads, optimizing
performance and providing fault tolerance.
6. Deployment Modules:
Deployment Server: Manages configurations for forwarders, ensuring consistency across the
environment. It simplifies the process of deploying and managing Splunk components.
Deployment Manager: Facilitates the management of configurations across multiple Splunk
instances. It ensures consistency and simplifies the deployment process.
www.linkedin.com/in/farhathnathvi
7. License Master:
Manages licenses for all Splunk components in the environment. It ensures that the usage
complies with licensing agreements.
8. Monitoring Console:
Provides a centralized interface for monitoring the health and performance of the Splunk
deployment. It helps administrators track the status of components and troubleshoot issues.
9. Data Inputs:
Various mechanisms for ingesting data into Splunk, including file monitoring, scripted inputs,
scripted modular inputs, and various protocol-based inputs.
www.linkedin.com/in/farhathnathvi
Core Features of Splunk
Splunk is a powerful SIEM software platform that offers a wide range of features that help
businesses gain valuable insights from their data and ensure cyber resilience.
www.linkedin.com/in/farhathnathvi
Primary Use Cases for Splunk
Splunk's application spans various critical areas. As we embark on this exploration, we'll
discover how Splunk's versatility addresses critical operational challenges across various
domains, making it a cornerstone for organizations seeking holistic IT, security, and business
intelligence solutions.
1. IT Operations Management
In the cyber security domain, IT operations management is synonymous with threat detection,
incident response, and system integrity. Splunk's role extends beyond IT operations, ensuring a
holistic security posture.
www.linkedin.com/in/farhathnathvi
3. Application Performance Monitoring (APM)
Applications are prime targets for cyber attacks. Splunk's APM capabilities enhance cyber
security by monitoring application performance, detecting anomalies, and mitigating potential
security risks.
www.linkedin.com/in/farhathnathvi
Advantages of Using Splunk
Splunk stands as the paramount choice in the realm of cyber security and data analysis, offering
a comprehensive solution that outshines its competitors. Through a meticulous exploration of
its core features, primary use cases, and advantages, it becomes evident that Splunk's robust
capabilities empower organizations to navigate the intricate landscape of cyber security and
derive actionable insights from their data. Splunk's adoption in cyber security is underpinned
by several advantages:
www.linkedin.com/in/farhathnathvi
Comparing Splunk to Other Data
Analysis Tools
Splunk's cyber security and data analysis prowess is further highlighted through a
comprehensive comparison with other leading solutions. Here, we compare Splunk with other
leading tools, providing detailed insights into their features, strengths, and unique offerings:
Comparison Highlights
Cost: ELK is open-source, making it cost-effective. Splunk offers free versions, but
enterprise solutions have licensing fees.
Ease of Use: Splunk has a more user-friendly interface and search language (SPL). ELK,
being open-source, may require more technical expertise.
Scalability: Both are scalable, but Splunk offers commercial support for demanding cyber
security needs.
Community and Ecosystem: ELK gets most of its support from a large open-source
community. Splunk has its community and Splunkbase marketplace.
Comparison Highlights
www.linkedin.com/in/farhathnathvi
Splunk vs. New Relic
Comparison Highlights
Focus: New Relic specializes in APM. Splunk's versatility makes it suitable for a broader
spectrum of cyber security and data analysis.
Pricing: New Relic follows a subscription model. Splunk's pricing varies based on cyber
security needs and data volumes.
Versatility: Splunk's adaptability makes it a better choice for organizations with diverse
cyber security requirements.
Comparison Highlights
Focus: Splunk offers a broader focus on data analysis and cyber security. IBM QRadar
specializes in security information and event management (SIEM).
Ease of Use: Splunk is known for its intuitive interface. IBM QRadar may have a steeper
learning curve.
Scalability: Both are scalable, but Splunk's commercial support enhances scalability for
demanding cyber security environments.
Community and Ecosystem: Splunk's active community and Splunkbase Marketplace
provide a robust ecosystem. IBM QRadar also has a community but may have fewer
community-driven resources.
Comparison Highlights
Focus: Splunk offers a broader focus on data analysis and cyber security. ArcSight
specializes in security information and event management (SIEM).
Ease of Use: Splunk is known for its intuitive interface. ArcSight may have a steeper learning
curve.
Scalability: Both are scalable, but Splunk's commercial support enhances scalability for
demanding cybersecurity environments.
Community and Ecosystem: Splunk's active community and Splunkbase Marketplace
provide a robust ecosystem. ArcSight also has a community but may have fewer
community-driven resources.
www.linkedin.com/in/farhathnathvi
Search Language in Splunk
Splunk uses what’s called Search Processing Language (SPL), which consists of keywords,
quoted phrases, Boolean expressions, wildcards (*), parameter/value pairs, and comparison
expressions. Unless you’re joining two explicit Boolean expressions, omit the AND operator
because Splunk assumes the space between any two search terms to be AND.
Basic Search offers a shorthand for simple keyword searches in a body of indexed data myIndex
without further processing:
index=myIndex keyword
An event is an entry of data representing a set of values associated with a timestamp. It can be a
text document, configuration file, or entire stack trace. Here is an example of an event in a web
activity log:
Search commands help filter unwanted events, extract additional information, calculate values,
transform data, and statistically analyze the indexed data. It is a process of narrowing the data
down to your focus. Note the decreasing number of results below:
www.linkedin.com/in/farhathnathvi
Common Search Commands
Command Description
www.linkedin.com/in/farhathnathvi
SPL Syntax
Begin by specifying the data using the parameter index, he equal sign =, and the data
index of your choice:
Complex queries involve the pipe character |, which feeds the output of the previous query into
the next.
Basic Search
This is the shorthand query to find the word hacker in an index called cybersecurity:
index=cybersecurity hacker
Filter by fields
source="/var/log/myapp/access All lines where the field status has value 404 from the
.log" status=404 file /var/log/myapp/access.log
source="bigdata.rar:*"
All entries where the field Code has value RED in the archive
index="data_tutorial"
bigdata.rar indexed as data_tutorial
Code=RED
index="customer_feedback" All entries whose text contains the keyword “excellent” in the
_raw="*excellent*" indexed data set customer_feedback
Filter by host
www.linkedin.com/in/farhathnathvi
SPL search terms Description
Filter by host
Selecting an index
This syntax also applies to the arguments following the search keyword. Here is an example of a
longer SPL search string:
www.linkedin.com/in/farhathnathvi
Basic Filtering
You can filter your data using regular expressions and the Splunk keywords rex and regex. An
example of finding deprecation warnings in the logs of an app would be:
Extract email
Extract fields according to
addresses:source="email_dump.
specified regular expression(s)
rex txt" | rexfield=_raw "From:
into a new field for further
<(?<from>.*)> To: <(?
processing
<to>.*)>"
The biggest difference between search and regex is that you can only exclude query strings with
regex. These two are equivalent:
source="access.log" Fatal
source="access.log" | regex _raw=".*Fatal.*"
But you can only use regex to find events that do not include your desired search term:
www.linkedin.com/in/farhathnathvi
Calculations
Combine the following with eval to do computations on your data, such as finding the mean,
longest and shortest comments in the following example:
coalesce(null(),
coalesce(X,…) The first value that is not NULL
"Returned val", null())
www.linkedin.com/in/farhathnathvi
Function Return value / Action Usage:eval foo=…
Evaluates an expression X using double
exact(X) exact(3.14*num)
precision floating point arithmetic
exp(X) e (natural number) to the power X (eX) exp(3)
www.linkedin.com/in/farhathnathvi
Function Return value / Action Usage:eval foo=…
www.linkedin.com/in/farhathnathvi
Function Return value / Action Usage:eval foo=…
Field value of X as a string.If X is a number, it reformats it as
This example returns
a string. If X is a Boolean value, it reformats to "True" or
bar=00:08:20:|
"False" strings.If X is a number, the optional second argument
tostring(X,Y) makeresults | eval
Y is one of:"hex": convert X to hexadecimal,"commas": formats
bar = tostring(500,
X with commas and two decimal places, or"duration": converts
"duration")
seconds X to readable time format HH:MM:SS.
This example
returns
"NumberBool":|
typeof(X) String representation of the field type
makeresults | eval
n=typeof(12) +
typeof(1==2)
urldecode("http%3A
%2F%2Fwww.site.c
urldecode(X) URL X, decoded.
om%2Fview%3Fr%
3Dabout")
For pairs of Boolean expressions X and strings Y, returns the validate(isint(N),
validate(X,Y,…) string Y corresponding to the first expression X which "Not an integer",
evaluates to False, and defaults to NULL if all X are True. N>0, "Not positive")
www.linkedin.com/in/farhathnathvi
Statistical and Graphing Functions
Common statistical functions used with the chart, stats, and timechart commands. Field names
can contain wildcards (*), so avg(*delay) might calculate the average of the delay and *delay
fields
number of occurrences of the field X. To indicate a specific field value to match, format
count(X)
X as eval(field="desired_value").
earliest(X)
chronologically earliest/latest seen value of X
latest(X)
maximum value of the field X. For non-numeric values of X, compute the max using
max(X)
alphabetical ordering.
minimum value of the field X. For non-numeric values of X, compute the min using
min(X)
alphabetical ordering.
range(X) difference between the max and min values of the field X
list of all distinct values of the field X as a multi-value entry. The order of the values is
values(X)
alphabetical
www.linkedin.com/in/farhathnathvi
Index Statistics
Compute index-related statistics.
From this point onward, splunk refers to the partial or full path of the Splunk app on your
device $SPLUNK_HOME/bin/splunk, such as /Applications/Splunk/bin/splunk on macOS, or, if
you have performed cd and entered /Applications/Splunk/bin/, simply ./splunk.
Function Description
| eventcount summarize=false
Show the number of events in your indexes
report_size=true index=* | eval size_MB
and their sizes in MB and bytes
= round(size_bytes/1024/1024,2)
| REST /services/data/indexes | table List the titles and current database sizes in
title currentDBSizeMB MB of the indexes on your Indexers
index=_internal source=*metrics.log
group=per_index_thruput series=* | eval Query write amount in MB per index
MB = round(kb/1024,2) | timechart from metrics.log
sum(MB) as MB by series
www.linkedin.com/in/farhathnathvi
Reload apps
To reload Splunk, enter the following in the address bar or command line interface.
www.linkedin.com/in/farhathnathvi
Debug Traces
You can enable traces listed in
$SPLUNK_HOME/var/log/splunk/splunkd.log.
Then
becomes
To change the trace settings only for the current instance of Splunk, go to Settings > Server
Settings > Server Logging:
www.linkedin.com/in/farhathnathvi
Select your new log trace topic and click Save. This persists until you stop the server.
www.linkedin.com/in/farhathnathvi
Configuration
The following changes Splunk settings. Where necessary, append -auth user:pass to the end of
your command to authenticate with your Splunk web server credentials.
Troubleshooting
Input management
User management
www.linkedin.com/in/farhathnathvi
Capacity Planning
Importing large volumes of data takes much time. If you’re using Splunk in-house, the software
installation of Splunk Enterprise alone requires ~2GB of disk space. You can find an excellent
online calculator
Input data
Specify the amount of data concerned. The more data you send to Splunk Enterprise, the
more time Splunk needs to index it into results that you can search, report and generate
alerts on.
Data Retention
Specify how long you want to keep the data. You can only keep your imported data for a
maximum length of 90 days or approximately three months.
Hot/Warm: short-term, in days.
Cold: mid-term, in weeks.
Archived (Frozen): long-term, in months.
Architecture
Specify the number of nodes required. The more data to ingest, the greater the number of
nodes required. Adding more nodes will improve indexing throughput and search
performance.
Storage Required
Specify how much space you need for hot/warm, cold, and archived data storage.
Storage Configuration
Specify the location of the storage configuration. If possible, spread each type of data across
separate volumes to improve performance: hot/warm data on the fastest disk, cold data on a
slower disk, and archived data on the slowest.
www.linkedin.com/in/farhathnathvi
Thank you
Farhath Nathvi
www.linkedin.com/in/farhathnathvi