Splunk Account : KANIPILLAY/Temenos1@
Splunk for windows login: admin / Temenos1@
SPLUNK – Notes Document
Start - 30/11/2021
Answers.splunk.com – get help there
Intro:
Upload a sample file e.g test.csv – it appears file has to be ‘structured’ ?
1. Click Add data button
2. Click ‘Upload file from computer’ – assuming we can upload from
various other sources
3. Choose the file
4. Next
5. Rename ‘Host field value’ field to something useful
6. Click Review & then Submit, then ‘Start Searching’
System then comes up with Standard Splunk fields + Interesting fields
and splits by field name, number of occurrences etc
You can build a custom table view by using the command like below
- You can now also build a visualisation – Click on VISUALISATION
Tab
Use stats command as per below define a count on specific fields from
your csv file as needed
Then click on ‘the Search button
You can then choose a suitable visualisation by clicking on the Splunk
Visualisations menu option
You can then create a DASHBOARD where you save your visualisation or
multiple visuals
Click ‘Save as’, choose new/existing dashboard
Under visualisations, you can also opt for ‘SINGLE VALUE’ which is a
number display. Per below, we’re counting only when
Status=error
We use stats count command on status field and then
We count the table output value
So you can edit the theme on the dashboard etc, save / refresh and so on
Section (2) – Basics of Splunk
SPLUNK COMPONENTS
The software does 3 things mainly:
- Ingests data
- Indexes, parses and stores data
- Run searches on indexed data
Processing components:
- Forwarder
Forwards data from one splunk component to another
e.g. from a source system to an indexer or index cluster
or from a source system directly to a search head
2 types of forwarders:
UNIVERSAL – most common – simply directs data from a source
to target
HEAVY – can also parse data before forwarding
- Indexer
Indexes data and stores data in indexes
In a distributed environment they can:
Reside on dedicated machines
Can be clustered or independent – clustered indexes also known
as peer nodes – can attach load balancing to these
- Searcher head – users typically interact with this component
Manages search requests from users
distributes searches across indexers
Consolidates results from the indexers
Management Components:
- Monitoring console
- Deployment server
Is a centralised configuration manager for any number of splunk
instances – splunk instances are called DEPLOYMENT CLIENTS
The deployment server pushes apps and config packages to
DEPLOYMENT CLIENTS – These are called DEPLOYMENT APPS
We configure the deployment server through the FORWARDER
MANAGEMENT INTERFACE
- Licence Server Master
Centralised licence manager
Clients are called licence slaves
Manages licence pools and stacks
ALL other SPLUNK components look to the licence Master to
validate licencing
- Indexer Cluster Master
Manages indexer clusters
Co-ordinates activity within the cluster
Manage data replication
Manage buckets(storage) for the cluster
Handles updates for the indexer cluster
- Search head Cluster Deployer
Manages baselines and apps for search head cluster members
Splunk scales via this component
Note – the Cluster deployer is NOT a member of the cluster
Every SPLUNK component is built using SPLUNK Enterprise – its only a
matter of Configuration?
LICENCE MANAGEMENT IN SPLUNK
>
Licence Types:
- Measured by amount of data ingested per day, NOT the amount of
data stored
- Data volume is measured from midnight>midnight
STANDARD licence:
Can buy a licence e/g/ 5gb data per day
ENTERPRISE TRIAL
Installs with the product – limited to 5oomb per day. Valid for 60days
– converts to free licence
SALES TRIAL
Setup a POC – SO THEY CAN SIZE A TEMP LICENCE AND SEE HOW
SPLUNK WORKS for you
DEV TEST
FREE LICENCE
When enterprise licence expires - limited
INDUSTRIAL IOT
FORWARDER
Used when setting up heavy forwarders
LICENCE VIOLATIONS Section
Will send warnings
From version 6.5 – Enterprise version does not disable search if you
exceed your data
DISTRIBUTED LICENCING:
Most components needs access to an Enterprise licence except Forwarder,
which needs a Forwarder licence
Ideally:
- Setup a licence master – then have all of your splunk instances talk
to the master for licencing
A collection of licences who’s individual volume licencing amounts
aggregate to serve a single unified amount of indexing volume is called a
STACK
Licence POOLS are created from the licence STACKS. Stacks are made
up of one or more licence files
POOLS are sized for specific purposes
The licence master manages licence pools
Indexers and other splunk Enterprise instances are assigned to a licence
pool
Universal FORWARDER does not require a licence
The HEAVY forwarder requires a forwarder licence
LICENCE GROUPS – sets of licence stacks. A stack can only be a member
of (1) licence group, and only (1) group can be active at a point in time
Types of licence groups:
Enterprise / Sales Trial – allows stacking
Enterprise Trial / Free & Forwarders – does not allow stacking
To amend/update licencing go to Settings/Licence
Splunk Configuration Files
Splunk runs on CONFIG files – WITHIN a CONF file, behaviours and
functions are defined
You can have multiple conf files (even with the same name), which
Splunk evaluates based on precedence
Common conf files:
Inputs.conf
Can set host values for source, which indexers to use to store
events
Props.conf
Manages the indexing property behaviours
Transforms.conf
Settings and values that govern data transformation
See admin manual for a full list of conf files
Above we have the Splunk config directory structure
Splunk home directory differs depending on your O/S
/bin – binaries and commands which you can execute from the CLI
/var /lib /splunk contain index files and other important splunky stuff
/etc/system/default – contains the default config files for system /
apps/users
/local – custom config files live here
Regarding conf files, system makes decisions based on the below
precedence
Whats inside a conf file?
Stanza – header in [ ] brackets
Settings are then defined by attribute = value pair
At run time, config files are evaluated as below
- Merges copies of all the stanzas of the same file
- If files are different or have diff attributes
o Uses value with highest precedence, as determined by location
in the directory structure (per above diagram)
Global config files live in /system
BTOOL – what’s this?
It is sometimes difficult (in a very large splunk setup), to understand
which config file is being used
So use BTOOL for this
- Command line tool to help troubleshoot – helps determine which
config file is being used
Example of a btool command – executed from /bin directory
INDEXES – how does Splunk manage this
Indexes are a repository of Events which come built in, OR you can
create custom ones
Indexes contain 3 types of data
- Raw data in compressed form
- Indexes that point to the raw data
- Metadata (timestamp, host value etc)
There are 2 types in INDEXES
1. EVENT
Default type which handles any type of data
2. METRICS
Optimised to store/retrieve metrics data
Metric data comprises timestamp/Metric name like a dns
name/Value and Dimension
How does indexing work?
TYPES OF BUCKETS
A bucket is a place / folder on the filesystem
- HOT bucket – newly indexed data – an index can one or more of
these hot buckets
- WARM – moved from hot without any writing / updates I assume
- COLD – moved from warm – can have many – cold data is data
that is hardly accessed
- FROZEN – moves from cold and is archived – indexer deletes
frozen unless you archive
- THAWED – restored data
Each bucket has defined directory paths
HOW does splunk check DATA INTEGRITY
Double hash method
- Computes hash on newly indexed data
- Computes another hash when moving to warm directory
- Stores hash files in /rawdata path
Some CLI options around integrity / checking hash files etc
Indexes.conf OPTIONS
You can set options at the following levels
We will focus on GLOBAL / PER INDEX
GLOBAL:
PER INDEX:
See admin/Indexesconf document for all options regarding these settings
PER PROVIDER:
Options for external resource providers e.g. HADOOP
All provider Stanzas begin with [provider: <provider name>]
PER PROVIDER FAMILY:
Options for multiple external resources. HADOOP / HIVE / SPARK
Family values take precedence if the same options are specified in PER
PROVIDER
All provider family stanzas begin with [provider-family:<family name>]
PER VIRTUAL INDEX
Common for the HADOOP family
Lets splunk access data stored in external systems
FISH BUCKET
Is a means to keep track of which file/filepart has already been indexed
Contains seed pointers or CRC(Cyclical Redundancy Check) only for the
indexed files and not the files themselves
(demo notes) –
How to create and index
- Settings/Index/New Index
Can define/set various settings here as per notes above
- Apply a data retention policy
By default splunk keeps data for 6 years
You can update by looking into the index.conf file
Explore buckets in the file system – all buckets are in /defaultdb/*
Note – Buckets are organised and processed by AGE!!!
Check hashes for validation : CLI command as below
MANAGING USERS / GROUPS & CAPABILITIES
Authentication Mgmt
3 types of authentication
NATIVE, LDAP & SCRIPTED
Integrating SPLUNK & LDAP(Lightweight Directory Access Protocol)
LDAP defines the protocol to authenticate to, access and update objects
in an X.500 style directory
X.500 defines a hierarchical database of objects on the network
Objects can be anything on a network
Each Object then has attributes
Example:
LDAP integration can be done via Splunk web OR edit the
authentication.conf file
We will need an LDAP strategy comprising the below
LDAP Connection settings
Host – fully qualified name or IP address of the LDAP server or AD
controller
Port – Default is 389. If using SSL, then port 636
Bind DN(Distinguished name) – Recommended to use a service account.
Used to Bind LDAP server to splunk
USER settings
User base DN
User base filter
Username Attrib
Real Name attrib
Email attrib
Group mapping attrib
GROUP settings
Group base DN
Static group search filter
Group name attrib
Static member attrib
DYNAMIC group settings
Dynamic member attrib
Dynamic group search filter
There is also the Splunk native authenticatin steps where you create a
user group, then add capability and then link this to a user
MULTIFACTOR auth is also supported
DUO * RSA can be used as per below steps
Spend a bit of time understanding LDAP (such as Microsoft AD) and how
to hook up with Splunk
Map LDAP groups to SPLUNK roles
Define the connection order
What is a User role – a definition of capabilities / actions that a user can
do. 4 types of roles in splunk. Roles can be edited etc using the GUI or
updating the authorisation.conf file
ADMIN, POWER, USER, CAN_DELETE
How to create a custom role
Access on the Settings/Roles/New role menu –
- Give it a name – can opt to choose other roles to inherit
functionality if you like – after creating – you can edit
ELSE
After creating user without cloning
Click Capabilities tab – add the required cababilities and save
Add Users
Access through Settings/Users/New user
You can then add them to a specific role
GETTING DATA INTO SPLUNK – new section
The pipeline looks like below
Inputs – this is when Splunk receives data from a source – adds on some
metadata like hostname/type etc – It does not actually look at the data
at this stage
- Can Monitor files/directories
- Can upload a file
- Once data is received it is compressed
- Can ingest data from TCP/UDP ports – network inputs
- Can also ingest from SNMP EVENTS
- Can be from Windows inputs
- Can ingest metrics data
- FIFO queues
- Scripted inputs – e.g. from apis
- Modular inputs
- HTTP event collector – web data specifically
BASIC SETTINGS for an input
We can configure inputs in a few ways
- Through an app
- Splunk Web
- CLI e.g. ./splunk add monitor <path>
- By editing inputs.conf – add a stanza for each input
- GDO(Guided Data Onboarding_ - is new and acts like data input
wizard
- Parsing phase – Splunk now looks at the data (examines, analyses
and transforms)
- Indexing phase – Splunk writes parsed data to indexes (buckets)
onto disk
MonitorNoHandle is available on Windows hosts.
‘syslog’ is a form of NETWORK input
WMI IS not required for monitoring inputs on Windows
SPLUNK FORWARDER TYPES
UNIVERSAL – Collects data from a source and forwards it to a receiver.
Can forward to an indexer or index cluster
It has no licence – does not do anything other than forward data
HOW TO CONFIGURE Universal Forwarder
- Config receiving on Splunk Enterprise instance
- Download and install the UF
- Start the UF
- Config the UF to send data
- Config the UF to collect data from host
To configure receiving
See Settings/Forwarding and Receiving/Receiving/new Receiving port
Default port = 9997
You need to download / install a Universal forwarder from Splunk
HEAVY – needs licence
Is more featured than the Universal forwarder
DISTRIBUTED SEARCH options
Enables you to scale deploying – seperates search management and
presentation from indexing and search retrieval layers
Two components:
- SEARCH HEADS
- INDEXERS or search peers
Why the need for distributions?
- Enables horizontal scaling – we can put in more search heads
- We can manage access control better
- Data coming from multiple locations – geolocations
DIFFERENT TYPES OF DISTRIBUTION
- ONE OR MORE INDEPENDENT SEARCH HEADS – that can search
across multiple indexers (peers)
- INDIVIDUAL INDEXERS or peers
- A group of search heads clustered together
We have a deployer which sends config data to the Cluster
- Clustered indexers so that they share data as below
Here the data is replicated across the indexer cluster
The MASTER CLUSTER NODE manages data replication and
failover within the indexer cluster
We can also combine a Search head cluster & Indexer Cluster as
below
Scaling Options
By adding components to each tier we can scale horizontally
Adding search heads – creates a Search head cluster
Adding indexers – create an indexer cluster
Adding forwarders creates a load balancing forwarder cluster
How to CONFIGURE A DISTRIBUTED SEARCH GROUP
We Need:
- Machines (virtual/physical or cloud_ Splunk Enterprise must be
installed on all
- Network – machines must be able to talk to each other. And all
machines must have a FQDN (Fully qualified domain name)
- Know the CLI commands
Notes:
- Network setup must be impeccable
- Static IP addresses recommended
- Must have a DNS server – or manually configure hosts files
- Firewalls setup
- Min 1 deployer to 3 cluster members
General workflow to setup this distribution
So, assuming you have your requirement defined:
1. Configure the deployer
2. Initialise the cluster, by repeating the below for EVERY CLUSTER
MEMBER
3. Configure the CAPTAIN
4. Optionally adding other members to the cluster
The demo – notes on setting up this distribution
Note : we have 2 members in this cluster and 1 x deployer
Note their FQDN’s
1. Setup deployer
Locate/edit server.conf file in Program files/splunk/etc/system/local
and add below lines. Save file when done
2. Setup the members
Use a powershell – go to Program files/splunk/bin
Use following command
./splunk init shcluster-config
–auth <splunk login:password>
-mgmt_uri https://fully qualified domain name of this member:8089
which is the portnumber
-replication_port <can be any port>
-replication_factor <max value of 3) – this is an optional switch
-conf_deploy_fetch_url <deployer IP or FQDN>:8089
-secret <password saved in server.conf file>
-shcluster_label <as defined in server.conf file>
Server or splunkd must be then restarted
3. Sort out the captain per the below syntax
To check status of your distrib, you can do as below from /bin
directory
./splunk show shcluster-status –auth:<splunk login dets?
To add a new member to cluster do as below:
STAGING DATA
The 3 phases of Splunk indexing process
- Data input from whatever source
Files and directories
Network events e.g. SNMP
Windows sources
Other – like HTTP Event Collector / Queues / Metrics
- Parsing – breaks the data into individual events / lines – adds
metadata
- Indexing – writes parsed data to index buckets
CONFIGURING FORWARDERS
Universal Forwarders – just forward data – no parsing or search
capability
5 steps to configuring Universal forwarder
4 ways to configure
1. GUI Windows during installation via MSI
2. CLI
3. Config files
4. Deployment server
Heavy forwarders – Full Enterprise installation and must apply a licence
- Can parse and route data
- Can index locally
Or
Can edit the outputs.conf file directly
Can also setup some advanced configs as per below
Other options with forwarders
Load balancing – forwarder can split data amongst indexers / receivers
Forwarders can route data based on time / volume intervals
INTERMEDIATE FORWARDERS
Are a setup where you configure forwarder A to route data to another
forwarder
MULTIPLE PIPELINE SETS
The forwarder can process multiple events at the same time
You can setup like below
Demo notes – setting up heavy forwarder
We have (1) the search head with IP …149
We have (2) the forwarder with IP …121
1. We apply the forwarder licence to the forwarder instance
Settings/licencing > Change licence group > choose forwarder
licence and apply the licence
A restart is then required
2. Configure RECEIVING on the search head instance (IP …149)
Settings/Forwarding and Receiving – See Receive data – Add new
- Listen on port – default 9997
- Save
3. Set up forwarding on the forwarder instance (IP …121)
Settings/Forwadring&Receiving – choose Configure forwarding –
Add new
Host = add the DNS or IP for the search head instance like this:
192.168.1.149:<port>9997
Save
4. Tell the forwarder which data to forward
Settings/Add data – in this case we chose ‘Monitor’ / choose
something from the available options
Select Next – hostname – check it
Click review
Click submit
You can also tell forwarder to index data locally
Settings/Forward&Receive / Forwarding defaults / select Radio
button to store local copy
DEMO NOTES – setting up Universal forwarder
1. On windows – download the software from Splunk
2. Refer to earlier notes – I think we covered this
Forwarder Management / DEPLOYMENT servers
DEPLOYMENT servers:
- Group components by commons characteristics
- Then distribute content based on these groups
When NOT to use a deployment server
- Indexer clusters – you can use DS to update the master node only
- Search head clusters except the deployer node
To refresh deployment server after creating a deployment app do as
below
Splunk reload deploy-server
The deployment process at high level
DEPLOYMENT APPS
Not necessary traditional Splunk apps
- Can be any content you want to send to the client instance
e.g. apps, configs
- default location = /opt/splunk/etc/deployment-apps
DEPLOYMENT CLIENTS
- Grouping by class . characteristic
- A client can belong to more than one class
Cli command to setup deployment client
The ip address is that of the deployment server
Monitor Inputs – new section
3 options available
MONITOR
- Continously monitors a specified directory / path remotely or
locally
MONITORNOHANDLE
- Windows only – monitors files on windows systems as they are
written, using a kernel
UPLOAD
- You can upload a file for a one time analysis
If splunk is restarted, monitoring continues processing where it left off
Compressed files are decompressed before indexing
If you add data to a compressed file, the entire file is re-indexed
Restrictions:
- CANNOT monitor a file who’s path exceeds 1024 chars
- Files with .splunk extension are not monitored
You can configure monitoring through inputs.conf like below
Can also use the CLI to add monitoring as below
Can also monitor via BATCH as per below
Typically used for large batches of historic data
BATCH will ingest the file, index, then delete the file
HOW TO CONFIGURE LOCAL MONITORING via Splunk Web
Settings / Add data – choose Monitor option
Selection type of file to monitor e.g. files & Directories – browse and
choose file / path
For remote monitoring – it can be done via the heavy forwarder using
the similar commands
NETWORK and scripted Inputs – new section
Here we create network inputs using TCP/UDP protocols
- Can configure splunk to accept network inputs on any port
- Splunk can then consume data arriving on these ports
- TCP is recommended
- SPLUNK CLOUD will only accept data from forwarders with SSL
certificate
Scripted Inputs
How to config a network input
Settings/Data Inputs – choose TCP
Click New Local TCP
- DEFAULT PORT = 514 and leave other defaults as is
- Click Next
- Select source type = syslog
- App context = Search & Reporting
- Host method = ip address
- Review/Submit
To add scripts in Splunk
Settings/Data Input – choose scripts
You have some default scripts or you can choose ‘New Local script’
To handle syslog data splunk:
- Config an intermediate forwarder
- Splunk cloud requires an intermediate forwarder
AGENTLESS inputs – new section
- Getting data into Splunk without using a forwarder
- Popular for syslog / SNMP data
Windows Input types can be
- Event logs
- PerfMon
- Registry
- WMI
- Active Directory
Ensure Firewall / Anti virus / AD Authentication isn’t blocking splunk
Set up for Windows – can use splunk app for Windows named INF
HTTP EVENT COLLECTOR (HEC)
- Allows sending of data to splunk over HTTP/S
- Useful for monitoring client side web applications
- Uses token based authentication
Like below command using curl: Notes on the token value are on next
page
Setting up the HEC:
Settings/Data Inputs – Choose HTTP Event Collector
The collector works on tokens
Check Global settings:
Then create a new token – Click ‘New token’
Click Next
Make changes as per next screenshot before clicking ‘Review’ & Submit
You then get a token value
The token becomes your AUTHENTICATION over HTTP like a
username/password
Remember the next settings changes: Select allowed Indexes e.g. main
Quiz note:
All of the above is the answer
FINE TUNING INPUTS – new section
- Notes on what SPLUNK does when it gets the data
Quiz notes:
- Splunk stores data in 64k block sizes
- What is a source type? – is a field that describes the structure of
data in an event
- What is the default host value – it is the DNS name of the machine
- Default char set encoding for splunk = UTF8
- Can specify alternate char set encoding on props.conf config file
The below happens during the parsing phase
There are 4 things that happen
How does Splunk determine event bouncaries – how does it split data
into separate events
Typically / commonly looks for CR / LF
Can also set custom line breaks by defining in props.conf file
TIMESTAMPS – if exists in the data is converted to UNIX time (no.of
seconds elapsed since 1970) and stored in the _time field
- Uses the timezone setting as defined on the splunk instance
How does Splunk look for timestamp data?
4 steps:
Last resort is current system time, at the point of indexing, for each
event
USING DATA PREVIEW – just lets you view events as they are created
We need to set up limits.conf
This will be in the path
Manipulating raw data – new section
Why the need to transform data?
Regulatory reasons usually means data is masked
We also may want to route data to specific indexes based on event
content
2 main methods to
- Transform
- Route
- Mask
1. SEDCMD – use inputs.conf / props.conf – easier to use if you are
familiar with Unix
SED – Linux stream editor
- Edits data as it comes in – streaming
- Replace strings using the (s) token
- Substitute chars using the (y) token
- We put the SEDCMD key value pairs into props.conf
- we tell splunk to find data via inputs.conf
The /g switch above – means make the change required globally
Heres an example
See regexr.com – useful for helping build regex commands
SEDCMD syntax : s/<regex>/<replacement>/g
2. TRANSFORMATIONS – uses inputs.conf / props.conf /
transforms.conf – Takes longer to setup
Using Transforms to Anonymising data
- Edit inputs.conf – tells Splunk where the data is
- Use transforms.conf – Use regular expressions and generally defining
masking parameters
- Use props.conf – this references the parameters in transforms.conf
S0 – it would be something like below
Using Transforms to Override Sourcetype or Host
- Occurs at parse time
- Only works on an indexer or heavy forwarder
- Need edits to transforms.conf / props.conf
…and it would like this:
Using Transforms to Route events to specific indexes
- is configured on a heavy forwarder
- need edits to
- props.conf – define routing based on event data
- transforms.conf – specifiy criteria based on regular expressions
(REGEX)
- outputs.conf – Define target groups
Using Transforms.conf to prevent unwanted data from being indexed
- define a REGEX to match events you want to keep in
transforms.conf
- Send everything else to nullqueue using props.conf
The <spec> stanza can be host / source or sourcetype