What is perf_event in Linux? Linux Performance Monitoring

Heyan Maurya
19 Min Read

You might have an application on your Linux system running slower than expected, or need to diagnose mysterious performance issues. That’s where Linux’s perf_event subsystem comes in—a powerful yet often underutilized performance monitoring tool built right into the kernel.

This Linux guide is where I will discuss everything you need to know about perf_event—from basic concepts to advanced usage scenarios. Whether you’re a systems administrator, developer, or just a curious Linux enthusiast, you’ll discover how to use pref_event to gain incredible insights into your system’s performance.

What exactly is perf_event?

perf_event (sometimes called perf_events or perf tools, originally known as Performance Counters for Linux) is a powerful performance analysis subsystem built into the Linux kernel since version 2.6.31, released in 2009. It provides a unified interface to various performance monitoring sources.

Think of perf_event as Linux’s internal fitness tracker—constantly monitoring vital signs across your system and providing a way to access that data. Unlike external monitoring tools that estimate performance, perf_event taps directly into the kernel and hardware, giving you precise, low-level insights with minimal overhead.

What makes perf_event special is its comprehensive approach. Rather than focusing on just one aspect of system performance, it offers a unified interface to collect data from multiple sources:

  • Hardware events: CPU cycles, instructions, cache misses
  • Software events: Page faults, context switches, CPU migrations
  • Kernel tracepoints: Predefined points in the kernel code
  • Dynamic tracing: Custom probes you insert anywhere in kernel or user code

Why Should You Care About perf_event?

Now we know what perf_event is in Linux systems, but you might wonder, “Why should I learn about perf_event when there are plenty of monitoring tools available?” The answer is simple: power and precision.

When you need to answer questions like “Why is this specific application slow?” or “What’s causing these mysterious latency spikes?”, generic monitoring tools often fall short. They might tell you that something is wrong, but not why or where.

With perf_event, you can:

  • Pinpoint exactly where CPU cycles are being spent
  • Identify specific functions causing performance bottlenecks
  • Monitor detailed system behaviors like disk I/O patterns and network activity
  • Correlate different performance events to understand cause-and-effect relationships

I’ve personally used perf_event to solve performance mysteries that stumped an entire team—like discovering a poorly optimized function that was causing cache misses and slowing down the entire application. The insights you gain from perf_event can transform the way you think about performance tuning.

The perf Command: Your Gateway to perf_event

While perf_event is the kernel subsystem, most users interact with it through the perf command-line tool, which provides a user-friendly interface to the perf_event functionality.

The perf tool is accessible from the command line and provides various subcommands for different performance analysis tasks.

Let’s look at some of the most useful perf subcommands:

  • perf stat: Collects and summarizes performance counter statistics
  • perf record: Captures detailed samples for later analysis
  • perf report: Analyzes and displays the data collected by perf record
  • perf top: Shows real-time performance counter profile
  • perf trace: Traces system calls (similar to strace but with lower overhead)
  • perf script: Outputs trace data in a scriptable format

These commands form the foundation of your performance analysis toolkit. As you become more familiar with Perf, you’ll develop your workflow by combining these tools to diagnose issues quickly.

Getting Started: Installing perf

Before diving into perf_event, you’ll need to install the perf tool. On most Linux distributions, it’s part of the linux-tools package.

On Ubuntu/Debian systems:

sudo apt install linux-tools-common linux-tools-generic

On Fedora/RHEL/CentOS:

sudo yum install perf

Once installed, verify it’s working:

perf --version

If you see a version number, you’re ready to start exploring!

Perf Stat: Your First Step into Performance Monitoring

Let’s start with the most straightforward and most useful perf command—perf stat. This command measures key performance counters for a specified command or time period.

For example, to measure performance statistics while running the gzip command:

perf stat gzip largefile

This will output something like:

 Performance counter stats for 'gzip largefile':

      1,920.15 msec  task-clock                #    0.991 CPUs utilized          
             13      context-switches          #    0.007 K/sec                  
              0      cpu-migrations            #    0.000 K/sec                  
            258      page-faults               #    0.134 K/sec                  
     5,649,595,479     cles                    #    2.942 GHz                    
     8,625,207,199   instructions              #    1.53  insn per cycle         
     1,488,797,176   branches                  #    775.351 M/sec                
        53,395,139   branch-misses             #    3.59% of all branches        

       1.936842598 seconds time elapsed

Even this simple output provides valuable insights:

  • 1.53 instructions per cycle: This value indicates how efficiently the CPU is executing instructions. Higher is generally better.
  • 3.59% branch misses: Shows how often the CPU’s branch prediction is wrong, which can impact performance.
  • Context switches and migrations: Reveals how often the task is being moved between CPU cores.

With just this quick command, I can immediately see if my program is CPU-bound, memory-bound, or suffering from excessive context switching.

If you get the following error while running the perf stat command:

Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

Then to solve it: Fix: Lower the perf_event_paranoid Value

To allow non-root users (or even root in some cases) to use perf, you need to change the perf_event_paranoid setting.

✅ Temporary Fix (until reboot):

sudo sysctl -w kernel.perf_event_paranoid=1

Or, to allow full access:

sudo sysctl -w kernel.perf_event_paranoid=-1

✅ Permanent Fix:

Open the sysctl config:

sudo nano /etc/sysctl.conf

Add this line at the end of the file and save it:

kernel.perf_event_paranoid = -1

Apply the changes:

sudo sysctl -p

Perf Record & Report: Capturing and Analyzing Performance Data

While perf stat gives you a high-level overview, perf record and perf report let you dive deeper by sampling what your system is doing over time.

The basic workflow is to use perf record to capture performance data and then analyze it with perf report. This powerful combination allows you to see exactly where your system is spending time, down to the specific functions.

Let’s see it in action:

# Sample CPU stack traces at 99 Hz for 30 seconds
perf record -F 99 -a -g -- sleep 30

# Analyze the collected data
perf report

The -F 99 option sets the sampling frequency to 99 Hz (samples per second), -a tells perf to monitor all CPUs, and -g enables call graph recording so you can see complete stack traces.

Why 99 Hz instead of a round number like 100? It’s a clever trick to avoid accidentally synchronizing with periodic system activities, which could skew your results.

When you run perf report, you’ll see something like:

Analyze the collected data using perf report command

This shows the functions where your CPU time is being spent, sorted by “overhead” (the percentage of samples captured in that function). The beauty of this approach is that you’re seeing real-world usage, not theoretical bottlenecks.

I often describe perf record/report as “putting your system under a microscope”—it reveals details you simply can’t see with higher-level tools.

Advanced Perf Usage: Tracing Specific Events

perf_events can trace a wide range of events, including hardware events from CPU performance counters, software events based on kernel counters, kernel tracepoints, user-level tracepoints, and dynamic events using kprobes and uprobes.

Here are some examples of more targeted perf commands:

Tracing System Calls

# Count all system calls made by a command
perf stat -e 'syscalls:sys_enter_*' command

# Trace specific system calls with stack traces
perf record -e 'syscalls:sys_enter_read' -ag

Analyzing Disk I/O

# Trace all block I/O events
perf record -e block:* -a

# Track disk I/O latency
perf record -e block:block_rq_issue -e block:block_rq_complete -a

Monitoring Network Activity

# Trace outbound TCP connections
perf record -e syscalls:sys_enter_connect -ag

# Track socket buffer consumption
perf record -e 'skb:consume_skb' -ag

Each of these commands provides visibility into a different aspect of system behavior. The key is understanding what you’re looking for and choosing the appropriate events to trace.

Dynamic Tracing

Dynamic tracing allows you to create custom instrumentation points anywhere in the kernel (using kprobes) or user-level applications (using uprobes). This capability lets you analyze what’s happening at very specific points in the code.

For example, to trace a specific kernel function:

# Add a dynamic probe to the tcp_sendmsg function
perf probe --add tcp_sendmsg

# Record traces of this function
perf record -e probe:tcp_sendmsg -ag

# Remove the probe when done
perf probe --del tcp_sendmsg

If you have debug symbols installed, you can even probe specific lines within functions or access function arguments:

# Show available variables in the tcp_sendmsg function
perf probe -V tcp_sendmsg

# Add a probe that captures the 'size' argument
perf probe --add 'tcp_sendmsg size'

This level of flexibility is what makes perf_event such a powerful tool for performance debugging—you can look at exactly what matters for your specific problem.

Real-World Example: Diagnosing a Performance Bottleneck

Let’s explore a real-world scenario where perf_event can save the day. Imagine you’re running a web server that suddenly experiences high latency. Traditional monitoring shows high CPU usage but doesn’t tell you why.

Here’s how I’d approach this with perf:

  1. First, get a high-level view: perf top -p $(pgrep -f "web-server") This shows me real-time usage. Let’s say I notice high activity in SSL/TLS functions.
  2. Next, I’d do a more detailed recording: perf record -F 99 -p $(pgrep -f "web-server") -g -- sleep 60
  3. Analyze the results: perf report --sort comm,dso,symbol This might reveal that most time is spent in a specific encryption function.
  4. To confirm my suspicion, I could trace SSL library calls: perf record -e probe:ssl_read -e probe:ssl_write -p $(pgrep -f "web-server")

Performance Visualization with Flame Graphs

While the text output of perf commands is informative, visualizing performance data can make patterns much more apparent. One of the most powerful visualization techniques for perf data is the flame graph.

Flame graphs display stack traces in a way that makes it immediately obvious where CPU time is being spent. Each rectangle represents a function in the stack, with the width proportional to how often it appears in the samples.

Example to create a flame graph:

Download the FlameGraph tools:

git clone https://github.com/brendangregg/FlameGraph.git

Collect stack traces with perf:

perf record -F 99 -a -g -- sleep 30

Generate the flame graph:

perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > profile.svg

The resulting SVG file can be opened in any web browser. Functions consuming the most CPU time will be the widest bars, making bottlenecks visually obvious.

Generate the flame graph using perf command

I’ve found flame graphs to be incredibly effective for communicating performance issues to team members who might not be familiar with performance analysis. A picture truly is worth a thousand perf reports!

Challenges and Considerations When Using perf_event

While perf_event is incredibly powerful, it’s not without challenges:

Security and Permissions

perf_events has significant security implications because it can potentially expose sensitive information. Linux implements security controls based on process credentials, dividing them into privileged (root) and unprivileged processes.

By default, Linux restricts certain perf_event operations to privileged users. You can check the current settings:

cat /proc/sys/kernel/perf_event_paranoid

Higher values impose more restrictions. You can temporarily lower this for testing, but be careful about security implications:

sudo sysctl kernel.perf_event_paranoid=1

Overhead Considerations

While perf is designed to be lightweight, tracing can still introduce overhead. This is especially true when:

  • Using high sampling frequencies (e.g., -F 1000 instead of -F 99)
  • Tracing very frequent events like context switches or memory allocations
  • Enabling stack traces (-g) for high-frequency events

Always be mindful of the overhead, especially on production systems. Start with targeted, low-frequency tracing and increase only as needed.

Kernel and Hardware Dependencies

Some perf features depend on:

  • Specific kernel configuration options
  • CPU hardware capabilities
  • Debug symbols availability

These dependencies can limit what’s possible on some systems, especially in virtualized environments where hardware access may be restricted.

Beyond Basics – That Is Worth Exploring

As you become more comfortable with perf_event, here are some advanced topics worth exploring:

eBPF Integration

In newer Linux kernels (4.4+), perf has enhanced BPF support. This makes perf tracing programmable, transforming it from a mere counting and sampling tool to a fully programmable in-kernel tracer.

BPF programs can filter events in kernel space, drastically reducing overhead for complex analyses.

Scheduler Analysis with perf sched

The perf sched subcommand provides specialized tools for analyzing the kernel’s CPU scheduler behavior:

# Record scheduler events
perf sched record -- sleep 1

# Analyze scheduler latencies
perf sched latency

This is invaluable for diagnosing CPU scheduling issues and latency problems.

Hardware Cache Events

Modern CPUs have extensive performance monitoring capabilities for their cache hierarchies:

# Analyze Level 1 data cache behavior
perf stat -e L1-dcache-loads,L1-dcache-load-misses command

Understanding cache behavior is crucial for optimizing high-performance applications.

FAQs About perf_event

How does perf_event differ from other performance tools like top or htop?

While tools like top provide a high-level system overview, perf_event gives you much deeper insights by accessing hardware performance counters and kernel tracepoints. Top shows you which processes are using resources; perf shows you why and how they’re using those resources.

Is there a GUI for perf_event?

While perf is primarily a command-line tool, there are several ways to visualize its data:
The built-in perf report TUI (Text User Interface)
Flame graphs (as discussed earlier)
External tools like hotspot or Intel VTune that can import perf data

Can perf_event monitor containers or virtual machines?

Yes, but with some limitations. You can use perf to monitor processes inside containers on the same host. For virtual machines, you can use perf inside the VM to monitor guest activity, but monitoring the VM from the host requires special configuration.

Does perf_event work on all Linux distributions?

The core functionality works on all modern Linux distributions, but availability of specific features depends on:

– Kernel version and configuration
– Distribution-specific packages
– Hardware capabilities

Major distributions like Ubuntu, Fedora, and RHEL all include perf tools packages.

How much overhead does perf_event introduce?

It depends on how you use it. Simple counting with perf stat introduces minimal overhead (typically <1%). Sampling with perf record -F 99 might add 1-3% overhead. Heavy tracing of frequent events can introduce more significant overhead, potentially up to 10% or more.

Conclusion: System’s Performance with perf_event

Linux’s perf_event subsystem provides a powerful window into what’s really happening on your system, from the hardware to the application code.

Whether you’re trying to optimize a high-performance application, diagnose a perplexing performance issue, or better understand your Linux system, perf_event is an invaluable tool.

I encourage you to start small: try perf stat on a command you use regularly, or run perf top to see what’s happening on your system right now. As you get comfortable, gradually explore the more advanced features. Before long, you’ll wonder how you ever managed without this powerful tool.

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.