Benchmark Suite

LunarLog ships a comprehensive benchmark suite built on Google Benchmark v1.9.1. It measures every layer of the logging pipeline in isolation — throughput, template formatting, filter evaluation, sink I/O, enricher overhead — and a realistic end-to-end scenario that combines all of them.

Use these benchmarks to:

Establish a performance baseline for your platform
Compare sink types and formatter costs
Measure the overhead of filters, enrichers, and tag routing
Validate that configuration changes don't regress throughput

v1.22.0 CoW Optimization Notes

v1.22.0 replaces several hot-path deep copies with immutable snapshot sharing (std::shared_ptr<const T> + copy-on-write updates).

Operation	Before	After
Per-sink filter check	`vector<FilterRule>` deep copy	`shared_ptr` refcount bump
Template cache hit	`vector<PlaceholderInfo>` deep copy	`shared_ptr` refcount bump
Global filter eval	Evaluated inside mutex	Snapshot outside mutex
Tag routing check	`set<string>` evaluated inside lock	Snapshot outside lock

This change is internal only — public APIs and expected behavior are unchanged.

Building the Benchmark Suite

Benchmarks are opt-in and disabled by default. Enable them with the LUNARLOG_BUILD_BENCHMARKS CMake option:

cmake -B build -DCMAKE_BUILD_TYPE=Release -DLUNARLOG_BUILD_BENCHMARKS=ON
cmake --build build --target lunar_log_bench

Google Benchmark is fetched automatically via FetchContent — no manual dependency installation required.

Important: Always build in Release mode for meaningful results. Debug builds include assertions, sanitizers, and no optimizations — timings will be 5-50x slower and not representative of production performance.

CMake Options

Option	Default	Description
`LUNARLOG_BUILD_BENCHMARKS`	`OFF`	Build the benchmark executable
`LUNARLOG_BUILD_TESTS`	`ON`	Build tests (disable with `OFF` for benchmark-only builds)
`LUNARLOG_BUILD_EXAMPLES`	`ON`	Build examples (disable with `OFF` for benchmark-only builds)

Minimal benchmark-only build:

cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DLUNARLOG_BUILD_BENCHMARKS=ON \
    -DLUNARLOG_BUILD_TESTS=OFF \
    -DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench

Running Benchmarks

Run All

./build/bench/lunar_log_bench

Run Filtered

Use --benchmark_filter to run a subset by regex:

# Only throughput benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Log"

# Only sink benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Sink"

# Only filtering benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Filter"

# Specific benchmark
./build/bench/lunar_log_bench --benchmark_filter="BM_E2E_Realistic"

JSON Output

Export results to a JSON file for analysis or CI comparison:

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json

Repetitions and Aggregates

For stable results, run multiple repetitions and report aggregates:

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

Benchmark Descriptions

Throughput (`bench_throughput.cpp`)

Core message throughput under various conditions.

Benchmark	What it measures	Why it matters
`BM_EmptyLogger`	Producer-side cost with zero sinks: level check, template parse, entry construction	Absolute minimum overhead per log call
`BM_LogInfo_SingleThread`	Full pipeline (parse, format, enqueue, NullSink write) on a single thread	Baseline single-thread throughput
`BM_LogTrace_Disabled`	TRACE log call with INFO min level — level check short-circuits before any work	Cost of a disabled log call (should be near-zero)
`BM_LogInfo_MultiThread`	Shared logger under 1, 2, 4, and 8 concurrent threads	Lock/queue contention scaling under concurrent writes
`BM_LogInfo_FlushEvery1`	Flush after every message (enqueue + consumer + sink + sync)	Worst-case end-to-end latency per message
`BM_LogInfo_FlushEvery100`	Flush every 100 messages	Balanced latency/throughput trade-off
`BM_LogInfo_FlushEvery1000`	Flush every 1000 messages	High-throughput batch mode pipeline cost

Formatting (`bench_formatting.cpp`)

Template parsing, rendering, and caching.

Benchmark	What it measures	Why it matters
`BM_FormatSimple`	Single placeholder: `"Hello {name}"`	Baseline formatting cost
`BM_FormatComplex`	Multiple placeholders with format specifiers and pipes: `"{method} {path} {status:04} in {elapsed:.2f}ms [{region\|upper}]"`	Realistic production template cost
`BM_FormatCacheHit`	Pre-warmed cache, same template on every iteration	Steady-state cost with warm cache
`BM_FormatCacheMiss`	256 rotating templates against a 128-entry cache (~50% miss rate)	Cache miss penalty — template re-parse cost
`BM_FormatPipeTransform`	Pipe chain: `"{value\|upper\|trim\|quote}"`	Per-transform overhead
`BM_FormatIndexed`	Indexed placeholders with reuse: `"{0} bought {1} for {0}"`	Indexed placeholder resolution cost

Filtering (`bench_filtering.cpp`)

Per-layer filter overhead.

Benchmark	What it measures	Why it matters
`BM_Filter_None`	No filters — pure baseline	Establishes the zero-filter cost
`BM_Filter_MinLevel`	Global min level only (messages pass)	Level-gate overhead
`BM_Filter_Predicate`	Global predicate filter (lambda evaluation per entry)	Predicate dispatch cost
`BM_Filter_DSL_1Rule`	Single DSL rule: `"level >= INFO"`	DSL evaluation baseline
`BM_Filter_DSL_5Rules`	Five AND-combined DSL rules	Scaling cost for multiple rules
`BM_Filter_DSL_10Rules`	Ten AND-combined DSL rules	Filter chain scaling at higher counts
`BM_Filter_Compact`	Compact filter syntax: `"INFO+ ~request !~heartbeat"`	Compact-filter-to-DSL cost
`BM_Filter_TagRouting`	3 sinks with `only()` and `except()` tag routing	Tag parse + routing overhead

Sinks (`bench_sinks.cpp`)

Sink write cost across sink types and formatters.

Benchmark	What it measures	Why it matters
`BM_Sink_Null`	NullSink (no-op write)	Sink dispatch overhead without I/O
`BM_Sink_File_HumanReadable`	FileSink with default HumanReadableFormatter	File I/O + HR formatting cost
`BM_Sink_File_Json`	FileSink with JsonFormatter	File I/O + JSON formatting cost
`BM_Sink_File_CompactJson`	FileSink with CompactJsonFormatter	File I/O + compact JSON formatting cost
`BM_Sink_Rolling`	RollingFileSink with daily rotation policy	Rolling sink overhead vs plain FileSink

All file sink benchmarks write to temp files and clean up after each run.

Enrichers (`bench_enrichers.cpp`)

Per-enricher overhead.

Benchmark	What it measures	Why it matters
`BM_Enricher_None`	No enrichers — baseline	Establishes zero-enricher cost
`BM_Enricher_ThreadId`	Single `Enrichers::threadId()` (dynamic, evaluates per entry)	Cost of a per-entry enricher
`BM_Enricher_Three`	Three enrichers: ThreadId + ProcessId + Property	Enricher chain scaling
`BM_Enricher_Lambda`	Custom lambda enricher	Lambda dispatch cost

End-to-End (`bench_e2e.cpp`)

Full-pipeline latency with a production-like configuration.

Benchmark	What it measures	Why it matters
`BM_E2E_Realistic`	3 sinks (NullSink + JSON file + Rolling error file), 2 enrichers (ThreadId + Property), WARN+ global filter, tag routing on error sink, per-iteration `flush()`	True end-to-end latency including enqueue, consumer processing, and all sink writes

This benchmark calls flush() inside the loop, so it measures the full round-trip time from log call to all sinks having written — not just enqueue latency.

Interpreting Results

Google Benchmark reports several key metrics:

Metric	Description
`Time`	Wall-clock time per iteration (includes I/O waits)
`CPU`	CPU time per iteration (excludes I/O waits)
`items_per_second`	Throughput — messages processed per second
`Iterations`	Number of iterations the benchmark ran

What to Look For

Throughput: items_per_second is the primary metric. Higher is better. Compare BM_LogInfo_SingleThread against BM_Sink_* benchmarks to isolate sink overhead.

Cache effectiveness: Compare BM_FormatCacheHit vs BM_FormatCacheMiss. If the miss penalty is large, consider increasing setTemplateCacheSize() for workloads with many distinct templates.

Filter scaling: Compare BM_Filter_DSL_1Rule → BM_Filter_DSL_5Rules → BM_Filter_DSL_10Rules to understand the per-rule cost. If overhead is significant, consolidate rules or use compact filters.

Sink comparison: Compare BM_Sink_Null → BM_Sink_File_HumanReadable → BM_Sink_File_Json → BM_Sink_File_CompactJson to see the formatter + I/O cost delta.

Enricher overhead: Subtract BM_Enricher_None from BM_Enricher_Three to see the total enricher cost. If it dominates your throughput budget, prefer cached enrichers (ProcessId, Property) over per-entry ones (ThreadId).

Contention: Compare single-thread vs multi-thread throughput in BM_LogInfo_MultiThread. The per-thread throughput drop indicates lock contention.

Example Output

-----------------------------------------------------------------------
Benchmark                             Time       CPU  items_per_second
-----------------------------------------------------------------------
BM_LogInfo_SingleThread            245 ns    243 ns        4.1M/s
BM_LogTrace_Disabled              1.2 ns    1.2 ns      833.3M/s
BM_LogInfo_MultiThread/threads:4   312 ns    298 ns       13.4M/s
BM_FormatCacheHit                  230 ns    228 ns        4.4M/s
BM_FormatCacheMiss                 580 ns    575 ns        1.7M/s
BM_Sink_Null                       240 ns    238 ns        4.2M/s
BM_Sink_File_Json                  890 ns    620 ns        1.6M/s
BM_E2E_Realistic                  4200 ns   2100 ns      476.2K/s

(Numbers are illustrative — actual results depend on hardware, OS, and compiler.)

NullSink Helper

The benchmark suite includes null_sink.hpp — a minimal no-op sink used to isolate non-I/O overhead:

class NullSink : public ISink {
public:
    void write(const LogEntry&) override {}
};

Use it in your own benchmarks or tests when you need to measure pipeline cost without file I/O.

CI Integration

The bench.yml workflow runs benchmarks on every push and pull request to master:

name: Benchmarks
on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  workflow_dispatch:

What it does:

Configures a Release build with benchmarks enabled (tests and examples disabled)
Builds the lunar_log_bench target
Runs all benchmarks with 5 repetitions, aggregate-only reporting, JSON output
Uploads bench_results.json as a GitHub Actions artifact

The results artifact is available for download from the Actions tab on each workflow run.

Running CI Locally

Replicate the CI benchmark run on your machine:

cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DLUNARLOG_BUILD_BENCHMARKS=ON \
    -DLUNARLOG_BUILD_TESTS=OFF \
    -DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench -j$(nproc)

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

Tips

Always benchmark in Release mode. Debug builds are 5-50x slower due to assertions and missing optimizations. Timings from Debug builds are not meaningful for performance analysis.
Close other workloads during benchmarking. Background processes, browser tabs, and system updates introduce noise.
Use --benchmark_repetitions=N (e.g. 5) for stable measurements. Google Benchmark reports mean, median, and stddev across repetitions.
Use --benchmark_filter to focus on specific areas. Running the full suite takes longer; targeted runs give faster feedback during development.
Compare items_per_second rather than raw nanoseconds when comparing across machines. Throughput is more portable than absolute timing.
Benchmark your actual configuration. The suite tests isolated layers. For your specific setup (number of sinks, enrichers, filter rules), the E2E benchmark is the closest proxy — or write a custom benchmark targeting your exact config.

Flush-Inclusive Throughput

The suite includes benchmarks that call flush() at different intervals to measure end-to-end pipeline throughput (enqueue + consumer + sink write):

Benchmark	Flush Frequency	What it measures
`BM_LogInfo_FlushEvery1`	Every message	Worst-case latency per message
`BM_LogInfo_FlushEvery100`	Every 100 msgs	Balanced throughput/latency
`BM_LogInfo_FlushEvery1000`	Every 1000 msgs	High-throughput batch mode
`BM_LogInfo_SingleThread`	End of run	Enqueue-only throughput

Rule of thumb: if your app flushes frequently (e.g. on every HTTP request), use FlushEvery1 numbers. For high-throughput logging, use FlushEvery1000.

CI Benchmark Results

Last updated: 2026-02-25 · Ubuntu (GitHub Actions) · 4 vCPU @ 3222 MHz · Release mode · Google Benchmark v1.9.1

Benchmark	CPU Time	Throughput
`BM_EmptyLogger`	473.8 ns	2.1M/s
`BM_LogInfo_SingleThread`	466.5 ns	2.1M/s
`BM_LogTrace_Disabled`	15.3 ns	65.5M/s
`BM_LogInfo_MultiThread/threads:1`	464.8 ns	2.2M/s
`BM_LogInfo_MultiThread/threads:2`	553.7 ns	1.8M/s
`BM_LogInfo_MultiThread/threads:4`	1165.5 ns	858.3K/s
`BM_LogInfo_MultiThread/threads:8`	1147.8 ns	871.4K/s
`BM_LogInfo_FlushEvery1`	510.7 ns	2.0M/s
`BM_LogInfo_FlushEvery100`	548.9 ns	1.8M/s
`BM_LogInfo_FlushEvery1000`	549.3 ns	1.8M/s
`BM_FormatSimple`	474.3 ns	2.1M/s
`BM_FormatComplex`	2867.8 ns	348.7K/s
`BM_FormatCacheHit`	2897.3 ns	345.2K/s
`BM_FormatCacheMiss`	1657.4 ns	603.5K/s
`BM_FormatPipeTransform`	839.2 ns	1.2M/s
`BM_FormatIndexed`	790.8 ns	1.3M/s
`BM_Filter_None`	443.6 ns	2.3M/s
`BM_Filter_MinLevel`	515.1 ns	1.9M/s
`BM_Filter_Predicate`	518.4 ns	1.9M/s
`BM_Filter_DSL_1Rule`	445.1 ns	2.2M/s
`BM_Filter_DSL_5Rules`	478.0 ns	2.1M/s
`BM_Filter_DSL_10Rules`	513.8 ns	1.9M/s
`BM_Filter_Compact`	597.1 ns	1.7M/s
`BM_Filter_TagRouting`	682.5 ns	1.5M/s
`BM_Sink_Null`	444.3 ns	2.3M/s
`BM_Sink_File_HumanReadable`	2628.0 ns	380.5K/s
`BM_Sink_File_Json`	3059.9 ns	326.8K/s
`BM_Sink_File_CompactJson`	2839.8 ns	352.1K/s
`BM_Sink_Rolling`	2627.3 ns	380.6K/s
`BM_Enricher_None`	441.4 ns	2.3M/s
`BM_Enricher_ThreadId`	632.4 ns	1.6M/s
`BM_Enricher_Three`	762.6 ns	1.3M/s
`BM_Enricher_Lambda`	563.1 ns	1.8M/s
`BM_E2E_Realistic`	6660.4 ns	150.1K/s

Results vary by hardware. Run locally for your platform.

Benchmark Suite

Benchmark Suite

v1.22.0 CoW Optimization Notes

Building the Benchmark Suite

CMake Options

Running Benchmarks

Run All

Run Filtered

JSON Output

Repetitions and Aggregates

Benchmark Descriptions

Throughput (bench_throughput.cpp)

Formatting (bench_formatting.cpp)

Filtering (bench_filtering.cpp)

Sinks (bench_sinks.cpp)

Enrichers (bench_enrichers.cpp)

End-to-End (bench_e2e.cpp)

Interpreting Results

What to Look For

Example Output

NullSink Helper

CI Integration

Running CI Locally

Tips

Flush-Inclusive Throughput

CI Benchmark Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Throughput (`bench_throughput.cpp`)

Formatting (`bench_formatting.cpp`)

Filtering (`bench_filtering.cpp`)

Sinks (`bench_sinks.cpp`)

Enrichers (`bench_enrichers.cpp`)

End-to-End (`bench_e2e.cpp`)