Skip to content

Benchmark Suite

github-actions[bot] edited this page Feb 25, 2026 · 39 revisions

Benchmark Suite

LunarLog ships a comprehensive benchmark suite built on Google Benchmark v1.9.1. It measures every layer of the logging pipeline in isolation — throughput, template formatting, filter evaluation, sink I/O, enricher overhead — and a realistic end-to-end scenario that combines all of them.

Use these benchmarks to:

  • Establish a performance baseline for your platform
  • Compare sink types and formatter costs
  • Measure the overhead of filters, enrichers, and tag routing
  • Validate that configuration changes don't regress throughput

v1.22.0 CoW Optimization Notes

v1.22.0 replaces several hot-path deep copies with immutable snapshot sharing (std::shared_ptr<const T> + copy-on-write updates).

Operation Before After
Per-sink filter check vector<FilterRule> deep copy shared_ptr refcount bump
Template cache hit vector<PlaceholderInfo> deep copy shared_ptr refcount bump
Global filter eval Evaluated inside mutex Snapshot outside mutex
Tag routing check set<string> evaluated inside lock Snapshot outside lock

This change is internal only — public APIs and expected behavior are unchanged.

Building the Benchmark Suite

Benchmarks are opt-in and disabled by default. Enable them with the LUNARLOG_BUILD_BENCHMARKS CMake option:

cmake -B build -DCMAKE_BUILD_TYPE=Release -DLUNARLOG_BUILD_BENCHMARKS=ON
cmake --build build --target lunar_log_bench

Google Benchmark is fetched automatically via FetchContent — no manual dependency installation required.

Important: Always build in Release mode for meaningful results. Debug builds include assertions, sanitizers, and no optimizations — timings will be 5-50x slower and not representative of production performance.

CMake Options

Option Default Description
LUNARLOG_BUILD_BENCHMARKS OFF Build the benchmark executable
LUNARLOG_BUILD_TESTS ON Build tests (disable with OFF for benchmark-only builds)
LUNARLOG_BUILD_EXAMPLES ON Build examples (disable with OFF for benchmark-only builds)

Minimal benchmark-only build:

cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DLUNARLOG_BUILD_BENCHMARKS=ON \
    -DLUNARLOG_BUILD_TESTS=OFF \
    -DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench

Running Benchmarks

Run All

./build/bench/lunar_log_bench

Run Filtered

Use --benchmark_filter to run a subset by regex:

# Only throughput benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Log"

# Only sink benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Sink"

# Only filtering benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Filter"

# Specific benchmark
./build/bench/lunar_log_bench --benchmark_filter="BM_E2E_Realistic"

JSON Output

Export results to a JSON file for analysis or CI comparison:

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json

Repetitions and Aggregates

For stable results, run multiple repetitions and report aggregates:

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

Benchmark Descriptions

Throughput (bench_throughput.cpp)

Core message throughput under various conditions.

Benchmark What it measures Why it matters
BM_EmptyLogger Producer-side cost with zero sinks: level check, template parse, entry construction Absolute minimum overhead per log call
BM_LogInfo_SingleThread Full pipeline (parse, format, enqueue, NullSink write) on a single thread Baseline single-thread throughput
BM_LogTrace_Disabled TRACE log call with INFO min level — level check short-circuits before any work Cost of a disabled log call (should be near-zero)
BM_LogInfo_MultiThread Shared logger under 1, 2, 4, and 8 concurrent threads Lock/queue contention scaling under concurrent writes
BM_LogInfo_FlushEvery1 Flush after every message (enqueue + consumer + sink + sync) Worst-case end-to-end latency per message
BM_LogInfo_FlushEvery100 Flush every 100 messages Balanced latency/throughput trade-off
BM_LogInfo_FlushEvery1000 Flush every 1000 messages High-throughput batch mode pipeline cost

Formatting (bench_formatting.cpp)

Template parsing, rendering, and caching.

Benchmark What it measures Why it matters
BM_FormatSimple Single placeholder: "Hello {name}" Baseline formatting cost
BM_FormatComplex Multiple placeholders with format specifiers and pipes: "{method} {path} {status:04} in {elapsed:.2f}ms [{region|upper}]" Realistic production template cost
BM_FormatCacheHit Pre-warmed cache, same template on every iteration Steady-state cost with warm cache
BM_FormatCacheMiss 256 rotating templates against a 128-entry cache (~50% miss rate) Cache miss penalty — template re-parse cost
BM_FormatPipeTransform Pipe chain: "{value|upper|trim|quote}" Per-transform overhead
BM_FormatIndexed Indexed placeholders with reuse: "{0} bought {1} for {0}" Indexed placeholder resolution cost

Filtering (bench_filtering.cpp)

Per-layer filter overhead.

Benchmark What it measures Why it matters
BM_Filter_None No filters — pure baseline Establishes the zero-filter cost
BM_Filter_MinLevel Global min level only (messages pass) Level-gate overhead
BM_Filter_Predicate Global predicate filter (lambda evaluation per entry) Predicate dispatch cost
BM_Filter_DSL_1Rule Single DSL rule: "level >= INFO" DSL evaluation baseline
BM_Filter_DSL_5Rules Five AND-combined DSL rules Scaling cost for multiple rules
BM_Filter_DSL_10Rules Ten AND-combined DSL rules Filter chain scaling at higher counts
BM_Filter_Compact Compact filter syntax: "INFO+ ~request !~heartbeat" Compact-filter-to-DSL cost
BM_Filter_TagRouting 3 sinks with only() and except() tag routing Tag parse + routing overhead

Sinks (bench_sinks.cpp)

Sink write cost across sink types and formatters.

Benchmark What it measures Why it matters
BM_Sink_Null NullSink (no-op write) Sink dispatch overhead without I/O
BM_Sink_File_HumanReadable FileSink with default HumanReadableFormatter File I/O + HR formatting cost
BM_Sink_File_Json FileSink with JsonFormatter File I/O + JSON formatting cost
BM_Sink_File_CompactJson FileSink with CompactJsonFormatter File I/O + compact JSON formatting cost
BM_Sink_Rolling RollingFileSink with daily rotation policy Rolling sink overhead vs plain FileSink

All file sink benchmarks write to temp files and clean up after each run.

Enrichers (bench_enrichers.cpp)

Per-enricher overhead.

Benchmark What it measures Why it matters
BM_Enricher_None No enrichers — baseline Establishes zero-enricher cost
BM_Enricher_ThreadId Single Enrichers::threadId() (dynamic, evaluates per entry) Cost of a per-entry enricher
BM_Enricher_Three Three enrichers: ThreadId + ProcessId + Property Enricher chain scaling
BM_Enricher_Lambda Custom lambda enricher Lambda dispatch cost

End-to-End (bench_e2e.cpp)

Full-pipeline latency with a production-like configuration.

Benchmark What it measures Why it matters
BM_E2E_Realistic 3 sinks (NullSink + JSON file + Rolling error file), 2 enrichers (ThreadId + Property), WARN+ global filter, tag routing on error sink, per-iteration flush() True end-to-end latency including enqueue, consumer processing, and all sink writes

This benchmark calls flush() inside the loop, so it measures the full round-trip time from log call to all sinks having written — not just enqueue latency.

Interpreting Results

Google Benchmark reports several key metrics:

Metric Description
Time Wall-clock time per iteration (includes I/O waits)
CPU CPU time per iteration (excludes I/O waits)
items_per_second Throughput — messages processed per second
Iterations Number of iterations the benchmark ran

What to Look For

Throughput: items_per_second is the primary metric. Higher is better. Compare BM_LogInfo_SingleThread against BM_Sink_* benchmarks to isolate sink overhead.

Cache effectiveness: Compare BM_FormatCacheHit vs BM_FormatCacheMiss. If the miss penalty is large, consider increasing setTemplateCacheSize() for workloads with many distinct templates.

Filter scaling: Compare BM_Filter_DSL_1RuleBM_Filter_DSL_5RulesBM_Filter_DSL_10Rules to understand the per-rule cost. If overhead is significant, consolidate rules or use compact filters.

Sink comparison: Compare BM_Sink_NullBM_Sink_File_HumanReadableBM_Sink_File_JsonBM_Sink_File_CompactJson to see the formatter + I/O cost delta.

Enricher overhead: Subtract BM_Enricher_None from BM_Enricher_Three to see the total enricher cost. If it dominates your throughput budget, prefer cached enrichers (ProcessId, Property) over per-entry ones (ThreadId).

Contention: Compare single-thread vs multi-thread throughput in BM_LogInfo_MultiThread. The per-thread throughput drop indicates lock contention.

Example Output

-----------------------------------------------------------------------
Benchmark                             Time       CPU  items_per_second
-----------------------------------------------------------------------
BM_LogInfo_SingleThread            245 ns    243 ns        4.1M/s
BM_LogTrace_Disabled              1.2 ns    1.2 ns      833.3M/s
BM_LogInfo_MultiThread/threads:4   312 ns    298 ns       13.4M/s
BM_FormatCacheHit                  230 ns    228 ns        4.4M/s
BM_FormatCacheMiss                 580 ns    575 ns        1.7M/s
BM_Sink_Null                       240 ns    238 ns        4.2M/s
BM_Sink_File_Json                  890 ns    620 ns        1.6M/s
BM_E2E_Realistic                  4200 ns   2100 ns      476.2K/s

(Numbers are illustrative — actual results depend on hardware, OS, and compiler.)

NullSink Helper

The benchmark suite includes null_sink.hpp — a minimal no-op sink used to isolate non-I/O overhead:

class NullSink : public ISink {
public:
    void write(const LogEntry&) override {}
};

Use it in your own benchmarks or tests when you need to measure pipeline cost without file I/O.

CI Integration

The bench.yml workflow runs benchmarks on every push and pull request to master:

name: Benchmarks
on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  workflow_dispatch:

What it does:

  1. Configures a Release build with benchmarks enabled (tests and examples disabled)
  2. Builds the lunar_log_bench target
  3. Runs all benchmarks with 5 repetitions, aggregate-only reporting, JSON output
  4. Uploads bench_results.json as a GitHub Actions artifact

The results artifact is available for download from the Actions tab on each workflow run.

Running CI Locally

Replicate the CI benchmark run on your machine:

cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DLUNARLOG_BUILD_BENCHMARKS=ON \
    -DLUNARLOG_BUILD_TESTS=OFF \
    -DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench -j$(nproc)

./build/bench/lunar_log_bench \
    --benchmark_format=json \
    --benchmark_out=bench_results.json \
    --benchmark_repetitions=5 \
    --benchmark_report_aggregates_only=true

Tips

  • Always benchmark in Release mode. Debug builds are 5-50x slower due to assertions and missing optimizations. Timings from Debug builds are not meaningful for performance analysis.
  • Close other workloads during benchmarking. Background processes, browser tabs, and system updates introduce noise.
  • Use --benchmark_repetitions=N (e.g. 5) for stable measurements. Google Benchmark reports mean, median, and stddev across repetitions.
  • Use --benchmark_filter to focus on specific areas. Running the full suite takes longer; targeted runs give faster feedback during development.
  • Compare items_per_second rather than raw nanoseconds when comparing across machines. Throughput is more portable than absolute timing.
  • Benchmark your actual configuration. The suite tests isolated layers. For your specific setup (number of sinks, enrichers, filter rules), the E2E benchmark is the closest proxy — or write a custom benchmark targeting your exact config.

Flush-Inclusive Throughput

The suite includes benchmarks that call flush() at different intervals to measure end-to-end pipeline throughput (enqueue + consumer + sink write):

Benchmark Flush Frequency What it measures
BM_LogInfo_FlushEvery1 Every message Worst-case latency per message
BM_LogInfo_FlushEvery100 Every 100 msgs Balanced throughput/latency
BM_LogInfo_FlushEvery1000 Every 1000 msgs High-throughput batch mode
BM_LogInfo_SingleThread End of run Enqueue-only throughput

Rule of thumb: if your app flushes frequently (e.g. on every HTTP request), use FlushEvery1 numbers. For high-throughput logging, use FlushEvery1000.

CI Benchmark Results

Last updated: 2026-02-25 · Ubuntu (GitHub Actions) · 4 vCPU @ 3222 MHz · Release mode · Google Benchmark v1.9.1

Benchmark CPU Time Throughput
BM_EmptyLogger 473.8 ns 2.1M/s
BM_LogInfo_SingleThread 466.5 ns 2.1M/s
BM_LogTrace_Disabled 15.3 ns 65.5M/s
BM_LogInfo_MultiThread/threads:1 464.8 ns 2.2M/s
BM_LogInfo_MultiThread/threads:2 553.7 ns 1.8M/s
BM_LogInfo_MultiThread/threads:4 1165.5 ns 858.3K/s
BM_LogInfo_MultiThread/threads:8 1147.8 ns 871.4K/s
BM_LogInfo_FlushEvery1 510.7 ns 2.0M/s
BM_LogInfo_FlushEvery100 548.9 ns 1.8M/s
BM_LogInfo_FlushEvery1000 549.3 ns 1.8M/s
BM_FormatSimple 474.3 ns 2.1M/s
BM_FormatComplex 2867.8 ns 348.7K/s
BM_FormatCacheHit 2897.3 ns 345.2K/s
BM_FormatCacheMiss 1657.4 ns 603.5K/s
BM_FormatPipeTransform 839.2 ns 1.2M/s
BM_FormatIndexed 790.8 ns 1.3M/s
BM_Filter_None 443.6 ns 2.3M/s
BM_Filter_MinLevel 515.1 ns 1.9M/s
BM_Filter_Predicate 518.4 ns 1.9M/s
BM_Filter_DSL_1Rule 445.1 ns 2.2M/s
BM_Filter_DSL_5Rules 478.0 ns 2.1M/s
BM_Filter_DSL_10Rules 513.8 ns 1.9M/s
BM_Filter_Compact 597.1 ns 1.7M/s
BM_Filter_TagRouting 682.5 ns 1.5M/s
BM_Sink_Null 444.3 ns 2.3M/s
BM_Sink_File_HumanReadable 2628.0 ns 380.5K/s
BM_Sink_File_Json 3059.9 ns 326.8K/s
BM_Sink_File_CompactJson 2839.8 ns 352.1K/s
BM_Sink_Rolling 2627.3 ns 380.6K/s
BM_Enricher_None 441.4 ns 2.3M/s
BM_Enricher_ThreadId 632.4 ns 1.6M/s
BM_Enricher_Three 762.6 ns 1.3M/s
BM_Enricher_Lambda 563.1 ns 1.8M/s
BM_E2E_Realistic 6660.4 ns 150.1K/s

Results vary by hardware. Run locally for your platform.

Clone this wiki locally