-
Notifications
You must be signed in to change notification settings - Fork 1
Benchmark Suite
LunarLog ships a comprehensive benchmark suite built on Google Benchmark v1.9.1. It measures every layer of the logging pipeline in isolation — throughput, template formatting, filter evaluation, sink I/O, enricher overhead — and a realistic end-to-end scenario that combines all of them.
Use these benchmarks to:
- Establish a performance baseline for your platform
- Compare sink types and formatter costs
- Measure the overhead of filters, enrichers, and tag routing
- Validate that configuration changes don't regress throughput
v1.22.0 replaces several hot-path deep copies with immutable snapshot sharing (std::shared_ptr<const T> + copy-on-write updates).
| Operation | Before | After |
|---|---|---|
| Per-sink filter check |
vector<FilterRule> deep copy |
shared_ptr refcount bump |
| Template cache hit |
vector<PlaceholderInfo> deep copy |
shared_ptr refcount bump |
| Global filter eval | Evaluated inside mutex | Snapshot outside mutex |
| Tag routing check |
set<string> evaluated inside lock |
Snapshot outside lock |
This change is internal only — public APIs and expected behavior are unchanged.
Benchmarks are opt-in and disabled by default. Enable them with the LUNARLOG_BUILD_BENCHMARKS CMake option:
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLUNARLOG_BUILD_BENCHMARKS=ON
cmake --build build --target lunar_log_benchGoogle Benchmark is fetched automatically via FetchContent — no manual dependency installation required.
Important: Always build in Release mode for meaningful results. Debug builds include assertions, sanitizers, and no optimizations — timings will be 5-50x slower and not representative of production performance.
| Option | Default | Description |
|---|---|---|
LUNARLOG_BUILD_BENCHMARKS |
OFF |
Build the benchmark executable |
LUNARLOG_BUILD_TESTS |
ON |
Build tests (disable with OFF for benchmark-only builds) |
LUNARLOG_BUILD_EXAMPLES |
ON |
Build examples (disable with OFF for benchmark-only builds) |
Minimal benchmark-only build:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DLUNARLOG_BUILD_BENCHMARKS=ON \
-DLUNARLOG_BUILD_TESTS=OFF \
-DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench./build/bench/lunar_log_benchUse --benchmark_filter to run a subset by regex:
# Only throughput benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Log"
# Only sink benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Sink"
# Only filtering benchmarks
./build/bench/lunar_log_bench --benchmark_filter="BM_Filter"
# Specific benchmark
./build/bench/lunar_log_bench --benchmark_filter="BM_E2E_Realistic"Export results to a JSON file for analysis or CI comparison:
./build/bench/lunar_log_bench \
--benchmark_format=json \
--benchmark_out=bench_results.jsonFor stable results, run multiple repetitions and report aggregates:
./build/bench/lunar_log_bench \
--benchmark_format=json \
--benchmark_out=bench_results.json \
--benchmark_repetitions=5 \
--benchmark_report_aggregates_only=trueCore message throughput under various conditions.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_EmptyLogger |
Producer-side cost with zero sinks: level check, template parse, entry construction | Absolute minimum overhead per log call |
BM_LogInfo_SingleThread |
Full pipeline (parse, format, enqueue, NullSink write) on a single thread | Baseline single-thread throughput |
BM_LogTrace_Disabled |
TRACE log call with INFO min level — level check short-circuits before any work | Cost of a disabled log call (should be near-zero) |
BM_LogInfo_MultiThread |
Shared logger under 1, 2, 4, and 8 concurrent threads | Lock/queue contention scaling under concurrent writes |
BM_LogInfo_FlushEvery1 |
Flush after every message (enqueue + consumer + sink + sync) | Worst-case end-to-end latency per message |
BM_LogInfo_FlushEvery100 |
Flush every 100 messages | Balanced latency/throughput trade-off |
BM_LogInfo_FlushEvery1000 |
Flush every 1000 messages | High-throughput batch mode pipeline cost |
Template parsing, rendering, and caching.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_FormatSimple |
Single placeholder: "Hello {name}"
|
Baseline formatting cost |
BM_FormatComplex |
Multiple placeholders with format specifiers and pipes: "{method} {path} {status:04} in {elapsed:.2f}ms [{region|upper}]"
|
Realistic production template cost |
BM_FormatCacheHit |
Pre-warmed cache, same template on every iteration | Steady-state cost with warm cache |
BM_FormatCacheMiss |
256 rotating templates against a 128-entry cache (~50% miss rate) | Cache miss penalty — template re-parse cost |
BM_FormatPipeTransform |
Pipe chain: "{value|upper|trim|quote}"
|
Per-transform overhead |
BM_FormatIndexed |
Indexed placeholders with reuse: "{0} bought {1} for {0}"
|
Indexed placeholder resolution cost |
Per-layer filter overhead.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_Filter_None |
No filters — pure baseline | Establishes the zero-filter cost |
BM_Filter_MinLevel |
Global min level only (messages pass) | Level-gate overhead |
BM_Filter_Predicate |
Global predicate filter (lambda evaluation per entry) | Predicate dispatch cost |
BM_Filter_DSL_1Rule |
Single DSL rule: "level >= INFO"
|
DSL evaluation baseline |
BM_Filter_DSL_5Rules |
Five AND-combined DSL rules | Scaling cost for multiple rules |
BM_Filter_DSL_10Rules |
Ten AND-combined DSL rules | Filter chain scaling at higher counts |
BM_Filter_Compact |
Compact filter syntax: "INFO+ ~request !~heartbeat"
|
Compact-filter-to-DSL cost |
BM_Filter_TagRouting |
3 sinks with only() and except() tag routing |
Tag parse + routing overhead |
Sink write cost across sink types and formatters.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_Sink_Null |
NullSink (no-op write) | Sink dispatch overhead without I/O |
BM_Sink_File_HumanReadable |
FileSink with default HumanReadableFormatter | File I/O + HR formatting cost |
BM_Sink_File_Json |
FileSink with JsonFormatter | File I/O + JSON formatting cost |
BM_Sink_File_CompactJson |
FileSink with CompactJsonFormatter | File I/O + compact JSON formatting cost |
BM_Sink_Rolling |
RollingFileSink with daily rotation policy | Rolling sink overhead vs plain FileSink |
All file sink benchmarks write to temp files and clean up after each run.
Per-enricher overhead.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_Enricher_None |
No enrichers — baseline | Establishes zero-enricher cost |
BM_Enricher_ThreadId |
Single Enrichers::threadId() (dynamic, evaluates per entry) |
Cost of a per-entry enricher |
BM_Enricher_Three |
Three enrichers: ThreadId + ProcessId + Property | Enricher chain scaling |
BM_Enricher_Lambda |
Custom lambda enricher | Lambda dispatch cost |
Full-pipeline latency with a production-like configuration.
| Benchmark | What it measures | Why it matters |
|---|---|---|
BM_E2E_Realistic |
3 sinks (NullSink + JSON file + Rolling error file), 2 enrichers (ThreadId + Property), WARN+ global filter, tag routing on error sink, per-iteration flush()
|
True end-to-end latency including enqueue, consumer processing, and all sink writes |
This benchmark calls flush() inside the loop, so it measures the full round-trip time from log call to all sinks having written — not just enqueue latency.
Google Benchmark reports several key metrics:
| Metric | Description |
|---|---|
Time |
Wall-clock time per iteration (includes I/O waits) |
CPU |
CPU time per iteration (excludes I/O waits) |
items_per_second |
Throughput — messages processed per second |
Iterations |
Number of iterations the benchmark ran |
Throughput: items_per_second is the primary metric. Higher is better. Compare BM_LogInfo_SingleThread against BM_Sink_* benchmarks to isolate sink overhead.
Cache effectiveness: Compare BM_FormatCacheHit vs BM_FormatCacheMiss. If the miss penalty is large, consider increasing setTemplateCacheSize() for workloads with many distinct templates.
Filter scaling: Compare BM_Filter_DSL_1Rule → BM_Filter_DSL_5Rules → BM_Filter_DSL_10Rules to understand the per-rule cost. If overhead is significant, consolidate rules or use compact filters.
Sink comparison: Compare BM_Sink_Null → BM_Sink_File_HumanReadable → BM_Sink_File_Json → BM_Sink_File_CompactJson to see the formatter + I/O cost delta.
Enricher overhead: Subtract BM_Enricher_None from BM_Enricher_Three to see the total enricher cost. If it dominates your throughput budget, prefer cached enrichers (ProcessId, Property) over per-entry ones (ThreadId).
Contention: Compare single-thread vs multi-thread throughput in BM_LogInfo_MultiThread. The per-thread throughput drop indicates lock contention.
-----------------------------------------------------------------------
Benchmark Time CPU items_per_second
-----------------------------------------------------------------------
BM_LogInfo_SingleThread 245 ns 243 ns 4.1M/s
BM_LogTrace_Disabled 1.2 ns 1.2 ns 833.3M/s
BM_LogInfo_MultiThread/threads:4 312 ns 298 ns 13.4M/s
BM_FormatCacheHit 230 ns 228 ns 4.4M/s
BM_FormatCacheMiss 580 ns 575 ns 1.7M/s
BM_Sink_Null 240 ns 238 ns 4.2M/s
BM_Sink_File_Json 890 ns 620 ns 1.6M/s
BM_E2E_Realistic 4200 ns 2100 ns 476.2K/s
(Numbers are illustrative — actual results depend on hardware, OS, and compiler.)
The benchmark suite includes null_sink.hpp — a minimal no-op sink used to isolate non-I/O overhead:
class NullSink : public ISink {
public:
void write(const LogEntry&) override {}
};Use it in your own benchmarks or tests when you need to measure pipeline cost without file I/O.
The bench.yml workflow runs benchmarks on every push and pull request to master:
name: Benchmarks
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
workflow_dispatch:What it does:
- Configures a Release build with benchmarks enabled (tests and examples disabled)
- Builds the
lunar_log_benchtarget - Runs all benchmarks with 5 repetitions, aggregate-only reporting, JSON output
- Uploads
bench_results.jsonas a GitHub Actions artifact
The results artifact is available for download from the Actions tab on each workflow run.
Replicate the CI benchmark run on your machine:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DLUNARLOG_BUILD_BENCHMARKS=ON \
-DLUNARLOG_BUILD_TESTS=OFF \
-DLUNARLOG_BUILD_EXAMPLES=OFF
cmake --build build --target lunar_log_bench -j$(nproc)
./build/bench/lunar_log_bench \
--benchmark_format=json \
--benchmark_out=bench_results.json \
--benchmark_repetitions=5 \
--benchmark_report_aggregates_only=true- Always benchmark in Release mode. Debug builds are 5-50x slower due to assertions and missing optimizations. Timings from Debug builds are not meaningful for performance analysis.
- Close other workloads during benchmarking. Background processes, browser tabs, and system updates introduce noise.
-
Use
--benchmark_repetitions=N(e.g. 5) for stable measurements. Google Benchmark reports mean, median, and stddev across repetitions. -
Use
--benchmark_filterto focus on specific areas. Running the full suite takes longer; targeted runs give faster feedback during development. -
Compare
items_per_secondrather than raw nanoseconds when comparing across machines. Throughput is more portable than absolute timing. - Benchmark your actual configuration. The suite tests isolated layers. For your specific setup (number of sinks, enrichers, filter rules), the E2E benchmark is the closest proxy — or write a custom benchmark targeting your exact config.
The suite includes benchmarks that call flush() at different intervals to measure end-to-end pipeline throughput (enqueue + consumer + sink write):
| Benchmark | Flush Frequency | What it measures |
|---|---|---|
BM_LogInfo_FlushEvery1 |
Every message | Worst-case latency per message |
BM_LogInfo_FlushEvery100 |
Every 100 msgs | Balanced throughput/latency |
BM_LogInfo_FlushEvery1000 |
Every 1000 msgs | High-throughput batch mode |
BM_LogInfo_SingleThread |
End of run | Enqueue-only throughput |
Rule of thumb: if your app flushes frequently (e.g. on every HTTP request), use FlushEvery1 numbers. For high-throughput logging, use FlushEvery1000.
Last updated: 2026-02-25 · Ubuntu (GitHub Actions) · 4 vCPU @ 3222 MHz · Release mode · Google Benchmark v1.9.1
| Benchmark | CPU Time | Throughput |
|---|---|---|
BM_EmptyLogger |
473.8 ns | 2.1M/s |
BM_LogInfo_SingleThread |
466.5 ns | 2.1M/s |
BM_LogTrace_Disabled |
15.3 ns | 65.5M/s |
BM_LogInfo_MultiThread/threads:1 |
464.8 ns | 2.2M/s |
BM_LogInfo_MultiThread/threads:2 |
553.7 ns | 1.8M/s |
BM_LogInfo_MultiThread/threads:4 |
1165.5 ns | 858.3K/s |
BM_LogInfo_MultiThread/threads:8 |
1147.8 ns | 871.4K/s |
BM_LogInfo_FlushEvery1 |
510.7 ns | 2.0M/s |
BM_LogInfo_FlushEvery100 |
548.9 ns | 1.8M/s |
BM_LogInfo_FlushEvery1000 |
549.3 ns | 1.8M/s |
BM_FormatSimple |
474.3 ns | 2.1M/s |
BM_FormatComplex |
2867.8 ns | 348.7K/s |
BM_FormatCacheHit |
2897.3 ns | 345.2K/s |
BM_FormatCacheMiss |
1657.4 ns | 603.5K/s |
BM_FormatPipeTransform |
839.2 ns | 1.2M/s |
BM_FormatIndexed |
790.8 ns | 1.3M/s |
BM_Filter_None |
443.6 ns | 2.3M/s |
BM_Filter_MinLevel |
515.1 ns | 1.9M/s |
BM_Filter_Predicate |
518.4 ns | 1.9M/s |
BM_Filter_DSL_1Rule |
445.1 ns | 2.2M/s |
BM_Filter_DSL_5Rules |
478.0 ns | 2.1M/s |
BM_Filter_DSL_10Rules |
513.8 ns | 1.9M/s |
BM_Filter_Compact |
597.1 ns | 1.7M/s |
BM_Filter_TagRouting |
682.5 ns | 1.5M/s |
BM_Sink_Null |
444.3 ns | 2.3M/s |
BM_Sink_File_HumanReadable |
2628.0 ns | 380.5K/s |
BM_Sink_File_Json |
3059.9 ns | 326.8K/s |
BM_Sink_File_CompactJson |
2839.8 ns | 352.1K/s |
BM_Sink_Rolling |
2627.3 ns | 380.6K/s |
BM_Enricher_None |
441.4 ns | 2.3M/s |
BM_Enricher_ThreadId |
632.4 ns | 1.6M/s |
BM_Enricher_Three |
762.6 ns | 1.3M/s |
BM_Enricher_Lambda |
563.1 ns | 1.8M/s |
BM_E2E_Realistic |
6660.4 ns | 150.1K/s |
Results vary by hardware. Run locally for your platform.