[PERF] TSDB: Default stripe size=1024 (and 256) (was 16384) by bboreham · Pull Request #17101 · prometheus/prometheus

bboreham · 2025-08-28T16:25:41Z

Expected to reduce resource usage without noticeable extra contention

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[PERF] TSDB: Reduce size of internal data structures to suit typical installations.

[Draft because I expect we will want to add a CLI flag to let people set it higher if they perceive contention]

Expected to reduce resource usage without noticeable extra contention Signed-off-by: Bryan Boreham <[email protected]>

bboreham · 2025-08-28T16:25:55Z

/prombench main

prombot · 2025-08-28T16:25:58Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-17101 and main

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main
To stop benchmark: /prombench cancel
To print help: /prombench help

bwplotka · 2025-08-29T04:35:00Z

Looks as this uses more memory (on avg) over time:

..and more CPU 🙈

bwplotka · 2025-08-29T04:36:17Z

BTW @bboreham I keep changing our prombench dashboard manually for longer averages to really tell the difference, because mem & CPU spikes are hard to compare (they happen on different times for each version). Should we add avg set of panels? (the spikes are still useful.. although max_over_time could tell us the same 🤔 ) WDYT?

prymitive · 2025-08-29T08:54:48Z

If you plot go_sync_mutex_wait_total_seconds_total there's more wait after this change:

bboreham · 2025-08-29T09:26:14Z

Looks as this uses more memory (on avg) over time:

RSS is pretty much meaningless; the Go heap is where I would expect to see any difference. But the expected difference is 3MB which is not visible on a 24GB scale.

Thanks @prymitive, the increased mutex time is interesting, seems to hit particularly during Head GC - the 5-second scrape interval lets me pin down the timing. Explore

2025-08-29T09:01:22.052Z level=INFO source=compact.go:598 msg="write block" component=tsdb mint=1756447200014 maxt=1756454400000 ulid=01K3TGW7MPSZ9QC4FF5NYRGRXS duration=1m22.029964316s ooo=false
2025-08-29T09:01:42.321Z level=INFO source=head.go:1422 msg="Head GC completed" component=tsdb caller=truncateMemory duration=20.267960764s
2025-08-29T09:01:42.394Z level=INFO source=checkpoint.go:100 msg="Creating checkpoint" component=tsdb from_segment=100 to_segment=114 mint=1756454400000
2025-08-29T09:02:05.639Z level=INFO source=head.go:1384 msg="WAL checkpoint complete" component=tsdb first=100 last=114 duration=23.244824427s

Logs Explore

I guess this proves that 256 is too low, and/or we should look at how Head GC takes locks.

bboreham · 2025-08-29T09:33:51Z

@prymitive I also notice your picture shows PR #17089. There is a bug in the way URLs are generated so it defaults to the lowest one active, sorry.

Signed-off-by: Bryan Boreham <[email protected]>

bboreham · 2025-08-29T15:32:10Z

/prombench cancel

prombot · 2025-08-29T15:32:13Z

Benchmark cancel is in progress.

bboreham · 2025-08-29T16:30:52Z

/prombench main

prombot · 2025-08-29T16:30:54Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-17101 and main

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main
To stop benchmark: /prombench cancel
To print help: /prombench help

bboreham · 2025-09-01T14:52:24Z

/prombench cancel

prombot · 2025-09-01T14:52:27Z

Benchmark cancel is in progress.

bboreham · 2025-09-01T14:53:37Z

I guess this isn't a slam-dunk.

machine424 · 2025-09-01T15:00:27Z

I'm running a bench for main against main #17110 (comment) prombench to see what go_sync_mutex_wait_total_seconds_total variance is acceptable.

(the next step after #15339 is to have it under a Go metrics panel in the dashboard, I'll try to add that. the metric was at least relevant once in the past #15242 (comment))

machine424 · 2025-09-04T12:12:17Z

go_sync_mutex_wait_total_seconds_total main vs main:

Note that a variance of ~90% could be "acceptable", they tend to catch up eventually but we shouldn’t expect them to be and stay close all the time.

[PERF] TSDB: Default stripe size=256 (was 16384)

ecd8c4e

Expected to reduce resource usage without noticeable extra contention Signed-off-by: Bryan Boreham <[email protected]>

prombot added the prombench label Aug 28, 2025

kgeckhart mentioned this pull request Aug 29, 2025

remote_write: Reduce stripe series size by 4x grafana/alloy#4306

Merged

2 tasks

Try DefaultStripeSize = 1024

1e855aa

Signed-off-by: Bryan Boreham <[email protected]>

bboreham changed the title ~~[PERF] TSDB: Default stripe size=256 (was 16384)~~ [PERF] TSDB: Default stripe size=1024 (and 256) (was 16384) Sep 1, 2025

bboreham closed this Sep 1, 2025

machine424 mentioned this pull request Sep 1, 2025

Pick #16925 into v3.6.0 #17089

Merged

machine424 mentioned this pull request Feb 26, 2026

prombench: Is Prombench reliable enough? fix or document ignorable baseline #18046

Open

Conversation

bboreham commented Aug 28, 2025

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

Uh oh!

bboreham commented Aug 28, 2025

Uh oh!

prombot commented Aug 28, 2025

Uh oh!

bwplotka commented Aug 29, 2025

Uh oh!

bwplotka commented Aug 29, 2025

Uh oh!

prymitive commented Aug 29, 2025

Uh oh!

bboreham commented Aug 29, 2025

Uh oh!

bboreham commented Aug 29, 2025

Uh oh!

bboreham commented Aug 29, 2025

Uh oh!

prombot commented Aug 29, 2025

Uh oh!

bboreham commented Aug 29, 2025

Uh oh!

prombot commented Aug 29, 2025

Uh oh!

bboreham commented Sep 1, 2025

Uh oh!

prombot commented Sep 1, 2025

Uh oh!

bboreham commented Sep 1, 2025

Uh oh!

machine424 commented Sep 1, 2025

Uh oh!

machine424 commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants