Skip to content

[PERF] TSDB: Default stripe size=1024 (and 256) (was 16384)#17101

Closed
bboreham wants to merge 2 commits intoprometheus:mainfrom
bboreham:fewer-stripes
Closed

[PERF] TSDB: Default stripe size=1024 (and 256) (was 16384)#17101
bboreham wants to merge 2 commits intoprometheus:mainfrom
bboreham:fewer-stripes

Conversation

@bboreham
Copy link
Copy Markdown
Member

Expected to reduce resource usage without noticeable extra contention

Which issue(s) does the PR fix:

Fixes #17100

Does this PR introduce a user-facing change?

[PERF] TSDB: Reduce size of internal data structures to suit typical installations. 

[Draft because I expect we will want to add a CLI flag to let people set it higher if they perceive contention]

Expected to reduce resource usage without noticeable extra contention

Signed-off-by: Bryan Boreham <[email protected]>
@bboreham
Copy link
Copy Markdown
Member Author

/prombench main

@prombot
Copy link
Copy Markdown
Contributor

prombot commented Aug 28, 2025

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-17101 and main

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

  • To restart benchmark: /prombench restart main
  • To stop benchmark: /prombench cancel
  • To print help: /prombench help

@bwplotka
Copy link
Copy Markdown
Member

Looks as this uses more memory (on avg) over time:

image

..and more CPU 🙈

image

@bwplotka
Copy link
Copy Markdown
Member

BTW @bboreham I keep changing our prombench dashboard manually for longer averages to really tell the difference, because mem & CPU spikes are hard to compare (they happen on different times for each version). Should we add avg set of panels? (the spikes are still useful.. although max_over_time could tell us the same 🤔 ) WDYT?

@prymitive
Copy link
Copy Markdown
Contributor

If you plot go_sync_mutex_wait_total_seconds_total there's more wait after this change:

image

@bboreham
Copy link
Copy Markdown
Member Author

Looks as this uses more memory (on avg) over time:

RSS is pretty much meaningless; the Go heap is where I would expect to see any difference. But the expected difference is 3MB which is not visible on a 24GB scale.

image

Thanks @prymitive, the increased mutex time is interesting, seems to hit particularly during Head GC - the 5-second scrape interval lets me pin down the timing. Explore

image
2025-08-29T09:01:22.052Z level=INFO source=compact.go:598 msg="write block" component=tsdb mint=1756447200014 maxt=1756454400000 ulid=01K3TGW7MPSZ9QC4FF5NYRGRXS duration=1m22.029964316s ooo=false
2025-08-29T09:01:42.321Z level=INFO source=head.go:1422 msg="Head GC completed" component=tsdb caller=truncateMemory duration=20.267960764s
2025-08-29T09:01:42.394Z level=INFO source=checkpoint.go:100 msg="Creating checkpoint" component=tsdb from_segment=100 to_segment=114 mint=1756454400000
2025-08-29T09:02:05.639Z level=INFO source=head.go:1384 msg="WAL checkpoint complete" component=tsdb first=100 last=114 duration=23.244824427s

Logs Explore

I guess this proves that 256 is too low, and/or we should look at how Head GC takes locks.

@bboreham
Copy link
Copy Markdown
Member Author

@prymitive I also notice your picture shows PR #17089. There is a bug in the way URLs are generated so it defaults to the lowest one active, sorry.

Signed-off-by: Bryan Boreham <[email protected]>
@bboreham
Copy link
Copy Markdown
Member Author

/prombench cancel

@prombot
Copy link
Copy Markdown
Contributor

prombot commented Aug 29, 2025

Benchmark cancel is in progress.

@bboreham
Copy link
Copy Markdown
Member Author

/prombench main

@prombot
Copy link
Copy Markdown
Contributor

prombot commented Aug 29, 2025

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-17101 and main

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

  • To restart benchmark: /prombench restart main
  • To stop benchmark: /prombench cancel
  • To print help: /prombench help

@bboreham
Copy link
Copy Markdown
Member Author

bboreham commented Sep 1, 2025

/prombench cancel

@prombot
Copy link
Copy Markdown
Contributor

prombot commented Sep 1, 2025

Benchmark cancel is in progress.

@bboreham bboreham changed the title [PERF] TSDB: Default stripe size=256 (was 16384) [PERF] TSDB: Default stripe size=1024 (and 256) (was 16384) Sep 1, 2025
@bboreham
Copy link
Copy Markdown
Member Author

bboreham commented Sep 1, 2025

I guess this isn't a slam-dunk.

@bboreham bboreham closed this Sep 1, 2025
@machine424
Copy link
Copy Markdown
Member

I'm running a bench for main against main #17110 (comment) prombench to see what go_sync_mutex_wait_total_seconds_total variance is acceptable.

(the next step after #15339 is to have it under a Go metrics panel in the dashboard, I'll try to add that. the metric was at least relevant once in the past #15242 (comment))

@machine424
Copy link
Copy Markdown
Member

go_sync_mutex_wait_total_seconds_total main vs main:

Note that a variance of ~90% could be "acceptable", they tend to catch up eventually but we shouldn’t expect them to be and stay close all the time.

Screenshot 2025-09-03 at 09 23 03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tsdb.DefaultStripeSize is way too big at 16384

5 participants