[PERF] TSDB: Default stripe size=1024 (and 256) (was 16384)#17101
[PERF] TSDB: Default stripe size=1024 (and 256) (was 16384)#17101bboreham wants to merge 2 commits intoprometheus:mainfrom
Conversation
Expected to reduce resource usage without noticeable extra contention Signed-off-by: Bryan Boreham <[email protected]>
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
BTW @bboreham I keep changing our prombench dashboard manually for longer averages to really tell the difference, because mem & CPU spikes are hard to compare (they happen on different times for each version). Should we add avg set of panels? (the spikes are still useful.. although max_over_time could tell us the same 🤔 ) WDYT? |
RSS is pretty much meaningless; the Go heap is where I would expect to see any difference. But the expected difference is 3MB which is not visible on a 24GB scale.
Thanks @prymitive, the increased mutex time is interesting, seems to hit particularly during Head GC - the 5-second scrape interval lets me pin down the timing. Explore
I guess this proves that 256 is too low, and/or we should look at how Head GC takes locks. |
|
@prymitive I also notice your picture shows PR #17089. There is a bug in the way URLs are generated so it defaults to the lowest one active, sorry. |
Signed-off-by: Bryan Boreham <[email protected]>
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
I guess this isn't a slam-dunk. |
|
I'm running a bench for main against main #17110 (comment) prombench to see what go_sync_mutex_wait_total_seconds_total variance is acceptable. (the next step after #15339 is to have it under a |






Expected to reduce resource usage without noticeable extra contention
Which issue(s) does the PR fix:
Fixes #17100
Does this PR introduce a user-facing change?
[Draft because I expect we will want to add a CLI flag to let people set it higher if they perceive contention]