[REFACTORY] simplify appender commit#15112
Conversation
040c12d to
37cb166
Compare
| }() | ||
|
|
||
| collectOOORecords() | ||
| a.commitSamples(acc) |
There was a problem hiding this comment.
I wasn't sure if we gain any benefit on run these commit functions in parallel, I tested dispatching one go routine for each one, tests were working as expected, but not sure.
I'm open for feedback.
There was a problem hiding this comment.
We can play with this. idea in a separate PR and with a benchmark.
In theory these should not block on each other so I can see the argument for running them in parallel.
There was a problem hiding this comment.
I think a problem with running these in parallel is that you are not deterministic anymore with the behavior when a series receives a float and histogram sample as well. This is rare but can happen.
In fact I'd rather merge the 3 loops into one and actually write the samples (floats and histograms) in the order they were appended and not always in the floats then histograms order. See #15142
| // The function also manages the cleanup of append IDs and the pending commit state | ||
| // for each series. | ||
| func (a *headAppender) commitFloatHistograms(acc *appenderCommitContext) { | ||
| var ok, chunkCreated bool |
There was a problem hiding this comment.
I was planning to declare this variables on the Commit function, but CI lint is failing because the value will be overwritten before use it.
I think would be valuable, especially because we're declaring this var ok and chuckCreated on every commit function
There was a problem hiding this comment.
It's hard to say if there's an impact on performance here without a benchmark. Do you wanna try running it and checking the results?
In head_test.go we already have a benchmark called BenchmarkHeadAppender_Append_Commit_ExistingSeries but we want to benchmark only the the Commit() function here. So I guess it makes sense that you create a new benchmark?
There was a problem hiding this comment.
Yes, I was trying to look on how to benchmark only this function, but I didn't get there yet.
If you have any insights, I'm more than happy on hear them 🥳
There was a problem hiding this comment.
It's exactly the same we have in BenchmarkHeadAppender_Append_Commit_ExistingSeries, just move the append part out of the benchmark.
37cb166 to
9148a68
Compare
| acc := &appenderCommitContext{ | ||
| floatsAppended: len(a.samples), | ||
| histogramsAppended: len(a.histograms) + len(a.floatHistograms), | ||
| inOrderMint: math.MaxInt64, | ||
| inOrderMaxt: math.MinInt64, | ||
| oooMinT: math.MaxInt64, | ||
| oooMaxT: math.MinInt64, | ||
| oooCapMax: a.head.opts.OutOfOrderCapMax.Load(), | ||
| appendChunkOpts: chunkOpts{ | ||
| chunkDiskMapper: a.head.chunkDiskMapper, | ||
| chunkRange: a.head.chunkRange.Load(), | ||
| samplesPerChunk: a.head.opts.SamplesPerChunk, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Nice, I like the idea of creating an object for this. Maybe we could have a NewCommitContext function to decrease lines a little bit more?
| if a.head.writeNotified != nil { | ||
| a.head.writeNotified.Notify() | ||
| } | ||
| type appenderCommitContext struct { |
There was a problem hiding this comment.
Hmmm, I'm not certain about the naming here but I'm also struggling to come up with something better 😅
There was a problem hiding this comment.
maybe commitStatistics? does that make sense?
There was a problem hiding this comment.
We could even split this object in two or three:
- Holds information that will be passed to metrics, e.g.
floatsAppended,histogramsAppended, etc - Things related to OOO records, like oooMin/MaxT, wblSample/Histograms, records, mmapMarkers, etc
There was a problem hiding this comment.
Yes, I feel you, this name is not the best one, I was struggling to get something better as well.
I'm not sure also about commitStatistics because seems to me this object is holding much more information than the ones will be used to increment the metrics.
About split into sub structs, I don't see much value beyond making life easier to naming the root object, did you see anything else beyond that?
There was a problem hiding this comment.
Maybe commitMetrics?
NAMING! (insert old man yelling gif)
| // The function also manages the cleanup of append IDs and the pending commit state | ||
| // for each series. | ||
| func (a *headAppender) commitFloatHistograms(acc *appenderCommitContext) { | ||
| var ok, chunkCreated bool |
There was a problem hiding this comment.
It's hard to say if there's an impact on performance here without a benchmark. Do you wanna try running it and checking the results?
In head_test.go we already have a benchmark called BenchmarkHeadAppender_Append_Commit_ExistingSeries but we want to benchmark only the the Commit() function here. So I guess it makes sense that you create a new benchmark?
jesusvazquez
left a comment
There was a problem hiding this comment.
Thanks for addressing my feedback on the locks management. I think the method is much more readable now.
Looking forward to the experimenting on seeing if we can commit sample types in parallel.
I wont merge yet because I'd like Arthur's approval and possibly Krajo's review!
| if a.head.writeNotified != nil { | ||
| a.head.writeNotified.Notify() | ||
| } | ||
| type appenderCommitContext struct { |
There was a problem hiding this comment.
Maybe commitMetrics?
NAMING! (insert old man yelling gif)
| }() | ||
|
|
||
| collectOOORecords() | ||
| a.commitSamples(acc) |
There was a problem hiding this comment.
We can play with this. idea in a separate PR and with a benchmark.
In theory these should not block on each other so I can see the argument for running them in parallel.
Signed-off-by: Nicolas Takashi <[email protected]>
81f5a64 to
870ed74
Compare
krajorama
left a comment
There was a problem hiding this comment.
Awesome stuff! I agree with Arthur that we need a Benchmark test to verify. I've added some minor comments . Thanks!
| }() | ||
|
|
||
| collectOOORecords() | ||
| a.commitSamples(acc) |
There was a problem hiding this comment.
I think a problem with running these in parallel is that you are not deterministic anymore with the behavior when a series receives a float and histogram sample as well. This is rare but can happen.
In fact I'd rather merge the 3 loops into one and actually write the samples (floats and histograms) in the order they were appended and not always in the floats then histograms order. See #15142
Co-authored-by: George Krajcsovits <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Co-authored-by: George Krajcsovits <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Co-authored-by: George Krajcsovits <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
26065db to
873db29
Compare
|
Hey @jesusvazquez @krajorama @ArthurSens, I just pushed the benchmark tests and would really appreciate your feedback. I’m not entirely confident that I’ve set everything up correctly 😅. Here’s how I ran the benchmark: go test -benchmem -run='^$' -bench='^BenchmarkHead_WalCommitWithVaryingSeriesSamplesExemplarsAndHistograms$' github.com/prometheus/prometheus
/tsdbBelow are the results: Main Branch: PR: 15112 Let me know if you think any adjustments are needed, or if you would like me to rerun the tests with different parameters before we trigger prombench. I’m happy to tweak things as needed! Looking forward to your feedback! |
b333719 to
1258711
Compare
Co-authored-by: George Krajcsovits <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
tsdb/head_bench_test.go
Outdated
|
|
||
| for _, seriesCount := range seriesCounts { | ||
| b.Run(fmt.Sprintf("%d series", seriesCount), func(b *testing.B) { | ||
| for _, samplesPerAppend := range []int64{1, 2, 5, 100} { |
There was a problem hiding this comment.
This is only ever 1 when using Prometheus code.
There was a problem hiding this comment.
can you elaborate a little bit more @bboreham ?
I'm not familiar with this part of the code 🙏🏽
| } | ||
| } | ||
|
|
||
| func BenchmarkHead_WalCommit(b *testing.B) { |
There was a problem hiding this comment.
I've been thinking about this test and there's a lot of extra stuff it does beside measuring Commit(). I think we should utilize b.StopTimer(), b.StartTimer() to exclude the appender creation, sample append and measure just Commit(). That way you don't actually need to keep track of ts, because you can just create a new Appender and in fact new TSDB head.
There was a problem hiding this comment.
Something like
func BenchmarkHead_WalCommit(b *testing.B) {
seriesCounts := []int{100, 1000, 10000}
series := genSeries(10000, 10, 0, 0) // Only using the generated labels.
appendSamples := func(b *testing.B, app storage.Appender, seriesCount int, ts int64) {
var err error
for i, s := range series[:seriesCount] {
var ref storage.SeriesRef
if i%2 == 0 {
ref, err = app.Append(ref, s.Labels(), ts, float64(ts))
} else {
h := &histogram.Histogram{
Count: 7 + uint64(ts*5),
ZeroCount: 2 + uint64(ts),
ZeroThreshold: 0.001,
Sum: 18.4 * rand.Float64(),
Schema: 1,
PositiveSpans: []histogram.Span{
{Offset: 0, Length: 2},
{Offset: 1, Length: 2},
},
PositiveBuckets: []int64{ts + 1, 1, -1, 0},
}
ref, err = app.AppendHistogram(ref, s.Labels(), ts, h, nil)
}
require.NoError(b, err)
_, err = app.AppendExemplar(ref, s.Labels(), exemplar.Exemplar{
Labels: labels.FromStrings("trace_id", strconv.Itoa(rand.Int())),
Value: rand.Float64(),
Ts: ts,
})
require.NoError(b, err)
}
}
for _, seriesCount := range seriesCounts {
b.Run(fmt.Sprintf("%d series", seriesCount), func(b *testing.B) {
for _, commits := range []int64{1, 2} {
b.Run(fmt.Sprintf("%d commits", commits), func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
b.StopTimer()
h, w := newTestHead(b, 10000, wlog.CompressionNone, false)
b.Cleanup(func() {
if h != nil {
h.Close()
}
if w != nil {
w.Close()
}
})
app := h.Appender(context.Background())
appendSamples(b, app, seriesCount, 0)
b.StartTimer()
require.NoError(b, app.Commit())
if commits == 2 {
b.StopTimer()
app = h.Appender(context.Background())
appendSamples(b, app, seriesCount, 1)
b.StartTimer()
require.NoError(b, app.Commit())
}
b.StopTimer()
h.Close()
h = nil
w.Close()
w = nil
}
})
}
})
}
}
Which gives me:
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
cpu: AMD Ryzen 7 4700U with Radeon Graphics
│ main.txt │ pr.txt │
│ sec/op │ sec/op vs base │
Head_WalCommit/100_series/1_commits-8 464.3µ ± 4% 399.0µ ± 14% -14.07% (p=0.002 n=6)
Head_WalCommit/100_series/2_commits-8 835.5µ ± 0% 837.0µ ± 1% ~ (p=0.699 n=6)
Head_WalCommit/1000_series/1_commits-8 2.786m ± 2% 2.774m ± 5% ~ (p=0.699 n=6)
Head_WalCommit/1000_series/2_commits-8 3.740m ± 2% 3.718m ± 3% ~ (p=0.589 n=6)
Head_WalCommit/10000_series/1_commits-8 24.90m ± 4% 24.92m ± 5% ~ (p=0.699 n=6)
Head_WalCommit/10000_series/2_commits-8 34.50m ± ∞ ¹ 34.18m ± 4% ~ (p=1.000 n=5+6)
geomean 3.891m 3.784m -2.77%
¹ need >= 6 samples for confidence interval at level 0.95
│ main.txt │ pr.txt │
│ B/op │ B/op vs base │
Head_WalCommit/100_series/1_commits-8 221.3Ki ± 0% 221.3Ki ± 0% ~ (p=0.853 n=6)
Head_WalCommit/100_series/2_commits-8 226.8Ki ± 0% 226.8Ki ± 0% ~ (p=1.000 n=6)
Head_WalCommit/1000_series/1_commits-8 2.101Mi ± 0% 2.101Mi ± 0% ~ (p=0.394 n=6)
Head_WalCommit/1000_series/2_commits-8 2.152Mi ± 0% 2.152Mi ± 0% ~ (p=1.000 n=6)
Head_WalCommit/10000_series/1_commits-8 22.75Mi ± 0% 22.75Mi ± 0% ~ (p=0.937 n=6)
Head_WalCommit/10000_series/2_commits-8 23.63Mi ± ∞ ¹ 23.60Mi ± 1% ~ (p=0.662 n=5+6)
geomean 2.209Mi 2.209Mi -0.02%
¹ need >= 6 samples for confidence interval at level 0.95
│ main.txt │ pr.txt │
│ allocs/op │ allocs/op vs base │
Head_WalCommit/100_series/1_commits-8 1.055k ± 0% 1.055k ± 0% ~ (p=1.000 n=6) ¹
Head_WalCommit/100_series/2_commits-8 1.064k ± 0% 1.064k ± 0% ~ (p=1.000 n=6) ¹
Head_WalCommit/1000_series/1_commits-8 9.637k ± 0% 9.637k ± 0% ~ (p=1.000 n=6) ¹
Head_WalCommit/1000_series/2_commits-8 9.655k ± 0% 9.655k ± 0% ~ (p=1.000 n=6)
Head_WalCommit/10000_series/1_commits-8 95.30k ± 0% 95.30k ± 0% ~ (p=0.455 n=6)
Head_WalCommit/10000_series/2_commits-8 95.34k ± ∞ ² 95.34k ± 0% ~ (p=0.909 n=5+6)
geomean 9.913k 9.913k -0.00%
¹ all samples are equal
² need >= 6 samples for confidence interval at level 0.95
which is pretty good.
There was a problem hiding this comment.
😮 Lots of learning, thanks @krajorama I just pushed your suggestions.
Can you take another look?
Signed-off-by: Nicolas Takashi <[email protected]>
krajorama
left a comment
There was a problem hiding this comment.
I think this is pretty much done, I have one request to move series out of the struct and make it a simple local variable. @ArthurSens @jesusvazquez plz take another look.
Co-authored-by: George Krajcsovits <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]>
Signed-off-by: Nicolas Takashi <[email protected]>
krajorama
left a comment
There was a problem hiding this comment.
LGTM, would like a second opinion
ArthurSens
left a comment
There was a problem hiding this comment.
LGTM
I was in doubt when reading the tests so maybe we can make it clearer with some extra comments, but they are nits :)
| if i%2 == 0 { | ||
| ref, err = app.Append(ref, s.Labels(), ts, float64(ts)) |
There was a problem hiding this comment.
Could we add a comment explaining why we do %2 == 0? I'm having a hard time understanding it myself 🥲
There was a problem hiding this comment.
I'll push a comment!
But it's basically if i is even then we append a sample otherwise append histograms.
There was a problem hiding this comment.
That part I understood, what I couldn't understand is why we do that 🤔. Couldn't we append both of them always?
There was a problem hiding this comment.
I think its to divide the samples between samples and histograms, but I might be wrong.
@krajorama probably will provide a better vision
There was a problem hiding this comment.
I suggested that to easily add float and histogram samples as well to the mix.
tsdb/head_bench_test.go
Outdated
|
|
||
| for _, seriesCount := range seriesCounts { | ||
| b.Run(fmt.Sprintf("%d series", seriesCount), func(b *testing.B) { | ||
| for _, commits := range []int64{1, 2} { |
There was a problem hiding this comment.
Similarly here, I don't understand why we're doing 1 or 2 commits 🤔
There was a problem hiding this comment.
If I'm not wrong this is acting as a warm up, can you help @krajorama?
There was a problem hiding this comment.
I suggested that to measure the case where series are created in the head vs the case where the series already exists.
0dbd961
|
@nicolastakashi I think you pushed some extra files by accident 😅 |
Signed-off-by: Nicolas Takashi <[email protected]>
0dbd961 to
4fdee0b
Compare
Signed-off-by: Arthur Silva Sens <[email protected]>
Whoops sorry, I just added an extra comment. I'll merge as soon as CI is green again :) |
krajorama
left a comment
There was a problem hiding this comment.
LGTM @ArthurSens please use squash commit , it's a lot of small commits.
Refactoring Head Appender Commit by splitting the Commit function into smaller functions and trying to reuse common code blocks between different samples type, it relates to #11329.