Skip to content

refactor(appenderV2)[PART1]: add AppenderV2 interface; add TSDB AppenderV2 implementation#17629

Merged
bwplotka merged 6 commits intomainfrom
bwplotka/a2-tsdb
Dec 9, 2025
Merged

refactor(appenderV2)[PART1]: add AppenderV2 interface; add TSDB AppenderV2 implementation#17629
bwplotka merged 6 commits intomainfrom
bwplotka/a2-tsdb

Conversation

@bwplotka
Copy link
Copy Markdown
Member

@bwplotka bwplotka commented Dec 1, 2025

This is a first PR towards moving to a new, unified, cleaner, more flexible AppenderV2 interface, for simpler review.

See

NOTE: This PR does NOT change any production flow (no one is using the new AppenderV2 logic that TSDB now implements). As a result there's no point to run prombench on this PR.

Changes

  • Added interface
  • Added TSDB AppenderV2 implementation
  • (majority of LOCs) Ported all tests that explicitly use Appender.
    • I tried removed bunch of tests totally unrelated to Appender implementation, but it's hard to judge sometimes. If I wasn't sure I was keeping those. Eventually we will need to port all of them once we remove Appender.
    • There are bunch of tests that use Appender inside various helps (e.g. createBlock). I decided to not port those as those likely don't test appending itself. We will port once we will switch prod to AppenderV2.

I suggest review by commit, especially if you are curious how Appender vs AppenderV2 implementation (and tests) are different.

image

On top of that I removed BenchmarkHead_WalCommit instead of adding V2 flow to it. This benchmark is broken on main, was odd. It feels we should write one from scratch if we need it:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M1 Pro
BenchmarkHead_WalCommit
BenchmarkHead_WalCommit/100_series
BenchmarkHead_WalCommit/100_series/1_commits
    benchmark.go:412: B.Loop called with timer stopped

Benchmarks

For 3 exemplar case (histogram case), the new interface is clearly faster (one append vs 4)

benchstat -col /appender append.txt
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/tsdb
cpu: AMD EPYC 7B12
                                                                                             │     v1      │                 v2                 │
                                                                                             │   sec/op    │   sec/op     vs base               │
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=1-2                         14.67µ ± 4%   14.00µ ± 3%   -4.55% (p=0.015 n=6)
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=5-2                         28.65µ ± 1%   27.79µ ± 2%   -3.01% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=100-2                       352.4µ ± 1%   343.4µ ± 1%   -2.55% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=1-2                        65.31µ ± 3%   64.57µ ± 6%        ~ (p=0.699 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=5-2                        207.6µ ± 2%   203.4µ ± 4%        ~ (p=0.240 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=100-2                      3.126m ± 4%   3.051m ± 1%   -2.40% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=1-2      53.09µ ± 1%   49.43µ ± 2%   -6.89% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=5-2      159.4µ ± 2%   133.8µ ± 4%  -16.06% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=100-2    2.432m ± 1%   1.916m ± 3%  -21.25% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=1-2     343.2µ ± 1%   293.1µ ± 2%  -14.58% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=5-2     1.443m ± 2%   1.201m ± 1%  -16.80% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=100-2   22.90m ± 2%   17.94m ± 2%  -21.63% (p=0.002 n=6)
geomean                                                                                        326.5µ        294.7µ        -9.74%

                                                                                             │      v1      │                  v2                  │
                                                                                             │     B/op     │     B/op      vs base                │
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=1-2                           675.0 ± 0%     675.0 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=5-2                           924.0 ± 0%     925.0 ± 0%       ~ (p=0.405 n=6)
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=100-2                       5.644Ki ± 0%   5.647Ki ± 0%       ~ (p=0.710 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=1-2                        2.931Ki ± 0%   2.930Ki ± 0%       ~ (p=0.892 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=5-2                        4.430Ki ± 0%   4.458Ki ± 1%  +0.62% (p=0.041 n=6)
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=100-2                      52.21Ki ± 2%   51.91Ki ± 1%       ~ (p=0.310 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=1-2      4.553Ki ± 0%   4.679Ki ± 0%  +2.76% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=5-2      19.41Ki ± 0%   19.46Ki ± 0%  +0.24% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=100-2    376.3Ki ± 1%   374.3Ki ± 1%  -0.54% (p=0.041 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=1-2     40.79Ki ± 0%   40.97Ki ± 0%  +0.44% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=5-2     191.1Ki ± 0%   189.9Ki ± 0%  -0.61% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=100-2   3.899Mi ± 8%   3.862Mi ± 6%  -0.96% (p=0.041 n=6)
geomean                                                                                        19.10Ki        19.12Ki       +0.12%
¹ all samples are equal

                                                                                             │     v1      │                 v2                  │
                                                                                             │  allocs/op  │  allocs/op   vs base                │
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=1-2                          8.000 ± 0%    8.000 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=5-2                          9.000 ± 0%    9.000 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=10/samples_per_append=100-2                        57.00 ± 0%    57.00 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=1-2                         44.00 ± 0%    44.00 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=5-2                         51.00 ± 0%    51.00 ± 0%       ~ (p=1.000 n=6) ¹
HeadAppender_AppendCommit/case=floats/series=100/samples_per_append=100-2                       513.0 ± 0%    513.0 ± 0%       ~ (p=1.000 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=1-2       86.00 ± 0%    87.00 ± 0%  +1.16% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=5-2       407.0 ± 0%    408.0 ± 0%  +0.25% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=10/samples_per_append=100-2    8.047k ± 0%   8.025k ± 0%  -0.27% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=1-2      816.0 ± 0%    817.0 ± 0%  +0.12% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=5-2     4.035k ± 0%   4.031k ± 0%  -0.10% (p=0.002 n=6)
HeadAppender_AppendCommit/case=floatsHistogramsExemplars/series=100/samples_per_append=100-2   80.27k ± 0%   80.05k ± 0%  -0.28% (p=0.002 n=6)
geomean                                                                                         289.9         290.1       +0.07%
¹ all samples are equal

Does this PR introduce a user-facing change?

NONE

Copy link
Copy Markdown
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks really good, will try this out! 🚀

Some small nits

@bwplotka
Copy link
Copy Markdown
Member Author

bwplotka commented Dec 1, 2025

Ran quick benchmark.

We expect:

  • Similar CPU/latency on Appending floats
  • Similar if not a bit better latency on Appending sample + exemplar (one big map/lock lookup less)
  • Similar memory

Currently we fail on memory, investigating why. I know one benchmark-side thing to fix (slice of exemplars), profiling.

EDIT: It was benchmark issue, not code. Updated PR description with the latest numbers (we are now faster).

@bwplotka bwplotka force-pushed the bwplotka/a2-tsdb branch 2 times, most recently from 3e22da5 to a1d088e Compare December 1, 2025 18:20
@krajorama krajorama self-assigned this Dec 2, 2025
Copy link
Copy Markdown
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I've added a couple of comments.

Test coverage is pretty good, I think we need an extra test in head_append_v2_test.go for the metadata in WAL case.

Also maybe have db_append_v2_test.go ? We'll have to convert those eventually anyway.

bwplotka added a commit that referenced this pull request Dec 2, 2025
For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so #17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>
bwplotka added a commit that referenced this pull request Dec 2, 2025
For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so #17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>
bwplotka added a commit that referenced this pull request Dec 2, 2025
For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so #17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>
bwplotka added a commit that referenced this pull request Dec 2, 2025
For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so #17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>
bwplotka added a commit that referenced this pull request Dec 3, 2025
For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so #17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>
@bwplotka bwplotka force-pushed the bwplotka/a2-tsdb branch 4 times, most recently from 96308f4 to 82b783f Compare December 3, 2025 11:00
@bwplotka
Copy link
Copy Markdown
Member Author

bwplotka commented Dec 3, 2025

Addressed all, PTAL! @krajorama

dimitarvdimitrov pushed a commit to grafana/mimir-prometheus that referenced this pull request Dec 8, 2025
* promql: fix histogram_fraction issue when lower falls within the first bucket (#17424)

Signed-off-by: Mohammad Alavi <[email protected]>

* prepare release 3.8.0-rc.0

Signed-off-by: Jan Fajerski <[email protected]>

* test: skip TestRemoteWrite_ReshardingWithoutDeadlock temporarily as flaky (#17534) (#17543)

(cherry picked from commit 35c3232)

Signed-off-by: machine424 <[email protected]>
Signed-off-by: Jan Fajerski <[email protected]>
Co-authored-by: Ayoub Mrini <[email protected]>

* chore(deps): bump prometheus/promci from 0.4.7 to 0.5.0

Signed-off-by: Jan Fajerski <[email protected]>

* chore(deps): bump prometheus/promci from 0.5.0 to 0.5.1

Signed-off-by: Jan Fajerski <[email protected]>

* chore(deps): bump prometheus/promci from 0.5.1 to 0.5.2

Signed-off-by: Jan Fajerski <[email protected]>

* chore(deps): bump prometheus/promci from 0.5.2 to 0.5.3

Signed-off-by: Jan Fajerski <[email protected]>

* prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (#17411)

Relates to
prometheus/prometheus#16944 (comment)

Signed-off-by: bwplotka <[email protected]>
(cherry picked from commit cefefc6)

* chore: prepare 3.8.0-rc.1 entry

Signed-off-by: bwplotka <[email protected]>

* [chore]: bump common dep to support RFC7523 3.1

Signed-off-by: Jorge Turrado <[email protected]>

* Update Prometheus Agent doc (#17591)

* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
  service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.

Signed-off-by: SuperQ <[email protected]>
(cherry picked from commit d0d2699)

* chore(deps): bump github.com/prometheus/common from 0.67.3 to 0.67.4 (#17594)

Signed-off-by: Jan Fajerski <[email protected]>

* prepare release v3.8.0-rc.1

Signed-off-by: Jan Fajerski <[email protected]>

* prepare release v3.8.0

Signed-off-by: Jan Fajerski <[email protected]>

* chore: Fix function name typo in createBatchSpan comment

Signed-off-by: zjumathcode <[email protected]>

* feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (#17435)

* Delay compactions until Thanos uploads all blocks

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

Signed-off-by: Lukasz Mierzwa <[email protected]>

* Rename structs and functions to make this more generic

Signed-off-by: Lukasz Mierzwa <[email protected]>

* Address review comments

Signed-off-by: Lukasz Mierzwa <[email protected]>

* Cache UploadMeta for 1 minute

Signed-off-by: Lukasz Mierzwa <[email protected]>

---------

Signed-off-by: Lukasz Mierzwa <[email protected]>

* RW2: Allow custom scope in azuread (#17483)

Signed-off-by: Ben Edmunds <[email protected]>

* docs: Describe how time() is set to start at 0 in unit tests

The return value of functions relating to the current time, e.g. time(),
is set by promtool to start at timestamp 0 at the start of a test's
evaluation.

This has the very nice consequence that tests can run reliably without
depending on when they are run.

It does, however, mean that tests will give out results that can be
unexpected by users.

If this behaviour is documented, then users will be empowered to write
tests for their rules that use time-dependent functions.

(Closes: prometheus/docs#1464)

Signed-off-by: Gabriel Filion <[email protected]>

* refactor(tsdb): use one test newTestDB constructor (#17638)

For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so prometheus/prometheus#17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <[email protected]>

* Add start_timestamp field for unit tests.

This commit adds support for configuring a custom start timestamp
for Prometheus unit tests, allowing tests to use realistic timestamps
instead of starting at Unix epoch 0.

Signed-off-by: Julien Pivotto <[email protected]>

* Fix serialization for empty `ignoring()` in combination with `group_x()`

Currently both the backend and frontend printers/formatters/serializers
incorrectly transform the following expression:

```
up * ignoring() group_left(__name__) node_boot_time_seconds
```

...into:

```
up * node_boot_time_seconds
```

...which yields a different result (including the metric name in the result
vs. no metric name).

We need to keep empty `ignoring()` modifiers if there is a grouping modifier
present.

Signed-off-by: Julius Volz <[email protected]>

* Simplify StartTime assignment in unit test setup.

Remove redundant IsZero check since promqltest.LazyLoader already
handles zero StartTime by defaulting to Unix epoch.

Signed-off-by: Julien Pivotto <[email protected]>

* Update golangci-lint and add modernize check (#17640)

* add modernize check

Signed-off-by: dongjiang1989 <[email protected]>

* fix golangci lint

Signed-off-by: dongjiang1989 <[email protected]>

---------

Signed-off-by: dongjiang1989 <[email protected]>

* fix lint

---------

Signed-off-by: Mohammad Alavi <[email protected]>
Signed-off-by: Jan Fajerski <[email protected]>
Signed-off-by: machine424 <[email protected]>
Signed-off-by: bwplotka <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: SuperQ <[email protected]>
Signed-off-by: zjumathcode <[email protected]>
Signed-off-by: Lukasz Mierzwa <[email protected]>
Signed-off-by: Ben Edmunds <[email protected]>
Signed-off-by: Gabriel Filion <[email protected]>
Signed-off-by: Julien Pivotto <[email protected]>
Signed-off-by: Julius Volz <[email protected]>
Signed-off-by: dongjiang1989 <[email protected]>
Co-authored-by: Mohammad Alavi <[email protected]>
Co-authored-by: Jan Fajerski <[email protected]>
Co-authored-by: Jan Fajerski <[email protected]>
Co-authored-by: Ayoub Mrini <[email protected]>
Co-authored-by: Bartlomiej Plotka <[email protected]>
Co-authored-by: Jorge Turrado <[email protected]>
Co-authored-by: Ben Kochie <[email protected]>
Co-authored-by: zjumathcode <[email protected]>
Co-authored-by: Łukasz Mierzwa <[email protected]>
Co-authored-by: Ben Edmunds <[email protected]>
Co-authored-by: Julien <[email protected]>
Co-authored-by: Gabriel Filion <[email protected]>
Co-authored-by: Julius Volz <[email protected]>
Co-authored-by: dongjiang <[email protected]>
Co-authored-by: Jeanette Tan <[email protected]>
Copy link
Copy Markdown
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic work! I've checked the test files side by side , but not every line, just whether the new test uses the new appender or not.

I'm fine with not so many tests on metadata, but I feel like the testing on the zero sample injection is maybe too light? Not sure, it's a lot tests.

@bwplotka bwplotka force-pushed the bwplotka/a2-tsdb branch 4 times, most recently from 6e2ef5f to 5d05dcf Compare December 9, 2025 10:35
Copy link
Copy Markdown
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, usual caveat from me: I'm not TSDB maintainer/generic maintainer.

@bwplotka
Copy link
Copy Markdown
Member Author

bwplotka commented Dec 9, 2025

@krajorama adding you #17663

@jesusvazquez @codesome @bboreham any objections to merge this/on this AppenderV2 shape?

Copy link
Copy Markdown
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM a good chunk of work went on here. It's true that in the past 3 years between OOO and Native Histograms the appender code kind of grew uncontrolled so I appreciate you taking the time to improve it.

@bwplotka
Copy link
Copy Markdown
Member Author

bwplotka commented Dec 9, 2025

CI is saying

image

But I looked 4 times and couldn't find comment that is unresolved... merging (:

@bwplotka bwplotka merged commit be419d8 into main Dec 9, 2025
46 checks passed
@bwplotka bwplotka deleted the bwplotka/a2-tsdb branch December 9, 2025 11:41
@bwplotka
Copy link
Copy Markdown
Member Author

bwplotka commented Dec 9, 2025

Thanks everyone for review 💪🏽


if isStale {
// For stale values we never attempt to process metadata/exemplars, claim the success.
return ref, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwplotka While vendoring this change into Mimir, Cursor pointed out that this might not be quite right:

When appending a stale sample, the function returns the input ref parameter instead of storage.SeriesRef(s.ref). If the caller passed ref=0 (unknown reference) and a series was created or looked up via getOrCreate, the actual series reference in s.ref would be lost. The normal return path at line 231 correctly returns storage.SeriesRef(s.ref), making this early return inconsistent and potentially breaking series reference caching for callers.

I also notice that on line 170 and 174 ref is used rather than s.ref, which seems like the same issue? (I'm no expert on this logic though.)

@tahirghaffar1380-source
Copy link
Copy Markdown

a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants