storage/remote: compute highestTimestamp and dataIn at QueueManager level by machine424 · Pull Request #17065 · prometheus/prometheus

machine424 · 2025-08-20T21:37:26Z

Because of relabelling, an endpoint can only select a subset of series that go through WriteStorage

Having a highestTimestamp at WriteStorage level only yields wrong values if the corresponding sample won't even make it to a remote queue.

Currently PrometheusRemoteWriteBehind is based on that, and would fire if an endpoint is only interested in a subset of series that take time to appear.

A "prometheus_remote_storage_queue_highest_timestamp_seconds" that only takes into account samples in the queue is introduced, and used in PrometheusRemoteWriteBehind and dashboards in documentation/prometheus-mixin

Same applies to samplesIn/dataIn, QueueManager should know more about when to update those; when data is enqueued.

That makes dataDropped unnecessary, thus help simplify the logic in QueueManager.calculateDesiredShards()

add tests
validate changes under documentation/prometheus-mixin
run some benchmarks

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[ENHANCEMENT] Remote-write: Add `prometheus_remote_storage_queue_highest_timestamp_seconds` metric, which tracks the highest timestamp that was actually enqueued per queue, accounting for relabeling e.g.
[ENHANCEMENT] Mixin: Replace `prometheus_remote_storage_highest_timestamp_in_seconds` metric with the new `prometheus_remote_storage_queue_highest_timestamp_seconds` metric in dashboards and alerts to properly account for relabeling, for better accuracy.
[CHANGE] Remote-write: Deprecate `prometheus_remote_storage_{samples,exemplars,histograms}_in_total` and `prometheus_remote_storage_highest_timestamp_in_seconds` metrics, see their respective descriptions for alternatives.

storage/remote/write.go

machine424 · 2025-08-20T22:13:15Z

still needs a double check and some extra tests, but wdyt @cstyan?

cstyan · 2025-08-21T22:18:10Z

Great idea! 👍 thanks for doing this. I think there's a few things we need to verify but likely we can get rid of those other two values you've pointed out.

What we want users to be able to do is differentiate between "prometheus is scraping samples and writing them to the WAL, but the WAL reader for remote write (per queue) is stuck for some reason" vs "this queue manager is seeing samples but is deciding to drop them all". I think we can still get that information in other ways, but we should verify.

As far as the reduction in complexity for calculateDesiredShards, love it.

machine424 · 2025-08-25T11:27:54Z

Thanks Callum for taking a look.
Do you have some concrete scenarios in mind for “Prometheus is scraping samples and writing them to the WAL, but the WAL reader for remote write (per queue) is stuck for some reason”?

The most likely cases I can think of are the watcher failing, or the queue being full (because pending samples can’t be sent). Both of these are already covered by existing metrics that expose such issues.

So yes, you're right, we'll kind of swap some false positives because we don't take relabeling into account for some possible false negatives, but I think the new “shadow areas” can be clarified by the existing metrics.

That said, we can keep the metrics at the storage level for now, 4 series per Prometheus instance isn’t a big deal. We can always reconsider this in the future.

machine424 · 2025-08-27T09:47:53Z

cc @cstyan if you can take another look.

cstyan · 2025-08-27T18:28:22Z

The most likely cases I can think of are the watcher failing, or the queue being full (because pending samples can’t be sent). Both of these are already covered by existing metrics that expose such issues.

Yeah I think you're right, it's just been a long time since I last looked at the data we have here in any detail.

I think the top level "most recent scrape timestamp" via prometheus_remote_storage_highest_timestamp_in_seconds and "most recent sent timestamp" via prometheus_remote_storage_queue_highest_sent_timestamp_seconds was useful to differentiate between "scraping but not sending" vs a more generic "not scraping", but in theory we do have other proxies for discovering that like total scrape metrics.

I think we also have a proxy for "sampels are being scraped and written to/read from the WAL, we're just choosing to drop all (via relabel) for queue X and that's why you're not seeing data in your remote storage" with the metrics in the watcher for records read plus the queues dropped samples metric. The dropped samples metric even has a label to differentiate between unexpectedly dropped samples vs samples dropped due to relabelling.

I think we're okay here 🤔

machine424 · 2025-08-28T18:09:03Z

Yes, because what gets sent is what the watcher sends to the queues; that's what we need to be monitoring.
Great, we're on the same page then.

And to be even more prudent (even though there are no stability guarantees for metrics), I'll only mark them as deprecated, propose the alternatives, and postpone the removal to another PR.

…evel Because of relabelling, an endpoint can only select a subset of series that go through WriteStorage Having a highestTimestamp at WriteStorage level yields wrong values if the corresponding sample won't even make it to a remote queue. Currently PrometheusRemoteWriteBehind is based on that, and would fire if an endpoint is only interested in a subset of series that take time to appear. A "prometheus_remote_storage_queue_highest_timestamp_seconds" that only takes into account samples in the queue is introduced, and used in PrometheusRemoteWriteBehind and dashboards in documentation/prometheus-mixin Same applies to samplesIn/dataIn, QueueManager should know more about when to update those; when data is enqueued. That makes dataDropped unnecessary, thus help simplify the logic in QueueManager.calculateDesiredShards() Signed-off-by: machine424 <[email protected]>

…ams}_in_total and prometheus_remote_storage_highest_timestamp_in_seconds Signed-off-by: machine424 <[email protected]>

machine424 · 2025-09-01T11:20:57Z

cc @cstyan
A lot of action on storage/remote recently, had to resolve conflicts 3 times :)

storage/remote/write.go

parial cherry-pick of prometheus#17065 to make it easier to backport to older versions. the new metric is "prometheus_remote_storage_queue_highest_timestamp_seconds"

This reverts commit c574303, reversing changes made to 2cbeef6.

machine424 requested review from bwplotka, cstyan, metalmatze and tomwilkie as code owners August 20, 2025 21:37

machine424 force-pushed the queuelevel branch from d371e0e to 423fd32 Compare August 20, 2025 21:38

machine424 commented Aug 20, 2025

View reviewed changes

storage/remote/write.go Outdated Show resolved Hide resolved

machine424 commented Aug 20, 2025

View reviewed changes

storage/remote/write.go Outdated Show resolved Hide resolved

machine424 marked this pull request as draft August 20, 2025 21:46

machine424 force-pushed the queuelevel branch from 423fd32 to 1689c17 Compare August 20, 2025 22:12

machine424 force-pushed the queuelevel branch from 1689c17 to 511b265 Compare August 21, 2025 13:26

machine424 changed the title ~~WIP: storage/remote: compute highestTimestamp and dataIn at QueueManager level~~ fix(storage/remote): compute highestTimestamp and dataIn at QueueManager level Aug 21, 2025

machine424 changed the title ~~fix(storage/remote): compute highestTimestamp and dataIn at QueueManager level~~ storage/remote: compute highestTimestamp and dataIn at QueueManager level Aug 21, 2025

machine424 marked this pull request as ready for review August 21, 2025 13:33

machine424 marked this pull request as draft August 25, 2025 08:56

machine424 force-pushed the queuelevel branch 2 times, most recently from 466cffc to 19fbc90 Compare August 25, 2025 11:11

machine424 marked this pull request as ready for review August 25, 2025 11:28

machine424 force-pushed the queuelevel branch from 19fbc90 to 42c2590 Compare August 25, 2025 11:46

machine424 force-pushed the queuelevel branch 2 times, most recently from 8ae3d9d to 9779d29 Compare August 27, 2025 09:56

machine424 added 2 commits September 1, 2025 13:19

chore: deprecate prometheus_remote_storage_{samples,exemplars,histogr…

ba14bc4

…ams}_in_total and prometheus_remote_storage_highest_timestamp_in_seconds Signed-off-by: machine424 <[email protected]>

machine424 force-pushed the queuelevel branch from 792147c to ba14bc4 Compare September 1, 2025 11:19

cstyan reviewed Sep 1, 2025

View reviewed changes

storage/remote/write.go Show resolved Hide resolved

cstyan approved these changes Sep 1, 2025

View reviewed changes

storage/remote/write.go Show resolved Hide resolved

machine424 merged commit c574303 into prometheus:main Sep 1, 2025
28 checks passed

machine424 mentioned this pull request Sep 2, 2025

OCPBUGS-56568: chore: compute highestTimestamp at queryManager level openshift/prometheus#262

Merged

1 task

machine424 mentioned this pull request Sep 10, 2025

OCPBUGS-61486: chore: compute highestTimestamp at queryManager level openshift/prometheus#265

Merged

machine424 mentioned this pull request Sep 15, 2025

[release-4.18] OCPBUGS-61706: chore: compute highestTimestamp at queryManager level openshift/prometheus#267

Merged

machine424 mentioned this pull request Sep 17, 2025

[release-4.17] OCPBUGS-61766: chore: compute highestTimestamp at queryManager level openshift/prometheus#268

Merged

machine424 mentioned this pull request Oct 8, 2025

prepare release 3.7-rc.0 #17274

Merged

bwplotka mentioned this pull request Oct 27, 2025

Remote write fails to scale after 3.7 upgrade in some clusters #17384

Closed

cstyan mentioned this pull request Oct 28, 2025

WIP: move dataIn increment/highestTimestamp tracking into various Append functions #17407

Draft

bwplotka added a commit that referenced this pull request Oct 28, 2025

Revert "Merge pull request #17065 from machine424/queuelevel"

c1aee88

This reverts commit c574303, reversing changes made to 2cbeef6.

bwplotka mentioned this pull request Oct 28, 2025

Revert "Merge pull request #17065 from machine424/queuelevel" #17408

Closed

cstyan mentioned this pull request Oct 28, 2025

[release-3.7] fix: Remote-write: revert changes in the queue resharding logic #17412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/remote: compute highestTimestamp and dataIn at QueueManager level#17065

storage/remote: compute highestTimestamp and dataIn at QueueManager level#17065
machine424 merged 2 commits intoprometheus:mainfrom
machine424:queuelevel

machine424 commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

machine424 commented Aug 20, 2025

Uh oh!

cstyan commented Aug 21, 2025

Uh oh!

machine424 commented Aug 25, 2025 •

edited

Loading

Uh oh!

machine424 commented Aug 27, 2025 •

edited

Loading

Uh oh!

cstyan commented Aug 27, 2025

Uh oh!

machine424 commented Aug 28, 2025

Uh oh!

machine424 commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

machine424 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

Uh oh!

Uh oh!

Uh oh!

machine424 commented Aug 20, 2025

Uh oh!

cstyan commented Aug 21, 2025

Uh oh!

machine424 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machine424 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cstyan commented Aug 27, 2025

Uh oh!

machine424 commented Aug 28, 2025

Uh oh!

machine424 commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

machine424 commented Aug 20, 2025 •

edited

Loading

machine424 commented Aug 25, 2025 •

edited

Loading

machine424 commented Aug 27, 2025 •

edited

Loading