-
Notifications
You must be signed in to change notification settings - Fork 10.3k
remote write lose sample when target is unavailable #14087
Description
What did you do?
I'm using remote write to a receiver, which can be temporary down when updating.
What did you expect to see?
All samples should be evantualy written back when a receiver is running again.
What did you see instead? Under which circumstances?
After upgrading to 2.51.2 from 2.50.1 I started missing samples on a receiver in time when it was not running. I cheked this by quering some metrics samples in both prometheus and receiver. I can see drop in prometheus_remote_storage_samples_total but nothing in prometheus_remote_storage_samples_dropped_total or prometheus_remote_storage_samples_failed_total metrics. So the samples were probraly never tried to send and just skipped.
I'm suspecting changes made in #13583 and shared parameter tail bool between
prometheus/tsdb/wlog/watcher.go
Line 393 in 3b8b577
| func (w *Watcher) watch(segmentNum int, tail bool) error { |
prometheus/tsdb/wlog/watcher.go
Line 533 in 3b8b577
| func (w *Watcher) readSegment(r *LiveReader, segmentNum int, tail bool) error { |
true once set. However now it can revert back to false when processing of samples is paused for some time and then resumed.
Reverting back to 2.50.1 fixed this.
System information
No response
Prometheus version
2.51.2
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response