[PRW 2.0] (part X) generalize remote write logic for DRY/maintainability#14338

Closed

bwplotka wants to merge 2 commits intoremote-write-2.0from

Member

bwplotka commented Jun 24, 2024

This is chained on top of #14329

bwplotka added 2 commits

June 24, 2024 09:20


          [PRW-2.0] Moved to latest basic negotiation & spec semantics.

786e304

Spec: prometheus/docs#2462

Supersedes #13968

Signed-off-by: bwplotka <[email protected]>


          [PRW 2.0] (chain3) generalize remote write logic for DRY/maintainabil…

68bf542

…ity.

Signed-off-by: bwplotka <[email protected]>

bwplotka changed the title ~~[PRW 2.0] (part 3) generalize remote write logic for DRY/maintainability~~ [PRW 2.0] (part X) generalize remote write logic for DRY/maintainability

cstyan reviewed

View reviewed changes

Member

cstyan left a comment

Some basic comments just from trying to grok the changes. I also started trying to fix tests locally but there's some bigger changes needed there as well to ensure we still have proper coverage after the removal of the direct buildWriteRequest functions.

It looks like some tests also just don't complete or get incorrect responses, such as TestBasicContentNegotiation

storage/remote/queue_manager.go

-              		}
-              		if sendNativeHistograms {
-              			pendingData[nPending].Histograms = pendingData[nPending].Histograms[:0]
+              // protoTimeSeriesQueue is a generic queue for both v1 and v2 Remote Write

Member

cstyan Jun 26, 2024

Suggested change

      
            // protoTimeSeriesQueue is a generic queue for both v1 and v2 Remote Write
          
            // protoTimeSeriesBuffer is a generic queue for both v1 and v2 Remote Write

storage/remote/queue_manager.go

+              	if protoMsg == config.RemoteWriteProtoMsgV1 {
+              		ret.v1 = make([]prompb.TimeSeries, max)
+              		for i := range ret.v1 {
+              			// NOTO(bwplotka): Why empty one-elem samples and exemplar?

Member

cstyan Jun 26, 2024

because we didn't ever attempt to insert the exemplar into the TS it is associated with, so in theory any TimeSeries could be a sample or exemplar

storage/remote/queue_manager.go

               }
-              func (s *shards) updateMetrics(_ context.Context, err error, sampleCount, exemplarCount, histogramCount, metadataCount int, duration time.Duration) {
+              func (p *protoTimeSeriesBuffer) FilterOutTooOldSamples(logger log.Logger, metrics *queueManagerMetrics, baseTime time.Time, sampleAgeLimit time.Duration) (highest, lowest int64) {

Member

cstyan Jun 26, 2024

Suggested change

      
            func (p *protoTimeSeriesBuffer) FilterOutTooOldSamples(logger log.Logger, metrics *queueManagerMetrics, baseTime time.Time, sampleAgeLimit time.Duration) (highest, lowest int64) {
          
            func (p *protoTimeSeriesBuffer) FilterOldSamples(logger log.Logger,

we might also want a comment

// Filter samples older than sampleAgeLimit, allows for quicker catching up when recovering.

storage/remote/queue_manager.go

-              	for i := range pendingDataV2 {
-              		pendingDataV2[i].Samples = []writev2.Sample{{}}
-              	}
+              	pendingSeries := newProtoTimeSeriesBuffer(s.qm.protoMsg, max, s.qm.sendExemplars, s.qm.sendNativeHistograms)

Member

cstyan Jun 26, 2024

can we call the var pendingProtoSeries to make it clearer this will already be serialized as protobuf by the time we get to sendSamplesWithBackoff? I had to trace back through the call chain to confirm that as it is atm

storage/remote/queue_manager.go

-              func (s *shards) sendV2SamplesWithBackoff(ctx context.Context, samples []writev2.TimeSeries, labels []string, sampleCount, exemplarCount, histogramCount, metadataCount int, pBuf, buf *[]byte, enc Compression) error {
-              	// Build the WriteRequest with no metadata.
-              	req, highest, lowest, err := buildV2WriteRequest(s.qm.logger, samples, labels, pBuf, buf, nil, enc)
+              func (s *shards) sendSamplesWithBackoff(ctx context.Context, series *protoTimeSeriesBuffer, enc Compression) error {

Member

cstyan Jun 26, 2024

maybe both send samples functions should have name changes to sendRequest?

storage/remote/queue_manager.go

Comment on lines +1836 to +1838

+              			// TODO(bwplotka): This does not count dropped samples in the filter above.
+              			// Is this on purpose? Given drop samples metric?
+              			attribute.Int("samples", series.nPendingSamples),

Member

cstyan Jun 26, 2024

not that I know of, I think this got missed during the review for the change that added the filtering functionality

storage/remote/queue_manager_test.go

+              func v2WriteRequestToWriteRequest(reqV2 *writev2.Request) (*prompb.WriteRequest, error) {
+              	req := &prompb.WriteRequest{
+              		Timeseries: make([]prompb.TimeSeries, len(reqV2.Timeseries)),
+              		// TODO handle metadata?

Member

cstyan Jun 26, 2024

probably we should, though we'll lose some granularity during the conversion

the metricmetadata only differentiates on metric family name, so either first seen or last seen set of metadata for a metric family name wins

bwplotka mentioned this pull request

[PRW 2.0] (part3) Moved type specific conversions to prompb and writev2 codecs. #14347

Merged

Member Author

bwplotka commented Jun 26, 2024

I will make sure comments apply to further PRs, but I switched tactic a bit (split), so this has to be recreated on top of #14347

bwplotka closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

cstyan cstyan left review comments

dgl Awaiting requested review from dgl dgl will be requested when the pull request is marked ready for review dgl is a code owner

tomwilkie Awaiting requested review from tomwilkie tomwilkie will be requested when the pull request is marked ready for review tomwilkie is a code owner

Labels

None yet