Skip to content

[PRW 2.0 spec] Sample Write Confirmation in Response #14359

@bwplotka

Description

@bwplotka

Proposal

cc @cstyan @alexgreenbank @npazosmendez

Problem

In the PRW 2.0 spec, the request Content-Type tells Receiver what proto to use to parse the message.
This works great if everyone implements the spec properly. However, reality kicks in and bugs happens. After updating our PRW 2.0 Prometheus branch
we noticed a common issue:

🐛 Receivers (especially 1.0) don't check Content-Type header. This is literally what Prometheus receive handler is doing pre-2.0 changes
The same in Thanos receiving.
It's was understandable as in 1.0 those headers were hardcoded so receivers also didn't care to check those.

The consequence is that newer clients could decide to send a write request encoded as RW2.0 proto to a Receiver that only reads 1.0 proto. In this case we would expecting the Receiver to fail with at least 400 (ideally 415).

Currently the above will cause the "broken" 1.0 Receiver to unmarshal v2 payload with v1 proto. Thanks to 2.0-rc.1 change in spec, this at least does not corrupt the writes (or give underminitic error or empty payload), but always returns empty message.

Unfortunately this will result likely in "no write" and 2xx, while client might think the data was written.

Specifically in Thanos this would result in debug log and 200
The same with Prometheus.

This problem also exists in other broken Receiver cases (wrong proto is used), broken Senders (sending v2 payload, but content type v1), but also if we would add more field to e.g. TimeSeries proto in future (e.g. samplesv2) (the same problem exists in 1.0 already).

Potential Solution

Use Response headers to send number of samples, exemplars, histogram samples ingested.

This is actually very close to #14323, just reversed and using HTTP response header (simpler).

One could establish response headers like

X-Prometheus-Remote-Write-Stats-Series 1
X-Prometheus-Remote-Write-Stats-Samples 12144
X-Prometheus-Remote-Write-Stats-Histograms 1123
X-Prometheus-Remote-Write-Stats-Exemplars 123

Those has benefit of:

This could MUST or SHOULD. WDYT?

Alternatives

Fail on empty request (e.g empty symbol or no timeseries)

One way would be to add MUST/SHOULD to spec to fail on empty request. This would help with new Receivers accidently parsing wrong protos/payloads. However there are 2 cons:

  • This does not fixes 1.0 Receivers.
  • Empty request was used in the past as a nice "gateway" for probing Receivers.
  • This actually makes proto forward/backward compatibility in future tricky, because old Receivers (e.g. 2.0) would fail if no TimeSeries are sent, but one day we might have more fields than timeseries and want to put it in 2.x (stretch?)

Do Nothing

In some way, this only happens on "broken" situation and once Sender ensures Receiver checks content-type properly and uses correct proto (tests/manual checks) it would be perhaps fine?

🐛

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions