Skip to content

remote_write: return 422 when native histograms are sent but disabled; add tests#17196

Open
shibo911 wants to merge 2 commits intoprometheus:mainfrom
shibo911:shibo911/rw-17181-422-nh-disabled
Open

remote_write: return 422 when native histograms are sent but disabled; add tests#17196
shibo911 wants to merge 2 commits intoprometheus:mainfrom
shibo911:shibo911/rw-17181-422-nh-disabled

Conversation

@shibo911
Copy link
Copy Markdown

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[BUGFIX] Remote Write: Return HTTP 422 Unprocessable Entity when native histograms are sent to a receiver with native histograms disabled. Applies to:

  • RW v1 (all-or-nothing).
  • RW v2 (partial write): samples/exemplars are accepted, histograms rejected; X-Prometheus-Remote-Write-* headers reflect zero histograms written.
  • OTLP write handler.
    Previously these requests returned HTTP 500.
[BUGFIX] Remote Write: Return HTTP 422 (Unprocessable Entity) instead of 500 when native histograms are sent to a receiver with native histograms disabled. For RW v2, valid samples/exemplars are still accepted (partial write) and X-Prometheus-Remote-Write-* headers reflect 0 histograms written. Also applied to the OTLP write endpoint.

@shibo911 shibo911 force-pushed the shibo911/rw-17181-422-nh-disabled branch from cc2a9a5 to 64db6ad Compare September 15, 2025 23:06
@shibo911
Copy link
Copy Markdown
Author

  • Using HTTP 422 Unprocessable Content (RFC 9110 §15.5.16) for native histograms when the feature is disabled. The request is syntactically valid but semantically unacceptable.

  • Behavior:

    • RW v1: 422, entire request rejected.
    • RW v2: partial write; samples/exemplars accepted, native histograms rejected; X-Prometheus-Remote-Write-* headers report 0 histograms.
    • OTLP: 422 mapped similarly.
  • This makes these client errors non-retriable. No change when NH is enabled.

  • Tests cover v1 all-or-nothing and v2 partial write, with headers. Follow-ups noted below.

CC @roidelapluie

@bboreham bboreham requested review from krajorama and roidelapluie and removed request for tomwilkie September 16, 2025 11:09
@roidelapluie
Copy link
Copy Markdown
Member

Is this compatible with #17197 ?

@shibo911
Copy link
Copy Markdown
Author

Is this compatible with #17197 ?

  • Yes. The 422 mapping is orthogonal to the handler abstraction used in [RW]: Adopt client_golang/exp/api/remote types for receiving RW1 and RW2 #17197. RW v2 remains compatible because appendV2 still returns 422 for ErrNativeHistogramsDisabled and the stats are propagated via the response object.
  • One small follow-up needed if [RW]: Adopt client_golang/exp/api/remote types for receiving RW1 and RW2 #17197 lands: in the v1 path within storage/remote/write_handler.go after the refactor (Store -> msgType == remoteapi.WriteV1MessageType), include a case to map storage.ErrNativeHistogramsDisabled to HTTP 422, same as this PR does today. Concretely, in the error switch after h.write(...), add: case errors.Is(err, storage.ErrNativeHistogramsDisabled): wr.SetStatusCode(http.StatusUnprocessableEntity) return wr, err This keeps v1 behavior aligned with this PR and ensures our new v1 test (asserting 422) continues to pass.

Notes:

So, aside from adding that single v1 case when #17197 merges, the behavior introduced in this PR is fully compatible.

PTAL @roidelapluie

if errors.Is(err, storage.ErrNativeHistogramsDisabled) {
// Receiver supports protocol, but cannot process native histograms.
nhDisabled = true
h.logger.Error("Out of order histogram from remote write", "err", err.Error(), "series", ls.String(), "timestamp", hp.Timestamp)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
h.logger.Error("Out of order histogram from remote write", "err", err.Error(), "series", ls.String(), "timestamp", hp.Timestamp)
h.logger.Error("Native histograms not enabled", "err", err.Error(), "series", ls.String(), "timestamp", hp.Timestamp)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to log this as error. The feature may be disabled on purpose, so maybe debug level is better.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Done. Updated the log to:
    • message: “Native histograms not enabled”
      
    • level: debug
      
  • Location: storage/remote/write_handler.go, in appendV2() under the ErrNativeHistogramsDisabled branch.
  • Behavior unchanged: we still mark nhDisabled = true, include the error in badRequestErrs, and return HTTP 422 for v2 with partial-write headers (0 histograms written).

CC @krajorama


switch {
case err == nil:
case errors.Is(err, storage.ErrNativeHistogramsDisabled):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness, please add test

Copy link
Copy Markdown
Author

@shibo911 shibo911 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added a test covering the OTLP path:

    • Test name: TestOTLPWriteHandler_NativeHistogramsDisabled
    • File: storage/remote/write_test.go
    • It builds an OTLP request with histogram data, injects ErrNativeHistogramsDisabled via the mock appendable, and asserts HTTP 422.
  • We already had coverage for RW v1 and v2:

    • TestNativeHistogramsDisabled_V1Message (v1 returns 422)
    • TestNativeHistogramsDisabled_V2Message (v2 returns 422, with samples/exemplars accepted and headers reflecting 0 histograms)

    CC @krajorama

@shibo911 shibo911 force-pushed the shibo911/rw-17181-422-nh-disabled branch from 0b684ac to 47302d5 Compare September 18, 2025 07:12
@shibo911 shibo911 force-pushed the shibo911/rw-17181-422-nh-disabled branch from 47302d5 to 4a55466 Compare September 18, 2025 07:30
@github-actions github-actions bot added the stale label Nov 20, 2025
Copy link
Copy Markdown
Contributor

@pipiland2612 pipiland2612 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks

Copy link
Copy Markdown
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I'd argue 422 for this case is an overkill, why not 400? Otherwise great!

switch {
case errors.Is(err, storage.ErrNativeHistogramsDisabled):
// Receiver supports the protocol, but cannot process native histograms.
http.Error(w, err.Error(), http.StatusUnprocessableEntity)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

422 is weird, quite rare. You could argue 422 for other cases too (e.g. too old samples, someone didn't configure out of order etc).

Why not invalid request?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, 422 makes more sense. The request is correct but can not be processed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's align on 400

return samplesWithoutMetadata, 0, nil
}
// TODO(bwplotka): Better concat formatting? Perhaps add size limit?
if nhDisabled {
Copy link
Copy Markdown
Member

@bwplotka bwplotka Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, an odd snowflake IMO, just StatusBadRequest would be fine (simpler code)

@github-actions github-actions bot removed the stale label Nov 25, 2025
@roidelapluie
Copy link
Copy Markdown
Member

I am fine with both 400 and 422. Semantically 422 is better but we can be consistent and just reply 400.

@cstyan
Copy link
Copy Markdown
Member

cstyan commented Dec 1, 2025

I am fine with both 400 and 422. Semantically 422 is better but we can be consistent and just reply 400.

I'm fine with either, but this points out that we should probably update the v2 spec to include details about what error code the receiver should return when it wants to reject data like this.

We already have conditions for "receiver may reject", for example if the receiver requires that metadata be present. This is another "may reject" case IMO, and the decision about which code to return in such cases should be documented in the spec.

@github-actions github-actions bot added the stale label Feb 9, 2026
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Feb 10, 2026

Hello from the bug scrub!

@cstyan @bwplotka @roidelapluie has there been any decision so that we can get this PR unblocked?

@roidelapluie
Copy link
Copy Markdown
Member

Let's align on 400 cc @shibo911

@bwplotka
Copy link
Copy Markdown
Member

We have other cases when something is not supported, so we would need to return 422 for those too. Then if we use 422 and make that a pattern, ideally we change the spec. So... I'd vote for 400 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote Write 2.0 returns 500 when Native Histograms are sent but the feature is disabled

7 participants