Add support for utf8 names on /v1/label/:name/values endpoint#15399
Add support for utf8 names on /v1/label/:name/values endpoint#15399jan--f merged 2 commits intorelease-3.0from
Conversation
461b63f to
609a49b
Compare
|
windows error seems to be spurious |
There was a problem hiding this comment.
Nice, LGTM!
Should we do it in release-3.0 branch first (if yes, rebase on that branch), so we do 3.0.1? cc @jan--f
We need to put cap on how much we put into 3.0.x here. We will likely have multiple UTF-8 gaps.. I don't think it must go to 3.0.1, but I think it could (:
|
@jan--f said to commit to main and then backport as necessary. I can set up that PR |
beorn7
left a comment
There was a problem hiding this comment.
Looks good code-wise, but shouldn't there be an update to the documentation in https://github.com/prometheus/prometheus/blob/main/docs/querying/api.md ?
90b9b4a to
23517a2
Compare
23517a2 to
919cb80
Compare
|
FTR: I'm not sure if this should really be in v3.0.1. I would see it more as a missing feature than a real bug. (And I would like to get back to a regular release cadence ASAP, so that v3.1.0 is not far down the road.) |
beorn7
left a comment
There was a problem hiding this comment.
Sorry for more comments. Documenting is hard…
In any case, before merging this, let's decide if this should go into v3.0.1 or v3.1.0, see other comment. (If need be, we need to rebase this on top of the release branch.)
| The `data` section of the JSON response is a list of string label values. | ||
|
|
||
| This example queries for all label values for the `job` label: | ||
| This example queries for all label values for the `job_name` label: |
There was a problem hiding this comment.
Sorry for nitpicking again, but let's not use a "weird" label name in the first example. job_name is a weird because it's so common to have a job label. Why would somebody have job_name? If your intention is to have a label name with an underscore here, let' use one that is actually common like http_status_code (which could then be http.status_code in the example below to even allude to the different meaning of . and _ in OTel).
| For label names that contain characters that are not valid in the legacy | ||
| Prometheus character set (letters, numbers, underscores, and colons), they |
There was a problem hiding this comment.
Strictly speaking, you can use non-legacy characters here as long as they are valid as URL components, can't we?
So I guess http://localhost:9090/api/v1/label/job.name/values would actually work?
If that's true, we only need the U__ encoding for characters that don't work as URL components (in particular /).
There was a problem hiding this comment.
I was thinking about that, but that might lead people to use HTML encoding: "job%20name" which is not supported. So while job.name would work, I'd prefer to make the reasoning simpler and just ask that people use encoding when the name is non-legacy-compatible.
There was a problem hiding this comment.
By HTML encoding, you mean percent encoding, also known as URL encoding, don't you? I.e. https://en.wikipedia.org/wiki/Percent-encoding
I would expect that URL encoding is in fact supported, because that should just be dealt with by the Go HTTP libraries and the Prometheus code wouldn't even notice.
It might very well possible that we only need the U__ encoding for a / in the label name. (At least that was the case long ago for label values as they show up in URLs for pushing to the PGW. Of course, back then there was no U__ encoding, so we elected for base64.)
Sorry for beating this point for so long, but it feels confusing to refer to the legacy Prometheus character set here when there is no technical reason for it.
There was a problem hiding this comment.
Assuming that my assumptions above are all correct, I would use the following flow of explanation:
- Don't mention URL encoding at all, because it is (IMHO) implied that that always works (it's part of the HTTP standard, and tools like
curland browsers will use it implicitly anyway). - Just state that label names starting with
U__are interpreted as encoded according to the Values Escaping method, which can be used at will, but also is the only way to specify label names that contain a/.
There was a problem hiding this comment.
I think I have rewritten it this way.
|
|
||
| For label names that contain characters that are not valid in the legacy | ||
| Prometheus character set (letters, numbers, underscores, and colons), they | ||
| should be escaped using the Values Escaping method: |
There was a problem hiding this comment.
While it's handy to have a short description here, do we have a comprehensive explanation of this encoding method that we can link to?
There was a problem hiding this comment.
the prometheus proposal is the closest thing
There was a problem hiding this comment.
@beorn7 I can file an issue for the doc issue, is that enough to unblock this PR?
There was a problem hiding this comment.
Linking to the Prometheus proposal is fine. I.e. link to https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping (as we did for PGW, just checked that out). No issue required.
|
Like beorn7 mentioned let's target |
Previously, the api was evaluating this regex to determine if the label name was valid or not: https://github.com/prometheus/common/blob/14bac55a992f7b83ab9d147a041e274606bdb607/model/labels.go#L94 However, I believe that the `IsValid()` function is what ought to be used in the brave new utf8 era. **Before** ``` $ curl localhost:9090/api/v1/label/host.name/values {"status":"error","errorType":"bad_data","error":"invalid label name: \"host.name\""} ``` **After** ``` $ curl localhost:9090/api/v1/label/host.name/values {"status":"success","data":["localhost"]} ``` It's very likely that I'm missing something here or you were already planning to do this at some point but I just encountered this issue and figured I'd give it a go. Signed-off-by: Owen Williams <[email protected]>
919cb80 to
528ecf6
Compare
|
rebased onto release-3.0 |
25085ee to
07b9c49
Compare
|
This should not block 3.0.1, but I believe it's ready to go in (pending getting the documentation right!) |
beorn7
left a comment
There was a problem hiding this comment.
It's actually good to have this in v3.0.1 because it is technically a breaking change (legacy label names could have the form U__...). I don't think it's relevant enough to call it out in the migration guide, though.
jan--f
left a comment
There was a problem hiding this comment.
Could use a follow up, but shouldn't block this fix.
Signed-off-by: Owen Williams <[email protected]>
07b9c49 to
12577e3
Compare
Follow-on from #14943
Previously, the api was evaluating this regex to determine if the label
name was valid or not:
https://github.com/prometheus/common/blob/14bac55a992f7b83ab9d147a041e274606bdb607/model/labels.go#L94
However, I believe that the
IsValid()function is what ought to beused in the brave new utf8 era.
Before
After
It's very likely that I'm missing something here or you were already
planning to do this at some point but I just encountered this issue and
figured I'd give it a go.