Allow UTF-8 characters in metric and label names as opt-in feature#15258
Allow UTF-8 characters in metric and label names as opt-in feature#15258ArthurSens merged 4 commits intomainfrom
Conversation
429758e to
5056345
Compare
5056345 to
5293e89
Compare
|
Taking a look. |
|
Could you fix conflicts/tests? |
storage/remote/otlptranslator/prometheus/normalize_label_test.go
Outdated
Show resolved
Hide resolved
fab4d8b to
b81eaef
Compare
|
Friendly ping @aknuds1, could we merge this now? :) |
@ArthurSens I'll have to review it first, getting around to it today. |
aknuds1
left a comment
There was a problem hiding this comment.
I found some more issues.
b81eaef to
5743d83
Compare
Signed-off-by: Arthur Silva Sens <[email protected]>
5743d83 to
58215fe
Compare
|
|
||
| // Metric name starts with a digit? Prefix it with an underscore. | ||
| if metricName != "" && unicode.IsDigit(rune(metricName[0])) { | ||
| // Metric name starts with a digit and utf8 not allowed? Prefix it with an underscore. |
There was a problem hiding this comment.
@jesusvazquez do you know if this change is correct?
The mode only says it will "keep UTF-8 characters as they are". Maybe it makes sense to also drop the rule of a metric not starting a metric with a digit, not sure.
Signed-off-by: Arthur Silva Sens <[email protected]>
| func TestAllowUTF8(t *testing.T) { | ||
| t.Run("allow UTF8", func(t *testing.T) { | ||
| require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", true)) | ||
| require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true)) |
There was a problem hiding this comment.
If you try applying the following, it's going to fail. There's a fundamental bug in the UTF-8 handling.
| require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true)) | |
| require.Equal(t, "unsupported.metric.+weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.+weird", "+=.:,!* & #"), "", true)) |
There was a problem hiding this comment.
Yes you're correct.
Just wanted to mention that although OTel allows full UTF8, the most common ones are dots and foreign characters, and this PR is focused to unblock OTel
There was a problem hiding this comment.
I don't think we should approve a PR with fundamentally broken logic. I think @jesusvazquez is also checking this out now. It can't be that difficult to fix FWIW.
Leaving to @jesusvazquez to review from here
Signed-off-by: Arthur Silva Sens <[email protected]>
| func TestAllowUTF8KnownBugs(t *testing.T) { | ||
| // Due to historical reasons, the translator code was copied from OpenTelemetry collector codebase. | ||
| // Over there, they tried to provide means to translate metric names following Prometheus conventions that are documented here: | ||
| // https://prometheus.io/docs/practices/naming/ | ||
| // | ||
| // Althogh not explicitly said, it was implied that words should be separated by a single underscore and the codebase was written | ||
| // with that in mind. | ||
| // | ||
| // Now that we're allowing OTel users to have their original names store in prometheus without any transformation, we're facing problems | ||
| // where two (or more) UTF-8 characters are being used to separate words. | ||
| // TODO(arthursens): Fix it! | ||
|
|
||
| // We're asserting on 'NotEqual', which proves the bug. | ||
| require.NotEqual(t, "metric....split_=+by_//utf8characters", normalizeName(createGauge("metric....split_=+by_//utf8characters", ""), "", true)) | ||
| // Here we're asserting on 'Equal', showing the current behavior | ||
| require.Equal(t, "metric.split_by_utf8characters", normalizeName(createGauge("metric....split_=+by_//utf8characters", ""), "", true)) | ||
| } |
There was a problem hiding this comment.
I've added a test showcasing the known bug, along with the explanation to why we've got to this situation.
This is a super edge case since appliations using the OTel SDK follow semantic convention, and those conventions do not have metrics with repeated dots or things like that.
Could we tackle this bug in a separate PR since it's not mission critical?
Signed-off-by: Arthur Silva Sens <[email protected]>
8c8154f to
6c79f5c
Compare
jesusvazquez
left a comment
There was a problem hiding this comment.
LGTM lets do a close followup on that bug in case it happens in unknown cases!
|
Issue opened, it's on my TODO list! |
…15258) * Allow UTF-8 characters in metric and label names as opt-in feature --------- Signed-off-by: Arthur Silva Sens <[email protected]>
|
FYI, porting to 3.0 #15384 |
In non-UTF-8 mode, use strings.FieldsFunc to split string into tokens, as it was before PR prometheus#15258. This makes the code more robust and readable. Signed-off-by: Arve Knudsen <[email protected]>
…#15573) In non-UTF-8 mode, use strings.FieldsFunc to split string into tokens, as it was before PR #15258. This makes the code more robust and readable. Signed-off-by: Arve Knudsen <[email protected]>
Work originally comes from #14974
This PR aims to unblock OTel users who want to keep their metric names unchanged when sending it to Prometheus. The decision is backed by a survey, and by dev-summits where this topic was discussed.
To keep the current behavior, we're adding a new configuration option to the config.file, so people will need to opt-in