Fix: OTLP translator now keeps multiple separators in metric names by ArthurSens · Pull Request #15440 · prometheus/prometheus

ArthurSens · 2024-11-22T14:11:44Z

To solve this, I've refactored the function that splits metric names into tokens and separators. Now, the separators are inferred from the position gaps between tokens.

The fix looks easy, but I've hit a strange behavior with foreign characters. Foreign characters skip positions when looping over an array of runes (a string). As shown in this playground: https://go.dev/play/p/qYywzhbbXcf Notice how the positions of 字 and 符 do not follow the numeral order. Fixed in the second commit :)

Signed-off-by: Arthur Silva Sens <[email protected]>

aknuds1

This PR changes the classical name translation mode, which should not be the case. I.e., it would introduce a bug.

aknuds1 · 2024-11-24T15:40:30Z

storage/remote/otlptranslator/prometheus/normalize_name_test.go

 		require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", false))
 		require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", false))
-		require.Equal(t, "metric_with_foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false))
+		require.Equal(t, "metric_with____foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false))


The classical name translation mode, i.e. only allowing alphanumerics, colons and underscores, should not change. It should be obvious that the PR does the wrong thing when you have to change a test expectation for a case unrelated to the PR's topic.

Well, I understood that we would keep the number of characters at least and the previous behavior was a bug. The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.

What I'm trying to avoid by at least keeping the same amount of characters is that we avoid collisions like this:

Original metric Keeping same amount of characters (no name collision) Keeping the current behavior (name collision)

http..requests..total http__requests__total http_requests_total

http.requests.total http_requests_total http_requests_total

I have no strong opinions though, I can see how this would be considered a breaking change, but we could also call this a bug fix 🤔

Well, I understood that we would keep the number of characters at least and the previous behavior was a bug.

When was that ever discussed? The bug that was pointed out by me is that the new UTF8 translation scheme translates some UTF8 characters to underscores, even though the scheme promises to not escape any UTF8 characters: "NoUTF8EscapingWithSuffixes will keep UTF-8 characters as they are". Please keep this PR on topic with that particular bug, and don't change the classical translation mode.

The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.

It's not the same because, A) it was on topic for the PR, B) not allowing colons in Prometheus metric names was a bug since the Prometheus metric name specification allows for colons.

If you think it's a bug that the classical translation mode folds multiple unsupported characters into a single underscore, please make a dedicated issue that can be discussed separately.

Ok no problems, I'll update the PR soon

storage/remote/otlptranslator/prometheus/helpers_from_stdlib.go

jan--f · 2024-11-25T15:54:22Z

Seems like this is a candidate for a 3.0.1 release? If so let target the release-3.0 branch.

ArthurSens · 2024-11-25T16:46:06Z

Seems like this is a candidate for a 3.0.1 release? If so let target the release-3.0 branch.

When is 3.0.1 scheduled? It looks like we're still discussing what we're trying to fix here. I wouldn't block the release for this fix since it's a rare scenario.

With that said, if we end up finding consensus quickly, I'll rebase the PR on top of release-3.0 :)

…underscores Signed-off-by: Arthur Silva Sens <[email protected]>

ArthurSens · 2024-11-25T17:10:12Z

storage/remote/otlptranslator/prometheus/normalize_name_test.go

 	t.Run("allow UTF8", func(t *testing.T) {
 		require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", true))
-		require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true))
+		require.Equal(t, "unsupported.metric.+weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.+weird", "+=.:,!* & #"), "", true))


@aknuds1, is this the test you wanted to see?

Yep, that is the one.

aknuds1

I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".

Also please see my comment about the logic I don't think makes sense for UTF-8.

aknuds1 · 2024-12-04T15:11:29Z

storage/remote/otlptranslator/prometheus/normalize_name.go

@@ -130,9 +130,9 @@ func normalizeName(metric pmetric.Metric, namespace string, allowUTF8 bool) stri
 		translationFunc = func(r rune) bool { return !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != ':' }


I think treating other letters than alphanumerics and colon specially doesn't make any sense in UTF-8 mode, since Prometheus in UTF-8 mode allows all valid UTF-8 characters. There must be a better way of doing this. Additionally, needlessly tokenizing metric names in UTF-8 mode is a performance hit, that's best avoided.

I think we need to tokenize because we don't want to repeat units in the metric name.

Imagine http.requests.seconds with s as unit. We don't want to create a metrics called http.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use strings.Contains(), I'm just not sure which option is better 🤔

I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).

I think that instead of tokenizing metric names based on legacy mode rules, there should be some sort of solution for determining whether a suffix should be added or not. Maybe pattern matching? In any case, it's a completely separate problem from escaping characters, which is what the non-UTF-8 mode has to do (implying that UTF-8 mode should not do this).

Fair enough!

I'm a bit confused about your acceptance criteria for this PR, though.

I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".

Did you mean it comes out as metric.name_°C? The second option, metric.name+_°C does look correct anyway ; that's a good catch and can easily be fixed! I'll update the PR with the fix.

I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).

Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered. I believe tokens are still useful here to help us avoid duplicated unit/type suffixes in the final results, but I can agree that the term "separators" does not make sense anymore.

A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right? Should we work on the redesign before committing to the fix here? Or is it ok if I just fix the bug you described above?

Did you mean it comes out as metric.name_°C?

It's the opposite, it doesn't. Anyway, it's not an acceptance criteria, but a symptom of the UTF-8 mode doing the wrong thing, i.e. escaping valid characters.

Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered.

The thing is that this is the design flaw I originally pointed out when the UTF-8 mode was originally proposed (and why I wouldn't approve it), and have documented as #15534. The OTLP UTF-8 mode can't be considered as working well before it no longer escapes characters. It's simply the wrong approach to split the metric name into tokens.

A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right?

I'm not sure what you mean by re-design of the code base. All we need in this context is to fix the OTLP UTF-8 mode such that it no longer escapes characters. I believe it simply should not tokenize metric names, since it's going to be an unnecessary performance hit at the very least, and most likely a source of bugs.

Hey Owen(@ywwg)! This is an interesting thread Arve and Arthur are having about how utf8 incoming metric names should be handled in the OTLP endpoint.

I was wondering if you could chime in and share your thoughts since you have context on how utf8 metric names are being handled across the prometheus codebase for remote write and scraping etc.

My opinion after reading through is that we should have a consistent approach so users can expect a similar handling while also having present that this is OTel and past prometheus conventions might not apply (like metric suffixes etcetera)

To clarify about metric suffixes, that is rather orthogonal to the discussion here, since it's instead controlled by the AddMetricSuffixes API parameter (not a user parameter in Prometheus, but e.g. in Grafana Mimir), and if I recall correctly the consensus at the Prometheus dev summit (Berlin?) was to keep metric suffixes even in UTF-8 mode. The discussion here is more about how to implement suffix generation in UTF-8 mode, while not modifying the base metric name.

Imagine http.requests.seconds with s as unit. We don't want to create a metrics called http.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use

In the Otel world, isn't this metric normally declared as "http.requests" with the unit of "seconds"? Why would an otel user put the unit in the metric name? Do we have evidence of metric names in otel including type/unit already or is this a pure hypothetical? And if this is not the otel standard, why are we bending over backward to fix it?

I ask because this normalization approach is starting to smell like "too much magic" -- we are trying to detect every possible way someone might construct a metric name that would look weird in prometheus, but I don't think it's the job of prometheus to try to clean up someone else's mess.

In the case where the user is asking for prometheus normalization, we can provide guidance for how prometheus will translate names -- escape to underscores, append units, etc. Users should follow those guidelines if they want the names to appear in a consistent way.

If a user chooses to ignore that with a silly name like "http..requests+seconds++" with a unit "seconds", then we follow the normalization convention we defined, and they'll get "http..requests+seconds++_seconds" in utf-8 mode or "http__requests_seconds___seconds" with escaped mode, or maybe with underscore-reduction, "http_requests_seconds_seconds". Garbage in / garbage out.

For the original example in this thread, "metric.+name+_°C" seems like the correct result -- simply the input name + underscore + unit, no other mutations.

regarding the original bug ( #15362 ), this is an escaping collision situation (prometheus/client_golang#1638) -- and two UTF-8 metrics that escape to the same prometheus metric name should be handled as part of that process.

In the meantime, a consensus was reached offline w/ @ywwg and @ArthurSens. In UTF-8 mode, we will not try to detect pre-escaping suffixes (and hence not split up and re-join the metric name).

aknuds1 · 2024-12-17T14:44:33Z

Can this PR be closed, since it seems to be replaced by #15664?

ArthurSens · 2024-12-17T17:48:39Z

Yes it can!

Fix: OTLP translator now keeps multiple separators in metric names

fe1cc29

Signed-off-by: Arthur Silva Sens <[email protected]>

ArthurSens requested review from aknuds1 and jesusvazquez as code owners November 22, 2024 14:11

ArthurSens added 2 commits November 22, 2024 11:35

Use utf8 lib to count runes, intead of counting bytes

3f5b20a

Signed-off-by: Arthur Silva Sens <[email protected]>

Remove conditional inside loop

f71d8ea

Signed-off-by: Arthur Silva Sens <[email protected]>

aknuds1 requested changes Nov 24, 2024

View reviewed changes

aknuds1 added area/opentelemetry kind/bug labels Nov 24, 2024

colega reviewed Nov 25, 2024

View reviewed changes

storage/remote/otlptranslator/prometheus/helpers_from_stdlib.go Outdated Show resolved Hide resolved

jan--f added the backport-3.0 Fixes that need to be backported to release 3.0 label Nov 25, 2024

Keep current behavior when translation strategy is to convert UTF to …

74be395

…underscores Signed-off-by: Arthur Silva Sens <[email protected]>

ArthurSens commented Nov 25, 2024

View reviewed changes

aknuds1 mentioned this pull request Dec 4, 2024

OTLP receiver: Translation strategy NoUTF8EscapingWithSuffixes escapes some metric name characters when adding of metric suffixes is enabled #15534

Closed

aknuds1 requested changes Dec 4, 2024

View reviewed changes

aknuds1 mentioned this pull request Dec 10, 2024

OTLP receiver: Return to using stdlib for splitting metric name in classical translation mode #15573

Merged

ArthurSens mentioned this pull request Dec 11, 2024

otlp/translator: Use separate function for metric names with UTF8 characters #15664

Merged

ArthurSens closed this Dec 17, 2024

ArthurSens deleted the keep-multiple-separators branch December 17, 2024 17:48

Original metric	Keeping same amount of characters (no name collision)	Keeping the current behavior (name collision)
http..requests..total	http__requests__total	http_requests_total
http.requests.total	http_requests_total	http_requests_total

		@@ -130,9 +130,9 @@ func normalizeName(metric pmetric.Metric, namespace string, allowUTF8 bool) stri
		translationFunc = func(r rune) bool { return !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != ':' }

Conversation

ArthurSens commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknuds1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan--f commented Nov 25, 2024

Uh oh!

ArthurSens commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 left a comment

Choose a reason for hiding this comment

Uh oh!

aknuds1 Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 commented Dec 17, 2024

Uh oh!

ArthurSens commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ArthurSens commented Nov 22, 2024 •

edited

Loading

aknuds1 left a comment •

edited

Loading

aknuds1 Nov 24, 2024 •

edited

Loading

aknuds1 Nov 25, 2024 •

edited

Loading

ArthurSens commented Nov 25, 2024 •

edited

Loading

aknuds1 Dec 4, 2024 •

edited

Loading

ArthurSens Dec 9, 2024 •

edited

Loading

aknuds1 Dec 9, 2024 •

edited

Loading

aknuds1 Dec 9, 2024 •

edited

Loading

aknuds1 Dec 10, 2024 •

edited

Loading