Skip to content

Fix: OTLP translator now keeps multiple separators in metric names#15440

Closed
ArthurSens wants to merge 4 commits intomainfrom
keep-multiple-separators
Closed

Fix: OTLP translator now keeps multiple separators in metric names#15440
ArthurSens wants to merge 4 commits intomainfrom
keep-multiple-separators

Conversation

@ArthurSens
Copy link
Copy Markdown
Member

@ArthurSens ArthurSens commented Nov 22, 2024

Fixes #15362


To solve this, I've refactored the function that splits metric names into tokens and separators. Now, the separators are inferred from the position gaps between tokens.

The fix looks easy, but I've hit a strange behavior with foreign characters. Foreign characters skip positions when looping over an array of runes (a string). As shown in this playground: https://go.dev/play/p/qYywzhbbXcf Notice how the positions of and do not follow the numeral order. Fixed in the second commit :)

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the classical name translation mode, which should not be the case. I.e., it would introduce a bug.

require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", false))
require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", false))
require.Equal(t, "metric_with_foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false))
require.Equal(t, "metric_with____foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false))
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classical name translation mode, i.e. only allowing alphanumerics, colons and underscores, should not change. It should be obvious that the PR does the wrong thing when you have to change a test expectation for a case unrelated to the PR's topic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I understood that we would keep the number of characters at least and the previous behavior was a bug. The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.

What I'm trying to avoid by at least keeping the same amount of characters is that we avoid collisions like this:

Original metric Keeping same amount of characters (no name collision) Keeping the current behavior (name collision)
http..requests..total http__requests__total http_requests_total
http.requests.total http_requests_total http_requests_total

I have no strong opinions though, I can see how this would be considered a breaking change, but we could also call this a bug fix 🤔

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I understood that we would keep the number of characters at least and the previous behavior was a bug.

When was that ever discussed? The bug that was pointed out by me is that the new UTF8 translation scheme translates some UTF8 characters to underscores, even though the scheme promises to not escape any UTF8 characters: "NoUTF8EscapingWithSuffixes will keep UTF-8 characters as they are". Please keep this PR on topic with that particular bug, and don't change the classical translation mode.

The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.

It's not the same because, A) it was on topic for the PR, B) not allowing colons in Prometheus metric names was a bug since the Prometheus metric name specification allows for colons.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's a bug that the classical translation mode folds multiple unsupported characters into a single underscore, please make a dedicated issue that can be discussed separately.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no problems, I'll update the PR soon

@jan--f jan--f added the backport-3.0 Fixes that need to be backported to release 3.0 label Nov 25, 2024
@jan--f
Copy link
Copy Markdown
Contributor

jan--f commented Nov 25, 2024

Seems like this is a candidate for a 3.0.1 release? If so let target the release-3.0 branch.

@ArthurSens
Copy link
Copy Markdown
Member Author

ArthurSens commented Nov 25, 2024

Seems like this is a candidate for a 3.0.1 release? If so let target the release-3.0 branch.

When is 3.0.1 scheduled? It looks like we're still discussing what we're trying to fix here. I wouldn't block the release for this fix since it's a rare scenario.

With that said, if we end up finding consensus quickly, I'll rebase the PR on top of release-3.0 :)

t.Run("allow UTF8", func(t *testing.T) {
require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", true))
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true))
require.Equal(t, "unsupported.metric.+weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.+weird", "+=.:,!* & #"), "", true))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aknuds1, is this the test you wanted to see?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that is the one.

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".

Also please see my comment about the logic I don't think makes sense for UTF-8.

@@ -130,9 +130,9 @@ func normalizeName(metric pmetric.Metric, namespace string, allowUTF8 bool) stri
translationFunc = func(r rune) bool { return !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != ':' }
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think treating other letters than alphanumerics and colon specially doesn't make any sense in UTF-8 mode, since Prometheus in UTF-8 mode allows all valid UTF-8 characters. There must be a better way of doing this. Additionally, needlessly tokenizing metric names in UTF-8 mode is a performance hit, that's best avoided.

Copy link
Copy Markdown
Member Author

@ArthurSens ArthurSens Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to tokenize because we don't want to repeat units in the metric name.

Imagine http.requests.seconds with s as unit. We don't want to create a metrics called http.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use strings.Contains(), I'm just not sure which option is better 🤔

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).

I think that instead of tokenizing metric names based on legacy mode rules, there should be some sort of solution for determining whether a suffix should be added or not. Maybe pattern matching? In any case, it's a completely separate problem from escaping characters, which is what the non-UTF-8 mode has to do (implying that UTF-8 mode should not do this).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

I'm a bit confused about your acceptance criteria for this PR, though.

I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".

Did you mean it comes out as metric.name_°C? The second option, metric.name+_°C does look correct anyway ; that's a good catch and can easily be fixed! I'll update the PR with the fix.

I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).

Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered. I believe tokens are still useful here to help us avoid duplicated unit/type suffixes in the final results, but I can agree that the term "separators" does not make sense anymore.

A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right? Should we work on the redesign before committing to the fix here? Or is it ok if I just fix the bug you described above?

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean it comes out as metric.name_°C?

It's the opposite, it doesn't. Anyway, it's not an acceptance criteria, but a symptom of the UTF-8 mode doing the wrong thing, i.e. escaping valid characters.

Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered.

The thing is that this is the design flaw I originally pointed out when the UTF-8 mode was originally proposed (and why I wouldn't approve it), and have documented as #15534. The OTLP UTF-8 mode can't be considered as working well before it no longer escapes characters. It's simply the wrong approach to split the metric name into tokens.

A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right?

I'm not sure what you mean by re-design of the code base. All we need in this context is to fix the OTLP UTF-8 mode such that it no longer escapes characters. I believe it simply should not tokenize metric names, since it's going to be an unnecessary performance hit at the very least, and most likely a source of bugs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Owen(@ywwg)! This is an interesting thread Arve and Arthur are having about how utf8 incoming metric names should be handled in the OTLP endpoint.

I was wondering if you could chime in and share your thoughts since you have context on how utf8 metric names are being handled across the prometheus codebase for remote write and scraping etc.

My opinion after reading through is that we should have a consistent approach so users can expect a similar handling while also having present that this is OTel and past prometheus conventions might not apply (like metric suffixes etcetera)

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify about metric suffixes, that is rather orthogonal to the discussion here, since it's instead controlled by the AddMetricSuffixes API parameter (not a user parameter in Prometheus, but e.g. in Grafana Mimir), and if I recall correctly the consensus at the Prometheus dev summit (Berlin?) was to keep metric suffixes even in UTF-8 mode. The discussion here is more about how to implement suffix generation in UTF-8 mode, while not modifying the base metric name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine http.requests.seconds with s as unit. We don't want to create a metrics called http.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use

In the Otel world, isn't this metric normally declared as "http.requests" with the unit of "seconds"? Why would an otel user put the unit in the metric name? Do we have evidence of metric names in otel including type/unit already or is this a pure hypothetical? And if this is not the otel standard, why are we bending over backward to fix it?

I ask because this normalization approach is starting to smell like "too much magic" -- we are trying to detect every possible way someone might construct a metric name that would look weird in prometheus, but I don't think it's the job of prometheus to try to clean up someone else's mess.

In the case where the user is asking for prometheus normalization, we can provide guidance for how prometheus will translate names -- escape to underscores, append units, etc. Users should follow those guidelines if they want the names to appear in a consistent way.

If a user chooses to ignore that with a silly name like "http..requests+seconds++" with a unit "seconds", then we follow the normalization convention we defined, and they'll get "http..requests+seconds++_seconds" in utf-8 mode or "http__requests_seconds___seconds" with escaped mode, or maybe with underscore-reduction, "http_requests_seconds_seconds". Garbage in / garbage out.

For the original example in this thread, "metric.+name+_°C" seems like the correct result -- simply the input name + underscore + unit, no other mutations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the original bug ( #15362 ), this is an escaping collision situation (prometheus/client_golang#1638) -- and two UTF-8 metrics that escape to the same prometheus metric name should be handled as part of that process.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, a consensus was reached offline w/ @ywwg and @ArthurSens. In UTF-8 mode, we will not try to detect pre-escaping suffixes (and hence not split up and re-join the metric name).

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Dec 17, 2024

Can this PR be closed, since it seems to be replaced by #15664?

@ArthurSens
Copy link
Copy Markdown
Member Author

Yes it can!

@ArthurSens ArthurSens closed this Dec 17, 2024
@ArthurSens ArthurSens deleted the keep-multiple-separators branch December 17, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/opentelemetry backport-3.0 Fixes that need to be backported to release 3.0 kind/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

otlp/translator: Metrics with words split by multiple UTF8 character lose characters during translation

6 participants