Fix: OTLP translator now keeps multiple separators in metric names#15440
Fix: OTLP translator now keeps multiple separators in metric names#15440ArthurSens wants to merge 4 commits intomainfrom
Conversation
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
| require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", false)) | ||
| require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", false)) | ||
| require.Equal(t, "metric_with_foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false)) | ||
| require.Equal(t, "metric_with____foreign_characters", normalizeName(createGauge("metric_with_字符_foreign_characters", "ど"), "", false)) |
There was a problem hiding this comment.
The classical name translation mode, i.e. only allowing alphanumerics, colons and underscores, should not change. It should be obvious that the PR does the wrong thing when you have to change a test expectation for a case unrelated to the PR's topic.
There was a problem hiding this comment.
Well, I understood that we would keep the number of characters at least and the previous behavior was a bug. The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.
What I'm trying to avoid by at least keeping the same amount of characters is that we avoid collisions like this:
| Original metric | Keeping same amount of characters (no name collision) | Keeping the current behavior (name collision) |
|---|---|---|
| http..requests..total | http__requests__total | http_requests_total |
| http.requests.total | http_requests_total | http_requests_total |
I have no strong opinions though, I can see how this would be considered a breaking change, but we could also call this a bug fix 🤔
There was a problem hiding this comment.
Well, I understood that we would keep the number of characters at least and the previous behavior was a bug.
When was that ever discussed? The bug that was pointed out by me is that the new UTF8 translation scheme translates some UTF8 characters to underscores, even though the scheme promises to not escape any UTF8 characters: "NoUTF8EscapingWithSuffixes will keep UTF-8 characters as they are". Please keep this PR on topic with that particular bug, and don't change the classical translation mode.
The same way you did in your PR that allowed colons, you changed the test because it didn't reflect the behavior we wanted.
It's not the same because, A) it was on topic for the PR, B) not allowing colons in Prometheus metric names was a bug since the Prometheus metric name specification allows for colons.
There was a problem hiding this comment.
If you think it's a bug that the classical translation mode folds multiple unsupported characters into a single underscore, please make a dedicated issue that can be discussed separately.
There was a problem hiding this comment.
Ok no problems, I'll update the PR soon
storage/remote/otlptranslator/prometheus/helpers_from_stdlib.go
Outdated
Show resolved
Hide resolved
|
Seems like this is a candidate for a 3.0.1 release? If so let target the |
When is 3.0.1 scheduled? It looks like we're still discussing what we're trying to fix here. I wouldn't block the release for this fix since it's a rare scenario. With that said, if we end up finding consensus quickly, I'll rebase the PR on top of release-3.0 :) |
…underscores Signed-off-by: Arthur Silva Sens <[email protected]>
| t.Run("allow UTF8", func(t *testing.T) { | ||
| require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", true)) | ||
| require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true)) | ||
| require.Equal(t, "unsupported.metric.+weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.+weird", "+=.:,!* & #"), "", true)) |
There was a problem hiding this comment.
@aknuds1, is this the test you wanted to see?
aknuds1
left a comment
There was a problem hiding this comment.
I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".
Also please see my comment about the logic I don't think makes sense for UTF-8.
| @@ -130,9 +130,9 @@ func normalizeName(metric pmetric.Metric, namespace string, allowUTF8 bool) stri | |||
| translationFunc = func(r rune) bool { return !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != ':' } | |||
There was a problem hiding this comment.
I think treating other letters than alphanumerics and colon specially doesn't make any sense in UTF-8 mode, since Prometheus in UTF-8 mode allows all valid UTF-8 characters. There must be a better way of doing this. Additionally, needlessly tokenizing metric names in UTF-8 mode is a performance hit, that's best avoided.
There was a problem hiding this comment.
I think we need to tokenize because we don't want to repeat units in the metric name.
Imagine http.requests.seconds with s as unit. We don't want to create a metrics called http.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use strings.Contains(), I'm just not sure which option is better 🤔
There was a problem hiding this comment.
I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).
I think that instead of tokenizing metric names based on legacy mode rules, there should be some sort of solution for determining whether a suffix should be added or not. Maybe pattern matching? In any case, it's a completely separate problem from escaping characters, which is what the non-UTF-8 mode has to do (implying that UTF-8 mode should not do this).
There was a problem hiding this comment.
Fair enough!
I'm a bit confused about your acceptance criteria for this PR, though.
I tested normalizeName with the OTLP metric name "metric.+name+" and unit "°C", and the result comes out as "metric.name+°C" instead of "metric.name+_°C".
Did you mean it comes out as metric.name_°C? The second option, metric.name+_°C does look correct anyway ; that's a good catch and can easily be fixed! I'll update the PR with the fix.
I don't understand how it makes sense to tokenize metric names in UTF-8 mode based on the character set allowed in Prometheus legacy mode, though. I.e., treating characters that are not alphanumeric and not colons as separators doesn't make sense in UTF-8 mode. All characters should be treated the same, so long as they are valid UTF-8 (if they're invalid, the metric should be rejected).
Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered. I believe tokens are still useful here to help us avoid duplicated unit/type suffixes in the final results, but I can agree that the term "separators" does not make sense anymore.
A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right? Should we work on the redesign before committing to the fix here? Or is it ok if I just fix the bug you described above?
There was a problem hiding this comment.
Did you mean it comes out as metric.name_°C?
It's the opposite, it doesn't. Anyway, it's not an acceptance criteria, but a symptom of the UTF-8 mode doing the wrong thing, i.e. escaping valid characters.
Well, the code was initially written that way, and I was looking for the smallest change possible to fix the bug we've discovered.
The thing is that this is the design flaw I originally pointed out when the UTF-8 mode was originally proposed (and why I wouldn't approve it), and have documented as #15534. The OTLP UTF-8 mode can't be considered as working well before it no longer escapes characters. It's simply the wrong approach to split the metric name into tokens.
A redesign of the codebase sounds useful, and that's one of the things we've been discussing in the OTel-Prometheus SIG calls, right?
I'm not sure what you mean by re-design of the code base. All we need in this context is to fix the OTLP UTF-8 mode such that it no longer escapes characters. I believe it simply should not tokenize metric names, since it's going to be an unnecessary performance hit at the very least, and most likely a source of bugs.
There was a problem hiding this comment.
Hey Owen(@ywwg)! This is an interesting thread Arve and Arthur are having about how utf8 incoming metric names should be handled in the OTLP endpoint.
I was wondering if you could chime in and share your thoughts since you have context on how utf8 metric names are being handled across the prometheus codebase for remote write and scraping etc.
My opinion after reading through is that we should have a consistent approach so users can expect a similar handling while also having present that this is OTel and past prometheus conventions might not apply (like metric suffixes etcetera)
There was a problem hiding this comment.
To clarify about metric suffixes, that is rather orthogonal to the discussion here, since it's instead controlled by the AddMetricSuffixes API parameter (not a user parameter in Prometheus, but e.g. in Grafana Mimir), and if I recall correctly the consensus at the Prometheus dev summit (Berlin?) was to keep metric suffixes even in UTF-8 mode. The discussion here is more about how to implement suffix generation in UTF-8 mode, while not modifying the base metric name.
There was a problem hiding this comment.
Imagine
http.requests.secondswithsas unit. We don't want to create a metrics calledhttp.requests.seconds.seconds, so we check if the unit is already present in the token array. We could also use
In the Otel world, isn't this metric normally declared as "http.requests" with the unit of "seconds"? Why would an otel user put the unit in the metric name? Do we have evidence of metric names in otel including type/unit already or is this a pure hypothetical? And if this is not the otel standard, why are we bending over backward to fix it?
I ask because this normalization approach is starting to smell like "too much magic" -- we are trying to detect every possible way someone might construct a metric name that would look weird in prometheus, but I don't think it's the job of prometheus to try to clean up someone else's mess.
In the case where the user is asking for prometheus normalization, we can provide guidance for how prometheus will translate names -- escape to underscores, append units, etc. Users should follow those guidelines if they want the names to appear in a consistent way.
If a user chooses to ignore that with a silly name like "http..requests+seconds++" with a unit "seconds", then we follow the normalization convention we defined, and they'll get "http..requests+seconds++_seconds" in utf-8 mode or "http__requests_seconds___seconds" with escaped mode, or maybe with underscore-reduction, "http_requests_seconds_seconds". Garbage in / garbage out.
For the original example in this thread, "metric.+name+_°C" seems like the correct result -- simply the input name + underscore + unit, no other mutations.
There was a problem hiding this comment.
regarding the original bug ( #15362 ), this is an escaping collision situation (prometheus/client_golang#1638) -- and two UTF-8 metrics that escape to the same prometheus metric name should be handled as part of that process.
There was a problem hiding this comment.
In the meantime, a consensus was reached offline w/ @ywwg and @ArthurSens. In UTF-8 mode, we will not try to detect pre-escaping suffixes (and hence not split up and re-join the metric name).
|
Can this PR be closed, since it seems to be replaced by #15664? |
|
Yes it can! |
Fixes #15362
To solve this, I've refactored the function that splits metric names into tokens and separators. Now, the separators are inferred from the position gaps between tokens.
The fix looks easy, but I've hit a strange behavior with foreign characters. Foreign characters skip positions when looping over an array of runes (a string). As shown in this playground: https://go.dev/play/p/qYywzhbbXcf Notice how the positions ofFixed in the second commit :)字and符do not follow the numeral order.