Skip to content

Allow UTF-8 characters in metric and label names as opt-in feature#15258

Merged
ArthurSens merged 4 commits intomainfrom
otlp-utf8-allowed
Nov 8, 2024
Merged

Allow UTF-8 characters in metric and label names as opt-in feature#15258
ArthurSens merged 4 commits intomainfrom
otlp-utf8-allowed

Conversation

@ArthurSens
Copy link
Copy Markdown
Member

@ArthurSens ArthurSens commented Nov 1, 2024

Work originally comes from #14974


This PR aims to unblock OTel users who want to keep their metric names unchanged when sending it to Prometheus. The decision is backed by a survey, and by dev-summits where this topic was discussed.

To keep the current behavior, we're adding a new configuration option to the config.file, so people will need to opt-in

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Nov 2, 2024

Taking a look.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Nov 3, 2024

Could you fix conflicts/tests?

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

@ArthurSens ArthurSens force-pushed the otlp-utf8-allowed branch 3 times, most recently from fab4d8b to b81eaef Compare November 4, 2024 14:27
@ArthurSens
Copy link
Copy Markdown
Member Author

Friendly ping @aknuds1, could we merge this now? :)

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Nov 7, 2024

Friendly ping @aknuds1, could we merge this now? :)

@ArthurSens I'll have to review it first, getting around to it today.

aknuds1
aknuds1 previously requested changes Nov 7, 2024
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some more issues.


// Metric name starts with a digit? Prefix it with an underscore.
if metricName != "" && unicode.IsDigit(rune(metricName[0])) {
// Metric name starts with a digit and utf8 not allowed? Prefix it with an underscore.
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesusvazquez do you know if this change is correct?

The mode only says it will "keep UTF-8 characters as they are". Maybe it makes sense to also drop the rule of a metric not starting a metric with a digit, not sure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue 😱

func TestAllowUTF8(t *testing.T) {
t.Run("allow UTF8", func(t *testing.T) {
require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", true))
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you try applying the following, it's going to fail. There's a fundamental bug in the UTF-8 handling.

Suggested change
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", true))
require.Equal(t, "unsupported.metric.+weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.+weird", "+=.:,!* & #"), "", true))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're correct.

Just wanted to mention that although OTel allows full UTF8, the most common ones are dots and foreign characters, and this PR is focused to unblock OTel

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should approve a PR with fundamentally broken logic. I think @jesusvazquez is also checking this out now. It can't be that difficult to fix FWIW.

@aknuds1 aknuds1 dismissed their stale review November 7, 2024 15:03

Leaving to @jesusvazquez to review from here

Signed-off-by: Arthur Silva Sens <[email protected]>
Comment on lines +100 to 116
func TestAllowUTF8KnownBugs(t *testing.T) {
// Due to historical reasons, the translator code was copied from OpenTelemetry collector codebase.
// Over there, they tried to provide means to translate metric names following Prometheus conventions that are documented here:
// https://prometheus.io/docs/practices/naming/
//
// Althogh not explicitly said, it was implied that words should be separated by a single underscore and the codebase was written
// with that in mind.
//
// Now that we're allowing OTel users to have their original names store in prometheus without any transformation, we're facing problems
// where two (or more) UTF-8 characters are being used to separate words.
// TODO(arthursens): Fix it!

// We're asserting on 'NotEqual', which proves the bug.
require.NotEqual(t, "metric....split_=+by_//utf8characters", normalizeName(createGauge("metric....split_=+by_//utf8characters", ""), "", true))
// Here we're asserting on 'Equal', showing the current behavior
require.Equal(t, "metric.split_by_utf8characters", normalizeName(createGauge("metric....split_=+by_//utf8characters", ""), "", true))
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test showcasing the known bug, along with the explanation to why we've got to this situation.

This is a super edge case since appliations using the OTel SDK follow semantic convention, and those conventions do not have metrics with repeated dots or things like that.

Could we tackle this bug in a separate PR since it's not mission critical?

Signed-off-by: Arthur Silva Sens <[email protected]>
Copy link
Copy Markdown
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM lets do a close followup on that bug in case it happens in unknown cases!

@ArthurSens ArthurSens merged commit 0399577 into main Nov 8, 2024
@ArthurSens ArthurSens deleted the otlp-utf8-allowed branch November 8, 2024 13:02
@ArthurSens
Copy link
Copy Markdown
Member Author

Issue opened, it's on my TODO list!

bwplotka pushed a commit that referenced this pull request Nov 11, 2024
…15258)

* Allow UTF-8 characters in metric and label names as opt-in feature

---------

Signed-off-by: Arthur Silva Sens <[email protected]>
@bwplotka
Copy link
Copy Markdown
Member

FYI, porting to 3.0 #15384

aknuds1 added a commit to aknuds1/prometheus that referenced this pull request Dec 10, 2024
In non-UTF-8 mode, use strings.FieldsFunc to split string into tokens,
as it was before PR prometheus#15258. This makes the code more robust and
readable.

Signed-off-by: Arve Knudsen <[email protected]>
aknuds1 added a commit that referenced this pull request Dec 11, 2024
…#15573)

In non-UTF-8 mode, use strings.FieldsFunc to split string into tokens,
as it was before PR #15258. This makes the code more robust and
readable.

Signed-off-by: Arve Knudsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants