Skip to content

feat: otlp allow not translating UTF8 characters#14974

Closed
jmichalek132 wants to merge 7 commits intoprometheus:otlp-fix-sanitizafrom
jmichalek132:jm-feat-otlp-allow-not-translating-dots
Closed

feat: otlp allow not translating UTF8 characters#14974
jmichalek132 wants to merge 7 commits intoprometheus:otlp-fix-sanitizafrom
jmichalek132:jm-feat-otlp-allow-not-translating-dots

Conversation

@jmichalek132
Copy link
Copy Markdown
Contributor

@jmichalek132 jmichalek132 commented Sep 24, 2024

config/config.go Outdated
// OTLPConfig is the configuration for writing to the OTLP endpoint.
type OTLPConfig struct {
PromoteResourceAttributes []string `yaml:"promote_resource_attributes,omitempty"`
TranslateDots bool `default:"true" yaml:"translate_dots"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said in the meeting, let's make this an Enum so that there is future-proofing for other possible translation modes. Something like "NameConversionMode" with values of "NoConversion" and "ClassicPrometheusConverstion"

@ArthurSens
Copy link
Copy Markdown
Member

@jmichalek132 as discussed in a few calls this week, let me take over this PR so you can focus on your mentorship project :)

@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:

  1. OTLP users want their metrics as is. No underscores, no units, no nothing, and we want to allow this use case.
  2. Before 3.0 Prometheus did not accept UTF8 characters by default so we had to translate UTF8 characters to underscores. Now it's a different scenario, UTF8 is accepted by default and we can accept them as they are. (No need to worry about unit suffixes)

Which of those problems are we solving here? Depending on the answer I'm thinking about two different API for the config file:

otlp:
  otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)
otlp:
  add_unit_suffixes: false (default) | true #Probably not being worked on in this PR
  utf8_to_underscore: false (default) | true

@jmichalek132
Copy link
Copy Markdown
Contributor Author

@jmichalek132 as discussed in a few calls this week, let me take over this PR so you can focus on your mentorship project :)

@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:

  1. OTLP users want their metrics as is. No underscores, no units, no nothing, and we want to allow this use case.
  2. Before 3.0 Prometheus did not accept UTF8 characters by default so we had to translate UTF8 characters to underscores. Now it's a different scenario, UTF8 is accepted by default and we can accept them as they are. (No need to worry about unit suffixes)

Which of those problems are we solving here? Depending on the answer I'm thinking about two different API for the config file:

otlp:
  otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)
otlp:
  add_unit_suffixes: false (default) | true #Probably not being worked on in this PR
  utf8_to_underscore: false (default) | true

For reference linking the consensus from the Prometheus in person dev summit on this topic

https://docs.google.com/document/d/1uurQCi5iVufhYHGlBZ8mJMK_freDFKPG0iYBQqJ9fvA/edit#bookmark=id.x5wujq3bsjjo

CONSENSUS: We recommend keeping the metric names from OTel SDKs as unchanged as possible when converting to Prometheus samples as the default.
DISAGREE, but not blocking Fabian Stäber and Julius Volz would like to keep units and _total
CONSENSUS: For classic histograms we will still add _bucket, _count and _sum,similar for summaries.
CONSENSUS: We intend for OpenMetrics 2.x to support this, targeting 2.0
CONSENSUS: We will have a Prometheus mode which converts . to _ with suffixes and units.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 27, 2024

@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:

@ArthurSens @jmichalek132 @ywwg given the consensus in the latest dev summit ("We want to explore ways to do type and unit annotations in PromQL"), we've concluded that we need a way to ensure time series are identified through type and unit, right? So we avoid collisions between metrics that are named the same, but have different types/units.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 27, 2024

I think the PR needs to define properly which problem is being solved. Jumping into a solution isn't the best approach.

@ArthurSens
Copy link
Copy Markdown
Member

I think the PR needs to define properly which problem is being solved. Jumping into a solution isn't the best approach.

Yes exactly! I was hoping we could come to an agreement before I continue with the PR :)

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 27, 2024

Yes exactly! I was hoping we could come to an agreement before I continue with the PR :)

Agree, we need to decide on a solution to the problem laid out in OpenTelemetry → Prometheus: type+unit suffixes, right?

The agreed upon solution would have to be well defined, which I think we haven't got to yet.

@ArthurSens
Copy link
Copy Markdown
Member

ArthurSens commented Sep 27, 2024

Agree, we need to decide on a solution to the problem laid out in OpenTelemetry → Prometheus: type+unit suffixes, right?

The agreed upon solution would have to be well defined, which I think we haven't got to yet.

If that's a requirement for the work here, I'm afraid this PR will have to wait until Prometheus 4.0 :/

I don't think we'll manage to implement any of those proposals before 3.0 (scheduled for november this year as far as I could understand). So I was hoping we could find a middle ground: let OTel users have what they want, even if not the best solution.

If we release 3.0 with the current behavior (doing translations), changing the default to no translation would be a breaking change. Could we release 3.0 with no translation, as far as it's well documented in the OTLP guide that name colisions are possible and people should opt-in to ClassicPrometheusTranslation to avoid these situations? Once we have good solution for units/types, we can finally drop the warning in our guides

@ArthurSens
Copy link
Copy Markdown
Member

If the team decides that we really want to wait for proper unit/type handling, I guess I could continue the PR... just changing the default to ClassicPrometheusTranslation

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 27, 2024

If that's a requirement for the work here, I'm afraid this PR will have to wait until Prometheus 4.0 :/

Could you start a discussion with @gouthamve in CNCF Slack on what his current stance is wrt. dropping OTel metric name translation? I kind of forgot what he last communicated.

@ywwg
Copy link
Copy Markdown
Member

ywwg commented Sep 27, 2024

My sense from the Otel folks is that the conversion to underscores is a pain point right now, so I'd prefer to find a way to allow the default to be no-translation. If my understanding is correct the issue of same-name-different-type has only been hypothetical so far. I think the benefit to the otel community of having names pass through unchanged outweighs the unproven possibility of collisions.

Perhaps it would be sufficient to add a note in the documentation explaining "by default no translation is done to Otel metric names. If there are multiple metrics with the same name but different types or units, these metrics will collide when queried in Prometheus. In this situation, administrators should set the translation mode to ClassicPrometheusTranslation to avoid the collisions".

As for config choices:

otlp:
  add_unit_suffixes: false (default) | true #Probably not being worked on in this PR
  utf8_to_underscore: false (default) | true

if we do this, does add_unit_suffixes always append with an underscore? imho I think it might be fine to use a dot if the user has otherwise not specified underscore conversion, e.g.: "http.status" → "http.status.total". Because of ambiguities like this, I think I'm more in favor of the enum approach:

otlp:
  otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 29, 2024

@ywwg @ArthurSens in the latest dev summit, @dashpole noted the following about @gouthamve's latest thinking:

After the inperson meeting Goutham realized that it’s not as easy as initially thought. We need to account for metric name collision if we don’t handle unit and types somehow.

Doesn't this indicate we need a solution that avoids collisions (as our current solution does, through suffixes)?

@ArthurSens
Copy link
Copy Markdown
Member

After the inperson meeting Goutham realized that it’s not as easy as initially thought. We need to account for metric name collision if we don’t handle unit and types somehow.

Doesn't this indicate we need a solution that avoids collisions (as our current solution does, through suffixes)?

For the perfect scenario, yes! I was just hoping that we could release something that allows no translation before 3.0, even knowing that name collisions is a possibility, while we keep working on a long term solution that will take a few months to release.

Would you be ok if we continue this PR but change the default behavior to keep translating things? NoTranslation would be opt-in, while being very well documented that it can lead to name collision.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Sep 30, 2024

For the perfect scenario, yes! I was just hoping that we could release something that allows no translation before 3.0, even knowing that name collisions is a possibility, while we keep working on a long term solution that will take a few months to release.

I'm kind of agnostic, I'm just trying to figure out what is the actual consensus in this sense. I've asked @gouthamve in CNCF Slack. I primarily want to avoid adding changes we end up not wanting. @dashpole's note from the last summit gives the impression we don't want a translation mode that can lead to collisions.

@jmichalek132
Copy link
Copy Markdown
Contributor Author

My sense from the Otel folks is that the conversion to underscores is a pain point right now, so I'd prefer to find a way to allow the default to be no-translation. If my understanding is correct the issue of same-name-different-type has only been hypothetical so far. I think the benefit to the otel community of having names pass through unchanged outweighs the unproven possibility of collisions.

Perhaps it would be sufficient to add a note in the documentation explaining "by default no translation is done to Otel metric names. If there are multiple metrics with the same name but different types or units, these metrics will collide when queried in Prometheus. In this situation, administrators should set the translation mode to ClassicPrometheusTranslation to avoid the collisions".

As for config choices:

otlp:
  add_unit_suffixes: false (default) | true #Probably not being worked on in this PR
  utf8_to_underscore: false (default) | true

if we do this, does add_unit_suffixes always append with an underscore? imho I think it might be fine to use a dot if the user has otherwise not specified underscore conversion, e.g.: "http.status" → "http.status.total". Because of ambiguities like this, I think I'm more in favor of the enum approach:

otlp:
  otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)

I agree with @ywwg suggestions.
Also to add my 2 cents,

@ArthurSens ArthurSens force-pushed the jm-feat-otlp-allow-not-translating-dots branch 2 times, most recently from 3af6071 to d0a868c Compare September 30, 2024 18:46
@ArthurSens ArthurSens changed the title feat: otlp allow not translating dots feat: otlp allow not translating UTF8 characters Sep 30, 2024
@ArthurSens ArthurSens force-pushed the jm-feat-otlp-allow-not-translating-dots branch from d0a868c to 4e03883 Compare September 30, 2024 18:54
@ArthurSens
Copy link
Copy Markdown
Member

I've updated the PR to not translate UTF8 characters by default. I'm also not touching unit/type suffixes in any way!

I'm happy to change the PR again to allow not adding unit/type suffixes, but I don't want to block the UTF8 translation just because we can't find an agreement on unit/type handling

@ArthurSens ArthurSens force-pushed the jm-feat-otlp-allow-not-translating-dots branch from 4e03883 to 4796f57 Compare September 30, 2024 19:02
@ArthurSens ArthurSens marked this pull request as ready for review September 30, 2024 19:18
dashpole
dashpole previously approved these changes Oct 21, 2024
type translationStrategyOption string

var (
// NoUTF8EscapingWithSuffixes will keep UTF-8 characters as they are, units and type suffixes will still be added.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the two options were going to be "no alteration at all" and "underscoreescapingwithsuffixes"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(is this because type/unit is differentiating?)

@ywwg
Copy link
Copy Markdown
Member

ywwg commented Oct 22, 2024

I am fine with adding an experimental mode to not translate provided we make it clear that this is experimental and can break in the future. Should we do it via a feature-flag or not needs to be decided.

context for why I thought we were also going to have a no-translation mode with caveats

@ywwg
Copy link
Copy Markdown
Member

ywwg commented Oct 23, 2024

Per our discussion, because units are differentiating and prometheus does not support otel unit metadata well, it makes sense for prom's otel endpoint to add suffixes in all available modes. In the future when prom does support unit metadata, we can revisit the enum and add a mode to pass names unaltered.

ywwg
ywwg previously approved these changes Oct 23, 2024
}

// Replace all non-alphanumeric runes with underscores
label = strings.Map(sanitizeRune, label)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you no longer use this method (for the non-allowUTF8 case), and why do call model.EscapeName(label, model.UnderscoreEscaping) (assuming this is supposed to have the same effect?) after potentially prefixing with key?

I basically don't understand why we would change the old mode (i.e. where we convert UTF8 chars to underscores).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using model.EscapeName as a suggestion from @ywwg!

The previous function isn't working as expected, it translates characters to underscores if unicode.IsLetter(r) || unicode.IsDigit(r) is not true.

This is problematic because:

It looks impractical to concatenate every foreign character in the sanitizeRunes conditional, so I'm just using the same function used in prometheus/common like owen suggested :)

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using model.EscapeName as a suggestion from @ywwg!

If we are making changes unrelated to the PR's topic (add un-escaped UTF8 mode), it should be a separate PR. I want to see explicitly what the effect is from using model.EscapeName instead.

Copy link
Copy Markdown
Member

@ArthurSens ArthurSens Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was now rebased on top of #15249, so we need that one merged first

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think tests should loop over true/false for allowUTF8, there should instead be one sub-test for each mode.

Also see my previous question regarding NormalizeLabel.

Comment on lines +86 to +96
for _, allowUTF8 := range []bool{true, false} {
if allowUTF8 {
require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported.metric.redundant___test $_per_°C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
} else {
require.Equal(t, "unsupported_metric_temperature_F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a loop.

Suggested change
for _, allowUTF8 := range []bool{true, false} {
if allowUTF8 {
require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported.metric.redundant___test $_per_°C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
} else {
require.Equal(t, "unsupported_metric_temperature_F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
}
}
t.Run("allow UTF8", func(t *testing.T) {
require.Equal(t, "unsupported.metric.temperature_°F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported.metric.weird_+=.:,!* & #", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported.metric.redundant___test $_per_°C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
})
t.Run("disallow UTF8", func(t *testing.T) {
require.Equal(t, "unsupported_metric_temperature_F", normalizeName(createGauge("unsupported.metric.temperature", "°F"), "", allowUTF8))
require.Equal(t, "unsupported_metric_weird", normalizeName(createGauge("unsupported.metric.weird", "+=.:,!* & #"), "", allowUTF8))
require.Equal(t, "unsupported_metric_redundant_test_per_C", normalizeName(createGauge("unsupported.metric.redundant", "__test $/°C"), "", allowUTF8))
})

annots, err := converter.FromMetrics(r.Context(), req.Metrics(), otlptranslator.Settings{
AddMetricSuffixes: true,
PromoteResourceAttributes: otlpCfg.PromoteResourceAttributes,
AllowUTF8: (model.NameValidationScheme == model.UTF8Validation) && (otlpCfg.TranslationStrategy == config.NoUTF8EscapingWithSuffixes),
Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It concerns me that if model.NameValidationScheme is not model.UTF8Validation, the NoUTF8EscapingWithSuffixes translation strategy is silently ignored. How about we instead log a warning if model.NameValidationScheme != model.UTF8Validation && otlpCfg.TranslationStrategy == config.NoUTF8EscapingWithSuffixes, while also setting AllowUTF8: false?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I think we can warn that while loading the configuration

Copy link
Copy Markdown
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In retrospect, I don't think CleanUpString should change. Just don't use it when allowing UTF8.

@ArthurSens ArthurSens dismissed stale reviews from ywwg and dashpole via e1158ca October 30, 2024 21:26
@ArthurSens ArthurSens force-pushed the jm-feat-otlp-allow-not-translating-dots branch from edf84a8 to e1158ca Compare October 30, 2024 21:26
@ArthurSens ArthurSens changed the base branch from main to otlp-fix-sanitiza October 30, 2024 21:28
@ArthurSens ArthurSens force-pushed the otlp-fix-sanitiza branch 3 times, most recently from 8df09f3 to ea06f1a Compare October 30, 2024 22:29
@ArthurSens
Copy link
Copy Markdown
Member

Folks, I've messed up my commits here 😓. Let's get this one merged first and I'll figure out what to do here later

@ArthurSens ArthurSens deleted the branch prometheus:otlp-fix-sanitiza October 31, 2024 22:05
@ArthurSens ArthurSens closed this Oct 31, 2024
@ArthurSens
Copy link
Copy Markdown
Member

Lol ok, after merging the chained PR the branch got automatically deleted and messed up this PR. I'll open another one, sorry everyone

@jmichalek132 jmichalek132 deleted the jm-feat-otlp-allow-not-translating-dots branch February 15, 2025 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants