feat(utf8): utf-8 content negotiation and flags by ywwg · Pull Request #14490 · prometheus/prometheus

ywwg · 2024-07-18T18:22:17Z

Implement content negotiation and flags for UTF-8 support in Prometheus.

ywwg · 2024-07-18T18:58:09Z

prompb/io/prometheus/write/v2/types.proto

+  // If true, metric names and labels can use the full UTF-8 character set.
+  // Otherwise, apply the legacy naming convention.
+  // XXXX do we even need this?  content negotiation should take care of it
+  bool utf8_names = 6;


pretty sure I can drop this completely but worth checking

This ideally has to be discussed in PRW 2.0 forum ideally, we want to make sure the spec is and this is source of truth 🙃 Do we need this? cc @cstyan

In the spec rw2.0 we already allow for utf8 characters in names and values, and just have SHOULD for the current prometheus name format. Don't think we need this in the proto for 2.0.

Do we need this for remote write? Can we parse everything and reject series based on the feature flag?

the content negotiation should resolve whether the metrics will be sent as UTF-8 or escaped. This would cover the case where the remote server could support UTF-8 but has that feature turned off for some reason, and the sender has UTF-8 metrics it is trying to send.

Alternatively we could just have the remote write fail validation and reject the incoming series

beorn7 · 2024-07-23T14:25:04Z

This in my review queue (but the queue is long, so if somebody beats me to it, even better).

bwplotka · 2024-08-01T16:02:50Z

config/config.go

 	// 0 means no limit.
 	KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"`
+	// Allow UTF8 Metric and Label Names
+	AllowUTF8Names bool `yaml:"utf8_names,omitempty"`


Hm, wonder if "escaping scheme" is not a better config option? Ideally we avoid bools and inflexibility of it.

so an escaping scheme would imply utf8 names being accepted? I think that would be ok

Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.

I just realised that this is a config that users set and will be a breaking change if we choose to change it. Maybe boolean indeed is not the best. How about we have it as metric_name_validation_schema with legacy and utf8 as two options? The code later can choose to convert into boolean for using it for checks.

so an escaping scheme would imply utf8 names being accepted? I think that would be ok

I would rather be explicit and not change validation schema based on escaping schema at the moment.

ok sounds good

codesome · 2024-08-01T20:16:35Z

prompb/io/prometheus/write/v2/types.proto

+  // If true, metric names and labels can use the full UTF-8 character set.
+  // Otherwise, apply the legacy naming convention.
+  // XXXX do we even need this?  content negotiation should take care of it
+  bool utf8_names = 6;


Do we need this for remote write? Can we parse everything and reject series based on the feature flag?

codesome · 2024-08-01T20:30:15Z

config/config.go

 	// 0 means no limit.
 	KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"`
+	// Allow UTF8 Metric and Label Names
+	AllowUTF8Names bool `yaml:"utf8_names,omitempty"`


Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.

codesome · 2024-08-01T20:32:41Z

cmd/prometheus/main.go

 		Hidden().Default("5s").SetValue(&cfg.scrape.DiscoveryReloadInterval)

-	a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.").
+	a.Flag("scrape.name-escaping-scheme", `Method for escaping legacy invalid names when sending to a Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots")`).Default(scrape.DefaultNameEscapingScheme.String()).StringVar(&cfg.nameEscapingScheme)


Are these documented somewhere in the docs? Maybe a good idea to brief each option in the help text as well if possible.

I should have left this PR as "WIP" -- now that we have a consensus on how to name / arrange the flags, I will create docs for them

added some docs, please let me know if I should add more

codesome · 2024-08-01T20:36:51Z

scrape/scrape_test.go

+	runTest(acceptHeader(config.DefaultProtoFirstScrapeProtocols, false))
+}
+
+func TestUTF8TargetScraperScrapeOK(t *testing.T) {


I suggest merging this test with TestTargetScraperScrapeOK if possible. You can use booleans similar to protobufParsing in TestTargetScraperScrapeOK to determine if you should have the utf8 header or not when checking.

codesome · 2024-08-01T20:46:01Z

Question: do we ignore escaping schemas if we set validation schema to utf8?

ywwg · 2024-08-05T14:19:41Z

Question: do we ignore escaping schemas if we set validation schema to utf8?

If both sides of the negotiation have utf8 support enabled, then there will be no escaping, and yes the escaping setting will be ignored. The escaping mode choice is purely for the case where the system receiving the metrics does not support utf8, but the sender (or the target being scraped) does -- it's a backwards-compatibility setting.

beorn7 · 2024-08-08T12:10:55Z

@ywwg thanks for doing this and thanks @cstyan @codesome @bwplotka for commenting.
I assume this PR has enough qualified reviewers and will focus my review activity elsewhere.
Let me know if you need anything from me here.

ywwg · 2024-08-09T14:13:10Z

I will be addressing the comments and fixing the conflicts soon

ywwg · 2024-08-13T15:40:28Z

bryan says we should have utf-8 on by default for this and for v3, so I will update the PR accordingly

ywwg · 2024-08-13T20:17:00Z

bryan says we should have utf-8 on by default for this and for v3, so I will update the PR accordingly

I am not sure the best way to do this -- maybe change the enable-feature flag to "legacy-names" as a way to turn off utf8?

beorn7 · 2024-08-14T13:49:44Z

Hmm, I get the argument "switch this on by default in v3", but isn't it dangerous to switch this on by default right now? IIRC what @bboreham said, he even emphasized that this could be seen as a breaking change from a certain PoV.

Maybe the assumption was that 2.54 is the last 2.x release? But I wouldn't bet on that right now. (Even if we release v3 at PromCon, we might want to keep separate v2 and v3 branches for a while.)

ywwg · 2024-08-14T14:00:47Z

so should I revert this latest change and then when v3 is updated, apply the flip there?

beorn7 · 2024-08-14T16:35:43Z

Yes, that would be my understanding.

ywwg · 2024-08-14T17:46:33Z

reverted defaults so it is now default-off again.

the real difference between the default is:

a blank/default value for metric_name_validation_scheme flips from "legacy" to "utf8"
enable-feature flag changes from utf8-names to legacy-names
init() function sets validation mode in common/model to utf8. (If there is a v3 branch of common we can change the default there as well)

codesome

Mostly nit comments. One step away from merging!

cmd/prometheus/main.go

config/config.go

docs/feature_flags.md

scrape/scrape.go

scrape/scrape_test.go

ywwg · 2024-08-16T13:04:25Z

Do you want me to apply suggestions all as single unsquashed configs? Or should I apply them and then rebase/squash so we still only have a single commit? I am not sure what the prometheus standard is.

ywwg · 2024-08-16T14:48:44Z

the commit history to prometheus seems to be squash, so I'll do that

codesome · 2024-08-16T16:52:52Z

Do you want me to apply suggestions all as single unsquashed configs? Or should I apply them and then rebase/squash so we still only have a single commit? I am not sure what the prometheus standard is.

Multiple commits are fine in the PR. I can squash merge the PR. We don't have a strict rule. We do a normal merge when individual commits makes sense independently in the git history. Squash merge otherwise.

codesome · 2024-08-16T20:37:48Z

docs/command-line/prometheus.md

 | <code class="text-nowrap">--query.max-concurrency</code> | Maximum number of queries executed concurrently. Use with server mode only. | `20` |
 | <code class="text-nowrap">--query.max-samples</code> | Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return. Use with server mode only. | `50000000` |
-| <code class="text-nowrap">--enable-feature</code> | Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval, delayed-compaction. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details. |  |
+| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` |


Suggested change

| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` |

| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots". | `values` |

codesome

LGTM with the tiny fix above

Signed-off-by: Owen Williams <[email protected]>

ywwg commented Jul 18, 2024

View reviewed changes

ywwg force-pushed the owilliams/utf8-02-upstream branch 5 times, most recently from 3310266 to 8e11c13 Compare August 1, 2024 14:33

ywwg changed the title ~~DRAFT WIP: utf8 content negotiation / flags~~ feat(utf8): utf8 content negotiation and flags Aug 1, 2024

ywwg marked this pull request as ready for review August 1, 2024 14:36

ywwg requested review from bwplotka, cstyan, roidelapluie and tomwilkie as code owners August 1, 2024 14:36

ywwg force-pushed the owilliams/utf8-02-upstream branch 3 times, most recently from 7200a78 to 8c3881c Compare August 1, 2024 14:47

bwplotka reviewed Aug 1, 2024

View reviewed changes

ywwg mentioned this pull request Aug 1, 2024

[meta] Remote Write 2.0 Spec Feedback List #14360

Closed

4 tasks

codesome reviewed Aug 1, 2024

View reviewed changes

beorn7 removed request for cstyan, roidelapluie and tomwilkie August 8, 2024 12:09

ywwg force-pushed the owilliams/utf8-02-upstream branch 2 times, most recently from 499b186 to c610d98 Compare August 12, 2024 15:39

ywwg marked this pull request as draft August 12, 2024 15:39

ywwg force-pushed the owilliams/utf8-02-upstream branch from 5c23703 to 2cb1827 Compare August 12, 2024 18:10

ywwg changed the title ~~feat(utf8): utf8 content negotiation and flags~~ feat(utf8): utf-8 content negotiation and flags Aug 13, 2024

ywwg requested a review from codesome August 13, 2024 15:40

ywwg force-pushed the owilliams/utf8-02-upstream branch from 6d2747d to 2cb1827 Compare August 14, 2024 17:43

ywwg mentioned this pull request Aug 14, 2024

chore(v3): set UTF-8 default-on #14674

Closed

ywwg force-pushed the owilliams/utf8-02-upstream branch from 2cb1827 to 0f9278d Compare August 14, 2024 18:54

codesome reviewed Aug 15, 2024

View reviewed changes

ywwg force-pushed the owilliams/utf8-02-upstream branch 2 times, most recently from f0cc4ad to 9556d70 Compare August 16, 2024 14:48

ywwg force-pushed the owilliams/utf8-02-upstream branch from 9556d70 to 3844d20 Compare August 16, 2024 17:41

ywwg requested a review from codesome August 16, 2024 19:06

codesome reviewed Aug 16, 2024

View reviewed changes

codesome previously approved these changes Aug 16, 2024

View reviewed changes

feat(utf8): utf8 content negotation and flags

9e7308d

Signed-off-by: Owen Williams <[email protected]>

ywwg dismissed codesome’s stale review via 9e7308d August 16, 2024 20:41

ywwg force-pushed the owilliams/utf8-02-upstream branch from 3844d20 to 9e7308d Compare August 16, 2024 20:41

codesome enabled auto-merge August 16, 2024 20:42

codesome approved these changes Aug 16, 2024

View reviewed changes

codesome merged commit e86e4ed into prometheus:main Aug 16, 2024

ywwg mentioned this pull request Aug 29, 2024

Epic: Implement UTF-8 character support for label and metric names #13095

Closed

40 tasks

npazosmendez mentioned this pull request Sep 9, 2024

fix(utf8): propagate validationScheme config to scraping options #14880

Merged

matthewhughes-uw mentioned this pull request Jan 22, 2025

escaping=allow-utf-8 set in Accept header when scraping even with metric_name_validation_scheme: legacy with PrometheusText1.0.0 #15857

Closed

	\| <code class="text-nowrap">--scrape.name-escaping-scheme</code> \| Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") \| `values` \|
	\| <code class="text-nowrap">--scrape.name-escaping-scheme</code> \| Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots". \| `values` \|

Conversation

ywwg commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 commented Jul 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codesome commented Aug 1, 2024

Uh oh!

ywwg commented Aug 5, 2024

Uh oh!

beorn7 commented Aug 8, 2024

Uh oh!

ywwg commented Aug 9, 2024

Uh oh!

ywwg commented Aug 13, 2024

Uh oh!

ywwg commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beorn7 commented Aug 14, 2024

Uh oh!

ywwg commented Aug 14, 2024

Uh oh!

beorn7 commented Aug 14, 2024

Uh oh!

ywwg commented Aug 14, 2024

Uh oh!

codesome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ywwg commented Aug 16, 2024

Uh oh!

ywwg commented Aug 16, 2024

Uh oh!

codesome commented Aug 16, 2024

Uh oh!

ywwg commented Jul 18, 2024 •

edited

Loading

ywwg commented Aug 13, 2024 •

edited

Loading

codesome Aug 16, 2024 •

edited

Loading