Skip to content

feat(utf8): utf-8 content negotiation and flags#14490

Merged
codesome merged 1 commit intoprometheus:mainfrom
ywwg:owilliams/utf8-02-upstream
Aug 16, 2024
Merged

feat(utf8): utf-8 content negotiation and flags#14490
codesome merged 1 commit intoprometheus:mainfrom
ywwg:owilliams/utf8-02-upstream

Conversation

@ywwg
Copy link
Copy Markdown
Member

@ywwg ywwg commented Jul 18, 2024

Part of #13095

Implement content negotiation and flags for UTF-8 support in Prometheus.

// If true, metric names and labels can use the full UTF-8 character set.
// Otherwise, apply the legacy naming convention.
// XXXX do we even need this? content negotiation should take care of it
bool utf8_names = 6;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure I can drop this completely but worth checking

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ideally has to be discussed in PRW 2.0 forum ideally, we want to make sure the spec is and this is source of truth 🙃 Do we need this? cc @cstyan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec rw2.0 we already allow for utf8 characters in names and values, and just have SHOULD for the current prometheus name format. Don't think we need this in the proto for 2.0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this for remote write? Can we parse everything and reject series based on the feature flag?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the content negotiation should resolve whether the metrics will be sent as UTF-8 or escaped. This would cover the case where the remote server could support UTF-8 but has that feature turned off for some reason, and the sender has UTF-8 metrics it is trying to send.

Alternatively we could just have the remote write fail validation and reject the incoming series

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Jul 23, 2024

This in my review queue (but the queue is long, so if somebody beats me to it, even better).

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch 5 times, most recently from 3310266 to 8e11c13 Compare August 1, 2024 14:33
@ywwg ywwg changed the title DRAFT WIP: utf8 content negotiation / flags feat(utf8): utf8 content negotiation and flags Aug 1, 2024
@ywwg ywwg marked this pull request as ready for review August 1, 2024 14:36
@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch 3 times, most recently from 7200a78 to 8c3881c Compare August 1, 2024 14:47
config/config.go Outdated
// 0 means no limit.
KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"`
// Allow UTF8 Metric and Label Names
AllowUTF8Names bool `yaml:"utf8_names,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, wonder if "escaping scheme" is not a better config option? Ideally we avoid bools and inflexibility of it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so an escaping scheme would imply utf8 names being accepted? I think that would be ok

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that this is a config that users set and will be a breaking change if we choose to change it. Maybe boolean indeed is not the best. How about we have it as metric_name_validation_schema with legacy and utf8 as two options? The code later can choose to convert into boolean for using it for checks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so an escaping scheme would imply utf8 names being accepted? I think that would be ok

I would rather be explicit and not change validation schema based on escaping schema at the moment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good

// If true, metric names and labels can use the full UTF-8 character set.
// Otherwise, apply the legacy naming convention.
// XXXX do we even need this? content negotiation should take care of it
bool utf8_names = 6;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this for remote write? Can we parse everything and reject series based on the feature flag?

config/config.go Outdated
// 0 means no limit.
KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"`
// Allow UTF8 Metric and Label Names
AllowUTF8Names bool `yaml:"utf8_names,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.

Hidden().Default("5s").SetValue(&cfg.scrape.DiscoveryReloadInterval)

a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.").
a.Flag("scrape.name-escaping-scheme", `Method for escaping legacy invalid names when sending to a Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots")`).Default(scrape.DefaultNameEscapingScheme.String()).StringVar(&cfg.nameEscapingScheme)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these documented somewhere in the docs? Maybe a good idea to brief each option in the help text as well if possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have left this PR as "WIP" -- now that we have a consensus on how to name / arrange the flags, I will create docs for them

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some docs, please let me know if I should add more

runTest(acceptHeader(config.DefaultProtoFirstScrapeProtocols, false))
}

func TestUTF8TargetScraperScrapeOK(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest merging this test with TestTargetScraperScrapeOK if possible. You can use booleans similar to protobufParsing in TestTargetScraperScrapeOK to determine if you should have the utf8 header or not when checking.

@codesome
Copy link
Copy Markdown
Member

codesome commented Aug 1, 2024

Question: do we ignore escaping schemas if we set validation schema to utf8?

@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 5, 2024

Question: do we ignore escaping schemas if we set validation schema to utf8?

If both sides of the negotiation have utf8 support enabled, then there will be no escaping, and yes the escaping setting will be ignored. The escaping mode choice is purely for the case where the system receiving the metrics does not support utf8, but the sender (or the target being scraped) does -- it's a backwards-compatibility setting.

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Aug 8, 2024

@ywwg thanks for doing this and thanks @cstyan @codesome @bwplotka for commenting.
I assume this PR has enough qualified reviewers and will focus my review activity elsewhere.
Let me know if you need anything from me here.

@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 9, 2024

I will be addressing the comments and fixing the conflicts soon

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch 2 times, most recently from 499b186 to c610d98 Compare August 12, 2024 15:39
@ywwg ywwg marked this pull request as draft August 12, 2024 15:39
@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch from 5c23703 to 2cb1827 Compare August 12, 2024 18:10
@ywwg ywwg changed the title feat(utf8): utf8 content negotiation and flags feat(utf8): utf-8 content negotiation and flags Aug 13, 2024
@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 13, 2024

bryan says we should have utf-8 on by default for this and for v3, so I will update the PR accordingly

@ywwg ywwg requested a review from codesome August 13, 2024 15:40
@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 13, 2024

bryan says we should have utf-8 on by default for this and for v3, so I will update the PR accordingly

I am not sure the best way to do this -- maybe change the enable-feature flag to "legacy-names" as a way to turn off utf8?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Aug 14, 2024

Hmm, I get the argument "switch this on by default in v3", but isn't it dangerous to switch this on by default right now? IIRC what @bboreham said, he even emphasized that this could be seen as a breaking change from a certain PoV.

Maybe the assumption was that 2.54 is the last 2.x release? But I wouldn't bet on that right now. (Even if we release v3 at PromCon, we might want to keep separate v2 and v3 branches for a while.)

@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 14, 2024

so should I revert this latest change and then when v3 is updated, apply the flip there?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Aug 14, 2024

Yes, that would be my understanding.

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch from 6d2747d to 2cb1827 Compare August 14, 2024 17:43
@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 14, 2024

reverted defaults so it is now default-off again.

the real difference between the default is:

  1. a blank/default value for metric_name_validation_scheme flips from "legacy" to "utf8"
  2. enable-feature flag changes from utf8-names to legacy-names
  3. init() function sets validation mode in common/model to utf8. (If there is a v3 branch of common we can change the default there as well)

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch from 2cb1827 to 0f9278d Compare August 14, 2024 18:54
Copy link
Copy Markdown
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nit comments. One step away from merging!

@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 16, 2024

Do you want me to apply suggestions all as single unsquashed configs? Or should I apply them and then rebase/squash so we still only have a single commit? I am not sure what the prometheus standard is.

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch 2 times, most recently from f0cc4ad to 9556d70 Compare August 16, 2024 14:48
@ywwg
Copy link
Copy Markdown
Member Author

ywwg commented Aug 16, 2024

the commit history to prometheus seems to be squash, so I'll do that

@codesome
Copy link
Copy Markdown
Member

Do you want me to apply suggestions all as single unsquashed configs? Or should I apply them and then rebase/squash so we still only have a single commit? I am not sure what the prometheus standard is.

Multiple commits are fine in the PR. I can squash merge the PR. We don't have a strict rule. We do a normal merge when individual commits makes sense independently in the git history. Squash merge otherwise.

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch from 9556d70 to 3844d20 Compare August 16, 2024 17:41
@ywwg ywwg requested a review from codesome August 16, 2024 19:06
| <code class="text-nowrap">--query.max-concurrency</code> | Maximum number of queries executed concurrently. Use with server mode only. | `20` |
| <code class="text-nowrap">--query.max-samples</code> | Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return. Use with server mode only. | `50000000` |
| <code class="text-nowrap">--enable-feature</code> | Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval, delayed-compaction. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details. | |
| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` |
Copy link
Copy Markdown
Member

@codesome codesome Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` |
| <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots". | `values` |

codesome
codesome previously approved these changes Aug 16, 2024
Copy link
Copy Markdown
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the tiny fix above

@ywwg ywwg force-pushed the owilliams/utf8-02-upstream branch from 3844d20 to 9e7308d Compare August 16, 2024 20:41
@codesome codesome enabled auto-merge August 16, 2024 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants