feat(utf8): utf-8 content negotiation and flags#14490
feat(utf8): utf-8 content negotiation and flags#14490codesome merged 1 commit intoprometheus:mainfrom
Conversation
| // If true, metric names and labels can use the full UTF-8 character set. | ||
| // Otherwise, apply the legacy naming convention. | ||
| // XXXX do we even need this? content negotiation should take care of it | ||
| bool utf8_names = 6; |
There was a problem hiding this comment.
pretty sure I can drop this completely but worth checking
There was a problem hiding this comment.
This ideally has to be discussed in PRW 2.0 forum ideally, we want to make sure the spec is and this is source of truth 🙃 Do we need this? cc @cstyan
There was a problem hiding this comment.
In the spec rw2.0 we already allow for utf8 characters in names and values, and just have SHOULD for the current prometheus name format. Don't think we need this in the proto for 2.0.
There was a problem hiding this comment.
Do we need this for remote write? Can we parse everything and reject series based on the feature flag?
There was a problem hiding this comment.
the content negotiation should resolve whether the metrics will be sent as UTF-8 or escaped. This would cover the case where the remote server could support UTF-8 but has that feature turned off for some reason, and the sender has UTF-8 metrics it is trying to send.
Alternatively we could just have the remote write fail validation and reject the incoming series
|
This in my review queue (but the queue is long, so if somebody beats me to it, even better). |
3310266 to
8e11c13
Compare
7200a78 to
8c3881c
Compare
config/config.go
Outdated
| // 0 means no limit. | ||
| KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"` | ||
| // Allow UTF8 Metric and Label Names | ||
| AllowUTF8Names bool `yaml:"utf8_names,omitempty"` |
There was a problem hiding this comment.
Hm, wonder if "escaping scheme" is not a better config option? Ideally we avoid bools and inflexibility of it.
There was a problem hiding this comment.
so an escaping scheme would imply utf8 names being accepted? I think that would be ok
There was a problem hiding this comment.
Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.
There was a problem hiding this comment.
I just realised that this is a config that users set and will be a breaking change if we choose to change it. Maybe boolean indeed is not the best. How about we have it as metric_name_validation_schema with legacy and utf8 as two options? The code later can choose to convert into boolean for using it for checks.
There was a problem hiding this comment.
so an escaping scheme would imply utf8 names being accepted? I think that would be ok
I would rather be explicit and not change validation schema based on escaping schema at the moment.
| // If true, metric names and labels can use the full UTF-8 character set. | ||
| // Otherwise, apply the legacy naming convention. | ||
| // XXXX do we even need this? content negotiation should take care of it | ||
| bool utf8_names = 6; |
There was a problem hiding this comment.
Do we need this for remote write? Can we parse everything and reject series based on the feature flag?
config/config.go
Outdated
| // 0 means no limit. | ||
| KeepDroppedTargets uint `yaml:"keep_dropped_targets,omitempty"` | ||
| // Allow UTF8 Metric and Label Names | ||
| AllowUTF8Names bool `yaml:"utf8_names,omitempty"` |
There was a problem hiding this comment.
Escaping schema is something different from allowing utf8. I was confused initially as well until I checked out common/model. See this; allowing utf8 is a validation schema and has only two validation schemas (utf8 or legacy). I think we can keep it boolean for now.
cmd/prometheus/main.go
Outdated
| Hidden().Default("5s").SetValue(&cfg.scrape.DiscoveryReloadInterval) | ||
|
|
||
| a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details."). | ||
| a.Flag("scrape.name-escaping-scheme", `Method for escaping legacy invalid names when sending to a Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots")`).Default(scrape.DefaultNameEscapingScheme.String()).StringVar(&cfg.nameEscapingScheme) |
There was a problem hiding this comment.
Are these documented somewhere in the docs? Maybe a good idea to brief each option in the help text as well if possible.
There was a problem hiding this comment.
I should have left this PR as "WIP" -- now that we have a consensus on how to name / arrange the flags, I will create docs for them
There was a problem hiding this comment.
added some docs, please let me know if I should add more
scrape/scrape_test.go
Outdated
| runTest(acceptHeader(config.DefaultProtoFirstScrapeProtocols, false)) | ||
| } | ||
|
|
||
| func TestUTF8TargetScraperScrapeOK(t *testing.T) { |
There was a problem hiding this comment.
I suggest merging this test with TestTargetScraperScrapeOK if possible. You can use booleans similar to protobufParsing in TestTargetScraperScrapeOK to determine if you should have the utf8 header or not when checking.
|
Question: do we ignore escaping schemas if we set validation schema to utf8? |
If both sides of the negotiation have utf8 support enabled, then there will be no escaping, and yes the escaping setting will be ignored. The escaping mode choice is purely for the case where the system receiving the metrics does not support utf8, but the sender (or the target being scraped) does -- it's a backwards-compatibility setting. |
|
I will be addressing the comments and fixing the conflicts soon |
499b186 to
c610d98
Compare
5c23703 to
2cb1827
Compare
|
bryan says we should have utf-8 on by default for this and for v3, so I will update the PR accordingly |
I am not sure the best way to do this -- maybe change the enable-feature flag to "legacy-names" as a way to turn off utf8? |
|
Hmm, I get the argument "switch this on by default in v3", but isn't it dangerous to switch this on by default right now? IIRC what @bboreham said, he even emphasized that this could be seen as a breaking change from a certain PoV. Maybe the assumption was that 2.54 is the last 2.x release? But I wouldn't bet on that right now. (Even if we release v3 at PromCon, we might want to keep separate v2 and v3 branches for a while.) |
|
so should I revert this latest change and then when v3 is updated, apply the flip there? |
|
Yes, that would be my understanding. |
6d2747d to
2cb1827
Compare
|
reverted defaults so it is now default-off again. the real difference between the default is:
|
2cb1827 to
0f9278d
Compare
codesome
left a comment
There was a problem hiding this comment.
Mostly nit comments. One step away from merging!
|
Do you want me to apply suggestions all as single unsquashed configs? Or should I apply them and then rebase/squash so we still only have a single commit? I am not sure what the prometheus standard is. |
f0cc4ad to
9556d70
Compare
|
the commit history to prometheus seems to be squash, so I'll do that |
Multiple commits are fine in the PR. I can squash merge the PR. We don't have a strict rule. We do a normal merge when individual commits makes sense independently in the git history. Squash merge otherwise. |
9556d70 to
3844d20
Compare
docs/command-line/prometheus.md
Outdated
| | <code class="text-nowrap">--query.max-concurrency</code> | Maximum number of queries executed concurrently. Use with server mode only. | `20` | | ||
| | <code class="text-nowrap">--query.max-samples</code> | Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return. Use with server mode only. | `50000000` | | ||
| | <code class="text-nowrap">--enable-feature</code> | Comma separated feature names to enable. Valid options: agent, auto-gomemlimit, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver, created-timestamp-zero-ingestion, concurrent-rule-eval, delayed-compaction. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details. | | | ||
| | <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` | |
There was a problem hiding this comment.
| | <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots") | `values` | | |
| | <code class="text-nowrap">--scrape.name-escaping-scheme</code> | Method for escaping legacy invalid names when sending to Prometheus that does not support UTF-8. Can be one of "values", "underscores", or "dots". | `values` | |
codesome
left a comment
There was a problem hiding this comment.
LGTM with the tiny fix above
Signed-off-by: Owen Williams <[email protected]>
3844d20 to
9e7308d
Compare
Part of #13095
Implement content negotiation and flags for UTF-8 support in Prometheus.