Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rulefmt: support YAML aliases for Alert/Record/Expr #14957

Merged
merged 4 commits into from
Feb 13, 2025

Conversation

Frazew
Copy link
Contributor

@Frazew Frazew commented Sep 21, 2024

Summary

I believe this PR addresses #13158: although the description is quite vague, it's the closest reference to the issue this PR addresses that I could find.

This PR proposes a refactor of rulefmt to fix a surprising behavior when using YAML anchors/aliases to configure alert rules. Although their use is somewhat discouraged in favour of using proper configuration management tools to generate full YAML, they can still be useful in some situations and the following behavior is unexpected: YAML aliases work everywhere but in the name and expr of a rule (see the Example section).

This is due to the explicit use of yaml.Node for Alert/Record/Expr when decoding a RuleNode.

This PR instead proposes to unmarshal everything into the proper final types, and keep the decoding structs only for validation. As a consequence, Rule becomes the type used in RuleGroup instead of RuleNode, which means its Alert/Record/Expr fields are fully unmarshalled, with aliases resolved if needed. This has the side effect of reducing the creep of YAML types in other modules: for example rules/manager.go used r.Expr.Value, the .Value being YAML-specific.

I'm not sure whether this should considered a breaking change or not: I've chosen not to export RuleNode since it would no longer make sense for other modules to use it now, but that could be changed to make sure nothing breaks.

The first commit adds a set of failing tests. The second commit is the proposed implementation (which fixes the tests).

Example

The following configuration illustrates the issue:

groups:
  - name: my-group-name
    interval: 30s
    rules:
      - alert: &alertname HighAlert
        expr: some expression
        for: 5m
        labels:
          severity: &severity critical
      - alert: *alertname
        expr: some other expression
        for: 5m
        labels:
          severity: *severity

In this example, Prometheus will properly register the first rule, but the second rule will have .Alert == "alertname" even though it will properly have .Labels.Severity == "critical".

@Frazew Frazew requested a review from dgl as a code owner September 21, 2024 21:38
@Frazew Frazew marked this pull request as draft September 21, 2024 21:38
@Frazew Frazew marked this pull request as ready for review September 22, 2024 09:30
@github-actions github-actions bot added the stale label Nov 21, 2024
@bboreham
Copy link
Member

Hello from the bug-scrub! Sorry your PR has gone un-reviewed for a while.
We asked in Slack to find someone familiar with the code to review this change.

@github-actions github-actions bot removed the stale label Nov 26, 2024
@Frazew Frazew force-pushed the yaml-support-alias-rule-expr-2 branch 3 times, most recently from 2cbad40 to ce0c672 Compare November 28, 2024 15:30
@Frazew
Copy link
Contributor Author

Frazew commented Nov 28, 2024

Thanks! I've fixed the merge conflicts that had appeared in the meantime

@github-actions github-actions bot added the stale label Jan 29, 2025
Copy link
Member

@dgl dgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response.

I'm not sure whether this should considered a breaking change or not: I've chosen not to export RuleNode since it would no longer make sense for other modules to use it now, but that could be changed to make sure nothing breaks.

Prometheus doesn't guarantee any API stability for its internal packages, I think it makes sense to not export it anymore as you've done, the users of this I see will need changing anyway. The type in RuleGroup changing is what matters and it's much cleaner to do it like this as it doesn't export the YAML details anymore. It should be a simple change.

Frazew and others added 4 commits February 13, 2025 16:05
Altough somewhat discouraged in favour of using proper configuration
management tools to generate full YAML, it can still be useful in some
situations to use YAML anchors/aliases in rules.

The current implementation is however confusing: aliases will work
everywhere except on the alert/record name and expr

This first commit adds (failing) tests to illustrate the issue, the next
one fixes it. The YAML test file is intentionally filled with anchors
and aliases. Although this is probably not representative of a real-world
use case (which would have less of them), it errs on the safer side.

Signed-off-by: François HORTA <[email protected]>
This fixes the use of YAML aliases in alert/recording rule names and
expressions. A side effect of this change is that the RuleNode YAML type is
no longer propagated deeper in the codebase, instead the generic Rule type
can now be used.

Signed-off-by: François HORTA <[email protected]>
Currently this does work, but adding a test for the related
functionally here makes sense.

Signed-off-by: David Leadbeater <[email protected]>
Signed-off-by: David Leadbeater <[email protected]>
@dgl dgl force-pushed the yaml-support-alias-rule-expr-2 branch from 8f8aaa5 to 5d0b06c Compare February 13, 2025 05:12
@dgl dgl merged commit 9b4c8f6 into prometheus:main Feb 13, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants