Evaluate independent rules concurrently by dannykopping · Pull Request #12946 · prometheus/prometheus

dannykopping · 2023-10-06T16:51:15Z

This PR aims to improve the problem of missed group evaluations by parallelizing work which can be parallelized to prevent groups from overrunning their interval.

Groups execute concurrently, while their constituent rules execute sequentially. Rules are evaluated sequentially because rules can have dependencies on one another. If one rule depends on the result of another rule, it is placed later in the order of execution by the operator.

However when no explicit relationships exists between rules (i.e. a rule does not depend on the resulting metric from a preceding recording rule), there is no reason to run these rules sequentially.

This PR determines if an explicit relationship can be found between two rules and, if not, it will evaluate the rule concurrently. The group will still wait for all rules to complete in order to maintain its current behaviour (i.e. if all rules cumulatively overrun the group interval, skip the next interval). This is relatively simple: enumerate all metrics used in each rule and if any of those are metrics produced by a recording rule, neither rule can execute concurrently.

This change does introduce some risk that multiple rules evaluating concurrently could put more bursty strain on the query engine. We may want to introduce some jitter, but wanted to open this up for discussion first.

I've implemented bounded concurrency, which would allow operators to control how much additional query load this would put on their Prometheus server. If the concurrency limit is in force, rules will evaluate sequentially as before.

roidelapluie · 2023-10-06T16:59:36Z

Thank you for your contribution.

Historically, Prometheus has ensured a sequential execution of rules, which is a significant aspect to consider. Introducing parallel evaluation could potentially lead to scenarios where large groups dominate the full concurrency lock of the PromQL engine, as opposed to the current guarantee of a maximum of one per group.

A possible alternative could be enhancing the promtool lint to provide suggestions on rule splitting, rather than attempting to deduce rule independence which might introduce complexity.

Additionally, certain rules are dependent on the ALERT and ALERTS_* metrics, and it's unclear whether this proposal has accounted for such dependencies.

dannykopping · 2023-10-06T17:09:27Z

Introducing parallel evaluation could potentially lead to scenarios where large groups dominate the full concurrency lock of the PromQL engine, as opposed to the current guarantee of a maximum of one per group.

Absolutely, I call this out in the description as well. I think this is a solvable problem, though. For example, we could limit the overall concurrency of the rule engine, introduce a priority queue, cooperative scheduling, etc. Would you be open to discussing those possibilities?

Additionally, certain rules are dependent on the ALERT and ALERTS_* metrics, and it's unclear whether this proposal has accounted for such dependencies.

Can you elaborate for me why this would be a problem? Do you mean a recording rule ordered after an alerting rule could rely on the changes to ALERTS.* for some behaviour? If so, I think we can mark that group as indeterminate which will force it to be synchronous (like if a rule does a wildcard metric query).

roidelapluie · 2023-10-06T18:35:59Z

I am not against this change but I think it should be clear to everyone that this can't be the default behaviour in Prometheus 2.x. I would also tend for it not to be the default in 3.x.

dannykopping · 2023-10-06T18:44:27Z

That's fair. It's currently behind a flag in any case so the current behaviour will not be affected.

cmd/prometheus/main.go

dannykopping · 2023-10-08T09:19:34Z

@roidelapluie can you please take another look? I've added bounded concurrency; I think this should work quite nicely.
If the concurrency limit is in force, rules will evaluate sequentially as before.

dannykopping · 2023-10-08T10:51:18Z

Additionally, certain rules are dependent on the ALERT and ALERTS_* metrics, and it's unclear whether this proposal has accounted for such dependencies.

Fixed in cf135c8

rules/group.go

dannykopping · 2023-10-25T21:08:28Z

NOTE: I have rebased by branch after some refactoring I did in #13014

I also took the opportunity to add a feature flag after discussing with @bboreham, and made the concurrency control mechanism simpler.

juliusv · 2023-11-02T20:18:09Z

Relevant notes from the past that I don't see mentioned here yet:

Detecting dependencies between rules first came up in Topological sorting of rules #17 in 2015 and was closed as won't-do after we introduced rule groups that allowed manual concurrency (doesn't mean we shouldn't attempt automatic concurrency now).
In rare cases, queries can return data that looks totally different from their selectors. See https://www.robustperception.io/promql-queries-against-sql-databases-using-a-read-adapter/ for an example. Unfortunately the relevant screenshot link in the blog post is dead now, but it showed putting an SQL query into a label value, and the result would be metrics resulting from the evaluation of that SQL query on the remote end.

That second point would strictly make it impossible to say for sure whether one rule depends on another, but it's a rare case for sure.

gouthamve

Haven't taken a close look, but it looks OK from first glance.

How do we handle cases where the query itself overrides the metric name? If the expr: label_replace(<EXPR>, "custom_name","$1","__name__",".+")

gouthamve · 2023-11-09T13:44:10Z

rules/fixtures/rules_multiple_groups.yaml

+      - record: TooManyHTTPRequests
+        expr: job:http_requests:rate15m > 100
+
+  - name: grpc


Can we have them spliced in one test?

As in:

job:http_requests:rate1m

job:http_requests:rate15m

job:http_requests:rate5m

TooManyHTTPRequests

@gouthamve I'm not understanding what you're asking, can you elaborate please?
They're all part of the same group and tested in TestBoundedRuleEvalConcurrency.

dannykopping · 2023-11-10T18:21:58Z

How do we handle cases where the query itself overrides the metric name? If the expr: label_replace(<EXPR>, "custom_name","$1","__name__",".+")

Is this valid in PromQL? I tried it but it doesn't seem to work.

It'll be trivial to add a case for it though, if necessary.

dannykopping · 2023-11-27T12:00:29Z

@roidelapluie @gouthamve following up on this, could I please request a review? Both the Loki & Mimir projects are pretty keen to see this merged so we can vendor it in to our projects.

As an aside, the Mimir team did an analysis and found that only 1% of their customers' rules have dependent rules. I think this is a good sign that this change will have a large impact with minimal disruption.

beorn7 · 2023-11-28T16:00:50Z

About the name changing via label_replace mentioned above: The syntax wasn't quite right, but it's possible. For example:

label_replace(up{container="prometheus"}, "__name__", "feep", "__name__", ".+")

The result is metrics with the name feep instead of up.

So I guess the moment some __name__ manipulation shows up, concurrent rule evaluation should be disabled.

beorn7 · 2023-11-28T16:02:03Z

About remote-read: Remote read can do weird things with a bunch of assumptions. Maybe let's not worry too much about it but make clear that users should only enable the feature flag when they know that no remote-read endpoint doing those weird things is involved.

juliusv · 2023-11-28T16:19:26Z

@beorn7 Doesn't the name always get overwritten by the rule name anyway? I assume this was about the dependency rule, not the dependent (where we only care about the actually selected data and not how it is renamed either).

beorn7 · 2023-11-28T16:37:28Z

I think you are right @juliusv . If recording rules always create a series with the rule name as the same, then the __name__ mangling isn't a problem.

beorn7 · 2023-11-29T16:39:29Z

Sounds like this is ready for a code-level review then.
@gouthamve @juliusv @roidelapluie could any of you take that?

I am not very familiar with the rules engine and have a huge review backlog already…

pracucci

Nice work Danny! The overall design LGTM. I left several minor comments and couple of major one.

pracucci · 2024-01-03T16:24:27Z

docs/feature_flags.md

+Rule groups execute concurrently, but the rules within a group execute sequentially; this is because rules can use the
+output of a preceding rule as its input. However, if there is no detectable relationship between rules then there is no
+reason to run them sequentially. This can improve rule reliability at the expense of adding more concurrent query
+load. The number of concurrent rule evaluations can be configured with `--rules.max-concurrent-rule-evals` which is set
+to `4` by default.


[nit] I would clarify you're talking about the default behaviour, not when the feature flag is enabled.

Suggested change

Rule groups execute concurrently, but the rules within a group execute sequentially; this is because rules can use the

output of a preceding rule as its input. However, if there is no detectable relationship between rules then there is no

reason to run them sequentially. This can improve rule reliability at the expense of adding more concurrent query

load. The number of concurrent rule evaluations can be configured with `--rules.max-concurrent-rule-evals` which is set

to `4` by default.

By default, rule groups execute concurrently, but the rules within a group execute sequentially; this is because rules can use the

output of a preceding rule as its input. However, if there is no detectable relationship between rules then there is no

reason to run them sequentially.

When the `concurrent-rule-eval` feature flag is enabled, rules without any dependency on other rules within a rule group will be evaluated concurrently.

This can improve rule reliability at the expense of adding more concurrent query load. The number of concurrent rule evaluations can be configured

with `--rules.max-concurrent-rule-evals` which is set to `4` by default.

rules/manager.go

pracucci · 2024-01-03T16:45:21Z

rules/manager.go

+	defer c.mu.Unlock()
+
+	// Clear out the memoized dependency maps because some or all groups may have been updated.
+	c.depMaps = map[*Group]dependencyMap{}


[question] Given groups are indexed by pointer in the map, and given whenever a group config changes a new Group is created, can you confirm the concurrentRuleEvalController .Invalidate() purpose (I mean the actual implementation, not the interface) is to clear the cached dep map to avoid it growing indefinitely, rather than avoiding stale cache issues? I want to make sure I correctly understand it.

Yes that's correct, to avoid the map growing indefinitely.

pracucci · 2024-01-03T16:56:31Z

rules/manager.go

 	defer m.mtx.Unlock()

+	if m.opts.RuleConcurrencyController != nil {
+		m.opts.RuleConcurrencyController.Invalidate()


The Invalidate() contract (from the interface doc) is:

// Invalidate instructs the controller to invalidate its state. // This should be called when groups are modified (during a reload, for instance), because the controller may // store some state about each group in order to more efficiently determine rule eligibility.

I think calling Invalidate() here (even while the m.mtx is locked) doesn't guarantee race conditions in stale cached data, because old rule groups will keep being evaluated.

For example, consider this scenario:

The controller caches the group based on the rule group filepath

Manager.Update() calls Invalidate()

The old rule group at filepath X evaluates, calling RuleEligible(), the info is cached

Manager.Update() reaches the point (below) where the old rule group is stopped and the new is loaded

The next RuleEligible() call for the rule group at filepath X will return stale data

I intentionally picked "The controller caches the group based on the rule group filepath" example, because I think your implementation concurrentRuleEvalController doesn't suffer this problem, given it caches the Group by pointer.

However, since RuleConcurrencyController can be configured in the options and the interface contract doesn't say how a Group is supposed to be cached, I think it may lead to a potential misbehaviour by upstream projects.

I think solving this problem here in Manager.Update() is non trivial. Even if you move the call to Invalidate() below, there will be such race condition anyway.

I'm wondering if we should change the RuleConcurrencyController interface contract instead.

I'm wondering if we should change the RuleConcurrencyController interface contract instead.

I may argue that we don't need Invalidate() at all. The controller implementation may use Group.Equals() to check whether the input Group is equal to the cached one. However, the annoying thing would be removing stale entries from the cache (e.g. you may need a LRU cache, instead of a very simple map like you're doing in concurrentRuleEvalController).

We agreed to leave this as-is for now, and follow up with a subsequent change; we'll move the independent flag to the rule itself at first evaluation.

we'll move the independent flag to the rule itself at first evaluation

I will take care of that follow up PR, if this PR will get accepted.

See my proposal here: #13527

rules/group.go

rules/manager_test.go

pracucci · 2024-01-03T18:18:33Z

rules/manager_test.go

+		},
+	}
+
+	inflightTracker := func(ctx context.Context) {


Why do we need this at all? Can't we just keep track of the high watermark in QueryFunc(), right after the call to inflightQueries.Add(1)?

pracucci

Nice work @dannykopping ! The changes LGTM. I've started testing these changes in Mimir, as part of a bigger work we're doing and I haven't found any issue.

pracucci · 2024-01-19T10:42:19Z

rules/manager.go

 	defer m.mtx.Unlock()

+	if m.opts.RuleConcurrencyController != nil {
+		m.opts.RuleConcurrencyController.Invalidate()


we'll move the independent flag to the rule itself at first evaluation

I will take care of that follow up PR, if this PR will get accepted.

gotjosh

Overall, this LGTM code-wise! Great job @dannykopping on making this complicated change easy to follow.

Most of my suggestions are very minor nits that you can choose to implement or not. The only ones I expect to be addressed are the comments and the extra unit tests for the dependency map building.

@dannykopping, there's one more thing - could we move the code within manager.go and group.go to a separate file? Given this behaviour is completely optional, I don't think someone looking to understand or modify rule evaluation within Prometheus should have a need to read this unless they want to cover that specific use-case.

cmd/prometheus/main.go

docs/feature_flags.md

rules/group.go

gotjosh · 2024-01-23T18:18:33Z

rules/group.go

+		}
 	}
+
 	if g.metrics != nil {


Great question, we already have a nil check in in line 97:

if metrics == nil { metrics = NewGroupMetrics(o.Opts.Registerer) }

This should be removed.

rules/group.go

rules/manager.go

NOTE: Rebased from main after refactor in prometheus#13014 Signed-off-by: Danny Kopping <[email protected]>

Signed-off-by: Danny Kopping <[email protected]>

…ch request Signed-off-by: Danny Kopping <[email protected]>

Signed-off-by: Danny Kopping <[email protected]>

Updated & added tests Review feedback nits Return empty map if not indeterminate Use highWatermark to track inflight requests counter Appease the linter Clarify feature flag Signed-off-by: Danny Kopping <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

…aluation has done Signed-off-by: Marco Pracucci <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

pracucci · 2024-01-29T09:27:48Z

rules/group.go

 	}

+	concurrencyController := o.Opts.RuleConcurrencyController
+	if concurrencyController == nil {


Note to reviewers: this is useful for unit tests, to having having to explicitly set it everywhere.

pracucci · 2024-01-29T09:28:25Z

rules/group.go

 			}
-		}(i, rule)
-	}
-	if g.metrics != nil {


Note to reviewers: the check has removed because superfluous.

gotjosh

LGTM - thank you for taking the time to address my comments!

gotjosh · 2024-01-29T10:50:05Z

I'll let this bake for the next 24 hours in case any other maintainers have any more thoughts. The way that @pracucci has chosen to handle the comments (addressing each comment by commit) indicates the direction of the feature has not changed too much since the past couple of weeks.

roidelapluie reviewed Oct 6, 2023

View reviewed changes

cmd/prometheus/main.go Outdated Show resolved Hide resolved

dannykopping marked this pull request as ready for review October 8, 2023 09:21

beorn7 requested a review from gouthamve October 24, 2023 11:49

dannykopping force-pushed the dannykopping/rule-deps branch from cf135c8 to 8f0ac08 Compare October 25, 2023 20:34

dannykopping commented Oct 25, 2023

View reviewed changes

rules/group.go Outdated Show resolved Hide resolved

dannykopping force-pushed the dannykopping/rule-deps branch 3 times, most recently from d9adbf4 to e5eac61 Compare October 28, 2023 11:17

gouthamve reviewed Nov 9, 2023

View reviewed changes

pracucci reviewed Jan 3, 2024

View reviewed changes

dannykopping force-pushed the dannykopping/rule-deps branch from e75c48d to 5f0aa5f Compare January 5, 2024 20:53

This was referenced Jan 16, 2024

Add conditional read after write for rules grafana/mimir-prometheus#584

Draft

Add conditional read-after–write support to rules evaluation grafana/mimir#7142

Merged

pracucci approved these changes Jan 19, 2024

View reviewed changes

pracucci mentioned this pull request Jan 19, 2024

rules.Manager: check whether a Rule has dependents or dependencies and expose it through RuleDetail grafana/mimir-prometheus#587

Merged

gotjosh reviewed Jan 23, 2024

View reviewed changes

pracucci mentioned this pull request Jan 26, 2024

Evaluate independent rules concurrently #13476

Closed

This comment was marked as outdated.

Sign in to view

dannykopping and others added 10 commits January 29, 2024 10:07

Implementation

940f83a

NOTE: Rebased from main after refactor in prometheus#13014 Signed-off-by: Danny Kopping <[email protected]>

Add feature flag

ed2933c

Signed-off-by: Danny Kopping <[email protected]>

Refactor concurrency control

e7758d1

Signed-off-by: Danny Kopping <[email protected]>

Optimising dependencies/dependents funcs to not produce new slices ea…

0dc7036

…ch request Signed-off-by: Danny Kopping <[email protected]>

Refactoring

94cdfa3

Signed-off-by: Danny Kopping <[email protected]>

Rename flag

5bda333

Signed-off-by: Danny Kopping <[email protected]>

Refactoring for performance, and to allow controller to be overridden

f922534

Signed-off-by: Danny Kopping <[email protected]>

Fix typo in CLI flag description

ac1c6eb

Signed-off-by: Marco Pracucci <[email protected]>

Fixed auto-generated doc

6bbb03b

Signed-off-by: Marco Pracucci <[email protected]>

pracucci force-pushed the dannykopping/rule-deps branch from 8d429da to 6bbb03b Compare January 29, 2024 09:14

pracucci added 7 commits January 29, 2024 10:15

Improve doc

1bb341f

Signed-off-by: Marco Pracucci <[email protected]>

Simplify the design to update concurrency controller once the rule ev…

21a03dc

…aluation has done Signed-off-by: Marco Pracucci <[email protected]>

Add more test cases to TestDependenciesEdgeCases

52bc568

Signed-off-by: Marco Pracucci <[email protected]>

Added more test cases to TestDependenciesEdgeCases

2764c46

Signed-off-by: Marco Pracucci <[email protected]>

Improved RuleConcurrencyController interface doc

23f89c1

Signed-off-by: Marco Pracucci <[email protected]>

Introduced sequentialRuleEvalController

046cd75

Signed-off-by: Marco Pracucci <[email protected]>

Remove superfluous nil check in Group.metrics

cbbbd6e

Signed-off-by: Marco Pracucci <[email protected]>

pracucci reviewed Jan 29, 2024

View reviewed changes

gotjosh approved these changes Jan 29, 2024

View reviewed changes

gotjosh merged commit ae2852b into prometheus:main Jan 30, 2024

This was referenced Feb 2, 2024

Sync prometheus grafana/mimir-prometheus#594

Merged

Decouple ruler dependency controller from concurrency controller #13527

Merged

darshanime mentioned this pull request Feb 2, 2024

Proposal: Concurrent evaluation of alert rules in an alert group #11737

Closed

Conversation

dannykopping commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roidelapluie commented Oct 6, 2023

Uh oh!

dannykopping commented Oct 6, 2023

Uh oh!

roidelapluie commented Oct 6, 2023

Uh oh!

dannykopping commented Oct 6, 2023

Uh oh!

Uh oh!

dannykopping commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dannykopping commented Oct 8, 2023

Uh oh!

Uh oh!

dannykopping commented Oct 25, 2023

Uh oh!

juliusv commented Nov 2, 2023

Uh oh!

gouthamve left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dannykopping commented Nov 10, 2023

Uh oh!

dannykopping commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beorn7 commented Nov 28, 2023

Uh oh!

beorn7 commented Nov 28, 2023

Uh oh!

juliusv commented Nov 28, 2023

Uh oh!

beorn7 commented Nov 28, 2023

Uh oh!

beorn7 commented Nov 29, 2023

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pracucci Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pracucci Feb 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

pracucci Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

dannykopping commented Oct 6, 2023 •

edited

Loading

dannykopping commented Oct 8, 2023 •

edited

Loading

dannykopping commented Nov 27, 2023 •

edited

Loading

pracucci Jan 19, 2024 •

edited

Loading

pracucci Feb 2, 2024 •

edited

Loading

pracucci Jan 19, 2024 •

edited

Loading

gotjosh left a comment •

edited

Loading