feat: distribute memberlist rejoins over time for different processes starting at the same time#892
feat: distribute memberlist rejoins over time for different processes starting at the same time#892
Conversation
… starting at the same time Signed-off-by: Marco Pracucci <[email protected]>
|
Bugbot Autofix prepared a fix for the bug found in the latest run.
Or push these changes by commenting: Preview (a2ae71bbcc)diff --git a/timeutil/variable_ticker_test.go b/timeutil/variable_ticker_test.go
--- a/timeutil/variable_ticker_test.go
+++ b/timeutil/variable_ticker_test.go
@@ -22,10 +22,10 @@
ticks = append(ticks, <-tickerChan)
}
- tolerance := 250 * time.Millisecond
- assert.InDelta(t, ticks[0].Sub(startTime).Seconds(), 1*time.Second.Seconds(), float64(tolerance))
- assert.InDelta(t, ticks[1].Sub(startTime).Seconds(), 3*time.Second.Seconds(), float64(tolerance))
- assert.InDelta(t, ticks[2].Sub(startTime).Seconds(), 5*time.Second.Seconds(), float64(tolerance))
+ tolerance := 250 * time.Millisecond
+ assert.InDelta(t, ticks[0].Sub(startTime).Seconds(), 1*time.Second.Seconds(), tolerance.Seconds())
+ assert.InDelta(t, ticks[1].Sub(startTime).Seconds(), 3*time.Second.Seconds(), tolerance.Seconds())
+ assert.InDelta(t, ticks[2].Sub(startTime).Seconds(), 5*time.Second.Seconds(), tolerance.Seconds())
})
t.Run("should not close the channel on stop function called", func(t *testing.T) { |
Signed-off-by: Marco Pracucci <[email protected]>
| "time" | ||
| ) | ||
|
|
||
| // NewVariableTicker wrap time.Ticker to Reset() the ticker with the next duration (picked from |
There was a problem hiding this comment.
Is it necessary to create this type with goroutines and channels to have an initial delay and a periodic delay? Could we combine time.NewTimer and time.NewTicker instead?
There was a problem hiding this comment.
Maybe we can do it in a follow up PR. This code is what we're already using since a long time in Mimir, so looks low risk to just move it from Mimir to dskit.
There was a problem hiding this comment.
Merging this PR because I need to get this stuff in by EOW (need to fix an high inter-AZ data transfer caused by memberlist join) but I'm not ignoring your feedback. Added it to my todo list and will re-iterate in a follow up PR.
| tickerChan = t.C | ||
| // Use a random initial delay between 0 and RejoinInterval to uniformly | ||
| // distribute rejoins across time when multiple processes start simultaneously. | ||
| initialDelay := 1 + time.Duration(math_rand.Int63n(int64(m.cfg.RejoinInterval))) |
There was a problem hiding this comment.
We could also do the Prometheus rule-eval thing here and do a hash of the node name modulo the RejoinInterval for the initial delay.
There was a problem hiding this comment.
What would be the benefit? Looks a complication to me.
There was a problem hiding this comment.
It would be deterministic. It doesn't seem complicated to me, it'd just be a hash of the node name from config.
There was a problem hiding this comment.
Not sure I get the point. The node name is random (we add a random suffix). For example see:
The last part of the node random is just a random suffix. It changes on each startup. We could hash without the suffix, but pod IDs are non deterministic too (change with each rollout), so it would be deterministic just for statefulsets.
#### What this PR does Update dskit to get a couple of changes to memberlist client: - grafana/dskit#888 - grafana/dskit#892 #### Which issue(s) this PR fixes or relates to N/A #### Checklist - [ ] Tests updated. - [x] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. Signed-off-by: Marco Pracucci <[email protected]>

What this PR does:
When multiple processes start at the same time (e.g., during a rolling restart or scale-up), they would all attempt to rejoin seed nodes nearly at the same intervals, creating a thundering herd effect. This PR introduces a random initial delay (between 0 and
RejoinInterval) before the first rejoin attempt, uniformly distributing rejoins across time.The variable ticker implementation has been copied from Mimir. Once this PR will be merged, I will update dskit in Mimir and remove the variable ticker implementation from there.
Here periodic push/pull rate caused by rejoin before and after the jitter rollout in a test Mimir cluster (before the rollout you can see it spikes every 5m, which is the configured rejoin interval):
Which issue(s) this PR fixes:
N/A
Checklist