Skip to content

feat: distribute memberlist rejoins over time for different processes starting at the same time#892

Merged
pracucci merged 2 commits intomainfrom
add-rejoin-jitter
Jan 30, 2026
Merged

feat: distribute memberlist rejoins over time for different processes starting at the same time#892
pracucci merged 2 commits intomainfrom
add-rejoin-jitter

Conversation

@pracucci
Copy link
Copy Markdown
Contributor

@pracucci pracucci commented Jan 30, 2026

What this PR does:

When multiple processes start at the same time (e.g., during a rolling restart or scale-up), they would all attempt to rejoin seed nodes nearly at the same intervals, creating a thundering herd effect. This PR introduces a random initial delay (between 0 and RejoinInterval) before the first rejoin attempt, uniformly distributing rejoins across time.

The variable ticker implementation has been copied from Mimir. Once this PR will be merged, I will update dskit in Mimir and remove the variable ticker implementation from there.

Here periodic push/pull rate caused by rejoin before and after the jitter rollout in a test Mimir cluster (before the rollout you can see it spikes every 5m, which is the configured rejoin interval):

542848503-e36a8611-7caf-41d0-8423-2b5a8c5f9624

Which issue(s) this PR fixes:

N/A

Checklist

  • Tests updated

… starting at the same time

Signed-off-by: Marco Pracucci <[email protected]>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

Comment thread timeutil/variable_ticker_test.go Outdated
@cursor
Copy link
Copy Markdown

cursor bot commented Jan 30, 2026

Bugbot Autofix prepared a fix for the bug found in the latest run.

  • ✅ Fixed: Test tolerance uses wrong units, test ineffective
    • Changed float64(tolerance) to tolerance.Seconds() to convert the 250ms tolerance from nanoseconds to seconds (0.25), matching the units used in the test assertions.

Create PR

Or push these changes by commenting:

@cursor push a2ae71bbcc
Preview (a2ae71bbcc)
diff --git a/timeutil/variable_ticker_test.go b/timeutil/variable_ticker_test.go
--- a/timeutil/variable_ticker_test.go
+++ b/timeutil/variable_ticker_test.go
@@ -22,10 +22,10 @@
 			ticks = append(ticks, <-tickerChan)
 		}
 
-		tolerance := 250 * time.Millisecond
-		assert.InDelta(t, ticks[0].Sub(startTime).Seconds(), 1*time.Second.Seconds(), float64(tolerance))
-		assert.InDelta(t, ticks[1].Sub(startTime).Seconds(), 3*time.Second.Seconds(), float64(tolerance))
-		assert.InDelta(t, ticks[2].Sub(startTime).Seconds(), 5*time.Second.Seconds(), float64(tolerance))
+	tolerance := 250 * time.Millisecond
+	assert.InDelta(t, ticks[0].Sub(startTime).Seconds(), 1*time.Second.Seconds(), tolerance.Seconds())
+	assert.InDelta(t, ticks[1].Sub(startTime).Seconds(), 3*time.Second.Seconds(), tolerance.Seconds())
+	assert.InDelta(t, ticks[2].Sub(startTime).Seconds(), 5*time.Second.Seconds(), tolerance.Seconds())
 	})
 
 	t.Run("should not close the channel on stop function called", func(t *testing.T) {

Signed-off-by: Marco Pracucci <[email protected]>
"time"
)

// NewVariableTicker wrap time.Ticker to Reset() the ticker with the next duration (picked from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to create this type with goroutines and channels to have an initial delay and a periodic delay? Could we combine time.NewTimer and time.NewTicker instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do it in a follow up PR. This code is what we're already using since a long time in Mimir, so looks low risk to just move it from Mimir to dskit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this PR because I need to get this stuff in by EOW (need to fix an high inter-AZ data transfer caused by memberlist join) but I'm not ignoring your feedback. Added it to my todo list and will re-iterate in a follow up PR.

tickerChan = t.C
// Use a random initial delay between 0 and RejoinInterval to uniformly
// distribute rejoins across time when multiple processes start simultaneously.
initialDelay := 1 + time.Duration(math_rand.Int63n(int64(m.cfg.RejoinInterval)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do the Prometheus rule-eval thing here and do a hash of the node name modulo the RejoinInterval for the initial delay.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefit? Looks a complication to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be deterministic. It doesn't seem complicated to me, it'd just be a hash of the node name from config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get the point. The node name is random (we add a random suffix). For example see:

Screenshot 2026-01-30 at 16 03 01

The last part of the node random is just a random suffix. It changes on each startup. We could hash without the suffix, but pod IDs are non deterministic too (change with each rollout), so it would be deterministic just for statefulsets.

@pracucci pracucci merged commit cbd6d52 into main Jan 30, 2026
12 checks passed
@pracucci pracucci deleted the add-rejoin-jitter branch January 30, 2026 16:27
pracucci added a commit to grafana/mimir that referenced this pull request Jan 30, 2026
#### What this PR does

Update dskit to get a couple of changes to memberlist client:
- grafana/dskit#888
- grafana/dskit#892

#### Which issue(s) this PR fixes or relates to

N/A

#### Checklist

- [ ] Tests updated.
- [x] Documentation added.
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

Signed-off-by: Marco Pracucci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants