Skip to content

Default to bigger remote_write sends#5267

Merged
csmarchbanks merged 2 commits intoprometheus:masterfrom
bboreham:raise-default-msps
Sep 9, 2020
Merged

Default to bigger remote_write sends#5267
csmarchbanks merged 2 commits intoprometheus:masterfrom
bboreham:raise-default-msps

Conversation

@bboreham
Copy link
Copy Markdown
Member

Fixes #5166

Raise the default MaxSamplesPerSend to amortise the cost of remote calls across more samples. Lower MaxShards to keep the expected max memory usage roughly the same.

Some adjustment of comment wording to make them less easy to misunderstand.

@simonpasquier simonpasquier mentioned this pull request Feb 26, 2019
@tomwilkie tomwilkie self-requested a review February 28, 2019 16:08
@tomwilkie
Copy link
Copy Markdown
Member

Not against this per se, but we need to do the maths wrt to all the other params to make sure it makes sense. For instance, the queue wait time might need tuning. Plus with bigger batches I think you might be more likely to hit a quasi live lock in the sharding code when one shard gets stuck. Lets get 2.8 release out in the next few days then come back to this.

@ashutoshgngwr
Copy link
Copy Markdown

@tomwilkie I am not pointing any fingers but we're at 2.13 now!

@stale stale bot added the stale label Feb 19, 2020
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented May 4, 2020

Is there any update on this, dear remote-write experts?

@stale stale bot removed the stale label May 4, 2020
@ashutoshgngwr
Copy link
Copy Markdown

@beorn7 I don't think this is ever getting merged but applying the suggested defaults using the Prometheus config file did bring its memory usage in check.

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented May 5, 2020

I'm really not qualified when it comes to remote-write. @tomwilkie @bboreham @cstyan @csmarchbanks (semi-randomly picked experts on the matter) WDYT about this PR ?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented May 5, 2020

(Context: We reached this during our weekly bug scrub, and the people in the room couldn't make a call here on their own.)

@csmarchbanks
Copy link
Copy Markdown
Member

You can see some recent conversation in #5166 around what default values to choose. I proposed 500, as I have seen some systems start to break down with 1000 samples per send. Once we have a better default decided Bryan can feel free to update this PR, or someone can make a new one.

@stale stale bot added the stale label Jul 4, 2020
@csmarchbanks
Copy link
Copy Markdown
Member

@bboreham Do you have any interest in resurrecting this PR and setting the default to 500 samples per send/200 max shards as discussed in #5166?

@stale stale bot removed the stale label Jul 24, 2020
@bboreham
Copy link
Copy Markdown
Member Author

Sure.

Raise the default MaxSamplesPerSend to amortise the cost of remote
calls across more samples. Lower MaxShards to keep the expected max
memory usage within reason.

Signed-off-by: Bryan Boreham <[email protected]>
@bboreham bboreham force-pushed the raise-default-msps branch from fb38dd2 to a88c0d2 Compare August 9, 2020 16:35
@bboreham
Copy link
Copy Markdown
Member Author

bboreham commented Aug 9, 2020

I have updated this PR to 200 shards and 500 samples.

config/config.go Outdated
// Each shard will have a max of 500 samples pending in its channel, plus the pending
// samples that have been enqueued. Theoretically we should only ever have about 1000 samples
// per shard pending. At 200 shards that's 200k.
Capacity: 500,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also increase the default Capacity to 2500 so it stays at 5 full requests being buffered?

To maintain ratio with MaxSamplesPerSend

Signed-off-by: Bryan Boreham <[email protected]>
Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@csmarchbanks
Copy link
Copy Markdown
Member

Windows failures look to be unrelated to the change. Merging anyway.

@csmarchbanks csmarchbanks merged commit 90fc6be into prometheus:master Sep 9, 2020
@bboreham bboreham deleted the raise-default-msps branch September 10, 2020 08:41
@roidelapluie
Copy link
Copy Markdown
Member

This seems a big change -- since it has been in the making for a very long time and would require a RC it will be for prometheus 2.22.

simonswine added a commit to simonswine/prometheus-docs that referenced this pull request Dec 9, 2020
prometheus/prometheus#5267 increase the defaults
by 5 times

Signed-off-by: Christian Simon <[email protected]>
csmarchbanks pushed a commit to prometheus/docs that referenced this pull request Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proposal: increase default value for max_samples_per_send

6 participants