Default to bigger remote_write sends#5267
Conversation
|
Not against this per se, but we need to do the maths wrt to all the other params to make sure it makes sense. For instance, the queue wait time might need tuning. Plus with bigger batches I think you might be more likely to hit a quasi live lock in the sharding code when one shard gets stuck. Lets get 2.8 release out in the next few days then come back to this. |
|
@tomwilkie I am not pointing any fingers but we're at 2.13 now! |
|
Is there any update on this, dear remote-write experts? |
|
@beorn7 I don't think this is ever getting merged but applying the suggested defaults using the Prometheus config file did bring its memory usage in check. |
|
I'm really not qualified when it comes to remote-write. @tomwilkie @bboreham @cstyan @csmarchbanks (semi-randomly picked experts on the matter) WDYT about this PR ? |
|
(Context: We reached this during our weekly bug scrub, and the people in the room couldn't make a call here on their own.) |
|
You can see some recent conversation in #5166 around what default values to choose. I proposed 500, as I have seen some systems start to break down with 1000 samples per send. Once we have a better default decided Bryan can feel free to update this PR, or someone can make a new one. |
|
Sure. |
Raise the default MaxSamplesPerSend to amortise the cost of remote calls across more samples. Lower MaxShards to keep the expected max memory usage within reason. Signed-off-by: Bryan Boreham <[email protected]>
fb38dd2 to
a88c0d2
Compare
|
I have updated this PR to 200 shards and 500 samples. |
config/config.go
Outdated
| // Each shard will have a max of 500 samples pending in its channel, plus the pending | ||
| // samples that have been enqueued. Theoretically we should only ever have about 1000 samples | ||
| // per shard pending. At 200 shards that's 200k. | ||
| Capacity: 500, |
There was a problem hiding this comment.
Would you also increase the default Capacity to 2500 so it stays at 5 full requests being buffered?
To maintain ratio with MaxSamplesPerSend Signed-off-by: Bryan Boreham <[email protected]>
4a99fae to
771266b
Compare
|
Windows failures look to be unrelated to the change. Merging anyway. |
|
This seems a big change -- since it has been in the making for a very long time and would require a RC it will be for prometheus 2.22. |
prometheus/prometheus#5267 increase the defaults by 5 times Signed-off-by: Christian Simon <[email protected]>
prometheus/prometheus#5267 increase the defaults by 5 times Signed-off-by: Christian Simon <[email protected]>
Fixes #5166
Raise the default
MaxSamplesPerSendto amortise the cost of remote calls across more samples. LowerMaxShardsto keep the expected max memory usage roughly the same.Some adjustment of comment wording to make them less easy to misunderstand.