Dynamically update mempool gossip request rate limit#4162
Dynamically update mempool gossip request rate limit#4162StephenButtolph merged 3 commits intomasterfrom
Conversation
b49df03 to
705dd49
Compare
| } | ||
|
|
||
| // Intersect returns the set intersection of s1 and s2 | ||
| func Intersect[T comparable](s1 Set[T], s2 Set[T]) Set[T] { |
There was a problem hiding this comment.
I thought this looked better than putting it on a receiver... but admittedly it is inconsistent w/ the rest of the code.
705dd49 to
023d3ff
Compare
| variance := d.requestsPerPeer * (n - 1) / (n * n) | ||
| stdDeviation := math.Sqrt(variance) | ||
|
|
||
| limit := int(expectedSamples + 4*stdDeviation) |
There was a problem hiding this comment.
seems like this limit should be lower. 4 std deviations means the node is in 99.9+th percentile of expectedSamples sent. wouldn't we want this to be between 2~3 sd's?
There was a problem hiding this comment.
It's just for outlier detection... each validator makes to a uniformly random selected validator, so we're modeling the distribution of requests we can get in expectation as a normal distribution. We just want to throttle people trying to spam us with requests while not throttling non-byzantine validators who are just happening to re-sample us... so this would throttle a non-byzantine validator 1-99.995% = 0.005% of the time. If we're doing 2 std devs it'd be closer to 2% (seems like a big false positive rate), and 3 would be 0.2% (still seems big). 4 std devs was chosen somewhat arbitrarily in an offline conversation... so it is a bit of a magic number. I could add a comment if that helps explain the reasoning?
| validatorState validators.State, | ||
| subnetID ids.ID, | ||
| maxValidatorSetStaleness time.Duration, |
There was a problem hiding this comment.
would it be better to pass in some sort of validator config? saying this because these three parameters are not used outside of creating the inner Validators struct, and I'm personally not a fan of methods with too many parameters.
feel free to ignore if this is just not preferred golang style
There was a problem hiding this comment.
I don't want us to just slap "preferred go style" as an answer (since that doesn't really give us insight into the reasoning behind a decision...). It's definitely an idea... my thoughts are:
- The number of things that need to be specified do not change if we put it into a config/as explicit parameters. One difference about params vs configs is that you break your callers when you modify it... which can be good or bad.
- Configs are nice when you have zero-safe/reasonable defaults that you can use - since the zero value of a config will recursively initialize its fields with their zero values + you can set defaults lazily.
- Some defaults we could use:
- Reasonably I think we could have default value for
maxValidatorSetStaleness - ... Maybe we could default to not logging or logging using stdout?
- Reasonably I think we could have default value for
- But introducing a config to just wrap one value (
maxValidatorSetStaleness) is a bit odd to add a layer of abstraction over a single value, and I don't know when we would ever want the default behavior for logging since we only use avalanchego's logger except for in tests.
- Some defaults we could use:
|
|
||
| func (n *Network) Connected(_ context.Context, nodeID ids.NodeID, _ *version.Application) error { | ||
| n.Peers.add(nodeID) | ||
| n.Validators.connected(nodeID) |
There was a problem hiding this comment.
im slightly confused on why it is ok to add to both structs here. Is it true that all validators are peers but not all peers are validators? If so then don't we have to do some sort of check to make sure the node is a validator
There was a problem hiding this comment.
ah i see we are adding peers, but are not considering them validators yet. I think this method should be named a bit more clear. Another thought, why can't we just pass in the *Peers object rather than creating a duplicate copy of it?
There was a problem hiding this comment.
I considered this, but then Peers and Validators would need to share a reference to a lock. We could do something like that - but I thought sharing a lock reference looked worse than just copying the data. Admittedly Validators is kind of in a bad spot right now... it owns both data for connected and non-connected validators which I find confusing.
| validatorState validators.State, | ||
| subnetID ids.ID, | ||
| maxValidatorSetStaleness time.Duration, |
There was a problem hiding this comment.
These arguments feel pretty unusual to me. We previously seemed to be trying to keep this package more modular in how things hook together... This seems to go against that.
Would it make sense to be able to register onConnected / onDisconnected notifications rather than having the p2p network struct own the validators entirely?
There was a problem hiding this comment.
I wanted this originally but ended up not doing it... and upon looking back, the reason I coupled it was because I wanted Validators to know about Peers but since I ended up decoupling them anyways the original reason for this is now gone... I think this should be possible to do.
| // PullGossipRequestsPerValidator = PullGossipThrottlingPeriod / PullGossipFrequency = | ||
| // 10 seconds/period / 1.5 requests/second ~= 6.67 requests | ||
| PullGossipRequestsPerValidator: 7, |
There was a problem hiding this comment.
Does this make sense being a config? It is derived from PullGossipFrequency and PullGossipThrottlingPeriod. Or is this a config because it is based on other peoples values of those provided configs?
There was a problem hiding this comment.
Is there a good reason that this can't be a float? We don't need the configs to be consensus critical.
There was a problem hiding this comment.
... Or is this a config because it is based on other peoples values of those provided configs?
Yeah that's correct
Is there a good reason that this can't be a float? We don't need the configs to be consensus critical.
There isn't a reason - I use a float here which does play more nicely w/ the limit which is also a float.
There was a problem hiding this comment.
Are we planning on changing the currentHits logic in a separate PR?
Recall that we want rate limited requests to increase currentHits as well as non-rate limited requests.
There was a problem hiding this comment.
I was thinking we would adjust it in a separate PR... if you want us to couple it to this I can but it seems like an independent change to me
There was a problem hiding this comment.
I'm happy with it as a separate change.
c72fdf0 to
27dce95
Compare
27dce95 to
a88abee
Compare
| } | ||
| } | ||
|
|
||
| func TestNewDynamicThrottlerHandler(t *testing.T) { |
There was a problem hiding this comment.
These tests are almost certainly overkill...
a88abee to
c4f4a72
Compare
3c4726c to
ccaf747
Compare
Signed-off-by: Joshua Kim <[email protected]>
ccaf747 to
21cb1a1
Compare
StephenButtolph
left a comment
There was a problem hiding this comment.
Code lgtm - we should canary it out to make sure it is working as intended before merging.
|
deadlock fix seems to be working as expected. Continuing to QA to verify the rate limiting is working as expected. |
commit 66ca7dc Author: rodrigo <[email protected]> Date: Tue Sep 2 16:49:55 2025 -0400 feat(load): add firewood flag (#4235) commit 274541b Author: Suyan Qu <[email protected]> Date: Tue Sep 2 14:10:48 2025 -0500 feat: add parameters to disable metrics (#4214) commit e03af84 Author: aaronbuchwald <[email protected]> Date: Tue Sep 2 12:30:07 2025 -0400 Add timeout parameter to C-Chain re-execution jobs (#4223) commit 0e20485 Author: aaronbuchwald <[email protected]> Date: Tue Sep 2 11:58:29 2025 -0400 Comment out schedule trigger for re-execution on w/container (#4234) Signed-off-by: aaronbuchwald <[email protected]> Co-authored-by: Copilot <[email protected]> commit 847eba1 Author: aaronbuchwald <[email protected]> Date: Fri Aug 29 14:38:11 2025 -0400 Add back empty schedule entry for reexecute w/ container job (#4230) commit a958b8a Author: aaronbuchwald <[email protected]> Date: Fri Aug 29 12:56:06 2025 -0400 Add newline to workflow dispatch (#4229) Signed-off-by: aaronbuchwald <[email protected]> commit 7ec2258 Author: aaronbuchwald <[email protected]> Date: Thu Aug 28 17:11:38 2025 -0400 Push benchmark re-execute results on master workflow dispatch (#4224) commit 34f983e Author: aaronbuchwald <[email protected]> Date: Thu Aug 28 15:33:12 2025 -0400 Disambiguate source vs exec variable names in reexecute tasks (#4200) Signed-off-by: aaronbuchwald <[email protected]> Co-authored-by: Copilot <[email protected]> commit 99578a2 Author: aaronbuchwald <[email protected]> Date: Thu Aug 28 12:52:31 2025 -0400 Write grafana link to logs and github step summary (#4219) commit 814300c Author: aaronbuchwald <[email protected]> Date: Thu Aug 28 12:37:05 2025 -0400 Remove firewood entry from PR triggers due to flakes (#4227) commit 40fbcd5 Author: rodrigo <[email protected]> Date: Thu Aug 28 00:24:54 2025 -0400 refactor(load): simulator contract (#4181) commit 6195e1f Author: rodrigo <[email protected]> Date: Wed Aug 27 17:31:51 2025 -0400 chore: remove unzip mention (#4226) commit 59e88f3 Author: aaronbuchwald <[email protected]> Date: Wed Aug 27 11:17:27 2025 -0400 Remove schedule trigger for w/ container job that evaluates to empty matrix (#4218) commit c2563d1 Author: Stephen Buttolph <[email protected]> Date: Tue Aug 26 19:07:47 2025 -0400 Update versions for v1.13.5 (#4217) commit a0ccd66 Author: aaronbuchwald <[email protected]> Date: Tue Aug 26 12:34:54 2025 -0400 Add support for passing config and predefined configs to VM re-execution tests (#4180) commit cc3242f Author: Joshua Kim <[email protected]> Date: Mon Aug 25 18:49:28 2025 -0400 Dynamically update mempool gossip request rate limit (#4162) Signed-off-by: Joshua Kim <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> commit f2e3273 Author: Draco <[email protected]> Date: Mon Aug 25 15:00:56 2025 -0400 Add ability to create zstd compressor with compression level (#4203) commit 441f441 Author: Joshua Kim <[email protected]> Date: Mon Aug 25 12:11:07 2025 -0400 Remove buf lint action (#4189) Signed-off-by: Joshua Kim <[email protected]> commit 4bcb221 Author: Stephen Buttolph <[email protected]> Date: Sat Aug 23 15:43:06 2025 -0400 Update block + validator + pgo checkpoints to 2025-08-23 (#4205) commit b18ffc1 Author: rodrigo <[email protected]> Date: Fri Aug 22 16:34:53 2025 -0400 Add s5cmd progress bar (#4204) commit 2100bee Author: Sam Liokumovich <[email protected]> Date: Fri Aug 22 11:52:31 2025 -0400 Rename Engine Types (#4193) Signed-off-by: Sam Liokumovich <[email protected]> Co-authored-by: Copilot <[email protected]> commit 33727a8 Author: Joshua Kim <[email protected]> Date: Fri Aug 22 11:00:12 2025 -0400 Count throttled requests as hits (#4199) Signed-off-by: Joshua Kim <[email protected]> commit b939be4 Author: Draco <[email protected]> Date: Thu Aug 21 14:22:54 2025 -0400 fix: blockdb file eviction race issue (#4186) commit 778ccfe Author: aaronbuchwald <[email protected]> Date: Thu Aug 21 11:40:03 2025 -0400 Add config option for AWS S3 read only credential duration (#4192) commit ae41355 Author: Stephen Buttolph <[email protected]> Date: Wed Aug 20 16:46:49 2025 -0400 Add redundant import alias linting (#4191) Signed-off-by: Stephen Buttolph <[email protected]> Co-authored-by: Copilot <[email protected]> commit a3b5c6a Author: Stephen Buttolph <[email protected]> Date: Wed Aug 20 10:53:24 2025 -0400 Make Draco the codeowner of the blockdb (#4187) commit a24ac68 Author: queryfast <[email protected]> Date: Wed Aug 20 22:18:21 2025 +0800 refactor: replace []byte(fmt.Sprintf) with fmt.Appendf (#4161) Signed-off-by: queryfast <[email protected]> commit 7aa6a17 Author: Sam Liokumovich <[email protected]> Date: Tue Aug 19 14:39:40 2025 -0400 Rename height field to numBlocks (#4184) commit 7d7e1fe Author: aaronbuchwald <[email protected]> Date: Tue Aug 19 13:24:59 2025 -0400 Add optional step to archive post-reexecution state to S3 (#4172) Signed-off-by: aaronbuchwald <[email protected]> Co-authored-by: Copilot <[email protected]> commit ebe0558 Author: aaronbuchwald <[email protected]> Date: Tue Aug 19 12:11:34 2025 -0400 Change cache path to tmp included in gitignore (#4183) commit e5593be Author: Draco <[email protected]> Date: Tue Aug 19 12:01:43 2025 -0400 Block Database (#4027) commit 940b96f Author: Sam Liokumovich <[email protected]> Date: Tue Aug 19 11:36:37 2025 -0400 Storage Component For Simplex (#4122) Signed-off-by: Sam Liokumovich <[email protected]> commit 6d7e2dc Author: Nicolas Arnedo Villanueva <[email protected]> Date: Tue Aug 19 16:59:58 2025 +0200 `config/config.md:` Added Env Variable representation of flags + improved UI design (#4110) Signed-off-by: Meaghan FitzGerald <[email protected]> Signed-off-by: Nicolas Arnedo Villanueva <[email protected]> Co-authored-by: Meaghan FitzGerald <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> commit 81f13b2 Author: Draco <[email protected]> Date: Mon Aug 18 13:59:43 2025 -0400 feat: add eviction callback in LRU cache (#4088) commit 4f5acfc Author: Jonathan Oppenheimer <[email protected]> Date: Mon Aug 18 13:16:44 2025 -0400 Migrate predicate package from evm repos (#4147) Signed-off-by: Jonathan Oppenheimer <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> Co-authored-by: Joshua Kim <[email protected]> commit 335e79f Author: Kendra Karol Sevilla <[email protected]> Date: Mon Aug 18 18:45:52 2025 +0200 chore: fix typo (#4179) Signed-off-by: Kendra Karol Sevilla <[email protected]> commit 7275b02 Author: yinwenyu6 <[email protected]> Date: Mon Aug 18 22:29:03 2025 +0800 chore: fix function name (#4178) Signed-off-by: yinwenyu6 <[email protected]> commit 3b0c595 Author: yacovm <[email protected]> Date: Mon Aug 18 16:28:29 2025 +0200 Fix typo in comment - PChainHeight context (#4176) Signed-off-by: Yacov Manevich <[email protected]> commit 96f30d1 Author: rodrigo <[email protected]> Date: Fri Aug 15 02:15:44 2025 -0400 feat(load): add token test (#4171) commit e285ce0 Author: Sam Liokumovich <[email protected]> Date: Thu Aug 14 13:52:41 2025 -0400 Use EmptyVoteMetadata in Simplex Proto Messages (#4174) commit 5c72544 Author: aaronbuchwald <[email protected]> Date: Wed Aug 13 10:34:58 2025 -0400 Move C-Chain benchmark to custom action and add ARC + GH runner triggers (#4165) commit 3b0f8d4 Author: rodrigo <[email protected]> Date: Tue Aug 5 20:14:38 2025 -0400 refactor(load): remove context from test interface (#4157) commit a893a61 Author: Juan Leon <[email protected]> Date: Tue Aug 5 14:36:59 2025 -0400 Add @joshua-kim as CODEOWNER to testing-related packages (#4118) Signed-off-by: Juan Leon <[email protected]> commit be28a8b Author: Galoretka <[email protected]> Date: Mon Aug 4 22:39:41 2025 +0300 chore: fix a typo in gossip,go (#4154) Signed-off-by: Galoretka <[email protected]> commit b876d78 Author: aaronbuchwald <[email protected]> Date: Mon Aug 4 11:58:22 2025 -0400 Separate re-execution job params for PR from schedule (#4151) commit 752e12f Author: Stephen Buttolph <[email protected]> Date: Fri Aug 1 16:23:01 2025 -0400 Update coreth to v0.15.3-rc.5 (#4153) commit 3ba5246 Author: Joshua Kim <[email protected]> Date: Fri Aug 1 14:59:24 2025 -0400 fix metrics tests (#4146) Signed-off-by: Joshua Kim <[email protected]> commit 0cb887b Author: Afounso Souza <[email protected]> Date: Fri Aug 1 16:37:53 2025 +0200 Typo fix (#4150) Signed-off-by: Afounso Souza <[email protected]> commit 110807a Author: rodrigo <[email protected]> Date: Thu Jul 31 22:06:40 2025 -0400 docs: load (#4132) commit 24a051a Author: Jonathan Oppenheimer <[email protected]> Date: Thu Jul 31 19:06:15 2025 -0400 uplift: Add combined metrics package from evm repositories (#4135) Signed-off-by: Jonathan Oppenheimer <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> commit d9b512e Author: rodrigo <[email protected]> Date: Thu Jul 31 11:52:39 2025 -0400 Parameterize values in transfer tests (#4144) commit 6947e4c Author: rodrigo <[email protected]> Date: Wed Jul 30 12:27:45 2025 -0400 feat(load): add trie stress test (#4137) Signed-off-by: Joshua Kim <[email protected]>
Why this should be merged
Prevents us from being spammed by gossip requests
How this works
Dynamically updates the throttling limit based on the number of connected validators such that larger networks expect fewer samples against each node.
How this was tested
Added unit tests
Need to be documented in RELEASES.md?
Yes