Skip to content

Dynamically update mempool gossip request rate limit#4162

Merged
StephenButtolph merged 3 commits intomasterfrom
gossip-throttling
Aug 25, 2025
Merged

Dynamically update mempool gossip request rate limit#4162
StephenButtolph merged 3 commits intomasterfrom
gossip-throttling

Conversation

@joshua-kim
Copy link
Copy Markdown
Contributor

Why this should be merged

Prevents us from being spammed by gossip requests

How this works

Dynamically updates the throttling limit based on the number of connected validators such that larger networks expect fewer samples against each node.

How this was tested

Added unit tests

Need to be documented in RELEASES.md?

Yes

@joshua-kim joshua-kim force-pushed the gossip-throttling branch 2 times, most recently from b49df03 to 705dd49 Compare August 7, 2025 21:06
Comment thread utils/set/set.go
}

// Intersect returns the set intersection of s1 and s2
func Intersect[T comparable](s1 Set[T], s2 Set[T]) Set[T] {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this looked better than putting it on a receiver... but admittedly it is inconsistent w/ the rest of the code.

Comment thread utils/set/set.go
@joshua-kim joshua-kim self-assigned this Aug 12, 2025
@joshua-kim joshua-kim requested review from a team, JuanLeon2, rrazvan1 and samliok and removed request for a team, JuanLeon2 and rrazvan1 August 12, 2025 16:12
Comment thread network/p2p/handler.go
Comment thread network/p2p/handler.go Outdated
variance := d.requestsPerPeer * (n - 1) / (n * n)
stdDeviation := math.Sqrt(variance)

limit := int(expectedSamples + 4*stdDeviation)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this limit should be lower. 4 std deviations means the node is in 99.9+th percentile of expectedSamples sent. wouldn't we want this to be between 2~3 sd's?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for outlier detection... each validator makes to a uniformly random selected validator, so we're modeling the distribution of requests we can get in expectation as a normal distribution. We just want to throttle people trying to spam us with requests while not throttling non-byzantine validators who are just happening to re-sample us... so this would throttle a non-byzantine validator 1-99.995% = 0.005% of the time. If we're doing 2 std devs it'd be closer to 2% (seems like a big false positive rate), and 3 would be 0.2% (still seems big). 4 std devs was chosen somewhat arbitrarily in an offline conversation... so it is a bit of a magic number. I could add a comment if that helps explain the reasoning?

Comment thread vms/example/xsvm/vm.go Outdated
Comment thread network/p2p/network.go Outdated
Comment on lines +62 to +64
validatorState validators.State,
subnetID ids.ID,
maxValidatorSetStaleness time.Duration,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to pass in some sort of validator config? saying this because these three parameters are not used outside of creating the inner Validators struct, and I'm personally not a fan of methods with too many parameters.

feel free to ignore if this is just not preferred golang style

Copy link
Copy Markdown
Contributor Author

@joshua-kim joshua-kim Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want us to just slap "preferred go style" as an answer (since that doesn't really give us insight into the reasoning behind a decision...). It's definitely an idea... my thoughts are:

  1. The number of things that need to be specified do not change if we put it into a config/as explicit parameters. One difference about params vs configs is that you break your callers when you modify it... which can be good or bad.
  2. Configs are nice when you have zero-safe/reasonable defaults that you can use - since the zero value of a config will recursively initialize its fields with their zero values + you can set defaults lazily.
    • Some defaults we could use:
      • Reasonably I think we could have default value for maxValidatorSetStaleness
      • ... Maybe we could default to not logging or logging using stdout?
    • But introducing a config to just wrap one value (maxValidatorSetStaleness) is a bit odd to add a layer of abstraction over a single value, and I don't know when we would ever want the default behavior for logging since we only use avalanchego's logger except for in tests.

Comment thread network/p2p/network.go Outdated

func (n *Network) Connected(_ context.Context, nodeID ids.NodeID, _ *version.Application) error {
n.Peers.add(nodeID)
n.Validators.connected(nodeID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im slightly confused on why it is ok to add to both structs here. Is it true that all validators are peers but not all peers are validators? If so then don't we have to do some sort of check to make sure the node is a validator

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see we are adding peers, but are not considering them validators yet. I think this method should be named a bit more clear. Another thought, why can't we just pass in the *Peers object rather than creating a duplicate copy of it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but then Peers and Validators would need to share a reference to a lock. We could do something like that - but I thought sharing a lock reference looked worse than just copying the data. Admittedly Validators is kind of in a bad spot right now... it owns both data for connected and non-connected validators which I find confusing.

Comment thread network/p2p/handler_test.go Outdated
@joshua-kim joshua-kim moved this to In Progress 🏗️ in avalanchego Aug 18, 2025
Comment thread network/p2p/validators.go
Comment thread network/p2p/network.go Outdated
Comment on lines +62 to +64
validatorState validators.State,
subnetID ids.ID,
maxValidatorSetStaleness time.Duration,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments feel pretty unusual to me. We previously seemed to be trying to keep this package more modular in how things hook together... This seems to go against that.

Would it make sense to be able to register onConnected / onDisconnected notifications rather than having the p2p network struct own the validators entirely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this originally but ended up not doing it... and upon looking back, the reason I coupled it was because I wanted Validators to know about Peers but since I ended up decoupling them anyways the original reason for this is now gone... I think this should be possible to do.

Comment thread vms/platformvm/config/network.go Outdated
Comment on lines +26 to +28
// PullGossipRequestsPerValidator = PullGossipThrottlingPeriod / PullGossipFrequency =
// 10 seconds/period / 1.5 requests/second ~= 6.67 requests
PullGossipRequestsPerValidator: 7,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense being a config? It is derived from PullGossipFrequency and PullGossipThrottlingPeriod. Or is this a config because it is based on other peoples values of those provided configs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason that this can't be a float? We don't need the configs to be consensus critical.

Copy link
Copy Markdown
Contributor Author

@joshua-kim joshua-kim Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Or is this a config because it is based on other peoples values of those provided configs?

Yeah that's correct

Is there a good reason that this can't be a float? We don't need the configs to be consensus critical.

There isn't a reason - I use a float here which does play more nicely w/ the limit which is also a float.

Comment thread vms/platformvm/config/network.go Outdated
Comment thread network/p2p/handler.go
Comment thread network/p2p/throttler.go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on changing the currentHits logic in a separate PR?

Recall that we want rate limited requests to increase currentHits as well as non-rate limited requests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we would adjust it in a separate PR... if you want us to couple it to this I can but it seems like an independent change to me

Copy link
Copy Markdown
Contributor

@StephenButtolph StephenButtolph Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with it as a separate change.

Comment thread network/p2p/handler.go
@joshua-kim joshua-kim force-pushed the gossip-throttling branch 3 times, most recently from c72fdf0 to 27dce95 Compare August 21, 2025 19:01
}
}

func TestNewDynamicThrottlerHandler(t *testing.T) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are almost certainly overkill...

@joshua-kim joshua-kim force-pushed the gossip-throttling branch 4 times, most recently from 3c4726c to ccaf747 Compare August 21, 2025 21:19
Comment thread network/p2p/handler.go
Comment thread network/p2p/network.go Outdated
Comment thread network/p2p/network.go
Copy link
Copy Markdown
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm - we should canary it out to make sure it is working as intended before merging.

@StephenButtolph StephenButtolph dismissed their stale review August 23, 2025 19:20

Code introduced a deadlock.

@StephenButtolph
Copy link
Copy Markdown
Contributor

deadlock fix seems to be working as expected. Continuing to QA to verify the rate limiting is working as expected.

@StephenButtolph StephenButtolph moved this from In Progress 🏗️ to In Review 🔎 in avalanchego Aug 25, 2025
@StephenButtolph StephenButtolph added this pull request to the merge queue Aug 25, 2025
Merged via the queue into master with commit cc3242f Aug 25, 2025
64 checks passed
@StephenButtolph StephenButtolph deleted the gossip-throttling branch August 25, 2025 23:06
@github-project-automation github-project-automation bot moved this from In Review 🔎 to Done 🎉 in avalanchego Aug 25, 2025
joshua-kim added a commit that referenced this pull request Sep 3, 2025
commit 66ca7dc
Author: rodrigo <[email protected]>
Date:   Tue Sep 2 16:49:55 2025 -0400

    feat(load): add firewood flag (#4235)

commit 274541b
Author: Suyan Qu <[email protected]>
Date:   Tue Sep 2 14:10:48 2025 -0500

    feat: add parameters to disable metrics (#4214)

commit e03af84
Author: aaronbuchwald <[email protected]>
Date:   Tue Sep 2 12:30:07 2025 -0400

    Add timeout parameter to C-Chain re-execution jobs (#4223)

commit 0e20485
Author: aaronbuchwald <[email protected]>
Date:   Tue Sep 2 11:58:29 2025 -0400

    Comment out schedule trigger for re-execution on w/container (#4234)

    Signed-off-by: aaronbuchwald <[email protected]>
    Co-authored-by: Copilot <[email protected]>

commit 847eba1
Author: aaronbuchwald <[email protected]>
Date:   Fri Aug 29 14:38:11 2025 -0400

    Add back empty schedule entry for reexecute w/ container job (#4230)

commit a958b8a
Author: aaronbuchwald <[email protected]>
Date:   Fri Aug 29 12:56:06 2025 -0400

    Add newline to workflow dispatch (#4229)

    Signed-off-by: aaronbuchwald <[email protected]>

commit 7ec2258
Author: aaronbuchwald <[email protected]>
Date:   Thu Aug 28 17:11:38 2025 -0400

    Push benchmark re-execute results on master workflow dispatch (#4224)

commit 34f983e
Author: aaronbuchwald <[email protected]>
Date:   Thu Aug 28 15:33:12 2025 -0400

    Disambiguate source vs exec variable names in reexecute tasks (#4200)

    Signed-off-by: aaronbuchwald <[email protected]>
    Co-authored-by: Copilot <[email protected]>

commit 99578a2
Author: aaronbuchwald <[email protected]>
Date:   Thu Aug 28 12:52:31 2025 -0400

    Write grafana link to logs and github step summary (#4219)

commit 814300c
Author: aaronbuchwald <[email protected]>
Date:   Thu Aug 28 12:37:05 2025 -0400

    Remove firewood entry from PR triggers due to flakes (#4227)

commit 40fbcd5
Author: rodrigo <[email protected]>
Date:   Thu Aug 28 00:24:54 2025 -0400

    refactor(load): simulator contract (#4181)

commit 6195e1f
Author: rodrigo <[email protected]>
Date:   Wed Aug 27 17:31:51 2025 -0400

    chore: remove unzip mention (#4226)

commit 59e88f3
Author: aaronbuchwald <[email protected]>
Date:   Wed Aug 27 11:17:27 2025 -0400

    Remove schedule trigger for w/ container job that evaluates to empty matrix (#4218)

commit c2563d1
Author: Stephen Buttolph <[email protected]>
Date:   Tue Aug 26 19:07:47 2025 -0400

    Update versions for v1.13.5 (#4217)

commit a0ccd66
Author: aaronbuchwald <[email protected]>
Date:   Tue Aug 26 12:34:54 2025 -0400

    Add support for passing config and predefined configs to VM re-execution tests (#4180)

commit cc3242f
Author: Joshua Kim <[email protected]>
Date:   Mon Aug 25 18:49:28 2025 -0400

    Dynamically update mempool gossip request rate limit (#4162)

    Signed-off-by: Joshua Kim <[email protected]>
    Co-authored-by: Stephen Buttolph <[email protected]>

commit f2e3273
Author: Draco <[email protected]>
Date:   Mon Aug 25 15:00:56 2025 -0400

    Add ability to create zstd compressor with compression level (#4203)

commit 441f441
Author: Joshua Kim <[email protected]>
Date:   Mon Aug 25 12:11:07 2025 -0400

    Remove buf lint action (#4189)

    Signed-off-by: Joshua Kim <[email protected]>

commit 4bcb221
Author: Stephen Buttolph <[email protected]>
Date:   Sat Aug 23 15:43:06 2025 -0400

    Update block + validator + pgo checkpoints to 2025-08-23 (#4205)

commit b18ffc1
Author: rodrigo <[email protected]>
Date:   Fri Aug 22 16:34:53 2025 -0400

    Add s5cmd progress bar (#4204)

commit 2100bee
Author: Sam Liokumovich <[email protected]>
Date:   Fri Aug 22 11:52:31 2025 -0400

    Rename Engine Types (#4193)

    Signed-off-by: Sam Liokumovich <[email protected]>
    Co-authored-by: Copilot <[email protected]>

commit 33727a8
Author: Joshua Kim <[email protected]>
Date:   Fri Aug 22 11:00:12 2025 -0400

    Count throttled requests as hits (#4199)

    Signed-off-by: Joshua Kim <[email protected]>

commit b939be4
Author: Draco <[email protected]>
Date:   Thu Aug 21 14:22:54 2025 -0400

    fix: blockdb file eviction race issue (#4186)

commit 778ccfe
Author: aaronbuchwald <[email protected]>
Date:   Thu Aug 21 11:40:03 2025 -0400

    Add config option for AWS S3 read only credential duration (#4192)

commit ae41355
Author: Stephen Buttolph <[email protected]>
Date:   Wed Aug 20 16:46:49 2025 -0400

    Add redundant import alias linting (#4191)

    Signed-off-by: Stephen Buttolph <[email protected]>
    Co-authored-by: Copilot <[email protected]>

commit a3b5c6a
Author: Stephen Buttolph <[email protected]>
Date:   Wed Aug 20 10:53:24 2025 -0400

    Make Draco the codeowner of the blockdb (#4187)

commit a24ac68
Author: queryfast <[email protected]>
Date:   Wed Aug 20 22:18:21 2025 +0800

    refactor: replace []byte(fmt.Sprintf) with fmt.Appendf (#4161)

    Signed-off-by: queryfast <[email protected]>

commit 7aa6a17
Author: Sam Liokumovich <[email protected]>
Date:   Tue Aug 19 14:39:40 2025 -0400

    Rename height field to numBlocks (#4184)

commit 7d7e1fe
Author: aaronbuchwald <[email protected]>
Date:   Tue Aug 19 13:24:59 2025 -0400

    Add optional step to archive post-reexecution state to S3 (#4172)

    Signed-off-by: aaronbuchwald <[email protected]>
    Co-authored-by: Copilot <[email protected]>

commit ebe0558
Author: aaronbuchwald <[email protected]>
Date:   Tue Aug 19 12:11:34 2025 -0400

    Change cache path to tmp included in gitignore (#4183)

commit e5593be
Author: Draco <[email protected]>
Date:   Tue Aug 19 12:01:43 2025 -0400

    Block Database (#4027)

commit 940b96f
Author: Sam Liokumovich <[email protected]>
Date:   Tue Aug 19 11:36:37 2025 -0400

    Storage Component For Simplex (#4122)

    Signed-off-by: Sam Liokumovich <[email protected]>

commit 6d7e2dc
Author: Nicolas Arnedo Villanueva <[email protected]>
Date:   Tue Aug 19 16:59:58 2025 +0200

    `config/config.md:` Added Env Variable representation of flags + improved UI design (#4110)

    Signed-off-by: Meaghan FitzGerald <[email protected]>
    Signed-off-by: Nicolas Arnedo Villanueva <[email protected]>
    Co-authored-by: Meaghan FitzGerald <[email protected]>
    Co-authored-by: Stephen Buttolph <[email protected]>

commit 81f13b2
Author: Draco <[email protected]>
Date:   Mon Aug 18 13:59:43 2025 -0400

    feat: add eviction callback in LRU cache (#4088)

commit 4f5acfc
Author: Jonathan Oppenheimer <[email protected]>
Date:   Mon Aug 18 13:16:44 2025 -0400

    Migrate predicate package from evm repos (#4147)

    Signed-off-by: Jonathan Oppenheimer <[email protected]>
    Co-authored-by: Copilot <[email protected]>
    Co-authored-by: Stephen Buttolph <[email protected]>
    Co-authored-by: Joshua Kim <[email protected]>

commit 335e79f
Author: Kendra Karol Sevilla <[email protected]>
Date:   Mon Aug 18 18:45:52 2025 +0200

    chore: fix typo (#4179)

    Signed-off-by: Kendra Karol Sevilla <[email protected]>

commit 7275b02
Author: yinwenyu6 <[email protected]>
Date:   Mon Aug 18 22:29:03 2025 +0800

    chore: fix function name (#4178)

    Signed-off-by: yinwenyu6 <[email protected]>

commit 3b0c595
Author: yacovm <[email protected]>
Date:   Mon Aug 18 16:28:29 2025 +0200

    Fix typo in comment - PChainHeight context (#4176)

    Signed-off-by: Yacov Manevich <[email protected]>

commit 96f30d1
Author: rodrigo <[email protected]>
Date:   Fri Aug 15 02:15:44 2025 -0400

    feat(load): add token test (#4171)

commit e285ce0
Author: Sam Liokumovich <[email protected]>
Date:   Thu Aug 14 13:52:41 2025 -0400

    Use EmptyVoteMetadata in Simplex Proto Messages (#4174)

commit 5c72544
Author: aaronbuchwald <[email protected]>
Date:   Wed Aug 13 10:34:58 2025 -0400

    Move C-Chain benchmark to custom action and add ARC + GH runner triggers (#4165)

commit 3b0f8d4
Author: rodrigo <[email protected]>
Date:   Tue Aug 5 20:14:38 2025 -0400

    refactor(load): remove context from test interface (#4157)

commit a893a61
Author: Juan Leon <[email protected]>
Date:   Tue Aug 5 14:36:59 2025 -0400

    Add @joshua-kim as CODEOWNER to testing-related packages (#4118)

    Signed-off-by: Juan Leon <[email protected]>

commit be28a8b
Author: Galoretka <[email protected]>
Date:   Mon Aug 4 22:39:41 2025 +0300

    chore: fix a typo in gossip,go (#4154)

    Signed-off-by: Galoretka <[email protected]>

commit b876d78
Author: aaronbuchwald <[email protected]>
Date:   Mon Aug 4 11:58:22 2025 -0400

    Separate re-execution job params for PR from schedule (#4151)

commit 752e12f
Author: Stephen Buttolph <[email protected]>
Date:   Fri Aug 1 16:23:01 2025 -0400

    Update coreth to v0.15.3-rc.5 (#4153)

commit 3ba5246
Author: Joshua Kim <[email protected]>
Date:   Fri Aug 1 14:59:24 2025 -0400

    fix metrics tests (#4146)

    Signed-off-by: Joshua Kim <[email protected]>

commit 0cb887b
Author: Afounso Souza <[email protected]>
Date:   Fri Aug 1 16:37:53 2025 +0200

    Typo fix (#4150)

    Signed-off-by: Afounso Souza <[email protected]>

commit 110807a
Author: rodrigo <[email protected]>
Date:   Thu Jul 31 22:06:40 2025 -0400

    docs: load (#4132)

commit 24a051a
Author: Jonathan Oppenheimer <[email protected]>
Date:   Thu Jul 31 19:06:15 2025 -0400

    uplift: Add combined metrics package from evm repositories (#4135)

    Signed-off-by: Jonathan Oppenheimer <[email protected]>
    Co-authored-by: Stephen Buttolph <[email protected]>

commit d9b512e
Author: rodrigo <[email protected]>
Date:   Thu Jul 31 11:52:39 2025 -0400

    Parameterize values in transfer tests (#4144)

commit 6947e4c
Author: rodrigo <[email protected]>
Date:   Wed Jul 30 12:27:45 2025 -0400

    feat(load): add trie stress test (#4137)

Signed-off-by: Joshua Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants