Skip to content

Fluffy: Implement offer cache to hold content ids of recent offers#3233

Merged
bhartnett merged 19 commits intomasterfrom
fluffy-offer-contentid-cache
Apr 24, 2025
Merged

Fluffy: Implement offer cache to hold content ids of recent offers#3233
bhartnett merged 19 commits intomasterfrom
fluffy-offer-contentid-cache

Conversation

@bhartnett
Copy link
Copy Markdown
Contributor

@bhartnett bhartnett commented Apr 23, 2025

Reduces load on the database during gossip process as it is very common to receive multiple copies of the same content from different peers as content is gossipped through the network.

Changes in this PR:

  • Offer cache holds content id of recently stored offers
  • Content cache is checked before checking the database during offer flow
  • If the database is pruned then the content ids in the offer cache are invalidated so that we don't incorrectly indicate that some pruned content is already stored.

@bhartnett bhartnett requested a review from kdeme April 23, 2025 06:08
Copy link
Copy Markdown
Contributor

@kdeme kdeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this cache addition as offers do typically come in fairly close to eachother and the change is quite minimal.

Two things however:

  • Curious to see any actual data on this.
  • I was also wondering how often this occurs versus the version where offers come in too close to each other that the actual content of the first offer is not stored yet. Of course for that scenario we cannot really add a cache as the content offered could be failed to send or invalid.

Comment thread fluffy/network/wire/portal_protocol.nim Outdated
Comment on lines +1818 to +1819
for k, v in p.offerCache.mpairs():
v = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably faster to just reinitialize the cache? And that way you also don't need to boolean I think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably faster to just reinitialize the cache? And that way you also don't need to boolean I think?

I was looking though the minilru code and I didn't find a clear function so I went with this method. But yes, reinit is probably better. I'll update.

@bhartnett
Copy link
Copy Markdown
Contributor Author

bhartnett commented Apr 23, 2025

  • Curious to see any actual data on this.

I guess I can add some cache hit cache miss metrics for this and gossip some data in a local testnet to see the results.

  • I was also wondering how often this occurs versus the version where offers come in too close to each other that the actual content of the first offer is not stored yet. Of course for that scenario we cannot really add a cache as the content offered could be failed to send or invalid.

This is something we could use metrics to get data on a well. To address this problem I think we should put a limit on the max number of concurrent offers per content id. The current limits are per content id and per peer but there is no limit on multiple peers sending the same content concurrently.

@bhartnett
Copy link
Copy Markdown
Contributor Author

bhartnett commented Apr 23, 2025

@kdeme Here are some metrics showing the usage of the offer cache when running a local testnet with 16 nodes and gossipping content using 20 workers running for 10 mins or so. All content is sent to one of the fluffy instances (node 2) which gossips the content to its peers.

Node 1:

# HELP portal_offer_cache_hits Portal wire protocol local content lookups that hit the offer cache
# TYPE portal_offer_cache_hits counter
portal_offer_cache_hits_total{protocol_id="500a"} 47330.0
portal_offer_cache_hits_created{protocol_id="500a"} 1745418350.0

# HELP portal_offer_cache_misses Portal wire protocol local content lookups that don't hit the offer cache
# TYPE portal_offer_cache_misses counter
portal_offer_cache_misses_total{protocol_id="500a"} 9918.0
portal_offer_cache_misses_created{protocol_id="500a"} 1745418350.0

Node 2 (the node which the portal bridge is connected to):

# HELP portal_offer_cache_hits Portal wire protocol local content lookups that hit the offer cache
# TYPE portal_offer_cache_hits counter

# HELP portal_offer_cache_misses Portal wire protocol local content lookups that don't hit the offer cache
# TYPE portal_offer_cache_misses counter
portal_offer_cache_misses_total{protocol_id="500a"} 47465.0
portal_offer_cache_misses_created{protocol_id="500a"} 1745418349.0

Node 3:

# HELP portal_offer_cache_hits Portal wire protocol local content lookups that hit the offer cache
# TYPE portal_offer_cache_hits counter
portal_offer_cache_hits_total{protocol_id="500a"} 48479.0
portal_offer_cache_hits_created{protocol_id="500a"} 1745418350.0

# HELP portal_offer_cache_misses Portal wire protocol local content lookups that don't hit the offer cache
# TYPE portal_offer_cache_misses counter
portal_offer_cache_misses_total{protocol_id="500a"} 9175.0
portal_offer_cache_misses_created{protocol_id="500a"} 1745418350.0

Node 4:

# HELP portal_offer_cache_hits Portal wire protocol local content lookups that hit the offer cache
# TYPE portal_offer_cache_hits counter
portal_offer_cache_hits_total{protocol_id="500a"} 17025.0
portal_offer_cache_hits_created{protocol_id="500a"} 1745418350.0

# HELP portal_offer_cache_misses Portal wire protocol local content lookups that don't hit the offer cache
# TYPE portal_offer_cache_misses counter
portal_offer_cache_misses_total{protocol_id="500a"} 18976.0
portal_offer_cache_misses_created{protocol_id="500a"} 1745418350.0

It appears that at least 50% of the content lookups hit the cache during the gossip process. Of course the other benefit of this change is DOS protection in which case rejecting recently offered content would be much faster and not require a database lookup.

@bhartnett bhartnett requested a review from kdeme April 23, 2025 15:20
@bhartnett bhartnett merged commit 24d1dcf into master Apr 24, 2025
@bhartnett bhartnett deleted the fluffy-offer-contentid-cache branch April 24, 2025 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants