validation: Persist coins cache to disk and load on startup #18941

andrewtoth · 2020-05-11T02:46:53Z

This PR adds a way to persist the coins cache on shutdown to a file named coinscache.dat, similar to what is done for the mempool. On startup this file is used to warm the cache so it doesn't get cold between restarts. This can be useful for users who want to connect blocks quickly with a high -dbcache value.

This introduces a new config arg, -persistcoinscache, that is defaulted to false. With a higher cache value the amount of disk space used for the file could be very large, so it defaults to off to prevent any footguns. With lower cache values this configuration could cause the cache to flush sooner than necessary and would probably not provide any benefit.

With a max dbcache after a reindex or IBD it will dump the entire utxo set and load it into memory on startup. Testing this today I had a file size of 2.4GB and it took ~22 minutes to fully reinsert the utxo set into the cache with an SSD.

After #17487 we can add a change to not wipe the cache on periodic flushes. Users could then run the node continuously with the entire utxo set in memory. Benchmarking shows running in this configuration could save several hundred milliseconds when connecting blocks, vs an empty cache that is cleared periodically or during restarts.

DrahtBot · 2020-05-11T07:47:01Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

coins: allow cache resize after init #18637 (coins: allow cache resize after init by jamesob)
Add fee_est tool for debugging fee estimation code #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

andrewtoth · 2020-05-12T03:28:02Z

Motivated by #15218 (comment)

we wanted to make the cache actually more useful on startup (something I've thought we should do for a while)

sipa · 2020-05-12T03:41:07Z

I'm not sure this is useful.

My information is several years old, so take this with a grain of salt. Actual benchmarks would of course be better.

I experimented with several different designs for partial flushing, where some data remained in the cache at the time it was flushed, and surprisingly, they were invariably all slower than just wiping everything every time. The reason is that our cache isn't really a cache but a buffer: its performance gains come from the fact that it (dramatically) reduces disk writes (it also reduces disk reads, but the impact of those is far less). The reason is that most UTXOs get created and spent within a short time frame, and if they happen within a cache window, without flush in between, the created UTXO is deleted from memory, without ever hitting disk. At least on the systems I tested on, reserving a portion of the cache as "flushed but still cached" was never better than just using that area as additional write-saving buffer.

andrewtoth · 2020-05-12T03:58:00Z

@sipa I'm having trouble reconciling your above comment with the one you made here

@andrewtoth The problem is that right now, causing a flush when exiting IBD will (temporarily) kill your performance right before finishing the sync (because it leaves you with an empty cache). If instead it was a non-clearing flush, there would be no such downside.

Either having data already in the cache is desirable, or we always want a clear cache so we can write as much as possible.

Or perhaps I haven't made the motivation clear. This does not do a partial flush. It just warms up the cache after a restart. Say you have 100 MB in the cache but you need to reboot. Upon startup you will have an empty cache, which you say above will temporarily kill your performance.

I suppose I should benchmark connecting a block with an empty cache and one with a full cache to measure the read benefit.

andrewtoth · 2020-05-25T18:55:52Z

I've put together some benchmarks. TL;DR running with the entire utxo set in memory can shave off several hundred milliseconds in ConnectBlock.

I benchmarked with this patch that records the ConnectBlock time and whenever a flush happens. I ran with -dbcache=20000 -persistcoinscache=1 -reindex-chainstate -stopatheight=600000. For a baseline I ran with an earlier commmit which blocks the main thread until the cache is fully loaded, and ran with -dbcache=20000 -persistcoinscache=1. I then ran the same from block 600000 with this patch to always flush the coins cache so it would always be empty. This gave me the connect block times for a cache with the entire utxo set, and an empty cache on each block connection. I also ran with default dbcache for comparison. Below is a plot of the time deltas between block connection in milliseconds. Orange is the delta between empty cache and full cache, and red is the delta between default cache and full cache with blue dots plotting the times the cache was flushed. I also capped the delta to 400 ms so it would be easier to read.

There's some noise after the assumevalid block, so to account for that I reran all three with -assumevalid=0:

I also plotted the 20 block rolling average of deltas to reduce noise:

This shows performance degradation at points where the cache is cleared. The above tests were using an internal SSD. I reran the tests with an external HDD, but the results are much more noisy. Here's the 20 block rolling average results:

It seems running with an empty cache can have a performance penalty of several hundred milliseconds in many cases. #14387 and #14397 were attempting to improve the performance of ConnectBlock by much less but were closed because they were too complex. I believe running in the configuration enabled by this PR could be useful to users who wish to connect blocks as fast as possible, as pointed out by these comments. This comment also suggests that an empty cache is harmful to performance:

I suspect the periodic flushes are by far the biggest source of blockacceptance/gbt latency though, so it's one which would almost certainly have a big payoff.

JeremyRubin · 2020-06-04T01:52:54Z

. @andrewtoth I think that you can solve some of these problems more 'cleanly' (no new files) if you have a parallel cache-warming strategy for connect block that attempts to asynchronously load all the required inputs before rolling through the block.

andrewtoth · 2020-06-04T14:23:54Z

@JeremyRubin interesting, thanks. So you are suggesting to have the cache warming thread begin to access coins of the inputs in a block in, say, ProcessNewBlock, before it gets to ActivateBestChainStep?

JeremyRubin · 2020-06-04T19:35:58Z

Yep. As soon as some basic checks pass (probably just PoW?) you may as well begin to warm the coins.

andrewtoth · 2020-06-18T01:29:06Z

Closing as this doesn't appear to have attracted any interest and is unlikely to be merged.

andronoob · 2020-08-17T07:22:43Z

On startup this file is used to warm the cache so it doesn't get cold between restarts.

As a user I just use a simple trick: cat BTC_DATA/chainstate/* > /dev/null.

Related: #14904, #19742

andronoob · 2020-08-17T07:32:49Z

I don't care whether the cache could be warmed as quickly as possible. What I care is very simple: keep the UTXO job totally in RAM, without touching the hard drive (esp. like a DM-SMR HDD) once again.

I think this should be shown in GUI, together with blocksdir. I think radio buttons with fancy words like "turbo, heavy mode" vs "agile, light mode" can let the user to choose how the cache should behave - the former sacrifices crash tolerance and fast starting/shutting down for better performance and less hard drive wearing.

I think it is very important to guide a newbie by pointing to where the typical performance bottleneck lies. I was once stuck on this point for a ridiculously long time, like, two weeks. I know the pain, helplessness that just watching the hard drive to wear, while the progress walks like snail.

andronoob · 2020-08-17T07:46:43Z

You may laugh at my ignorance, but I thought I knew what "cache" is - I installed something like PrimoCache, in hope that such a "cache" can relieve the problem - now I think such idea is just dumb.

Maybe it's actually not as dumb as I currently think? I don't know. I just think it's too cruel for a newbie trying to sync a full node without knowing where the typical bottleneck lies - the chainstate, or UTXO set.

fanquake added the Validation label May 11, 2020

andrewtoth force-pushed the persist-coinscache branch from ba5414d to 11f466b Compare May 11, 2020 03:44

This was referenced May 11, 2020

coins: allow cache resize after init #18637

Merged

Add fee_est tool for debugging fee estimation code #10443

Closed

andrewtoth closed this May 11, 2020

andrewtoth reopened this May 12, 2020

andrewtoth force-pushed the persist-coinscache branch from 11f466b to dd6fa81 Compare May 12, 2020 02:48

DrahtBot mentioned this pull request May 14, 2020

doc: noban precludes maxuploadtarget disconnects #18968

Merged

Persist coins cache to disk and load on startup

8907789

andrewtoth force-pushed the persist-coinscache branch from dd6fa81 to 8907789 Compare May 27, 2020 22:42

andrewtoth mentioned this pull request Jun 14, 2020

validation: Warm coins cache during prevalidation to connect blocks faster #19271

Closed

andrewtoth closed this Jun 18, 2020

jamesob mentioned this pull request Aug 1, 2020

coins: allow write to disk without cache drop #17487

Merged

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

andrewtoth deleted the persist-coinscache branch August 17, 2023 20:41

validation: Persist coins cache to disk and load on startup #18941

validation: Persist coins cache to disk and load on startup #18941

Uh oh!

Conversation

andrewtoth commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

andrewtoth commented May 12, 2020

Uh oh!

sipa commented May 12, 2020

Uh oh!

andrewtoth commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewtoth commented May 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JeremyRubin commented Jun 4, 2020

Uh oh!

andrewtoth commented Jun 4, 2020

Uh oh!

JeremyRubin commented Jun 4, 2020

Uh oh!

andrewtoth commented Jun 18, 2020

Uh oh!

andronoob commented Aug 17, 2020

Uh oh!

andronoob commented Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andronoob commented Aug 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

andrewtoth commented May 11, 2020 •

edited

Loading

DrahtBot commented May 11, 2020 •

edited

Loading

andrewtoth commented May 12, 2020 •

edited

Loading

andrewtoth commented May 25, 2020 •

edited

Loading

andronoob commented Aug 17, 2020 •

edited

Loading