-
Notifications
You must be signed in to change notification settings - Fork 38.7k
validation: Persist coins cache to disk and load on startup #18941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ba5414d to
11f466b
Compare
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
11f466b to
dd6fa81
Compare
|
Motivated by #15218 (comment)
|
|
I'm not sure this is useful. My information is several years old, so take this with a grain of salt. Actual benchmarks would of course be better. I experimented with several different designs for partial flushing, where some data remained in the cache at the time it was flushed, and surprisingly, they were invariably all slower than just wiping everything every time. The reason is that our cache isn't really a cache but a buffer: its performance gains come from the fact that it (dramatically) reduces disk writes (it also reduces disk reads, but the impact of those is far less). The reason is that most UTXOs get created and spent within a short time frame, and if they happen within a cache window, without flush in between, the created UTXO is deleted from memory, without ever hitting disk. At least on the systems I tested on, reserving a portion of the cache as "flushed but still cached" was never better than just using that area as additional write-saving buffer. |
|
@sipa I'm having trouble reconciling your above comment with the one you made here
Either having data already in the cache is desirable, or we always want a clear cache so we can write as much as possible. Or perhaps I haven't made the motivation clear. This does not do a partial flush. It just warms up the cache after a restart. Say you have 100 MB in the cache but you need to reboot. Upon startup you will have an empty cache, which you say above will temporarily kill your performance. I suppose I should benchmark connecting a block with an empty cache and one with a full cache to measure the read benefit. |
|
I've put together some benchmarks. TL;DR running with the entire utxo set in memory can shave off several hundred milliseconds in ConnectBlock. I benchmarked with this patch that records the ConnectBlock time and whenever a flush happens. I ran with There's some noise after the assumevalid block, so to account for that I reran all three with I also plotted the 20 block rolling average of deltas to reduce noise: This shows performance degradation at points where the cache is cleared. The above tests were using an internal SSD. I reran the tests with an external HDD, but the results are much more noisy. Here's the 20 block rolling average results: It seems running with an empty cache can have a performance penalty of several hundred milliseconds in many cases. #14387 and #14397 were attempting to improve the performance of ConnectBlock by much less but were closed because they were too complex. I believe running in the configuration enabled by this PR could be useful to users who wish to connect blocks as fast as possible, as pointed out by these comments. This comment also suggests that an empty cache is harmful to performance:
|
dd6fa81 to
8907789
Compare
|
. @andrewtoth I think that you can solve some of these problems more 'cleanly' (no new files) if you have a parallel cache-warming strategy for connect block that attempts to asynchronously load all the required inputs before rolling through the block. |
|
@JeremyRubin interesting, thanks. So you are suggesting to have the cache warming thread begin to access coins of the inputs in a block in, say, |
|
Yep. As soon as some basic checks pass (probably just PoW?) you may as well begin to warm the coins. |
|
Closing as this doesn't appear to have attracted any interest and is unlikely to be merged. |
|
I don't care whether the cache could be warmed as quickly as possible. What I care is very simple: keep the UTXO job totally in RAM, without touching the hard drive (esp. like a DM-SMR HDD) once again. I think this should be shown in GUI, together with I think it is very important to guide a newbie by pointing to where the typical performance bottleneck lies. I was once stuck on this point for a ridiculously long time, like, two weeks. I know the pain, helplessness that just watching the hard drive to wear, while the progress walks like snail. |
|
You may laugh at my ignorance, but I thought I knew what "cache" is - I installed something like PrimoCache, in hope that such a "cache" can relieve the problem - now I think such idea is just dumb. Maybe it's actually not as dumb as I currently think? I don't know. I just think it's too cruel for a newbie trying to sync a full node without knowing where the typical bottleneck lies - the chainstate, or UTXO set. |




This PR adds a way to persist the coins cache on shutdown to a file named
coinscache.dat, similar to what is done for the mempool. On startup this file is used to warm the cache so it doesn't get cold between restarts. This can be useful for users who want to connect blocks quickly with a high-dbcachevalue.This introduces a new config arg,
-persistcoinscache, that is defaulted to false. With a higher cache value the amount of disk space used for the file could be very large, so it defaults to off to prevent any footguns. With lower cache values this configuration could cause the cache to flush sooner than necessary and would probably not provide any benefit.With a max dbcache after a reindex or IBD it will dump the entire utxo set and load it into memory on startup. Testing this today I had a file size of 2.4GB and it took ~22 minutes to fully reinsert the utxo set into the cache with an SSD.
After #17487 we can add a change to not wipe the cache on periodic flushes. Users could then run the node continuously with the entire utxo set in memory. Benchmarking shows running in this configuration could save several hundred milliseconds when connecting blocks, vs an empty cache that is cleared periodically or during restarts.