-
Notifications
You must be signed in to change notification settings - Fork 38.8k
Lazily update CBlockIndex entries when pruning #6118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f61911f to
b479eca
Compare
|
utACK. |
b479eca to
3d49b87
Compare
|
Nit fixed, thanks! |
|
Added this PR on top of master and left my pruned bitcoind mainnet node running over night. No issues. Tested ACK. |
602535a to
45027bf
Compare
|
Untested ACK. I don't understand the Travis failure. |
ed0e14a to
fb10422
Compare
|
Rebased. |
|
What is your reasoning behind this? From a conceptual point of view I like the flags to specify what they do now: does the block HAVE data or undo data. |
|
I was trying to eliminate the need to iterate the entire mapBlockIndex whenever a file is pruned (since we currently lack a map from block files to blocks, we check each block to see if it's in a blockfile being deleted). That seems like something that needs to be fixed at some point. One idea for addressing that would be to just maintain a (memory-only) map from block files to blockindex entries which we iterate when pruning; this would be straightforward, cpu efficient, and preserve existing HAVE_DATA semantics. However I think @sipa was concerned about the additional memory overhead of that approach (I believe it would be roughly an additional 10MB if you're running in prune mode with a target high enough that no files are actually deleted -- which is perhaps relevant if in the future most nodes may run with some level of pruning), and he suggested lazy updating instead. So I tried implementing the lazy-update approach to see how feasible (and/or cumbersome) this would be. It seems very efficient; the code changes to support it seem relatively minor; and I like the idea of not having to worry about updating state when block files are removed -- that seems like it should make future pruning-related improvements easier to make. I do agree there's a tradeoff, in that the current HAVE_DATA semantics seem cleaner versus the new STORED_DATA semantics, but it's hard for me right now to say which semantics are better in the long run. I think that uncertainty would be an argument for waiting to see what the future of pruning code looks like and then deciding what solution makes most sense, except one thing I realized after coding this up is that if we decide to go in this direction with lazy updating of block index entries, then I think it would be better if we deploy this behavior with pruning in 0.11 than wait until 0.12 -- otherwise there's a backwards compatibility issue, where once your block index has been lazily updated, you can't easily revert back to a version of pruning code that didn't support that (at least with the present implementation). At any rate, I don't feel strongly about which approach has the best tradeoffs, and if we end up not merging this for 0.11, then I'd say we may as well just wait until we implement sharding, and decide then what approach makes sense here. |
|
Is the overhead to have to iterate over the (in-memory) block index when a file is pruned significant? Muddying the database semantics to make a rare action slightly faster would be unfortunate. But I don't feel strongly about it either. Also: this makes the 'block has data' check more involved and that is performed very often. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment points out that the HaveBlockData() name is very misleading. Could you change the name to indicate that it also updates the state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change it, any suggestions? The function only changes state if in pruning mode, so I don't want to go overboard with the description, and nothing concise is jumping out at me.
I did notice the comment for the function doesn't really emphasize the potentially state-changing behavior, I'll start by rewording that for now to make it more obvious.
|
@laanwj I believe it's ~20ms to iterate mapBlockIndex. Arguably that is insignificant the way pruning works now, since it can only happen when we're calling Regarding the efficiency of |
|
I think @sdaftuar has a point that if we're ever going to implement lazy updating, the time to do it is now. But personally I think some sort of reverse lookup is eventually going to be needed. I implemented it to try it out and it's fine, but I also think the simple 20ms iteration every time you delete a blockfile is good enough for now and we should just do the reverse lookup in whatever way makes sense with the sharding algorithm we end up with. Hmm, I was going to say the argument against the 20ms iteration is that once a day or so (when you've got an extra block file you can prune) then it's possible (I think even probable) that you're holding up block propagation. But come to think of it, that's a problem with choosing when we're pruning because even without the 20ms cost, you're still doing a full chainstate flush then. If we think that's problematic, maybe we can move around the pruning calls to only happen when we're a bit less busy, it'll take some testing though.. |
Rename BLOCK_HAVE_DATA/ BLOCK_HAVE_UNDO to BLOCK_STORED_DATA/ BLOCK_STORED_UNDO. These status values now indicate that a block or its undo information has been stored on disk, but pruning may cause that information to no longer be true. HaveBlockData() uses vinfoBlockFile to determine whether we actually have block data for a given block, and it updates the CBlockIndex passed in when STORED_DATA was set to true but the block file has been pruned. Also remove the iteration of mapBlockIndex in PruneOneBlockFile, as we no longer need to update the CBlockIndex entries of the blocks referenced in a pruned file.
fb10422 to
15d9748
Compare
|
Updated a few comments and tried to make it clearer that |
|
Closing. This apparently need to be rebased, and it's probably not worth it as I don't think we're going to merge for 0.11, and given the backwards compatibility issues this pull would generate if this were accepted later (see description above and #6148 (comment)), I think it'd be better for us to wait until a sharding solution is on the table and revisit this idea in that context. |
Rename
BLOCK_HAVE_DATAandBLOCK_HAVE_UNDOtoBLOCK_STORED_DATA/ BLOCK_STORED_UNDO.These status values now indicate that a block or its undo information has been
stored on disk, but pruning could cause that data to no longer be present.
HaveBlockData()usesvinfoBlockFileto determine whether we actually have blockdata for a given block, and updates the
CBlockIndexpassed in whenBLOCK_STORED_DATAwas set to true but the block file has been pruned.
Also remove the iteration of
mapBlockIndexinPruneOneBlockFile(), as we nolonger need to update the
CBlockIndexentries of the blocks referenced in apruned file.
@sipa Is this along the lines of what you had in mind?