Skip to content

Conversation

@sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented May 8, 2015

Rename BLOCK_HAVE_DATA and BLOCK_HAVE_UNDO to BLOCK_STORED_DATA/ BLOCK_STORED_UNDO.

These status values now indicate that a block or its undo information has been
stored on disk, but pruning could cause that data to no longer be present.

HaveBlockData() uses vinfoBlockFile to determine whether we actually have block
data for a given block, and updates the CBlockIndex passed in when BLOCK_STORED_DATA
was set to true but the block file has been pruned.

Also remove the iteration of mapBlockIndex in PruneOneBlockFile(), as we no
longer need to update the CBlockIndex entries of the blocks referenced in a
pruned file.

@sipa Is this along the lines of what you had in mind?

@sdaftuar sdaftuar force-pushed the lazy-blockindex-update branch from f61911f to b479eca Compare May 8, 2015 17:30
@jonasschnelli
Copy link
Contributor

utACK.
Plans to test this soon.
Little indentation/tabs nit chain.h L88:90.

@sdaftuar sdaftuar force-pushed the lazy-blockindex-update branch from b479eca to 3d49b87 Compare May 8, 2015 18:14
@sdaftuar
Copy link
Member Author

sdaftuar commented May 8, 2015

Nit fixed, thanks!

@jonasschnelli
Copy link
Contributor

Added this PR on top of master and left my pruned bitcoind mainnet node running over night. No issues.
Just copied blocks/chainstate from a full-node to a fresh node and pruned to target 550MB. Worked as expected.

Tested ACK.

@sdaftuar sdaftuar force-pushed the lazy-blockindex-update branch 2 times, most recently from 602535a to 45027bf Compare May 10, 2015 15:29
@sipa
Copy link
Member

sipa commented May 10, 2015

Untested ACK. I don't understand the Travis failure.

@sdaftuar sdaftuar force-pushed the lazy-blockindex-update branch 3 times, most recently from ed0e14a to fb10422 Compare May 13, 2015 18:25
@sdaftuar
Copy link
Member Author

Rebased.

@laanwj
Copy link
Member

laanwj commented May 15, 2015

What is your reasoning behind this? From a conceptual point of view I like the flags to specify what they do now: does the block HAVE data or undo data.

@sdaftuar
Copy link
Member Author

I was trying to eliminate the need to iterate the entire mapBlockIndex whenever a file is pruned (since we currently lack a map from block files to blocks, we check each block to see if it's in a blockfile being deleted). That seems like something that needs to be fixed at some point.

One idea for addressing that would be to just maintain a (memory-only) map from block files to blockindex entries which we iterate when pruning; this would be straightforward, cpu efficient, and preserve existing HAVE_DATA semantics. However I think @sipa was concerned about the additional memory overhead of that approach (I believe it would be roughly an additional 10MB if you're running in prune mode with a target high enough that no files are actually deleted -- which is perhaps relevant if in the future most nodes may run with some level of pruning), and he suggested lazy updating instead.

So I tried implementing the lazy-update approach to see how feasible (and/or cumbersome) this would be. It seems very efficient; the code changes to support it seem relatively minor; and I like the idea of not having to worry about updating state when block files are removed -- that seems like it should make future pruning-related improvements easier to make. I do agree there's a tradeoff, in that the current HAVE_DATA semantics seem cleaner versus the new STORED_DATA semantics, but it's hard for me right now to say which semantics are better in the long run.

I think that uncertainty would be an argument for waiting to see what the future of pruning code looks like and then deciding what solution makes most sense, except one thing I realized after coding this up is that if we decide to go in this direction with lazy updating of block index entries, then I think it would be better if we deploy this behavior with pruning in 0.11 than wait until 0.12 -- otherwise there's a backwards compatibility issue, where once your block index has been lazily updated, you can't easily revert back to a version of pruning code that didn't support that (at least with the present implementation).

At any rate, I don't feel strongly about which approach has the best tradeoffs, and if we end up not merging this for 0.11, then I'd say we may as well just wait until we implement sharding, and decide then what approach makes sense here.

@laanwj
Copy link
Member

laanwj commented May 15, 2015

Is the overhead to have to iterate over the (in-memory) block index when a file is pruned significant?

Muddying the database semantics to make a rare action slightly faster would be unfortunate. But I don't feel strongly about it either.

Also: this makes the 'block has data' check more involved and that is performed very often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment points out that the HaveBlockData() name is very misleading. Could you change the name to indicate that it also updates the state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change it, any suggestions? The function only changes state if in pruning mode, so I don't want to go overboard with the description, and nothing concise is jumping out at me.

I did notice the comment for the function doesn't really emphasize the potentially state-changing behavior, I'll start by rewording that for now to make it more obvious.

@sdaftuar
Copy link
Member Author

@laanwj I believe it's ~20ms to iterate mapBlockIndex. Arguably that is insignificant the way pruning works now, since it can only happen when we're calling FlushStateToDisk() which is already an expensive operation.

Regarding the efficiency of HaveBlockData(), I think that this implementation is very fast if there's no lock contention and that seems to be the case presently; it's unclear to me whether there could be lock contention to worry about down the road, in particular if the code were to change in the future so that the HaveBlockData() function is called from more places.

@morcos
Copy link
Contributor

morcos commented May 15, 2015

I think @sdaftuar has a point that if we're ever going to implement lazy updating, the time to do it is now. But personally I think some sort of reverse lookup is eventually going to be needed. I implemented it to try it out and it's fine, but I also think the simple 20ms iteration every time you delete a blockfile is good enough for now and we should just do the reverse lookup in whatever way makes sense with the sharding algorithm we end up with.

Hmm, I was going to say the argument against the 20ms iteration is that once a day or so (when you've got an extra block file you can prune) then it's possible (I think even probable) that you're holding up block propagation. But come to think of it, that's a problem with choosing when we're pruning because even without the 20ms cost, you're still doing a full chainstate flush then. If we think that's problematic, maybe we can move around the pruning calls to only happen when we're a bit less busy, it'll take some testing though..

Rename BLOCK_HAVE_DATA/ BLOCK_HAVE_UNDO to BLOCK_STORED_DATA/
BLOCK_STORED_UNDO. These status values now indicate that a block or its undo
information has been stored on disk, but pruning may cause that information to
no longer be true.

HaveBlockData() uses vinfoBlockFile to determine whether we actually have block
data for a given block, and it updates the CBlockIndex passed in when
STORED_DATA was set to true but the block file has been pruned.

Also remove the iteration of mapBlockIndex in PruneOneBlockFile, as we no
longer need to update the CBlockIndex entries of the blocks referenced in a
pruned file.
@sdaftuar sdaftuar force-pushed the lazy-blockindex-update branch from fb10422 to 15d9748 Compare May 15, 2015 23:55
@sdaftuar
Copy link
Member Author

Updated a few comments and tried to make it clearer that HaveBlockData() can change state when pruning is enabled.

@sdaftuar
Copy link
Member Author

sdaftuar commented Jun 5, 2015

Closing. This apparently need to be rebased, and it's probably not worth it as I don't think we're going to merge for 0.11, and given the backwards compatibility issues this pull would generate if this were accepted later (see description above and #6148 (comment)), I think it'd be better for us to wait until a sharding solution is on the table and revisit this idea in that context.

@sdaftuar sdaftuar closed this Jun 5, 2015
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants