-
Notifications
You must be signed in to change notification settings - Fork 38.7k
reduce cs_main scope, guard block index 'nFile' under a local mutex #27006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce cs_main scope, guard block index 'nFile' under a local mutex #27006
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
feb3f2a to
acddd42
Compare
src/node/blockstorage.cpp
Outdated
| pindex->nStatus &= ~BLOCK_HAVE_DATA; | ||
| pindex->nStatus &= ~BLOCK_HAVE_UNDO; | ||
| pindex->nFile = 0; | ||
| pindex->nFile = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
47bc99c This is probably fine, but there's a bunch a places that we have to make sure are never reached. For example in FindBlockPos( we do _blockfile_info[nFile] where nFile = ... pos.nFile. I'll have to review that a bit more.
| info.file_number = index->GetFileNum(); | ||
| info.data_pos = index->GetDataPos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there's a setter (CBlockIndex::SetFileData) than sets all three atomically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to feel like a struct for those 3 members would be appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there's a setter (
CBlockIndex::SetFileData) than sets all three atomically.
@LarryRuane I didn't add it merely to not introduce that many changes but could be done too. The members set is guarded by cs_main (for now) so there is no possible race.
Starting to feel like a struct for those 3 members would be appropriate
Why exactly?. So far in the code, we either want the block data position or the block undo data position, we don't get both at the same time because they represent different files on disk (thus why they are returned inside FlatFilePos, the struct groups only one of them with the file number).
In other words, nData + nFile maps to a "blk.dat", while nUndoData + nFile maps to an specific "rev.dat".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these are protected by cs_main, is it? Running gdb test_bitcoin on the PR branch (leaving some irrelevant stuff out):
(gdb) b MakeBlockInfo
(gdb) run
Thread 29 "b-basic block f" hit Breakpoint 1, kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14
#0 kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14
#1 0x000055555623fa81 in BaseIndex::ThreadSync (this=0x7fffffffad00) at index/base.cpp:201
(gdb) p cs_main
$1 = {
<std::recursive_mutex> = {
<std::__recursive_mutex_base> = {
_M_mutex = {
__data = {
__lock = 0,
__count = 0,
__owner = 0,
}
}
}
}
}
I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these are protected by cs_main, is it?
The members' write process, not the read. If the read would be protected by cs_main, this PR wouldn't make any sense :p.
The two places are: the CBlockIndex creation time and CBlockIndex pruning time.
Still, those cs_main locks can also be removed in a follow-up PR. I haven't removed them here because they require further changes.
Not sure if I'm understanding your concern here. You are thinking on a race in-between GetFileNum and GetDataPos?
Because, if that is the concern, the error that could arise from it is harmless. The read block call would just return false, and callers would continue. But, to not even have to think about such error, we could just call GetFilePos() instead of the two getters.
I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.
There is already a method for that: CBlockIndex::GetFilePos.
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Is my understanding correct that 1) replacing cs_main with g_cs_blockindex_data and 2) having g_cs_blockindex_data be a shared instead of recursive/exclusive mutex are orthogonal improvements? I think it makes sense for both of them to happen in this PR, just wondering.
Moving But, the usage of an exclusive mutex here still means no concurrent block data access. Which is a shame as accessing block 124 on disk shouldn't block the entire node from accessing block 750,000 on disk (nor any other block). They are on different files. |
dergoegge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK on reducing cs_main scope.
Approach NACK for adding new globals. It is not worth going with a worse design to "rush in" an improvement for a feature that may or may not happen (#26966). I would prefer to go with a more sustainable approach from the get go, otherwise this will need to be refactored in the long run.
I don't see why the mutex could not be a member of BlockManager. Like you say in #27006 (comment), giving BlockManager its own internal lock(s) instead of cs_main is the preferable long term design.
|
Hey dergoegge, thanks for the feedback.
No rush here.
I'm not opposed to it but there are few problems with that approach:
The second point can be fixed by refactoring the functions into the class but the first one is more complex. We have to avoid adding a |
This has been the case for 10+ years, if we're gonna change it then we should take the time to do it without adding to our technical debt.
I agree that it is more work but it is worth doing imo and we are clearly in no rush.
This is being done in #27125.
Move |
I never said the opposite. Not sure why the "rush" term is being used here really. I haven't said anything like "this is great, we must have it now".. (nor here nor anywhere in the repository). I was trying to give you more context, answering to your "It is not worth going with a worse design to "rush in" an improvement for a feature that may or may not happen (#26966)". I'm happy to continue discussing stuff and improve what needs to be improved.
How that solves the problem? Placing the mutex inside block storage means that any field that is guarded by this new mutex will need block storage dependency. So don't care where we place A possible solution would be to place the mutex in a separate file. So blockstorage and Or.. thinking out loud: something somewhat "radical" could be move Which, in a first glance, sounds quite positive. As we are currently mixing two concepts into the same container. |
I'm sorry didn't mean to put words in your mouth. It seemed rushed to me because you are going with an "easier" solution here rather than the more complex but (imo) worth while long term design.
I was thinking of not adding the annotations in this case. Which is not great but seems better to me than adding more globals.
This separates the concerns nicely and lets us add annotations, I would be happy with this :) The only thing that's a bit weird then is maintaining the on-disk format for the block index entries but that seems workable. |
Well, this PR didn't have any long term design goal discussion prior to your arrival.
I'm on the other side of the fence actually. While globals aren't ideal, particularly for testing, I prefer them over loosing thread safety analysis as it is essential to prevent deadlocks.
Yeah, the read shouldn’t be that hard. And a db upgrade mechanism is doable too. |
acddd42 to
55a8c98
Compare
55a8c98 to
724a22e
Compare
|
7541841 to
0252bd2
Compare
0252bd2 to
496f38d
Compare
better to store an invalid file number to describe block indexes with no data on disk.
By not having to lock cs_main every time we access block data on disk, we can avoid slowing down concurrent tasks due to threads waiting for the lock to be released.
496f38d to
2a50bce
Compare
|
Closing for now. Might revive it in the future, post #26966. |
By not having to lock
cs_mainevery time we try to access blockposition on disk, we can avoid slowing down concurrent tasks due
to threads waiting for the global lock to be released.
Also, solves a edge case race inside
ReadBlockFromDiskwherewe obtain the file position locking
cs_mainand then try to accessthe file without it (the file could no longer exist if pruning occurred
just in-between these two actions).
Context from where this comes from #26966.