reduce cs_main scope, guard block index 'nFile' under a local mutex #27006

furszy · 2023-01-31T16:10:54Z

By not having to lock cs_main every time we try to access block
position on disk, we can avoid slowing down concurrent tasks due
to threads waiting for the global lock to be released.

Also, solves a edge case race inside ReadBlockFromDisk where
we obtain the file position locking cs_main and then try to access
the file without it (the file could no longer exist if pruning occurred
just in-between these two actions).

Context from where this comes from #26966.

DrahtBot · 2023-01-31T16:10:57Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	stickies-v
Approach NACK	dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31551 (optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
#31539 (optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
#31490 (refactor: cache block[undo] serialized size for consecutive calls by l0rinc)
#31144 (optimization: batch XOR operations 12% faster IBD by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/chain.h

Sjors · 2023-03-01T09:41:26Z

src/node/blockstorage.cpp

            pindex->nStatus &= ~BLOCK_HAVE_DATA;
            pindex->nStatus &= ~BLOCK_HAVE_UNDO;
-            pindex->nFile = 0;
+            pindex->nFile = -1;


47bc99c This is probably fine, but there's a bunch a places that we have to make sure are never reached. For example in FindBlockPos( we do _blockfile_info[nFile] where nFile = ... pos.nFile. I'll have to review that a bit more.

LarryRuane · 2023-03-15T18:10:40Z

src/kernel/chain.cpp

+        info.file_number = index->GetFileNum();
+        info.data_pos = index->GetDataPos();


Shouldn't these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there's a setter (CBlockIndex::SetFileData) than sets all three atomically.

Starting to feel like a struct for those 3 members would be appropriate

Shouldn't these be fetched atomically? Maybe there should be a getter that returns all three values, similar to how there's a setter (CBlockIndex::SetFileData) than sets all three atomically.

@LarryRuane I didn't add it merely to not introduce that many changes but could be done too. The members set is guarded by cs_main (for now) so there is no possible race.

Starting to feel like a struct for those 3 members would be appropriate

Why exactly?. So far in the code, we either want the block data position or the block undo data position, we don't get both at the same time because they represent different files on disk (thus why they are returned inside FlatFilePos, the struct groups only one of them with the file number).

In other words, nData + nFile maps to a "blk.dat", while nUndoData + nFile maps to an specific "rev.dat".

Are you sure these are protected by cs_main, is it? Running gdb test_bitcoin on the PR branch (leaving some irrelevant stuff out):

(gdb) b MakeBlockInfo (gdb) run Thread 29 "b-basic block f" hit Breakpoint 1, kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14 #0 kernel::MakeBlockInfo (index=0x55555749b018, data=0x0) at kernel/chain.cpp:14 #1 0x000055555623fa81 in BaseIndex::ThreadSync (this=0x7fffffffad00) at index/base.cpp:201 (gdb) p cs_main $1 = { <std::recursive_mutex> = { <std::__recursive_mutex_base> = { _M_mutex = { __data = { __lock = 0, __count = 0, __owner = 0, } } } } }

I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.

Are you sure these are protected by cs_main, is it?

The members' write process, not the read. If the read would be protected by cs_main, this PR wouldn't make any sense :p.
The two places are: the CBlockIndex creation time and CBlockIndex pruning time.

Still, those cs_main locks can also be removed in a follow-up PR. I haven't removed them here because they require further changes.

Not sure if I'm understanding your concern here. You are thinking on a race in-between GetFileNum and GetDataPos?
Because, if that is the concern, the error that could arise from it is harmless. The read block call would just return false, and callers would continue. But, to not even have to think about such error, we could just call GetFilePos() instead of the two getters.

I agree with you that a new struct should just be nFile and nDataPos (or nPos). This struct can be used for either block data or rev data.

There is already a method for that: CBlockIndex::GetFilePos.

stickies-v

Concept ACK

Is my understanding correct that 1) replacing cs_main with g_cs_blockindex_data and 2) having g_cs_blockindex_data be a shared instead of recursive/exclusive mutex are orthogonal improvements? I think it makes sense for both of them to happen in this PR, just wondering.

furszy · 2023-03-23T13:44:39Z

Concept ACK

Is my understanding correct that 1) replacing cs_main with g_cs_blockindex_data and 2) having g_cs_blockindex_data be a shared instead of recursive/exclusive mutex are orthogonal improvements? I think it makes sense for both of them to happen in this PR, just wondering.

Moving cs_main to an blockstorage local exclusive mutex would still be an improvement from the current state, yeah. The networking threads (and pretty much the entire node) shouldn't get blocked because the user called the getblock RPC command.

But, the usage of an exclusive mutex here still means no concurrent block data access. Which is a shame as accessing block 124 on disk shouldn't block the entire node from accessing block 750,000 on disk (nor any other block). They are on different files.

dergoegge

Concept ACK on reducing cs_main scope.

Approach NACK for adding new globals. It is not worth going with a worse design to "rush in" an improvement for a feature that may or may not happen (#26966). I would prefer to go with a more sustainable approach from the get go, otherwise this will need to be refactored in the long run.

I don't see why the mutex could not be a member of BlockManager. Like you say in #27006 (comment), giving BlockManager its own internal lock(s) instead of cs_main is the preferable long term design.

furszy · 2023-04-03T15:39:19Z

Hey dergoegge, thanks for the feedback.

Approach NACK for adding new globals. It is not worth going with a worse design to "rush in" an improvement for a feature that may or may not happen (#26966). I would prefer to go with a more sustainable approach from the get go, otherwise this will need to be refactored in the long run.

No rush here.
#26966 is only one clear use case. It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

I don't see why the mutex could not be a member of BlockManager. Like you say in #27006 (comment), giving BlockManager its own internal lock(s) instead of cs_main is the preferable long term design.

I'm not opposed to it but there are few problems with that approach:

BlockStorage depends on CBlockIndex and not in the other way around, so moving the mutex to BlockStorage would leave CBlockIndex members with no thread safety analysis annotations.
The ReadBlockFromDisk and UndoReadFromDisk functions are currently global functions. They do not have access to the BlockManager class members.

The second point can be fixed by refactoring the functions into the class but the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

dergoegge · 2023-04-04T14:08:17Z

It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

This has been the case for 10+ years, if we're gonna change it then we should take the time to do it without adding to our technical debt.

I'm not opposed to it but there are few problems with that approach:

I agree that it is more work but it is worth doing imo and we are clearly in no rush.

The ReadBlockFromDisk and UndoReadFromDisk functions are currently global functions. They do not have access to the BlockManager class members.

This is being done in #27125.

the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

Move CBlockIndex to a new header file and have chain and blockstorage include that?

furszy · 2023-04-04T14:56:51Z

It isn’t the main one actually, there are other areas that shouldn’t be locking the net messages reception / validation thread because of a disk read: e.g. “getblock”, “getblocktxn” and compact block net messages, the current indexes sync process (all of them), getblock/gettxoutproof RPC commands. They all are locking the node at every single block read.

This has been the case for 10+ years, if we're gonna change it then we should take the time to do it without adding to our technical debt.

I never said the opposite. Not sure why the "rush" term is being used here really. I haven't said anything like "this is great, we must have it now".. (nor here nor anywhere in the repository).

I was trying to give you more context, answering to your "It is not worth going with a worse design to "rush in" an improvement for a feature that may or may not happen (#26966)".

I'm happy to continue discussing stuff and improve what needs to be improved.

the first one is more complex. We have to avoid adding a chain.h <-> blockstorage.h cyclic dependency.

Move CBlockIndex to a new header file and have chain and blockstorage include that?

How that solves the problem?

Placing the mutex inside block storage means that any field that is guarded by this new mutex will need block storage dependency. So don't care where we place CBlockIndex, it will need blockstorage dependency to use the thread analysis annotation. Which will cause a cyclic dependency.

A possible solution would be to place the mutex in a separate file. So blockstorage and CBlockIndex can include it independently.

Or.. thinking out loud: something somewhat "radical" could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

Which, in a first glance, sounds quite positive. As we are currently mixing two concepts into the same container. CBlockIndex currently is used to store the chain in a sort of linked list structure while it also contains the position of the block data on disk.

dergoegge · 2023-04-04T16:10:08Z

I never said the opposite. Not sure why the "rush" term is being used here really. I haven't said anything like "this is great, we must have it now".. (nor here nor anywhere in the repository).

I'm sorry didn't mean to put words in your mouth. It seemed rushed to me because you are going with an "easier" solution here rather than the more complex but (imo) worth while long term design.

it will need blockstorage dependency to use the thread analysis annotation

I was thinking of not adding the annotations in this case. Which is not great but seems better to me than adding more globals.

Or.. thinking out loud: something somewhat "radical" could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

This separates the concerns nicely and lets us add annotations, I would be happy with this :) The only thing that's a bit weird then is maintaining the on-disk format for the block index entries but that seems workable.

furszy · 2023-04-05T18:28:26Z

I never said the opposite. Not sure why the "rush" term is being used here really. I haven't said anything like "this is great, we must have it now".. (nor here nor anywhere in the repository).

I'm sorry didn't mean to put words in your mouth. It seemed rushed to me because you are going with an "easier" solution here rather than the more complex but (imo) worth while long term design.

Well, this PR didn't have any long term design goal discussion prior to your arrival.
I'm happy that it happened as it pushed us to think about the "radical" change mentioned above.

it will need blockstorage dependency to use the thread analysis annotation

I was thinking of not adding the annotations in this case. Which is not great but seems better to me than adding more globals.

I'm on the other side of the fence actually. While globals aren't ideal, particularly for testing, I prefer them over loosing thread safety analysis as it is essential to prevent deadlocks.

Or.. thinking out loud: something somewhat "radical" could be move nFile, nData and nUndoData members out of CBlockIndex, into the blockstorage class. We could place them into an unordered map <block_id, FileData>.

This separates the concerns nicely and lets us add annotations, I would be happy with this :) The only thing that's a bit weird then is maintaining the on-disk format for the block index entries but that seems workable.

Yeah, the read shouldn’t be that hard. And a db upgrade mechanism is doable too.

maflcko · 2024-02-15T15:22:23Z

test/sharedlock_tests.cpp:18:17: error: use emplace_back instead of push_back [modernize-use-emplace,-warnings-as-errors]
   18 |         workers.push_back(std::thread(task));
      |                 ^~~~~~~~~~~~~~~~~~~~~~    ~
      |                 emplace_back(

better to store an invalid file number to describe block indexes with no data on disk.

By not having to lock cs_main every time we access block data on disk, we can avoid slowing down concurrent tasks due to threads waiting for the lock to be released.

furszy · 2024-12-30T19:30:32Z

Closing for now. Might revive it in the future, post #26966.

DrahtBot changed the title ~~reduce cs_main scope, guard block index 'nFile' under a local mutex~~ reduce cs_main scope, guard block index 'nFile' under a local mutex Jan 31, 2023

maflcko reviewed Jan 31, 2023

View reviewed changes

src/chain.h Outdated Show resolved Hide resolved

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch 2 times, most recently from feb3f2a to acddd42 Compare February 2, 2023 19:44

furszy mentioned this pull request Feb 7, 2023

index: initial sync speedup, parallelize process #26966

Open

DrahtBot mentioned this pull request Feb 19, 2023

refactor, kernel: Decouple ArgsManager from blockstorage #27125

Merged

Sjors reviewed Mar 1, 2023

View reviewed changes

LarryRuane reviewed Mar 15, 2023

View reviewed changes

stickies-v reviewed Mar 15, 2023

View reviewed changes

dergoegge suggested changes Apr 3, 2023

View reviewed changes

DrahtBot mentioned this pull request Apr 15, 2023

wallet: improve IBD sync time by skipping block scanning prior birth time #27469

Merged

furszy mentioned this pull request May 1, 2023

wallet: improve rescan performance 640% #26951

Closed

This was referenced May 4, 2023

refactor: Remove need to pass chainparams to BlockManager methods #27570

Merged

kernel: Remove args, settings, chainparams, chainparamsbase from kernel library #27576

Merged

DrahtBot added the Needs rebase label May 5, 2023

furszy marked this pull request as draft June 22, 2023 00:43

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch from acddd42 to 55a8c98 Compare November 7, 2023 18:20

DrahtBot added CI failed and removed Needs rebase labels Nov 7, 2023

DrahtBot mentioned this pull request Nov 16, 2023

Remove version field from GetSerializeSize #28878

Merged

DrahtBot added the Needs rebase label Nov 17, 2023

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch from 55a8c98 to 724a22e Compare December 26, 2023 23:05

DrahtBot removed the Needs rebase label Dec 27, 2023

DrahtBot removed the Needs rebase label Feb 5, 2024

This was referenced Mar 3, 2024

refactor: Improve naming of CBlock::GetHash() -> GetHeaderHash() #29538

Closed

assumeutxo: Get rid of faked nTx and nChainTx values #29370

Merged

DrahtBot added the Needs rebase label Mar 12, 2024

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch from 7541841 to 0252bd2 Compare March 12, 2024 13:22

DrahtBot removed the Needs rebase label Mar 12, 2024

This was referenced Mar 13, 2024

kernel: Handle fatal errors through return values #29642

Draft

chainparams: Change nChainTx type to uint64_t #29656

Merged

DrahtBot added the Needs rebase label Mar 20, 2024

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch from 0252bd2 to 496f38d Compare April 7, 2024 17:32

DrahtBot removed the Needs rebase label Apr 7, 2024

DrahtBot mentioned this pull request Apr 8, 2024

kernel, refactor: return error status on all fatal errors #29700

Draft

This was referenced Jun 25, 2024

RFC: Instanced logs #30338

Closed

logging: Use LogFatal instead LogError for fatal errors #30347

Closed

logging: Replace LogError and LogWarning with LogAlert #30364

Closed

DrahtBot mentioned this pull request Jul 12, 2024

log: LogError with FlatFilePos in UndoReadFromDisk #30428

Merged

DrahtBot added the Needs rebase label Jul 15, 2024

furszy added 3 commits October 13, 2024 13:36

blockindex: do not point to file 0 if data does not exist

967b18b

better to store an invalid file number to describe block indexes with no data on disk.

sync: implement annotated shared_mutex

1e72ef5

reduce cs_main scope, guard block index 'nFile' under a read/write mutex

2a50bce

By not having to lock cs_main every time we access block data on disk, we can avoid slowing down concurrent tasks due to threads waiting for the lock to be released.

furszy force-pushed the 2022_reduce_cs_main_scope_blockindex_nfile branch from 496f38d to 2a50bce Compare October 13, 2024 16:39

DrahtBot removed the Needs rebase label Oct 13, 2024

This was referenced Dec 13, 2024

refactor: inline UndoWriteToDisk and WriteBlockToDisk to reduce serialization calls #31490

Merged

optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

Closed

This was referenced Dec 22, 2024

[IBD] batch block reads/writes during AutoFile serialization #31551

Merged

[IBD] multi-byte block obfuscation #31144

Merged

furszy closed this Dec 30, 2024

bitcoin locked and limited conversation to collaborators Dec 30, 2025

		info.file_number = index->GetFileNum();
		info.data_pos = index->GetDataPos();

reduce cs_main scope, guard block index 'nFile' under a local mutex #27006

reduce cs_main scope, guard block index 'nFile' under a local mutex #27006

Uh oh!

Conversation

furszy commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Conflicts

Uh oh!

Uh oh!

Sjors Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

LarryRuane Mar 15, 2023

Choose a reason for hiding this comment

Uh oh!

stickies-v Mar 16, 2023

Choose a reason for hiding this comment

Uh oh!

furszy Mar 23, 2023

Choose a reason for hiding this comment

Uh oh!

LarryRuane Mar 31, 2023

Choose a reason for hiding this comment

Uh oh!

furszy Apr 3, 2023

Choose a reason for hiding this comment

Uh oh!

stickies-v left a comment

Choose a reason for hiding this comment

Uh oh!

furszy commented Mar 23, 2023

Uh oh!

dergoegge left a comment

Choose a reason for hiding this comment

Uh oh!

furszy commented Apr 3, 2023

Uh oh!

dergoegge commented Apr 4, 2023

Uh oh!

furszy commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dergoegge commented Apr 4, 2023

Uh oh!

furszy commented Apr 5, 2023

Uh oh!

maflcko commented Feb 15, 2024

Uh oh!

furszy commented Dec 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

furszy commented Jan 31, 2023 •

edited

Loading

DrahtBot commented Jan 31, 2023 •

edited

Loading

furszy commented Apr 4, 2023 •

edited

Loading