-
Notifications
You must be signed in to change notification settings - Fork 38.9k
index: block filters sync, reduce disk read operations by caching last header #28955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index: block filters sync, reduce disk read operations by caching last header #28955
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Ran bench on MacBook Pro 2019 (2,3 GHz 8-Core Intel Core i9, SSD): src/bench/bench_bitcoin -filter=BlockFilterIndexSyncBefore (486c71bc707f5765fc6698a746cf221bab1ac357):
After:
Not seeing any improvement in the bench. Haven't tried running the full indexer though. |
Probably, it is because the benchmarks, same as the unit tests, create temp directories. Which may be faster to access than regular directories. See #26684 (review). Also, you could increase the |
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 63ef83d72a979e9e557eec9fe87c2795a963dedb
63ef83d to
b19585e
Compare
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK b19585e00a52c0b571e70ddb8f7c996d598c3435
Is it, though? I would think the OS should have this cached regardless? |
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, it is because the benchmarks, same as the unit tests, create temp directories. Which may be faster to access than regular directories.
Is it, though? I would think the OS should have this cached regardless?
We will know it soon. I'm working on a way to test it inside #26564 (review).
But, it seems to be an orthogonal topic. Regardless of the result, which could vary depending on the OS, the changes in this PR should be good to go as is.
Conceptually, the block filter index synchronization process makes a db call on every new block (forever) just to get the tip's header hash when it could just cache those 32 bytes.
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished testing this on a custom directory. Results are favorable @Sjors and @luke-jr.
I have observed that access to the OS temp directory is faster than accessing regular directories locally. Benchmark outputs are provided at the end.
The testing methodology is as following:
Running ./bench_bitcoin -filter="BlockFilterIndexSync" -testdatadir=<custom_test_datadir_path> on any of the following branches:
Branch 1 -> Which includes: this PR changes + #26564 + a commit that introduces the custom datadir feature for the benchmark framework.
Branch 2 -> Which includes: the block filter index benchmark + #26564 + a commit that introduces the custom datadir feature for the benchmark framework.
The results were:
For Branch 1
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 117,648,576.33 | 8.50 | 0.5% | 6.96 | BlockFilterIndexSync |
For Branch 2
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 121,370,493.17 | 8.24 | 0.5% | 7.16 | BlockFilterIndexSync |
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch seems to be doing more than what is in the description. From my understanding, it is also reducing cs_main lock contention in the renamed Sync method, which is also now made public. The former seems like a simple win, but it's unclear to me why we want the latter?
Also, a refactor to indexes in init.cpp that also seems fine but unrelated?
Maybe the PR description can be updated to describe the motivation behind these other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated per feedback. Thanks @andrewtoth!
This patch seems to be doing more than what is in the description. From my understanding, it is also reducing
cs_mainlock contention in the renamedSyncmethod, which is also now made public. The former seems like a simple win, but it's unclear to me why we want the latter?
The benchmark, introduced in the second commit, makes use of it. If BaseIndex::Sync() is not public, the benchmark would need to call BaseIndex::StartBackgroundSync(), who wraps BaseIndex::Sync() on a separate thread, which goes against the objective of the benchmark (we want to bench the process, not the time it takes the OS to create, wait-for and destroy a thread).
Also, it facilitates the future decoupling of the index inner thread in #26966. Replacing it for a thread-pool class provided by the caller.
Also, a refactor to indexes in
init.cppthat also seems fine but unrelated?
Maybe the PR description can be updated to describe the motivation behind these other changes?
Sure. This PR improves the index sources. And, while the changes simplify the code, to me, pushing a PR just to do such small code refactoring does not seem worth the efforts (from reviewers and from myself).
Will update the description. Thanks!
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I've ran the benchmark locally on this PR and the branches you mention in #28955 (review), and it is sometimes better on one branch then the other for both tests. The results you show are very small improvements which I think could just be attributed to noise.
Regardless of the result, which could vary depending on the OS, the changes in this PR should be good to go as is.
I agree, except for the first 2 commits. The benchmark as I mentioned doesn't seem particularly useful, and the Sync method being made public should then only be done in a patch that will take advantage of it's new accessibility.
src/index/blockfilterindex.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be const function.
src/index/blockfilterindex.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const bool.
|
Concept ACK I synced the block filter from scratch at c3e2915b14900c1379e4c3f1ac0262f60b362431 (rebased on master) and then on the last commit. AMD Ryzen 7950x with SSD drive running Ubuntu 23.10:
So not much difference, which seems fine with the goal of #26966 in mind - which does make a dramatic difference. I haven't measured the performance impact on syncing the index during IBD; in light of 5d5e22b979d00ca3d3c4b4013fea43ba58f571bc that might be more significant because there's more going on in Can you add some rationale to the b19585e00a52c0b571e70ddb8f7c996d598c3435 commit message as to what was preventing this simplification before? @andrewtoth wrote:
Benchmarks are also useful to prevent future regressions. |
e3d3763 to
08d8608
Compare
|
Updated per feedback. Thanks both.
Nothing was preventing it.
Have you tried it on a spinning disk? The diff should be noticeable there. We need to avoid disk writes/reads where is possible. See #28037 discussion as a good example of it.
Have merged the first two commits. |
|
tACK 08d8608c51beb66dab4e12ac06559a70517fb054 |
08d8608 to
1cf73a3
Compare
|
Rebased due a one-line conflict with #29236. |
|
Tidy complains about |
Avoid disk read operations on every new processed block.
Only NextSyncBlock requires cs_main lock. The other function calls like Commit or Rewind will lock or not cs_main internally when they need it. Avoiding keeping cs_main locked when Commit() or Rewind() write data to disk.
1cf73a3 to
99afb9d
Compare
|
re-ACK 99afb9d |
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK 99afb9d
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 99afb9d
|
Concept ACK |
|
ACK 99afb9d |
|
Follow-up in #29867 (comment) |
| } | ||
|
|
||
| { | ||
| LOCK(cs_main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "index: decrease ThreadSync cs_main contention" (0faafb5)
Note: This commit introduces a race condition, because it is no longer locking cs_main while calling NextSyncBlock and setting m_synced = true. As a result, a new block could be connected by another thread after NextSyncBlock returns null in this thread, but before m_synced is set to true, so the block will never be indexed because BlockConnected notifications are ignored while m_synced is false. This should be fixed in #29867
Work decoupled from #26966 per request.
The aim is to remove an unnecessary disk read operation that currently takes place with every new arriving block (or scanned block during background sync). Instead of reading the last filter header from disk merely to access its hash for constructing the next filter, this work caches it, occupying just 32 more bytes in memory.
Also, reduces
cs_mainlock contention during the index initial sync process. And, simplifies the indexes initialization and shutdown procedure.Testing Note:
To compare the changes, added a pretty basic benchmark in the second commit. Alternatively, could also test the changes by timing the block filter sync from scratch on any network; start the node with
-blockfilterindexand monitor the logs until the syncing process finish.Local Benchmark Results:
*Master (c252a0f):
BlockFilterIndexSync*PR (43a212cfdac6c64e82b601c664443d022f191520):
BlockFilterIndexSync