-
Notifications
You must be signed in to change notification settings - Fork 38.7k
index: Remove unused BlockFilter::Unserialize() #25637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index: Remove unused BlockFilter::Unserialize() #25637
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Some nits if you retouch:
- Not sure all the scope brackets in the tests are useful.
- Prefere prefix iterators (faster due to avoiding a copy), see our developer notes
- Avoid unneeded temporaries in loop
- s/insert/emplace/ where it makes sense
- named params and constify where helpful/clearer
suggestions
diff --git a/src/test/blockfilter_tests.cpp b/src/test/blockfilter_tests.cpp
index a17a71bf9f..a20b49e0a7 100644
--- a/src/test/blockfilter_tests.cpp
+++ b/src/test/blockfilter_tests.cpp
@@ -62,7 +62,7 @@ static BlockFilter UnserializeBlockFilter(Stream& s) {
>> block_hash
>> encoded_filter;
- BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
+ const BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_uint8);
BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter));
return block_filter;
}
diff --git a/src/test/fuzz/blockfilter.cpp b/src/test/fuzz/blockfilter.cpp
index db3788d46f..2a1d81d970 100644
--- a/src/test/fuzz/blockfilter.cpp
+++ b/src/test/fuzz/blockfilter.cpp
@@ -22,23 +22,17 @@ FUZZ_TARGET(blockfilter)
GCSFilter::ElementSet elements;
size_t num_elements = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 100000);
- for (size_t i=0; i < num_elements; i++) {
- std::vector<unsigned char> element = ConsumeRandomLengthIntegralVector<unsigned char>(fuzzed_data_provider, 32);
- elements.insert(element);
+ for (size_t i=0; i < num_elements; ++i) {
+ elements.emplace(ConsumeRandomLengthIntegralVector<unsigned char>(fuzzed_data_provider, /*max_vector_size=*/32));
char>(fuzzed_data_provider, 32));
}
GCSFilter filter({u256->GetUint64(0), u256->GetUint64(1), BASIC_FILTER_P, BASIC_FILTER_M}, elements);
- BlockFilter block_filter(BlockFilterType::BASIC, u256.value(), filter.GetEncoded());
- {
- (void)block_filter.ComputeHeader(ConsumeUInt256(fuzzed_data_provider));
- (void)block_filter.GetBlockHash();
- (void)block_filter.GetEncodedFilter();
- (void)block_filter.GetHash();
- }
- {
- const BlockFilterType block_filter_type = block_filter.GetFilterType();
- (void)BlockFilterTypeName(block_filter_type);
- }
+ const BlockFilter block_filter(BlockFilterType::BASIC, u256.value(), filter.GetEncoded());
+ (void)block_filter.ComputeHeader(ConsumeUInt256(fuzzed_data_provider));
+ (void)block_filter.GetBlockHash();
+ (void)block_filter.GetEncodedFilter();
+ (void)block_filter.GetHash();
+ (void)BlockFilterTypeName(block_filter.GetFilterType());
{
const GCSFilter gcs_filter = block_filter.GetFilter();
(void)gcs_filter.GetN();
shaavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f5563da
Since it is my first time reviewing a PR concerning filters and streams, I want to detail my understanding of this PR before proceeding with the review. If my account is erroneous, please do correct me. Thank you.
Notes:
- Normally, the filters are initialized manually and are not read from a stream. So we can be sure that the
encoded_filteralways contains the correct number of bits.
Eg.
GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);- And when the
encoded_filteris read from a data stream in a function, a preliminary check is done before passing it through the BlockFilter function.
Eg.
filein >> block_hash >> encoded_filter;
uint256 result;
CHash256().Write(encoded_filter).Finalize(result);
if (result != hash) return error("Checksum mismatch in filter decode.");- The only function which allows getting the
encoded_filterfrom the data stream and passing it through theGCSFilter()function without any preliminary checks is the Unserialize function. - Since this function is used only in the testing, it could be moved to the test file, which removes the need for checking
encoded_filtercorrectness in the BlockFilter/GCSFilter functions. - In the test file, the correctness of
encoded_filteris checked by checking if theencoded_filterremains unchanged after movingblock_filterin and out of the stream.
stream << block_filter;
stream >> block_filter2;
block_filter2 = UnserializeBlockFilter(stream);
...
BOOST_CHECK(block_filter.GetEncodedFilter() == block_filter2.GetEncodedFilter()); Review:
- I agree with moving the Unserialize function to the test file as it simplifies the code's logic while making it more straightforward.
- The code changes are clean and logical.
- And successful passing of CI shows that the fuzz testing was successful on the PR.
- I would like to suggest addressing @jonatack suggestions before merging this PR.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
f5563da to
3295371
Compare
Pushed new code with the suggested changes stated here. |
shaavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c
Changes since my last review:
- Addressed @jonatack's suggestions.
I verified that the update adequately addresses all the suggestions in this comment.
|
cc @mzumsande @ryanofsky @stickies-v @theStack given you all reviewed #24832. |
|
ACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c edit: modulo removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 32953718a , with the caveat that I don't understand the fuzzing code very well so count that as unreviewed from my end (even though I left a small nit)
I think GCSFilterDecodeSkipCheck() should be removed before ACK. Besides that, left 2 style comments but none blocking.
Summary: #24832 introduced a new file hash-based check to ensure filter integrity. This hash check (in BlockFilterIndex::ReadFilterFromDisk) can not be used when filters are unserialized from a stream because the hash is unknown, but that unserialization only happened in the test suite. As such, this PR moving unserialization logic to /src/test/ and simplifying the non-test code removing the skip_decode_check parameter and path makes sense.
src/test/fuzz/blockfilter.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't understand the fuzzing code very well yet, but is there no more descriptive name than u256?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly being used as a blockhash so I guess I could rename it to be block_hash
src/test/blockfilter_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd just initialize at once:
| block_filter2 = UnserializeBlockFilter(stream); | |
| BlockFilter block_filter2 {UnserializeBlockFilter(stream)}; |
src/bench/gcs_filter.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the entire GCSFilterDecodeSkipCheck() test can be removed? It's identical to GCSFilterDecode()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks for noticing 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll remove it
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK 32953718a3fb8e2f298b99e1a2ee72af1ea5be8c (modulo potential benchmark/fuzz test improvements)
Left a nit regarding the fuzz test below. I also agree that the benchmark GCSFilterDecodeSkipCheck can be removed as it's identical to GCSFilterDecode now, as suggested by stickies-v (good catch!).
src/test/fuzz/blockfilter.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too deep into fuzzing, but IIUC, we just want a random uint256 here and there is not really any benefit by involving a deserialization (that could fail)?
| const std::optional<uint256> u256 = ConsumeDeserializable<uint256>(fuzzed_data_provider); | |
| if (!u256) { | |
| return; | |
| } | |
| const uint256 u256 = ConsumeUInt256(fuzzed_data_provider); |
Could also limit this variable's scope by moving it down before it's first use (that is, before GCSFilter is constructed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just want a random uint256 here and there is not really any benefit by involving a deserialization (that could fail)?
Yeah that's right. I'll apply the suggested change.
Could also limit this variable's scope by moving it down before it's first use (that is, before GCSFilter is constructed).
I'm also using the it for entropy with the siphash key. But I could get rid of that and just use the default key and initialize it right before the filter construction.
Oops this is also done during the filter construction. I'll move it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting on this PR since I got a review request above. I am -0 on this change. I don't think it's an improvement, but would not object if others want to go ahead with it.
I think this is worse for code organization and debugability. I think serialize and deserialize methods should live next to each other in the code so they can don't get out of sync and stay available for understanding and debugging. If there was a large amount of code that was only called by tests and not used by the application, I'd agree it should be moved or removed. But this is a small amount and IMO this change just moves it someplace harder to find.
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this similarly (as expressed here), but I think removing the skip_decode_check logic is orthogonal to this and would definitely be an improvement.
3295371 to
1204e3d
Compare
|
code review ACK 1204e3d4d Verified that since 3295371 the only changes made are incorporating (all) the suggestions made by myself and @theStack
I'm not sufficiently familiar with how (de)serialization is organized across the codebase, so I'll refrain from going beyond a code review ACK, even though otherwise I think this PR is in good shape. |
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 1204e3d4d35a1adf7e2160f29b604bb1ba1ac899
While I agree that to each serialization routine there should also be a deserialization counterpart and it wouldn't be a good idea to remove it completely, I think just having it in the unit tests is fine. If serialization and deserialization went out of sync, the serialization/deserialization roundtrip test would likely detect this, no?
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1204e3d4d35a1adf7e2160f29b604bb1ba1ac899
Some nit comments (feel free to ignore), most of which would be pre-empted by running
git diff -U0 HEAD~1.. | ./contrib/devtools/clang-format-diff.py -p1 -i -v
to apply clang formatting to the changes; see contrib/devtools/clang-format-diff.py.
src/test/blockfilter_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop a temporary here if you retouch
- BlockFilter block_filter(filter_type, block_hash, std::move(encoded_filter));
- return block_filter;
+ return BlockFilter{filter_type, block_hash, std::move(encoded_filter)};
src/test/blockfilter_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang format
| BlockFilter block_filter2 {UnserializeBlockFilter(stream)}; | |
| BlockFilter block_filter2{UnserializeBlockFilter(stream)}; |
src/test/blockfilter_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang format
template <typename Stream>
-static BlockFilter UnserializeBlockFilter(Stream& s) {
+static BlockFilter UnserializeBlockFilter(Stream& s)
+{
std::vector<unsigned char> encoded_filter;
uint8_t filter_type_uint8;
uint256 block_hash;
- s >> filter_type_uint8
- >> block_hash
- >> encoded_filter;
+ s >> filter_type_uint8 >> block_hash >> encoded_filter;
src/test/fuzz/blockfilter.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang format
| for (size_t i=0; i < num_elements; ++i) { | |
| for (size_t i = 0; i < num_elements; ++i) { |
1204e3d to
9b87854
Compare
Addressed. |
|
Looks like ae7ae36 added a new check to the Clang-tidy CI (unrelated to changes in the last push) that is red now: (ACK 9b87854 per |
9b87854 to
51fd464
Compare
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
51fd464 to
07fdd10
Compare
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review re-ACK 07fdd10
Verified that since 1204e3d changes are limited to clang formatting and declaring gcs_filter as a reference variable.
|
Diff-only review re-ACK 07fdd10 per |
…er where needed 89576cc refactor: add LIFETIMEBOUND to blockfilter where needed (stickies-v) Pull request description: Noticed from bitcoin/bitcoin#25637 (comment) that [`BlockFilter::GetFilter()`](https://github.com/bitcoin/bitcoin/blob/01e1627e25bc5477c40f51da03c3c31b609a85c9/src/blockfilter.h#L132) returns a reference to a member variable. Added LIFETIMEBOUND to all blockfilter-related code to ensure that the return values do not have a lifetime that exceeds the lifetime of what it is bound to. See https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#lifetimebound or bitcoin/bitcoin#25060 for a similar example. I used `grep -E '[a-zA-Z>0-9][&*] ([a-zA-Z]*)\((.*)\)' src/**/blockfilter*` to grep all possible occurrences (not all of them require LIFETIMEBOUND) ACKs for top commit: brunoerg: crACK 89576cc Tree-SHA512: 6fe61fc0c1ed9e446edce083d1b093e1a5e2ef8c39ff74125bb12a24e514d45711845809817fbd4a04d7a9c23c8b362203771c17b6d831d2560b1af268453019
… needed 89576cc refactor: add LIFETIMEBOUND to blockfilter where needed (stickies-v) Pull request description: Noticed from bitcoin#25637 (comment) that [`BlockFilter::GetFilter()`](https://github.com/bitcoin/bitcoin/blob/01e1627e25bc5477c40f51da03c3c31b609a85c9/src/blockfilter.h#L132) returns a reference to a member variable. Added LIFETIMEBOUND to all blockfilter-related code to ensure that the return values do not have a lifetime that exceeds the lifetime of what it is bound to. See https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#lifetimebound or bitcoin#25060 for a similar example. I used `grep -E '[a-zA-Z>0-9][&*] ([a-zA-Z]*)\((.*)\)' src/**/blockfilter*` to grep all possible occurrences (not all of them require LIFETIMEBOUND) ACKs for top commit: brunoerg: crACK 89576cc Tree-SHA512: 6fe61fc0c1ed9e446edce083d1b093e1a5e2ef8c39ff74125bb12a24e514d45711845809817fbd4a04d7a9c23c8b362203771c17b6d831d2560b1af268453019
|
cc @theStack for re-ACK? |
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 07fdd10
(verified via git range-diff 1204e3d4...07fdd10c that changes since my last ACK are trivial refactors)
|
@furszy My understanding is that you are working on implementing something that requires block filter download, which would require having this |
|
Yeah, the new feature requires to unserialize block filters from the wire. Thanks for the ping @achow101. |
|
Talked to andrew and fursy and am convinced that the PR would just have to be reverted in a later commit. Closing. |
BlockFilter::Unserialize is only used in the test code. Moving it to
the tests allows for the removal of the optional sanity check in the
GCSFilter and BlockFilter constructor, resulting in simpler code.
This is a follow up to #24832.