-
Notifications
You must be signed in to change notification settings - Fork 38.8k
Safegcd-based modular inverses in MuHash3072 #21590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Using libgmp for inverses is 1.5x-2x faster still, which is somewhat expected - there are several optimizations to safegcd that become more relevant for larger input sizes but aren't useful in the 256-bit code which this is adapted from as well. I think it's fine to leave those for future improvements, as this already gets hash finalization down to ~1 signature check worth, which is probably far below what we care about. |
|
Concept and high-level review ACK. Did not check the algorithm in detail. |
|
Concept ACK |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/21590. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
498d1df to
f1f4eae
Compare
|
Rebased. |
0bf165c to
58b108e
Compare
|
I added additional fuzz tests, which seem to pass. However, |
3821b14 to
9cb1bc3
Compare
2c35a93 Generalize/simplify VectorReader into SpanReader (Pieter Wuille) Pull request description: Originally written for #21590 (safegcd-based MuHash inverses), but then found a better way that removed the need for it, so I'm submitting it independently. ACKs for top commit: MarcoFalke: re-ACK 2c35a93 🖨 shaavan: ACK 2c35a93 Tree-SHA512: 959e3251e0cfe20e13a50639b617c9dc2a561d613a0884d983c93d15dacb6d2305d760aa933d18ba055cef8a1651a344bcb6b3f93051ecf26d3f2efc5779efa4
2c35a93 Generalize/simplify VectorReader into SpanReader (Pieter Wuille) Pull request description: Originally written for bitcoin#21590 (safegcd-based MuHash inverses), but then found a better way that removed the need for it, so I'm submitting it independently. ACKs for top commit: MarcoFalke: re-ACK 2c35a93 🖨 shaavan: ACK 2c35a93 Tree-SHA512: 959e3251e0cfe20e13a50639b617c9dc2a561d613a0884d983c93d15dacb6d2305d760aa933d18ba055cef8a1651a344bcb6b3f93051ecf26d3f2efc5779efa4
This isn't an issue with the implementation here but with I will take a closer look at the code here soon! |
c055f6b test: Remove false coinstatsindex test (Fabian Jahr) Pull request description: This test never actually tested the behavior that it describes in the comments. This was discovered in #21590 which seems to speed up muhash which lead to the test failing. I can vaguely remember that the described behavior was desired by some reviewers of `coinstatsindex`: That `coinstatsindex` should be aware of stale blocks and able to return statistics on them as well. The index actually does this for blocks that it sees while the index is active, i.e. while running `coinstatsindex` all blocks will be indexed and even when they become stale the index (via `gettxoutsetinfo`) will still return a result for them when given the right hash. But this currently does not work for blocks that the node saw and that became stale _before_ the node activated `coinstatsindex`. While the index syncs initially everything but the active chain is ignored and I don't see any indication that this ever worked differently in the past. Introducing this behavior seems non-trivial at first glance so, while I will give this a shot, I think the test should be removed so it does not confuse users and does not block #21590. Top commit has no ACKs. Tree-SHA512: b05f5dfeea3e453c8bb7c761501d0d896d4412a3f0c08037955951fae9fe388c63402da401792591e18da8fb67734f47f1a297d573cdb66e0ced451698718067
c055f6b test: Remove false coinstatsindex test (Fabian Jahr) Pull request description: This test never actually tested the behavior that it describes in the comments. This was discovered in bitcoin#21590 which seems to speed up muhash which lead to the test failing. I can vaguely remember that the described behavior was desired by some reviewers of `coinstatsindex`: That `coinstatsindex` should be aware of stale blocks and able to return statistics on them as well. The index actually does this for blocks that it sees while the index is active, i.e. while running `coinstatsindex` all blocks will be indexed and even when they become stale the index (via `gettxoutsetinfo`) will still return a result for them when given the right hash. But this currently does not work for blocks that the node saw and that became stale _before_ the node activated `coinstatsindex`. While the index syncs initially everything but the active chain is ignored and I don't see any indication that this ever worked differently in the past. Introducing this behavior seems non-trivial at first glance so, while I will give this a shot, I think the test should be removed so it does not confuse users and does not block bitcoin#21590. Top commit has no ACKs. Tree-SHA512: b05f5dfeea3e453c8bb7c761501d0d896d4412a3f0c08037955951fae9fe388c63402da401792591e18da8fb67734f47f1a297d573cdb66e0ced451698718067
PastaPastaPasta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change all the c-style casts to functional casts, I gave specific comments on a few instances, but stopped so it's not as spammy
02c333c to
ad67fd2
Compare
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ad67fd2
Just nits, and all of them can be ignored. This was fun to review, the explanations in the safegcd_implementation.md are excellent. I also profited from the two review clubs and their notes that were done on the original MuHash introduction.
src/crypto/muhash.cpp
Outdated
| ce -= (signed_double_limb_t)1103717 * me; | ||
| /* Verify that the low SIGNED_LIMB_SIZE bits of the computation are indeed zero, and then throw them away. */ | ||
| assert((cd & MAX_SIGNED_LIMB) == 0); | ||
| assert((ce & MAX_SIGNED_LIMB) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: Compared to the code in modinv32_impl.h, there are fewer bound checks done here. Is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones in particular are missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant these checks:
https://github.com/bitcoin/bitcoin/blob/master/src/secp256k1/src/modinv32_impl.h#L414-L419
https://github.com/bitcoin/bitcoin/blob/master/src/secp256k1/src/modinv32_impl.h#L456-L459
But I guess they are not really that important and it is not worth adding the required utilities?
src/crypto/muhash.cpp
Outdated
| // Add modulus if this was negative. This brings the range of *this to 1-2^3072..2^3072-1. | ||
| signed_limb_t cond_add = limbs[SIGNED_LIMBS-1] >> (LIMB_SIZE-1); // -1 if this is negative; 0 otherwise | ||
| limbs[0] += signed_limb_t(-MAX_PRIME_DIFF) & cond_add; | ||
| limbs[3072 / SIGNED_LIMB_SIZE] += (signed_limb_t(1) << (3072 % SIGNED_LIMB_SIZE)) & cond_add; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: Compared to modinv32_normalize, this step seems to skip the inner limbs. Is there a reason why there is a difference between the implementations here? IIUC this works out in the end because of the carry step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. They aren't skipped, the carry step is the equivalent of processing the inner limbs. They're just easier, because the modulus here can be represented as [1 << FINAL_LIMB_MODULUS_BITS, 0, 0, 0, ..., 0, 0, -MAX_PRIME_DIFF]. In the modinv32_impl.h code, the modulus is treated as generic, where any limb can be nonzero.
| me -= (limb_t(0x70a1421da087d93) * limb_t(ce) + me) & MAX_SIGNED_LIMB; | ||
| /* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */ | ||
| cd -= (signed_double_limb_t)1103717 * md; | ||
| ce -= (signed_double_limb_t)1103717 * me; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tripped me up at first, because something is added instead of subtracted compared to the implementation in modinv32_impl.h. But I think this is correct, because it subtracts the distance (I don't know what the correct terminology is for this) to the modulus instead of adding the modulus, which should work out to the same. Similarly, the operation in the for loop can also be moved to the final step, which I'm guessing is a further nice optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. The modulus here is 2^3072 - MAX_PRIME_DIFF, which is represented in signed-limb representation as [1 << FINAL_LIMB_MODULUS_BITS, 0, 0, 0, ..., 0, -MAX_PRIME_DIFF], so we only need to do something for the bottom limb (where our modulus is negative) and the top limb.
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzz-tested ACK ad67fd2
With the friendly help of @dergoegge I managed to get differential fuzzing running last week and let that ran for the last ~77 hours. Here are the rough instructions for those who also want to give it a try:
Instructions for differential fuzzing using semsan
-
created branches on top of master and the PR each that add a characterization to the MuHash fuzz test, writing to a shared memory for comparison (see https://github.com/theStack/bitcoin/tree/muhash_characterization_master and https://github.com/theStack/bitcoin/tree/muhash_characterization_pr21590, cherry-picking the commit originally from dergoegge@d327378; note that the environment variable value had to be adapted to
SEMSAN_CHARACTERIZATION_SHMEM_ID) -
built the
afl-clang-...binaries for clang 18:
$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus
$ LLVM_CONFIG=llvm-config-18 make
- built both branches mentioned in step 1 above using
afl-clang-lto/afl-clang-lto++(built in step 2):
$ cmake -B build_fuzz -DCMAKE_C_COMPILER="/path/to/AFLplusplus/afl-clang-lto" -DCMAKE_CXX_COMPILER="/path/to/AFLplusplus/afl-clang-lto++" -DBUILD_FOR_FUZZING=ON
...
$ cmake --build build_fuzz/
...
(Note that this can take quite a while. Unfortunately, using the -fast binaries didn't work for me and resulted in a linker error.)
- cloned the qa-assets repo for the fuzzing seeds
$ git clone --depth=1 https://github.com/bitcoin-core/qa-assets
- built the dergoegge's semsan tool and run it with each of the built fuzzing binaries above and the fuzzing seed:
$ https://github.com/dergoegge/semsan
$ cd semsan
$ cargo build --release
$ AFL_DEBUG=1 FUZZ=muhash ./target/release/semsan --debug-children /path/to/master_characterization_branch/build_fuzz/src/test/fuzz/fuzz /path/to/pr21590_characterization_branch/build_fuzz/src/test/fuzz/fuzz fuzz --seeds ~/qa-assets/fuzz_corpora/muhash/ --solutions ./solutions
- wait and enjoy 🍻 🥃 🥩 🍨
The latest output looked like this on my machine:
[Client Heartbeat #0] run time: 77h-20m-55s, clients: 1, corpus: 22, objectives: 0, executions: 31816828, exec/sec: 114.3, combined-coverage: 262/563840 (0%), stability: 262/262 (100%)
This implements a safegcd-based modular inverse for MuHash3072. It is a fairly straightforward translation of the libsecp256k1 implementation, with the following changes:
A benchmark is also included for MuHash3072::Finalize. The new implementation is around 100x faster on x86_64 for me (from 5.8 ms to 57 μs); for 32-bit code the factor is likely even larger.
For more information: