Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Apr 4, 2021

This implements a safegcd-based modular inverse for MuHash3072. It is a fairly straightforward translation of the libsecp256k1 implementation, with the following changes:

  • Generic for 32-bit and 64-bit
  • Specialized for the specific MuHash3072 modulus (2^3072 - 1103717).
  • A bit more C++ish
  • Far fewer sanity checks

A benchmark is also included for MuHash3072::Finalize. The new implementation is around 100x faster on x86_64 for me (from 5.8 ms to 57 μs); for 32-bit code the factor is likely even larger.

For more information:

  • Original paper by Daniel J. Bernstein and Bo-Yin Yang
  • Implementation for libsecp256k1 by Peter Dettman; and the final version
  • Explanation of the algorithm using Python snippets
  • Analysis of the maximum number of iterations the algorithm needs
  • Formal proof in Coq by Russell O'Connor (for the 256-bit version of the algorithm; here we use a 3072-bit one).

@sipa
Copy link
Member Author

sipa commented Apr 6, 2021

Using libgmp for inverses is 1.5x-2x faster still, which is somewhat expected - there are several optimizations to safegcd that become more relevant for larger input sizes but aren't useful in the 256-bit code which this is adapted from as well.

I think it's fine to leave those for future improvements, as this already gets hash finalization down to ~1 signature check worth, which is probably far below what we care about.

@laanwj
Copy link
Member

laanwj commented May 19, 2021

Concept and high-level review ACK. Did not check the algorithm in detail.

@theStack
Copy link
Contributor

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 15, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/21590.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, dergoegge, achow101
Concept ACK laanwj
Stale ACK theStack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31507 ([POC] build: Use clang-cl to build on Windows natively by hebasto)
  • #31308 (ci, iwyu: Treat warnings as errors for specific targets by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa
Copy link
Member Author

sipa commented Nov 15, 2021

Rebased.

@sipa sipa force-pushed the 202101_muhash_safegcd branch 2 times, most recently from 0bf165c to 58b108e Compare December 1, 2021 21:13
@sipa
Copy link
Member Author

sipa commented Dec 1, 2021

I added additional fuzz tests, which seem to pass.

However, test/functional/feature_coinstatsindex.py now fails, and I haven't figured out why. @fjahr ?

@sipa sipa force-pushed the 202101_muhash_safegcd branch 2 times, most recently from 3821b14 to 9cb1bc3 Compare December 2, 2021 16:48
maflcko pushed a commit that referenced this pull request Dec 3, 2021
2c35a93 Generalize/simplify VectorReader into SpanReader (Pieter Wuille)

Pull request description:

  Originally written for #21590 (safegcd-based MuHash inverses), but then found a better way that removed the need for it, so I'm submitting it independently.

ACKs for top commit:
  MarcoFalke:
    re-ACK 2c35a93 🖨
  shaavan:
    ACK 2c35a93

Tree-SHA512: 959e3251e0cfe20e13a50639b617c9dc2a561d613a0884d983c93d15dacb6d2305d760aa933d18ba055cef8a1651a344bcb6b3f93051ecf26d3f2efc5779efa4
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 3, 2021
2c35a93 Generalize/simplify VectorReader into SpanReader (Pieter Wuille)

Pull request description:

  Originally written for bitcoin#21590 (safegcd-based MuHash inverses), but then found a better way that removed the need for it, so I'm submitting it independently.

ACKs for top commit:
  MarcoFalke:
    re-ACK 2c35a93 🖨
  shaavan:
    ACK 2c35a93

Tree-SHA512: 959e3251e0cfe20e13a50639b617c9dc2a561d613a0884d983c93d15dacb6d2305d760aa933d18ba055cef8a1651a344bcb6b3f93051ecf26d3f2efc5779efa4
@fjahr
Copy link
Contributor

fjahr commented Dec 5, 2021

I added additional fuzz tests, which seem to pass.

However, test/functional/feature_coinstatsindex.py now fails, and I haven't figured out why. @fjahr ?

This isn't an issue with the implementation here but with me the test just being stupid. More info here #23681.

I will take a closer look at the code here soon!

maflcko pushed a commit that referenced this pull request Dec 6, 2021
c055f6b test: Remove false coinstatsindex test (Fabian Jahr)

Pull request description:

  This test never actually tested the behavior that it describes in the comments. This was discovered in #21590 which seems to speed up muhash which lead to the test failing.

  I can vaguely remember that the described behavior was desired by some reviewers of `coinstatsindex`: That `coinstatsindex` should be aware of stale blocks and able to return statistics on them as well. The index actually does this for blocks that it sees while the index is active, i.e. while running `coinstatsindex` all blocks will be indexed and even when they become stale the index (via `gettxoutsetinfo`) will still return a result for them when given the right hash. But this currently does not work for blocks that the node saw and that became stale _before_ the node activated `coinstatsindex`. While the index syncs initially everything but the active chain is ignored and I don't see any indication that this ever worked differently in the past.

  Introducing this behavior seems non-trivial at first glance so, while I will give this a shot, I think the test should be removed so it does not confuse users and does not block #21590.

Top commit has no ACKs.

Tree-SHA512: b05f5dfeea3e453c8bb7c761501d0d896d4412a3f0c08037955951fae9fe388c63402da401792591e18da8fb67734f47f1a297d573cdb66e0ced451698718067
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 6, 2021
c055f6b test: Remove false coinstatsindex test (Fabian Jahr)

Pull request description:

  This test never actually tested the behavior that it describes in the comments. This was discovered in bitcoin#21590 which seems to speed up muhash which lead to the test failing.

  I can vaguely remember that the described behavior was desired by some reviewers of `coinstatsindex`: That `coinstatsindex` should be aware of stale blocks and able to return statistics on them as well. The index actually does this for blocks that it sees while the index is active, i.e. while running `coinstatsindex` all blocks will be indexed and even when they become stale the index (via `gettxoutsetinfo`) will still return a result for them when given the right hash. But this currently does not work for blocks that the node saw and that became stale _before_ the node activated `coinstatsindex`. While the index syncs initially everything but the active chain is ignored and I don't see any indication that this ever worked differently in the past.

  Introducing this behavior seems non-trivial at first glance so, while I will give this a shot, I think the test should be removed so it does not confuse users and does not block bitcoin#21590.

Top commit has no ACKs.

Tree-SHA512: b05f5dfeea3e453c8bb7c761501d0d896d4412a3f0c08037955951fae9fe388c63402da401792591e18da8fb67734f47f1a297d573cdb66e0ced451698718067
Copy link
Contributor

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change all the c-style casts to functional casts, I gave specific comments on a few instances, but stopped so it's not as spammy

@sipa sipa force-pushed the 202101_muhash_safegcd branch from 02c333c to ad67fd2 Compare October 16, 2024 07:58
Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ad67fd2

Just nits, and all of them can be ignored. This was fun to review, the explanations in the safegcd_implementation.md are excellent. I also profited from the two review clubs and their notes that were done on the original MuHash introduction.

ce -= (signed_double_limb_t)1103717 * me;
/* Verify that the low SIGNED_LIMB_SIZE bits of the computation are indeed zero, and then throw them away. */
assert((cd & MAX_SIGNED_LIMB) == 0);
assert((ce & MAX_SIGNED_LIMB) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Compared to the code in modinv32_impl.h, there are fewer bound checks done here. Is this on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ones in particular are missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Add modulus if this was negative. This brings the range of *this to 1-2^3072..2^3072-1.
signed_limb_t cond_add = limbs[SIGNED_LIMBS-1] >> (LIMB_SIZE-1); // -1 if this is negative; 0 otherwise
limbs[0] += signed_limb_t(-MAX_PRIME_DIFF) & cond_add;
limbs[3072 / SIGNED_LIMB_SIZE] += (signed_limb_t(1) << (3072 % SIGNED_LIMB_SIZE)) & cond_add;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Compared to modinv32_normalize, this step seems to skip the inner limbs. Is there a reason why there is a difference between the implementations here? IIUC this works out in the end because of the carry step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. They aren't skipped, the carry step is the equivalent of processing the inner limbs. They're just easier, because the modulus here can be represented as [1 << FINAL_LIMB_MODULUS_BITS, 0, 0, 0, ..., 0, 0, -MAX_PRIME_DIFF]. In the modinv32_impl.h code, the modulus is treated as generic, where any limb can be nonzero.

me -= (limb_t(0x70a1421da087d93) * limb_t(ce) + me) & MAX_SIGNED_LIMB;
/* Update the beginning of computation for t*[d,e]+modulus*[md,me] now md,me are known. */
cd -= (signed_double_limb_t)1103717 * md;
ce -= (signed_double_limb_t)1103717 * me;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up at first, because something is added instead of subtracted compared to the implementation in modinv32_impl.h. But I think this is correct, because it subtracts the distance (I don't know what the correct terminology is for this) to the modulus instead of adding the modulus, which should work out to the same. Similarly, the operation in the for loop can also be moved to the final step, which I'm guessing is a further nice optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. The modulus here is 2^3072 - MAX_PRIME_DIFF, which is represented in signed-limb representation as [1 << FINAL_LIMB_MODULUS_BITS, 0, 0, 0, ..., 0, -MAX_PRIME_DIFF], so we only need to do something for the bottom limb (where our modulus is negative) and the top limb.

@DrahtBot DrahtBot requested a review from dergoegge November 7, 2024 20:26
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzz-tested ACK ad67fd2

With the friendly help of @dergoegge I managed to get differential fuzzing running last week and let that ran for the last ~77 hours. Here are the rough instructions for those who also want to give it a try:

Instructions for differential fuzzing using semsan
  1. created branches on top of master and the PR each that add a characterization to the MuHash fuzz test, writing to a shared memory for comparison (see https://github.com/theStack/bitcoin/tree/muhash_characterization_master and https://github.com/theStack/bitcoin/tree/muhash_characterization_pr21590, cherry-picking the commit originally from dergoegge@d327378; note that the environment variable value had to be adapted to SEMSAN_CHARACTERIZATION_SHMEM_ID)

  2. built the afl-clang-... binaries for clang 18:

$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus
$ LLVM_CONFIG=llvm-config-18 make
  1. built both branches mentioned in step 1 above using afl-clang-lto/afl-clang-lto++ (built in step 2):
$ cmake -B build_fuzz -DCMAKE_C_COMPILER="/path/to/AFLplusplus/afl-clang-lto" -DCMAKE_CXX_COMPILER="/path/to/AFLplusplus/afl-clang-lto++" -DBUILD_FOR_FUZZING=ON
...
$ cmake --build build_fuzz/
...

(Note that this can take quite a while. Unfortunately, using the -fast binaries didn't work for me and resulted in a linker error.)

  1. cloned the qa-assets repo for the fuzzing seeds
$ git clone --depth=1 https://github.com/bitcoin-core/qa-assets
  1. built the dergoegge's semsan tool and run it with each of the built fuzzing binaries above and the fuzzing seed:
$ https://github.com/dergoegge/semsan
$ cd semsan
$ cargo build --release
$ AFL_DEBUG=1 FUZZ=muhash ./target/release/semsan --debug-children /path/to/master_characterization_branch/build_fuzz/src/test/fuzz/fuzz /path/to/pr21590_characterization_branch/build_fuzz/src/test/fuzz/fuzz fuzz --seeds ~/qa-assets/fuzz_corpora/muhash/ --solutions ./solutions
  1. wait and enjoy 🍻 🥃 🥩 🍨

The latest output looked like this on my machine:

[Client Heartbeat #0] run time: 77h-20m-55s, clients: 1, corpus: 22, objectives: 0, executions: 31816828, exec/sec: 114.3, combined-coverage: 262/563840 (0%), stability: 262/262 (100%)