Skip to content

Conversation

@hebasto
Copy link
Member

@hebasto hebasto commented Nov 16, 2023

  1. Fix the test for SSE4.1 intrinsics during build system configuration, which currently can be false positive, for example, when CXXFLAGS="-mno-sse4.1" provided.

    This PR fixes the test by adding the _mm_blend_epi16 SSE4.1 function used in our codebase.

  2. Guard sha_x86_shani.cpp code with ENABLE_SSE41 macro as it uses the _mm_blend_epi16 function from
    the SSE4.1 instruction set.

    It is possible that SHA-NI is enabled even when SSE4.1 is disabled, which causes compile errors in the master branch.

Closes #28864.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 16, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK willcl-ark, sipa, theuni
Concept NACK luke-jr
Concept ACK laanwj, dergoegge, fanquake

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@hebasto hebasto marked this pull request as draft November 16, 2023 14:54
@laanwj
Copy link
Member

laanwj commented Nov 16, 2023

Concept ACK

@dergoegge
Copy link
Member

Cool! Concept ACK

@hebasto hebasto marked this pull request as ready for review November 16, 2023 15:35
@DrahtBot
Copy link
Contributor

Guix builds (on x86_64)

File commit 22025d0
(master)
commit 5096570b243b442ecf88f1bbb93eb483664b983a
(master and this pull)
SHA256SUMS.part 8c20bf13882367c1... 75d5f5f74ed8cf3d...
*-aarch64-linux-gnu-debug.tar.gz ddad48413a90edcc... 1d0fda9033fd33eb...
*-aarch64-linux-gnu.tar.gz 475776d445d01302... dd4f058043965533...
*-arm-linux-gnueabihf-debug.tar.gz 4ebfac8dcd043f01... 35aac4c502a6def9...
*-arm-linux-gnueabihf.tar.gz e99c3890d894e61d... c9ed0a67ccdc2842...
*-arm64-apple-darwin-unsigned.tar.gz 790282cee337f0d5... a34a41326f91a77c...
*-arm64-apple-darwin-unsigned.zip 1022a0e01d3bb7cc... 10593711628b82c2...
*-arm64-apple-darwin.tar.gz d03565cc6bb0d273... 20a1dbae3d6dfbab...
*-powerpc64-linux-gnu-debug.tar.gz 1e6f0a493dfb280a... cda5ba2e6ae593c8...
*-powerpc64-linux-gnu.tar.gz d411de4850aada0f... 3bec634f422ccc68...
*-powerpc64le-linux-gnu-debug.tar.gz 86e978abca77e790... 437896bcfbb37d76...
*-powerpc64le-linux-gnu.tar.gz ec0c9d2fc7a165c7... a5fa185c67a87a7c...
*-riscv64-linux-gnu-debug.tar.gz 35a02a18e2a885d8... 34429527e4ea014a...
*-riscv64-linux-gnu.tar.gz 216b9b5803674e54... 55d89463d4ac9d4e...
*-x86_64-apple-darwin-unsigned.tar.gz a4263e45b80de4ad... 2e1abdf0df04de80...
*-x86_64-apple-darwin-unsigned.zip 9c118c69289967e0... 080ac453313ecea2...
*-x86_64-apple-darwin.tar.gz 83388f8563d8043d... d205f8d5a5d7b34f...
*-x86_64-linux-gnu-debug.tar.gz 501990251709cbba... 3363860af25f86b0...
*-x86_64-linux-gnu.tar.gz 4548a959a4d1b049... 8f0d5f22d8df5251...
*.tar.gz ee37119e069929f2... b3f67d08d8805d0c...
guix_build.log a29e3e2cbad945f6... f293123a8da4e497...
guix_build.log.diff 05d62414aebc3558...

@fanquake
Copy link
Member

Concept ACK

@luke-jr
Copy link
Member

luke-jr commented Dec 5, 2023

Weak Concept NACK: We should probably still build the SSE4.1 code if the compiler supports it. The flag should only affect the rest of the codebase, so that the software will run on a system without SSE4.1 (but still use the optimised crypto on systems that have it).

@hebasto
Copy link
Member Author

hebasto commented Dec 5, 2023

@luke-jr

Weak Concept NACK: We should probably still build the SSE4.1 code if the compiler supports it.

This PR branch still builds the SSE4.1 code if the compiler supports it, no?

@hebasto
Copy link
Member Author

hebasto commented Feb 12, 2024

Rebased.

@fanquake
Copy link
Member

fanquake commented Mar 5, 2024

I think this could be rebased again, given the recent build system changes, and removal of the sse4 lib.

hebasto added 2 commits March 5, 2024 11:36
This change uses the `_mm_blend_epi16` SSE4.1 function used in our code
and fixes false-positive cases, for example, when CXXFLAGS="-mno-sse4.1"
provided.
The code in `sha_x86_shani.cpp` uses the `_mm_blend_epi16` function from
the SSE4.1 instruction set. However, it is possible that SHA-NI is
enabled even when SSE4.1 is disabled.

This changes avoid compilation errors in such a condition.
@hebasto
Copy link
Member Author

hebasto commented Mar 5, 2024

I think this could be rebased again, given the recent build system changes, and removal of the sse4 lib.

Sure! Rebased.

@fanquake
Copy link
Member

fanquake commented Mar 5, 2024

@theuni probably interested here, given you might als be refactoring the define usage?

@DrahtBot
Copy link
Contributor

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

File commit 327f08b
(master)
commit a1b91a1424b6e5e52e62dab7a62972e2cc2a8f1d
(master and this pull)
SHA256SUMS.part 2f7fbee107d854b9... 3634f9c71692d158...
*-aarch64-linux-gnu-debug.tar.gz 131ec1047dafc740... c0056ca782052d2d...
*-aarch64-linux-gnu.tar.gz 45fb78c1e31b46db... 18e3ac27aec42545...
*-arm-linux-gnueabihf-debug.tar.gz b6a522d6bc56f6a5... cc32b6549679f1cc...
*-arm-linux-gnueabihf.tar.gz decf947b99562eb5... 1722b603eae9da92...
*-arm64-apple-darwin-unsigned.tar.gz 0fff0b5c9d6d1652... f880e9d57b55c98d...
*-arm64-apple-darwin-unsigned.zip 6db0130c624a7ddf... ae18c652fbb4dfb5...
*-arm64-apple-darwin.tar.gz 84414b88744adffb... 8e4b981640a07436...
*-powerpc64-linux-gnu-debug.tar.gz cb76d396c6e08279... 614754e0d329121a...
*-powerpc64-linux-gnu.tar.gz aec752522feaaf87... 9641282279056855...
*-riscv64-linux-gnu-debug.tar.gz b2ea26b0b0db7815... ade38d756d20f814...
*-riscv64-linux-gnu.tar.gz 08bc325b2e2c3aeb... dcf02be96c063e78...
*-x86_64-apple-darwin-unsigned.tar.gz 1d437b14b7dd1096... be71af18e9165fe4...
*-x86_64-apple-darwin-unsigned.zip f18c51926fc0efee... 9286a16458d230d4...
*-x86_64-apple-darwin.tar.gz 0833d69d18e97013... 823c4d9a0d195273...
*-x86_64-linux-gnu-debug.tar.gz 2d70f334c41d4e60... 0a5b4dd1436b2e85...
*-x86_64-linux-gnu.tar.gz e76f8be29de9634c... 45bac980bbfe8266...
*.tar.gz 5467b517b7ad2c46... 72ee1a8832d6146e...
guix_build.log d68b4ff13be02f84... 6f82440b85f6e39c...
guix_build.log.diff 92a1b24a0c948c2c...

@willcl-ark
Copy link
Member

Concept ACK.

I was tacking this today (missed this PR as GH search for "sse" yielded no results 🤦🏼‍♂️ )

First I tried re-implementing @luke-jr previous approach, but refactoring into macros: cdaa9bf

Whilst this works well-enough, it didn't particularly fix #28864 in my opinion, so I tried master...willcl-ark:bitcoin:detect-sse4.1

Now that I've found this PR, the mechanics of the two change-sets appear very similar, but @hebasto gates SHANI from the Makefile whereas I opted to gate in configure.ac. Not sure if either has any advantages, but I'm concept ACK here!

Copy link
Member

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK d440f13

These changes fix the issues correctly in my testing on Ubuntu.

I do slightly prefer how my changeset printed a configure-time warning message for the user that SHANI support would be disabled if SSE4.1 was not enabled, but overall prefer the approach of this fix.

@sipa
Copy link
Member

sipa commented Jul 16, 2024

utACK d440f13

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK d440f13

@fanquake fanquake merged commit 3679fa1 into bitcoin:master Jul 17, 2024
@hebasto hebasto deleted the 231116-sse41 branch July 17, 2024 16:24
@hebasto
Copy link
Member Author

hebasto commented Jul 18, 2024

Ported to the CMake-based build system in hebasto#268.

hebasto added a commit to hebasto/bitcoin that referenced this pull request Jul 18, 2024
07e7fdc fixup! cmake: Build `bitcoin_crypto` library (Hennadii Stepanov)
6e9c0c6 crypto: Guard code with `ENABLE_SSE41` macro (Hennadii Stepanov)
7db01e1 build: Fix test for SSE4.1 intrinsics (Hennadii Stepanov)

Pull request description:

  This PR ports bitcoin#28893. All commits from there are picked here to avoid conflicts and CI failures.

ACKs for top commit:
  theuni:
    utACK 07e7fdc

Tree-SHA512: 58b5167d644e47f4e5d499f21a11acc66f9c8c2d23ee76ec1886d701d4f34965069af25a0a86d5c40ccbc3e064346bbe68f1e7451e7653419ad68a13ccc1492b
kwvg added a commit to kwvg/dash that referenced this pull request Jul 15, 2025
kwvg added a commit to kwvg/dash that referenced this pull request Jul 16, 2025
PastaPastaPasta added a commit to dashpay/dash that referenced this pull request Jul 17, 2025
, bitcoin#27598, bitcoin#29180, bitcoin#29407, bitcoin#29528, bitcoin#28893, bitcoin#29834, partial bitcoin#26158, bitcoin#25200, bitcoin#24322 (auxiliary backports: part 25)

2bb8e5b chore: resolve nit from review (Kittywhiskers Van Gogh)
61bba13 fix: tweak bls/bls_dkg benchmarks to work faster in `-sanity-check` mode (UdjinM6)
0eff28c merge bitcoin#29834: Change MAC_OSX macro to __APPLE__ in crypto (Kittywhiskers Van Gogh)
76cef17 merge bitcoin#28893: Fix SSE4.1-related issues (Kittywhiskers Van Gogh)
16209a8 merge bitcoin#29528: move sha256_sse4 into libbitcoin_crypto_base (Kittywhiskers Van Gogh)
ec1f492 merge bitcoin#29407: remove confusing and inconsistent disable-asm option (Kittywhiskers Van Gogh)
9440c18 merge bitcoin#29180: remove use of BUILD_BITCOIN_INTERNAL macro in sha256 (Kittywhiskers Van Gogh)
68f6bea fix: place `LIBTEST_*` before `LIBBITCOIN_*` for fuzzing binaries (Kittywhiskers Van Gogh)
a3c11b6 partial bitcoin#24322: Introduce initial `libbitcoinkernel` (Kittywhiskers Van Gogh)
9f981dd merge bitcoin#27598: Add SHA256 implementation specific benchmarks (Kittywhiskers Van Gogh)
306fa9a chore: realign function names in `crypto_hash` bench with upstream (Kittywhiskers Van Gogh)
3bdc15a chore: replace leftover `minEpochIterations` from `crypto_hash` benches (Kittywhiskers Van Gogh)
f300277 chore: drop `minEpochIterations(10)` from `crypto_hash` bench functions (Kittywhiskers Van Gogh)
c8f0d54 chore: move Dash-specific tests downward, sort like upstream (Kittywhiskers Van Gogh)
fda951f merge bitcoin#26481: Suppress output when running with `-sanity-check` option (Kittywhiskers Van Gogh)
a5718a0 partial bitcoin#25200: Fix spelling errors identified by codespell in comments (Kittywhiskers Van Gogh)
58a6248 partial bitcoin#26158: add "priority level" to the benchmark framework (Kittywhiskers Van Gogh)
160b3fc merge bitcoin#25107: Add `--sanity-check` flag, use it in `make check` (Kittywhiskers Van Gogh)
2ad60b3 fix: use atomic in `BLS_Verify_BatchedParallel` to avoid data race (Kittywhiskers Van Gogh)
24ad076 fix: ensure that globals are set and checks present to avoid bench errs (Kittywhiskers Van Gogh)
1c54997 trivial: re-enable benchmarks for `make check` (Kittywhiskers Van Gogh)
2a0df15 merge bitcoin#25058: Move output script RPCs to separate file, rename misc.cpp (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * `make check` upstream includes a dry run of the benchmarking suite, this behavior was introduced in [bitcoin#14092](bitcoin#14092) (a557022) but was disabled for Dash Core as our benchmarking suite is substantially heavier. Unfortunately, as a side effect of that, failures in the benchmark suite go unnoticed by CI (see e507a51 for one such fix).

    With the introduction of [bitcoin#25107](bitcoin#25107) and [bitcoin#26158](bitcoin#26158), the sanity checks require significantly less overhead and should be suitable for reintroduction into the `make check` rotation.

    * The sanity check caught failures in the test suite ([build](https://github.com/dashpay/dash/actions/runs/16291896463/job/46003746200#step:10:206)) due to missing initialization or insufficient checking and they have been remedied in this pull request. An example is available below.

      <details>

      <summary>Debugger stacktrace</summary>

      ```
      #0  StatsdClient::send<unsigned long> (this=0x0, key="CheckBlock_us", value=2847, type="ms", sample_rate=1) at stats/client.cpp:108
      #1  0x0000555555c3603f in StatsdClient::timing (this=0x0, key="CheckBlock_us", ms=2847, sample_rate=1) at stats/client.cpp:100
      #2  0x0000555555ca0b51 in CheckBlock (block=..., state=..., consensusParams=..., fCheckPOW=true, fCheckMerkleRoot=true) at validation.cpp:3932
      #3  0x000055555560cca5 in DeserializeAndCheckBlockTest(ankerl::nanobench::Bench&)::$_0::operator()() const (this=0x7fffffffd120) at bench/checkblock.cpp:47
      #4  ankerl::nanobench::Bench::run<DeserializeAndCheckBlockTest(ankerl::nanobench::Bench&)::$_0>(DeserializeAndCheckBlockTest(ankerl::nanobench::Bench&)::$_0&&) (this=0x7fffffffd430, op=...) at ./bench/nanobench.h:1221
      #5  0x000055555560c74b in DeserializeAndCheckBlockTest (bench=...) at bench/checkblock.cpp:40
      #6  0x00005555555e2f4a in std::function<void (ankerl::nanobench::Bench&)>::operator()(ankerl::nanobench::Bench&) const (this=0x555556942a90, __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:591
      #7  benchmark::BenchRunner::RunAll (args=...) at bench/bench.cpp:120
      #8  0x00005555555de5f4 in main (argc=<optimized out>, argv=<optimized out>) at bench/bench_bitcoin.cpp:134
      ```

      </details>

    * The benchmark `BLS_Verify_BatchedParallel` also has a data race that was spotted courtesy of `make check` ([build](https://github.com/dashpay/dash/actions/runs/16295305205/job/46015360773#step:10:183)) that has been resolved by using an atomic.

  * To allow backporting [bitcoin#27598](bitcoin#27598), the naming scheme changes in `src/bench/crypto_hash.cpp` introduced by [dash#1925](#1925) have been effectively reverted and Dash-specific benchmarks have been moved downward to mitigate merge conflict issues.
    * Similarly, changes to minimum iteration and batching by input size were made to realign with upstream.

  * The order of libraries for the linker had to be rearranged as the linker reads dependencies from left to right (i.e. top to bottom), which makes them ordering sensitive. Without this, the linker complains visibly (see below).

      <details>

      <summary>Linker errors:</summary>

      ```
      $ make -j10
      [...]
        AR       libbitcoin_node.a
        CXXLD    dashd
        CXXLD    test/test_dash
        CXXLD    qt/dash-qt
        CXXLD    test/fuzz/fuzz
        CXXLD    qt/test/test_dash-qt
        CXXLD    bench/bench_dash
      /usr/bin/ld: libtest_util.a(libtest_util_a-setup_common.o): in function `DashChainstateSetup(ChainstateManager&, node::NodeContext&, bool, bool, Consensus::Params const&)':
      /src/dash/dash/src/test/util/setup_common.cpp:128: undefined reference to `node::DashChainstateSetup(ChainstateManager&, CGovernanceManager&, CMasternodeMetaMan&, CMasternodeSync&, CSporkManager&, std::unique_ptr<CActiveMasternodeManager, std::default_delete<CActiveMasternodeManager> >&, std::unique_ptr<CChainstateHelper, std::default_delete<CChainstateHelper> >&, std::unique_ptr<CCreditPoolManager, std::default_delete<CCreditPoolManager> >&, std::unique_ptr<CDeterministicMNManager, std::default_delete<CDeterministicMNManager> >&, std::unique_ptr<CEvoDB, std::default_delete<CEvoDB> >&, std::unique_ptr<CMNHFManager, std::default_delete<CMNHFManager> >&, std::unique_ptr<LLMQContext, std::default_delete<LLMQContext> >&, CTxMemPool*, bool, bool, Consensus::Params const&)'
      /usr/bin/ld: libtest_util.a(libtest_util_a-setup_common.o): in function `DashChainstateSetupClose(node::NodeContext&)':
      /src/dash/dash/src/test/util/setup_common.cpp:136: undefined reference to `node::DashChainstateSetupClose(std::unique_ptr<CChainstateHelper, std::default_delete<CChainstateHelper> >&, std::unique_ptr<CCreditPoolManager, std::default_delete<CCreditPoolManager> >&, std::unique_ptr<CDeterministicMNManager, std::default_delete<CDeterministicMNManager> >&, std::unique_ptr<CMNHFManager, std::default_delete<CMNHFManager> >&, std::unique_ptr<LLMQContext, std::default_delete<LLMQContext> >&, CTxMemPool*)'
      /usr/bin/ld: libtest_util.a(libtest_util_a-setup_common.o): in function `BasicTestingSetup::BasicTestingSetup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<char const*, std::allocator<char const*> > const&)':
      [...]
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      make[2]: *** [Makefile:7723: test/fuzz/fuzz] Error 1
      make[2]: *** Waiting for unfinished jobs....
      make[2]: Leaving directory '/src/dash/dash/src'
      make[1]: *** [Makefile:20430: all-recursive] Error 1
      make[1]: Leaving directory '/src/dash/dash/src'
      make: *** [Makefile:796: all-recursive] Error 1
      ```

     </details>

  ## Breaking Changes

  None expected.

  ## Checklist

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)**
  - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)**
  - [x] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  PastaPastaPasta:
    utACK 2bb8e5b

Tree-SHA512: 1771f6af9d7a4da9f5417dc60fa0a69f7f1adb1c366ab458d9c2d572c2924ac03fd2c4b9e00cda8a63022cbc193b6b262349c5eab79654a1d0006e5e65327340
@bitcoin bitcoin locked and limited conversation to collaborators Jul 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build: Configuring with -mno-sse4.1 does not fail the sse4.1 instrinsics check

10 participants