feefrac: 128-bit multiply support in MSVC #29758

sipa · 2024-03-28T15:33:00Z

Feerate comparisons in the recently (#29242) introduced FeeFrac type rely on multiplications between 32-bit and 64-bit integers. On 64-bit systems, hardware can do this natively. On GCC and Clang we can use the __int128 type for this, but on MSVC one needs to use the _mul128 or _mulh intrinsics instead. This PR adds the use of _mul128 which is available on x86_64 systems.

Performance of these operations isn't currently very important, but they will become crucial with cluster mempool.

I have not tested this code myself, though it's based on similar code in libsecp256k1 (see https://github.com/bitcoin-core/secp256k1/blob/v0.4.1/src/int128_struct_impl.h#L7L30).

DrahtBot · 2024-03-28T15:33:03Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	theuni
Concept ACK	hebasto

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

hebasto

Concept ACK.

src/util/feefrac.h

sipa · 2024-03-28T15:58:49Z

Hmm, does our CI not run the fuzz corpus when building with MSVC?

dergoegge · 2024-03-28T16:04:44Z

Hmm, does our CI not run the fuzz corpus when building with MSVC?

Afaik it does not but it would probably make sense. We also do it in the MacOS job.

hebasto · 2024-03-28T16:04:57Z

Hmm, does our CI not run the fuzz corpus when building with MSVC?

No, it does not. Even a fuzz binary is not built.

sipa · 2024-03-28T16:12:16Z

I think it would be very useful if we did. Fuzzing itself won't work in MSVC, but I don't think there is a technical reason why we can't run the existing fuzz corpus.

18fd522 ci, msvc: Add "Run fuzz binaries" step (Hennadii Stepanov) 52933d7 fuzz: Pass `SystemRoot` environment variable to subprocess (Hennadii Stepanov) 23cb820 ci, msvc: Add "Clone fuzz corpus" step (Hennadii Stepanov) 19dcedd build, msvc: Build `fuzz.exe` binary (Hennadii Stepanov) 4c078d7 build, msvc: Enable preprocessor conformance mode (Hennadii Stepanov) 09f5a74 fuzz: Re-implement `read_stdin` in portable way (Hennadii Stepanov) Pull request description: Closes #29760. Suggested in #29758 (comment). ACKs for top commit: maflcko: lgtm ACK 18fd522 🔍 sipsorcery: tACK 18fd522 sipa: utACK 18fd522 Tree-SHA512: 672ed6926ee9091f68f13780e77b60fc1d48731f16e847d849374f8426ffe1dafd9bcab06a27af62e8052ba345bb57f20f40579d6be8540c12ef85c23a6eec8b

theuni

utACK 5fb70b5

(I didn't look into the additional unit tests)

Godbolt output here:
https://godbolt.org/z/8Msz984bv

Compiles down to roughly the same.

theuni · 2024-05-28T16:09:21Z

src/util/feefrac.h

+    {
+        // On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.
+        std::pair<int64_t, uint64_t> ret;
+        ret.second = _mul128(a, b, &ret.first);


To help other reviewers:

__int64 _mul128( __int64 Multiplier, __int64 Multiplicand, __int64 *HighProduct ); Return value The low 64 bits of the product.

So this should result in:
ret = <high, low>

Which matches the other impls.

Note the weirdly signed return value __int64 here, which IMHO makes no sense whatsoever. Semantically, the return value is 2⁶⁴HighProduct + (uint64_t)ret.

hebasto · 2024-06-03T10:01:27Z

src/util/feefrac.h

        return __int128{a} * b;
    }
+#elif defined(_MSC_VER) && defined(_M_X64)
+    static inline std::pair<int64_t, uint64_t> Mul(int64_t a, int32_t b) noexcept


Why do this Mul's return type and MulFallback's one differ?

That's the point; they represent the 96-bit integer product in a different way.

It would be possible to use std::pair<int32_t, uint64_t> here instead (as in, that type would be big enough to store the result), but _mul128 returns 64-bit results, which match the register size of x86_64, so no conversion is needed.

Also note that std::pair<int64_t, uint64_t> probably roughly corresponds to how __int128 is represented internally (the CPU has no 128-bit general-purpose registers, and the mul instruction returns two 64-bit registers).

hebasto · 2024-06-04T09:09:22Z

I've add a benchmark for the FeeRateCompare function, which calls Mul twice.

Here are results on Windows 11, Release configuration, which implies /O2 /Oi compile flags:

master branch @ 61de64d:

> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      680,300,487.93 |    1.9% |      0.17 | `FeefracMultipication`
> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      679,903,607.75 |    1.1% |      0.18 | `FeefracMultipication`
> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.43 |      698,557,147.13 |    1.3% |      0.17 | `FeefracMultipication`

this PR:

> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.53 |      654,804,351.62 |    0.7% |      0.18 | `FeefracMultipication`
> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.52 |      659,374,380.82 |    0.7% |      0.18 | `FeefracMultipication`
> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.51 |      660,797,183.37 |    0.6% |      0.18 | `FeefracMultipication`

The performance worsened by ~4%.

cc @sipsorcery

FWIW, this benchmark shows <10% of performance improvement for __int128 implementation comparing to the naive one on Linux.

hebasto · 2024-06-04T10:45:01Z

src/util/feefrac.h

+        // On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.
+        std::pair<int64_t, uint64_t> ret;
+        ret.second = _mul128(a, b, &ret.first);
+        return ret;


on MSVC one needs to use the _mul128 or _mulh intrinsics instead

From my research, it follows that __mulh seems more performant.

This code:

Suggested change

// On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.

std::pair<int64_t, uint64_t> ret;

ret.second = _mul128(a, b, &ret.first);

return ret;

// On 64-bit MSVC, use __mulh intrinsic for wide multiplication.

return {__mulh(a, b), std::bit_cast<uint64_t>(a) * b};

gives the following numbers for the benchmark:

> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1.41 | 707,119,862.93 | 2.6% | 0.17 | `FeefracMultipication` > build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1.36 | 734,209,621.15 | 0.9% | 0.16 | `FeefracMultipication` > build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1.39 | 720,775,228.66 | 3.1% | 0.17 | `FeefracMultipication`

sipa · 2024-06-04T17:36:14Z

@hebasto Would you mind benchmarking with smaller realistic numbers (say fee and size both between 0 and 2^30)? If the top limb is equal, it's possible the naive code does worse.

hebasto · 2024-06-06T12:19:21Z

@sipa

Would you mind benchmarking with smaller realistic numbers (say fee and size both between 0 and 2^30)?

I switched to FastRandomContext::randbits(30).

If the top limb is equal, it's possible the naive code does worse.

Numbers have not changed:

the master branch:

> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      682,496,667.82 |    1.1% |      0.17 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.46 |      686,538,677.70 |    1.3% |      0.17 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.43 |      701,448,254.82 |    1.5% |      0.17 | `FeefracMultipication`

this PR:

> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.55 |      646,069,682.69 |    0.8% |      0.18 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.52 |      656,962,444.61 |    0.6% |      0.18 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.53 |      653,076,477.59 |    0.9% |      0.18 | `FeefracMultipication`

sipa · 2024-06-12T01:07:49Z

@hebasto Thanks. I was assuming this would be an obvious improvement, but if it isn't, it'll need some more investigation into what this is all compiled to. That's not something I'm interested in doing for a half-supported architecture.

hebasto reviewed Mar 28, 2024

View reviewed changes

src/util/feefrac.h Outdated Show resolved Hide resolved

sipa force-pushed the 202403_feefrac128_msvc branch from 6988dd6 to acf59f5 Compare March 28, 2024 17:32

maflcko mentioned this pull request Mar 28, 2024

build: Enable Fuzz binary in MSVC #29760

Closed

sipa force-pushed the 202403_feefrac128_msvc branch from acf59f5 to d2b94d3 Compare March 28, 2024 19:39

sipa added 2 commits March 29, 2024 20:03

feefrac: 128-bit multiply support in MSVC

119f0a5

feefrac: add some more unit tests

5fb70b5

sipa force-pushed the 202403_feefrac128_msvc branch from d2b94d3 to 5fb70b5 Compare March 30, 2024 00:03

hebasto mentioned this pull request Mar 31, 2024

build: Enable fuzz binary in MSVC #29774

Merged

DrahtBot added CI failed and removed CI failed labels Apr 18, 2024

theuni approved these changes May 28, 2024

View reviewed changes

DrahtBot requested a review from hebasto May 28, 2024 16:28

hebasto reviewed Jun 3, 2024

View reviewed changes

DrahtBot requested a review from hebasto June 3, 2024 10:01

hebasto reviewed Jun 4, 2024

View reviewed changes

DrahtBot requested a review from hebasto June 4, 2024 10:45

sipa closed this Jun 12, 2024

bitcoin locked and limited conversation to collaborators Jun 12, 2025

feefrac: 128-bit multiply support in MSVC #29758

feefrac: 128-bit multiply support in MSVC #29758

Uh oh!

Conversation

sipa commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Uh oh!

hebasto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sipa commented Mar 28, 2024

Uh oh!

dergoegge commented Mar 28, 2024

Uh oh!

hebasto commented Mar 28, 2024

Uh oh!

sipa commented Mar 28, 2024

Uh oh!

theuni left a comment

Choose a reason for hiding this comment

Uh oh!

theuni May 28, 2024

Choose a reason for hiding this comment

Uh oh!

sipa May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hebasto Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

sipa Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

sipa Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hebasto commented Jun 4, 2024

Uh oh!

hebasto Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

sipa commented Jun 4, 2024

Uh oh!

hebasto commented Jun 6, 2024

Uh oh!

sipa commented Jun 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sipa commented Mar 28, 2024 •

edited

Loading

DrahtBot commented Mar 28, 2024 •

edited

Loading

sipa May 28, 2024 •

edited

Loading

sipa Jun 3, 2024 •

edited

Loading