Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Mar 28, 2024

Feerate comparisons in the recently (#29242) introduced FeeFrac type rely on multiplications between 32-bit and 64-bit integers. On 64-bit systems, hardware can do this natively. On GCC and Clang we can use the __int128 type for this, but on MSVC one needs to use the _mul128 or _mulh intrinsics instead. This PR adds the use of _mul128 which is available on x86_64 systems.

Performance of these operations isn't currently very important, but they will become crucial with cluster mempool.

I have not tested this code myself, though it's based on similar code in libsecp256k1 (see https://github.com/bitcoin-core/secp256k1/blob/v0.4.1/src/int128_struct_impl.h#L7L30).

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 28, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK theuni
Concept ACK hebasto

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

@sipa
Copy link
Member Author

sipa commented Mar 28, 2024

Hmm, does our CI not run the fuzz corpus when building with MSVC?

@dergoegge
Copy link
Member

Hmm, does our CI not run the fuzz corpus when building with MSVC?

Afaik it does not but it would probably make sense. We also do it in the MacOS job.

@hebasto
Copy link
Member

hebasto commented Mar 28, 2024

Hmm, does our CI not run the fuzz corpus when building with MSVC?

No, it does not. Even a fuzz binary is not built.

@sipa
Copy link
Member Author

sipa commented Mar 28, 2024

I think it would be very useful if we did. Fuzzing itself won't work in MSVC, but I don't think there is a technical reason why we can't run the existing fuzz corpus.

@sipa sipa force-pushed the 202403_feefrac128_msvc branch from 6988dd6 to acf59f5 Compare March 28, 2024 17:32
@sipa sipa force-pushed the 202403_feefrac128_msvc branch from acf59f5 to d2b94d3 Compare March 28, 2024 19:39
@sipa sipa force-pushed the 202403_feefrac128_msvc branch from d2b94d3 to 5fb70b5 Compare March 30, 2024 00:03
fanquake added a commit that referenced this pull request Apr 28, 2024
18fd522 ci, msvc: Add "Run fuzz binaries" step (Hennadii Stepanov)
52933d7 fuzz: Pass `SystemRoot` environment variable to subprocess (Hennadii Stepanov)
23cb820 ci, msvc: Add "Clone fuzz corpus" step (Hennadii Stepanov)
19dcedd build, msvc: Build `fuzz.exe` binary (Hennadii Stepanov)
4c078d7 build, msvc: Enable preprocessor conformance mode (Hennadii Stepanov)
09f5a74 fuzz: Re-implement `read_stdin` in portable way (Hennadii Stepanov)

Pull request description:

  Closes #29760.

  Suggested in #29758 (comment).

ACKs for top commit:
  maflcko:
    lgtm ACK 18fd522 🔍
  sipsorcery:
    tACK 18fd522
  sipa:
    utACK 18fd522

Tree-SHA512: 672ed6926ee9091f68f13780e77b60fc1d48731f16e847d849374f8426ffe1dafd9bcab06a27af62e8052ba345bb57f20f40579d6be8540c12ef85c23a6eec8b
Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5fb70b5

(I didn't look into the additional unit tests)

Godbolt output here:
https://godbolt.org/z/8Msz984bv

Compiles down to roughly the same.

{
// On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.
std::pair<int64_t, uint64_t> ret;
ret.second = _mul128(a, b, &ret.first);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help other reviewers:

__int64 _mul128(
   __int64 Multiplier,
   __int64 Multiplicand,
   __int64 *HighProduct
);

Return value
The low 64 bits of the product.

So this should result in:
ret = <high, low>

Which matches the other impls.

Copy link
Member Author

@sipa sipa May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the weirdly signed return value __int64 here, which IMHO makes no sense whatsoever. Semantically, the return value is 264HighProduct + (uint64_t)ret.

@DrahtBot DrahtBot requested a review from hebasto May 28, 2024 16:28
return __int128{a} * b;
}
#elif defined(_MSC_VER) && defined(_M_X64)
static inline std::pair<int64_t, uint64_t> Mul(int64_t a, int32_t b) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this Mul's return type and MulFallback's one differ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point; they represent the 96-bit integer product in a different way.

Copy link
Member Author

@sipa sipa Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to use std::pair<int32_t, uint64_t> here instead (as in, that type would be big enough to store the result), but _mul128 returns 64-bit results, which match the register size of x86_64, so no conversion is needed.

Also note that std::pair<int64_t, uint64_t> probably roughly corresponds to how __int128 is represented internally (the CPU has no 128-bit general-purpose registers, and the mul instruction returns two 64-bit registers).

@DrahtBot DrahtBot requested a review from hebasto June 3, 2024 10:01
@hebasto
Copy link
Member

hebasto commented Jun 4, 2024

I've add a benchmark for the FeeRateCompare function, which calls Mul twice.

Here are results on Windows 11, Release configuration, which implies /O2 /Oi compile flags:

> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      680,300,487.93 |    1.9% |      0.17 | `FeefracMultipication`
> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      679,903,607.75 |    1.1% |      0.18 | `FeefracMultipication`
> .\build_msvc\master\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.43 |      698,557,147.13 |    1.3% |      0.17 | `FeefracMultipication`
  • this PR:
> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.53 |      654,804,351.62 |    0.7% |      0.18 | `FeefracMultipication`
> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.52 |      659,374,380.82 |    0.7% |      0.18 | `FeefracMultipication`
> .\build_msvc\pr29758\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.51 |      660,797,183.37 |    0.6% |      0.18 | `FeefracMultipication`

The performance worsened by ~4%.

cc @sipsorcery


FWIW, this benchmark shows <10% of performance improvement for __int128 implementation comparing to the naive one on Linux.

Comment on lines +66 to +69
// On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.
std::pair<int64_t, uint64_t> ret;
ret.second = _mul128(a, b, &ret.first);
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on MSVC one needs to use the _mul128 or _mulh intrinsics instead

From my research, it follows that __mulh seems more performant.

This code:

Suggested change
// On 64-bit MSVC, use _mul128 intrinsic for wide multiplication.
std::pair<int64_t, uint64_t> ret;
ret.second = _mul128(a, b, &ret.first);
return ret;
// On 64-bit MSVC, use __mulh intrinsic for wide multiplication.
return {__mulh(a, b), std::bit_cast<uint64_t>(a) * b};

gives the following numbers for the benchmark:

> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.41 |      707,119,862.93 |    2.6% |      0.17 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.36 |      734,209,621.15 |    0.9% |      0.16 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.39 |      720,775,228.66 |    3.1% |      0.17 | `FeefracMultipication`

@DrahtBot DrahtBot requested a review from hebasto June 4, 2024 10:45
@sipa
Copy link
Member Author

sipa commented Jun 4, 2024

@hebasto Would you mind benchmarking with smaller realistic numbers (say fee and size both between 0 and 2^30)? If the top limb is equal, it's possible the naive code does worse.

@hebasto
Copy link
Member

hebasto commented Jun 6, 2024

@sipa

Would you mind benchmarking with smaller realistic numbers (say fee and size both between 0 and 2^30)?

I switched to FastRandomContext::randbits(30).

If the top limb is equal, it's possible the naive code does worse.

Numbers have not changed:

  • the master branch:
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.47 |      682,496,667.82 |    1.1% |      0.17 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.46 |      686,538,677.70 |    1.3% |      0.17 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.43 |      701,448,254.82 |    1.5% |      0.17 | `FeefracMultipication`
  • this PR:
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.55 |      646,069,682.69 |    0.8% |      0.18 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.52 |      656,962,444.61 |    0.6% |      0.18 | `FeefracMultipication`
> build_msvc\x64\Release\bench_bitcoin.exe -filter=FeefracMultipication

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.53 |      653,076,477.59 |    0.9% |      0.18 | `FeefracMultipication`

@sipa
Copy link
Member Author

sipa commented Jun 12, 2024

@hebasto Thanks. I was assuming this would be an obvious improvement, but if it isn't, it'll need some more investigation into what this is all compiled to. That's not something I'm interested in doing for a half-supported architecture.

@sipa sipa closed this Jun 12, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Jun 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants