Skip to content

Conversation

@AaronChen0
Copy link
Contributor

@AaronChen0 AaronChen0 commented Mar 22, 2024

Using 4 local uint64 variable enables register allocation instead of memory allocation for uint256.
This improve the performance of Mul, squared, Exp about 40%.


Edit:
I found out later that the above is not the reason why this pull request can boost performance.
Optimzing uint256.Set achieves the same effect. See #153


Running

go test ./...

returns

ok  	github.com/holiman/uint256	0.978s

Benchmark

go test -run - -bench BenchmarkExp -benchmem -count=10 >/tmp/old

squared benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Square/single/uint256-16   7.235n ± 3%   4.304n ± 1%  -40.52% (p=0.000 n=10)
Square/single/big-16       36.73n ± 1%   36.72n ± 1%        ~ (p=0.782 n=10)
geomean                    16.30n        12.57n       -22.89%

                         │     old      │                 new                 │
                         │     B/op     │    B/op     vs base                 │
Square/single/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Square/single/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                         │     old      │                 new                 │
                         │  allocs/op   │ allocs/op   vs base                 │
Square/single/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Square/single/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Exp benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                     │     old     │                 new                 │
                     │   sec/op    │   sec/op     vs base                │
Exp/large/big-16       15.70µ ± 4%   16.22µ ± 2%   +3.30% (p=0.011 n=10)
Exp/large/uint256-16   3.986µ ± 1%   1.592µ ± 3%  -60.06% (p=0.000 n=10)
Exp/small/big-16       5.275µ ± 3%   5.277µ ± 2%        ~ (p=0.684 n=10)
Exp/small/uint256-16   338.6n ± 0%   137.6n ± 2%  -59.37% (p=0.000 n=10)
geomean                3.252µ        2.081µ       -36.01%

                     │      old       │                  new                  │
                     │      B/op      │     B/op      vs base                 │
Exp/large/big-16       17.72Ki ± 0%     17.72Ki ± 0%       ~ (p=1.000 n=10) ¹
Exp/large/uint256-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/big-16       7.219Ki ± 0%     7.219Ki ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/uint256-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                             ²                 +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                     │     old      │                 new                 │
                     │  allocs/op   │ allocs/op   vs base                 │
Exp/large/big-16       189.0 ± 0%     189.0 ± 0%       ~ (p=1.000 n=10) ¹
Exp/large/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/big-16       77.00 ± 0%     77.00 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                           ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Mul benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Mul/single/uint256-16              6.892n ± 3%   4.358n ± 1%  -36.77% (p=0.000 n=10)
Mul/single/big-16                  37.78n ± 1%   37.78n ± 1%        ~ (p=0.868 n=10)
MulOverflow/single/uint256-16      15.01n ± 2%   15.01n ± 2%        ~ (p=0.853 n=10)
MulOverflow/single/big-16          37.74n ± 1%   37.73n ± 2%        ~ (p=0.956 n=10)
MulMod/small/uint256-16            21.39n ± 1%   21.38n ± 1%        ~ (p=0.956 n=10)
MulMod/mod64/uint256-16            38.74n ± 1%   39.01n ± 1%   +0.68% (p=0.027 n=10)
MulMod/mod128/uint256-16           64.02n ± 5%   63.04n ± 1%   -1.53% (p=0.022 n=10)
MulMod/mod192/uint256-16           79.52n ± 0%   79.16n ± 1%        ~ (p=0.128 n=10)
MulMod/mod256/uint256-16           94.50n ± 1%   94.59n ± 1%        ~ (p=0.739 n=10)
MulMod/mod256/uint256r-16          44.35n ± 1%   43.95n ± 1%        ~ (p=0.109 n=10)
MulMod/small/big-16                37.88n ± 3%   38.48n ± 3%        ~ (p=0.052 n=10)
MulMod/mod64/big-16                62.54n ± 1%   61.74n ± 4%        ~ (p=0.436 n=10)
MulMod/mod128/big-16               210.6n ± 3%   209.9n ± 3%        ~ (p=0.684 n=10)
MulMod/mod192/big-16               243.6n ± 3%   240.1n ± 3%        ~ (p=0.481 n=10)
MulMod/mod256/big-16               276.8n ± 2%   277.4n ± 3%        ~ (p=0.631 n=10)
MulDivOverflow/small/uint256-16    2.099n ± 3%   1.972n ± 1%   -6.00% (p=0.000 n=10)
MulDivOverflow/div64/uint256-16    2.085n ± 1%   1.964n ± 0%   -5.80% (p=0.000 n=10)
MulDivOverflow/div128/uint256-16   2.143n ± 2%   2.030n ± 1%   -5.25% (p=0.000 n=10)
MulDivOverflow/div192/uint256-16   2.160n ± 1%   2.037n ± 3%   -5.69% (p=0.000 n=10)
MulDivOverflow/div256/uint256-16   2.220n ± 2%   2.095n ± 2%   -5.65% (p=0.000 n=10)
MulDivOverflow/small/big-16        14.10n ± 2%   14.17n ± 1%        ~ (p=0.148 n=10)
MulDivOverflow/div64/big-16        14.06n ± 2%   14.25n ± 4%   +1.35% (p=0.016 n=10)
MulDivOverflow/div128/big-16       16.79n ± 5%   16.96n ± 4%        ~ (p=0.353 n=10)
MulDivOverflow/div192/big-16       17.10n ± 2%   17.19n ± 3%        ~ (p=0.927 n=10)
MulDivOverflow/div256/big-16       18.89n ± 5%   18.68n ± 7%        ~ (p=0.280 n=10)
geomean                            22.14n        21.47n        -3.01%

                                 │     old      │                 new                  │
                                 │     B/op     │    B/op      vs base                 │
Mul/single/uint256-16              0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
Mul/single/big-16                  0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/uint256-16      0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/big-16          0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/small/uint256-16            0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod64/uint256-16            0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256r-16          0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/small/big-16                7.500 ± 7%     7.000 ± 14%       ~ (p=0.650 n=10)
MulMod/mod64/big-16                48.00 ± 0%     48.00 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/big-16               128.0 ± 0%     128.0 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/big-16               144.0 ± 0%     144.0 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/big-16               176.0 ± 0%     176.0 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/uint256-16    0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/uint256-16    0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/big-16        0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/big-16        0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/big-16       1.000 ± 0%     1.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/big-16       1.000 ± 0%     1.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/big-16       3.000 ± 0%     3.000 ± 33%       ~ (p=0.087 n=10)
geomean                                       ²                -0.28%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                 │     old      │                 new                 │
                                 │  allocs/op   │ allocs/op   vs base                 │
Mul/single/uint256-16              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Mul/single/big-16                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/uint256-16      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/big-16          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/small/uint256-16            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod64/uint256-16            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256r-16          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/small/big-16                0.000 ±  ?     0.000 ±  ?       ~ (p=1.000 n=10)
MulMod/mod64/big-16                1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/big-16        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/big-16        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                       ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@codecov
Copy link

codecov bot commented Mar 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (97405b6) to head (d6c1d7d).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #152   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1643      1642    -1     
=========================================
- Hits          1643      1642    -1     

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@holiman holiman changed the title uint256: optimize Mul, squared uint256: optimize Mul, squared and Exp Mar 25, 2024
@holiman holiman merged commit c9fc0ce into holiman:master Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants