Use arithmetic primitives from intops#170
Conversation
|
I've removed the entire private/primitives directory and updated all places where functions from its modules had been used so that they used intops instead. Currently, I'm seeing a ×2 drop in latency, roughly 13k ms to 26k ms (running I'll keep the PR as a draft until I'm able to improve the performance. |
|
I've found the source of the performance regression. The reason is that in intops, a pure Nim implementation of an operation is guaranteed to rely only on other pure Nim implementations. For composite ops like mulAcc this effectively omits all optimized implementations. |
|
@arnetheduck the performance of the original and intops versions is now the same. At least, I can't see any difference in my local benchmarks. That makes this PR ready for review. |
…ed during compilation. Discussion: vacp2p/nim-intops#20 (comment)
| # Undo normalization | ||
| result = result shr clz | ||
|
|
||
| for i in countdown(a.high, 0): |
There was a problem hiding this comment.
this should be staticFor to unroll the loop at compile time
There was a problem hiding this comment.
actually, it looks like we only unroll loops sometimes - let's leave that for a separate PR so that it can be measured appropriately.
There was a problem hiding this comment.
@arnetheduck made a separate PR from this branch to test this variant: #174
Locally, I'm not seeing any difference in performance (or a slight drop):
$ git co feature/intops
$ nim r -d:danger --passC:"-march=native -O3" .\benchmarks\bench.nim
Modmul (stint): 15112 ms
$ git co feature/intops_staticfor
$ nim r -d:danger --passC:"-march=native -O3" .\benchmarks\bench.nim
Modmul (stint): 15734 msThere was a problem hiding this comment.
Locally, I'm not seeing any difference in performance
yes, this is somewhat expected in a "simple" / small test since the C compiler will also do unrolling so it will largely depend on the target CPU, C compiler and its options whether it can "understand" an unrolled loop better or not .. the opportunity in "pre-unrolling" here instead of waiting for the C compiler to do it is that it's a targeted unroll for an operation that almost always benefits from it, until general C compiler optimizations that are applied to all loops and therefore tend to be more conservative in their unrolling.
There was a problem hiding this comment.
@arnetheduck I've added a template the injects staticFor for StInt[4096] and below (can be tuned but seems reasonable) and regular loops for larger StInts.
Locally, I'm still not seeing any improvements but I've decided to push a PR to get it reviewed nonetheless: #175
| for i in 0 ..< a.limbs.len: | ||
| subB(borrow, diff, a[i], b[i], borrow) | ||
| return bool(borrow) | ||
| (diff, borrow) = borrowingSub(a[i], b[i], borrow) |
There was a problem hiding this comment.
since we discard the diff here, we should have variations of borrowingSub in intops that only return the carry (since this is the minimal information needed to correctly implement this algo)
There was a problem hiding this comment.
I've created an issue for that: vacp2p/nim-intops#22
arnetheduck
left a comment
There was a problem hiding this comment.
LGTM - next step before merging is a PR in nimbus-eth1 that validates this change with the EVM code and the integration tests that run there (they are heavy on stint ops)
…optimize the proc. Discussion: status-im#170 (comment)
|
nimbus-eth1 PR: status-im/nimbus-eth1#3956 |
|
nimbus-eth2 PR: status-im/nimbus-eth2#7938 |
…on, not forks. The forks have been merged to the upstream repos, so this is no longer needed. - status-im/nim-stint#170 - status-im/nim-bncurve#16
…cy (#3956) * Deps: Add intops as a new dependency, use stint and bncurve forks that use intops. * Bump copyright year. * Update stint and bncurve dependencies to point to the status-im version, not forks. The forks have been merged to the upstream repos, so this is no longer needed. - status-im/nim-stint#170 - status-im/nim-bncurve#16 * Update stint and bncurve submodules. * Replace tabs with spaces.
No description provided.