-
Notifications
You must be signed in to change notification settings - Fork 5.3k
System.Random performance tuning. #47390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bring up to date.
… state from the heap onto the stack before performong PRNG generation and mxing operations.
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
|
Can you share the cited perf tests? |
|
@AndyAyersMS, thoughts on this from a JIT perspective? Would you expect these kinds of code changes to be necessary, or would you expect the JIT to be able to "get this"? |
Bear in mind that the baseline version (i.e. prior to my change) changes the object fields on each loop, whereas my change avoids this, so I think it is technically a change in functionality, and thus something that the JIT cannot do (because you may want that field state change - it just so happens we don't need to update it right away). |
They're not volatile, so as long as the difference can't be observed by this thread (e.g. if some method were being called, since it might look at the field value), technically it should be fine. But maybe we just never eliminate field writes out of an abundance of caution for asynchronous exceptions (e.g. thread aborts in the old days, OOMs, etc.) or something like that? |
No, this isn't something the JIT does currently. Class field loads and subsequent operations can be CSEd (and hence held in registers), but nothing can promote entire webs involving stores... |
stephentoub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro128StarStarImpl.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro128StarStarImpl.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro256StarStarImpl.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro256StarStarImpl.cs
Outdated
Show resolved
Hide resolved
|
We're missing perf tests for the new API and for the unseeded Random. I'll add something. |

Performance improvements, achieved by operating on a copy of the PRNG state copied into stack variables, and then back onto the heap when done. This benefits all of the various Next() overloads in the region of 10% better performance, and more than doubles performance of NextBytes().
The NextBytes() change means duplicating the PRNG inner logic inside that method, instead of calling out to NextUInt64(). This change alone seems to offer a 2x performance improvement, so is worth considering despite the code duplication. I think the fact that this works suggests that NextUInt64() isn't being inlined(?) (but perhaps I caused that by adding the extra method/stack variables).
I thought this was worth considering given the fairly chunky performance gains. Here are my own BenchmarkDotNet figures (the 10M suffix means a timing for 10 million ops, so divide by 10^7 to get per op time - or just look at the relative times for before and after :)
(for 64bit runtime)
Before
After
/cc @stephentoub