Skip to content

Conversation

@colgreen
Copy link
Contributor

@colgreen colgreen commented Jan 24, 2021

Performance improvements, achieved by operating on a copy of the PRNG state copied into stack variables, and then back onto the heap when done. This benefits all of the various Next() overloads in the region of 10% better performance, and more than doubles performance of NextBytes().

The NextBytes() change means duplicating the PRNG inner logic inside that method, instead of calling out to NextUInt64(). This change alone seems to offer a 2x performance improvement, so is worth considering despite the code duplication. I think the fact that this works suggests that NextUInt64() isn't being inlined(?) (but perhaps I caused that by adding the extra method/stack variables).

I thought this was worth considering given the fairly chunky performance gains. Here are my own BenchmarkDotNet figures (the 10M suffix means a timing for 10 million ops, so divide by 10^7 to get per op time - or just look at the relative times for before and after :)

(for 64bit runtime)

Before

Method Mean Error StdDev
Next10M 25.61 ms 0.056 ms 0.050 ms
NextUpperBound10M 37.55 ms 0.080 ms 0.075 ms
NextLowerUpperBound10M 113.47 ms 0.149 ms 0.132 ms
NextDouble10M 31.40 ms 0.626 ms 0.792 ms
NextBytes100M 32.56 ms 0.078 ms 0.069 ms

After

Method Mean Error StdDev
Next10M 23.78 ms 0.280 ms 0.262 ms
NextUpperBound10M 30.91 ms 0.325 ms 0.304 ms
NextLowerUpperBound10M 102.07 ms 1.436 ms 1.344 ms
NextDouble10M 29.51 ms 0.584 ms 0.717 ms
NextBytes100M 14.36 ms 0.190 ms 0.177 ms

/cc @stephentoub

… state from the heap onto the stack before performong PRNG generation and mxing operations.
@ghost
Copy link

ghost commented Jan 24, 2021

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@colgreen colgreen changed the title Random performance tuning. System.Random performance tuning. Jan 24, 2021
@stephentoub
Copy link
Member

Can you share the cited perf tests?

@colgreen
Copy link
Contributor Author

@stephentoub
Copy link
Member

Thanks.

I think the fact that this works suggests that NextUInt64() isn't being inlined(?)

No, the NextUInt32/64 methods are getting inlined. It's just that the tight loop ends up having a lot more moves when the fields are involved.
image

@stephentoub
Copy link
Member

@AndyAyersMS, thoughts on this from a JIT perspective? Would you expect these kinds of code changes to be necessary, or would you expect the JIT to be able to "get this"?

@colgreen
Copy link
Contributor Author

Would you expect these kinds of code changes to be necessary, or would you expect the JIT to be able to "get this"?

Bear in mind that the baseline version (i.e. prior to my change) changes the object fields on each loop, whereas my change avoids this, so I think it is technically a change in functionality, and thus something that the JIT cannot do (because you may want that field state change - it just so happens we don't need to update it right away).

@stephentoub
Copy link
Member

stephentoub commented Jan 25, 2021

so I think it is technically a change in functionality, and thus something that the JIT cannot do

They're not volatile, so as long as the difference can't be observed by this thread (e.g. if some method were being called, since it might look at the field value), technically it should be fine. But maybe we just never eliminate field writes out of an abundance of caution for asynchronous exceptions (e.g. thread aborts in the old days, OOMs, etc.) or something like that?

@AndyAyersMS
Copy link
Member

would you expect the JIT to be able to "get this"?

No, this isn't something the JIT does currently. Class field loads and subsequent operations can be CSEd (and hence held in registers), but nothing can promote entire webs involving stores...

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stephentoub stephentoub merged commit 9f193c9 into dotnet:master Jan 26, 2021
@colgreen colgreen deleted the random-perftune branch January 26, 2021 15:53
@danmoseley
Copy link
Member

We're missing perf tests for the new API and for the unseeded Random. I'll add something.

@adamsitnik adamsitnik added this to the 6.0.0 milestone Jan 27, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants