Deferencing only once per function call for RangeEnumerator #1532

liboz · 2016-09-13T14:00:46Z

Thanks to @manofstick's test gist, I found out that one major slowdown for his test case was because of dereferencing. This PR switches to using a mutable variable instead of dereferencing multiple times. It makes the state machine more comparable to his test case (2x slower for 64bit and slightly faster for 32bit), those translate to a 3x speed increase for 64bit and a 7x for 32bit

liboz · 2016-09-13T14:12:58Z

Actually I'm pretty confused now. Looking at the git history, it was a mutable variable in @manofstick's original implementation, but it was changed to use a ref during some unknown code review step. Maybe someone can give some insight into this?

dsyme · 2016-09-13T14:56:46Z

Th F# compiler is compiled with all warnings enabled, including the one for when "let mutable" values are promoted to refs, to help make allocations pain. So a compiler warning would be triggered by your code.

liboz · 2016-09-13T15:45:14Z

@dsyme I don't see any errors from using the mutable. That said instead of using mutable is it instead preferred to do something like the following so that we only have to dereference once per function?

let value : Ref<'T> = ref (n - LanguagePrimitives.GenericOne)

let inline current () =
    let derefValue = !value
    if derefValue < n then
        notStarted ()
    elif derefValue > m then
        alreadyFinished ()
    else 
        derefValue

forki

Are there similar functions which use ref cells?

liboz · 2016-09-16T09:01:40Z

@forki, I didn't see any in the same file. I didn't check too hard for other files though.

KevinRansom · 2016-09-16T16:15:29Z

These look good to me.

manofstick · 2016-09-17T04:35:42Z

This doesn't actually do anything; the "let mutable ..." does just get converted into a ref field, as per @dsyme's comments.

I think where the observed speed up comes from is that in 4.4.1.x there is the "singleStepRangeEnumerator" optimization which I did a while back (which I guess I thought already existed in the 4.4.0.x codebase... Seemed like a long time ago!)

Anyway, it appears that my original gist was a bit of a red herring. I still think that seq {} blocks should drag the iteration into the state machine, rather than deferring it to the helper enumerable, but the effect of this is obviously not as bad as it had once been.

I think this PR should just be closed.

liboz · 2016-09-18T01:33:32Z

@manofstick You're right the mutable doesn't do anything in reality. The emitted IL is effectively the same. However, I think if we were to make the relevant function only dereference once per function call, we would in fact get some savings. Here's some of my results (note that the new Range is marginally slower for upperbound is 10,000 and 1,000,000, though it is within the measurement error of the old range) :

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4710HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435778 ticks, Resolution=410.5464 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1586.0

Type=Range  Mode=Throughput  LaunchCount=5  
TargetCount=50

Method	Platform	Jit	upperBound	Median	StdDev	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
oldRange	X64	RyuJit	10	108.4412 ns	1.9252 ns	0.10	-	-	38.26
newRange	X64	RyuJit	10	92.0246 ns	1.4069 ns	0.10	-	-	37.93
oldRange	X86	LegacyJit	10	106.0151 ns	1.5874 ns	0.06	-	-	22.52
newRange	X86	LegacyJit	10	95.8192 ns	2.0231 ns	0.06	-	-	22.94
oldRange	X64	RyuJit	100	621.1180 ns	7.3605 ns	0.10	-	-	38.94
newRange	X64	RyuJit	100	603.8810 ns	8.6720 ns	0.09	-	-	37.64
oldRange	X86	LegacyJit	100	566.0929 ns	8.8126 ns	0.06	-	-	22.72
newRange	X86	LegacyJit	100	514.6959 ns	8.2234 ns	0.06	-	-	23.12
oldRange	X64	RyuJit	1000	5,625.5500 ns	70.0235 ns	0.06	-	-	37.51
newRange	X64	RyuJit	1000	5,615.5864 ns	58.6605 ns	0.06	-	-	37.76
oldRange	X86	LegacyJit	1000	4,969.5424 ns	58.0718 ns	0.06	-	-	22.72
newRange	X86	LegacyJit	1000	4,473.5380 ns	56.8198 ns	0.06	-	-	22.53
oldRange	X64	RyuJit	10000	55,660.0148 ns	904.1689 ns	-	-	-	40.87
newRange	X64	RyuJit	10000	55,674.8740 ns	1,106.6792 ns	-	-	-	40.40
oldRange	X86	LegacyJit	10000	49,051.7305 ns	847.4043 ns	-	-	-	24.11
newRange	X86	LegacyJit	10000	43,441.4970 ns	632.6051 ns	-	-	-	23.69
oldRange	X64	RyuJit	1000000	5,554,898.6813 ns	62,065.7178 ns	-	-	-	183.26
newRange	X64	RyuJit	1000000	*5,555,662.0410 ns*	73,486.9761 ns	-	-	-	168.34
oldRange	X86	LegacyJit	1000000	4,890,223.9859 ns	68,223.0582 ns	-	-	-	175.41
newRange	X86	LegacyJit	1000000	4,334,440.3205 ns	69,872.2515 ns	-	-	-	162.46

Here is the gist for that test: https://gist.github.com/liboz/c9e529d0292f25135243721157b93fae

manofstick · 2016-09-18T03:55:01Z

(...to ensure consumption of "Current", you should probably change Seq.len to something like Seq.sumBy sign.)

Anyway, no detrimental effects, and bit of a kick with x86, so why not?

forki · 2016-09-20T16:31:03Z

❤️

OkayX6 · 2016-09-20T17:23:28Z

(off-topic) Hey @forki, do you have like a performance comparison of the last releases of F# versus the current master? Just as a summary. It feels like a lot have been implemented lately?

msftclas added the cla-already-signed label Sep 13, 2016

liboz changed the title ~~Using Mutable Int Instead of Dereferencing for RangeEnumerator~~ Using Mutable instead of Dereferencing repeatedly for RangeEnumerator Sep 14, 2016

forki approved these changes Sep 16, 2016

View reviewed changes

deferencing only once per function call

a0831fb

liboz force-pushed the working2 branch from acb7822 to a0831fb Compare September 18, 2016 01:22

liboz changed the title ~~Using Mutable instead of Dereferencing repeatedly for RangeEnumerator~~ Using Deferencing Only once per function call for RangeEnumerator Sep 18, 2016

liboz changed the title ~~Using Deferencing Only once per function call for RangeEnumerator~~ Deferencing only once per function call for RangeEnumerator Sep 18, 2016

KevinRansom merged commit 782451a into dotnet:master Sep 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deferencing only once per function call for RangeEnumerator #1532

Deferencing only once per function call for RangeEnumerator #1532

Uh oh!

liboz commented Sep 13, 2016 •

edited

Loading

Uh oh!

liboz commented Sep 13, 2016

Uh oh!

dsyme commented Sep 13, 2016

Uh oh!

liboz commented Sep 13, 2016 •

edited

Loading

Uh oh!

forki left a comment

Uh oh!

liboz commented Sep 16, 2016

Uh oh!

KevinRansom commented Sep 16, 2016

Uh oh!

manofstick commented Sep 17, 2016

Uh oh!

liboz commented Sep 18, 2016

Uh oh!

manofstick commented Sep 18, 2016

Uh oh!

forki commented Sep 20, 2016

Uh oh!

OkayX6 commented Sep 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Deferencing only once per function call for RangeEnumerator #1532

Deferencing only once per function call for RangeEnumerator #1532

Uh oh!

Conversation

liboz commented Sep 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liboz commented Sep 13, 2016

Uh oh!

dsyme commented Sep 13, 2016

Uh oh!

liboz commented Sep 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forki left a comment

Choose a reason for hiding this comment

Uh oh!

liboz commented Sep 16, 2016

Uh oh!

KevinRansom commented Sep 16, 2016

Uh oh!

manofstick commented Sep 17, 2016

Uh oh!

liboz commented Sep 18, 2016

Uh oh!

manofstick commented Sep 18, 2016

Uh oh!

forki commented Sep 20, 2016

Uh oh!

OkayX6 commented Sep 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

liboz commented Sep 13, 2016 •

edited

Loading

liboz commented Sep 13, 2016 •

edited

Loading