Skip to content

Conversation

@liboz
Copy link
Contributor

@liboz liboz commented Sep 13, 2016

Thanks to @manofstick's test gist, I found out that one major slowdown for his test case was because of dereferencing. This PR switches to using a mutable variable instead of dereferencing multiple times. It makes the state machine more comparable to his test case (2x slower for 64bit and slightly faster for 32bit), those translate to a 3x speed increase for 64bit and a 7x for 32bit

@liboz
Copy link
Contributor Author

liboz commented Sep 13, 2016

Actually I'm pretty confused now. Looking at the git history, it was a mutable variable in @manofstick's original implementation, but it was changed to use a ref during some unknown code review step. Maybe someone can give some insight into this?

@dsyme
Copy link
Contributor

dsyme commented Sep 13, 2016

Th F# compiler is compiled with all warnings enabled, including the one for when "let mutable" values are promoted to refs, to help make allocations pain. So a compiler warning would be triggered by your code.

@liboz
Copy link
Contributor Author

liboz commented Sep 13, 2016

@dsyme I don't see any errors from using the mutable. That said instead of using mutable is it instead preferred to do something like the following so that we only have to dereference once per function?

let value : Ref<'T> = ref (n - LanguagePrimitives.GenericOne)

let inline current () =
    let derefValue = !value
    if derefValue < n then
        notStarted ()
    elif derefValue > m then
        alreadyFinished ()
    else 
        derefValue

@liboz liboz changed the title Using Mutable Int Instead of Dereferencing for RangeEnumerator Using Mutable instead of Dereferencing repeatedly for RangeEnumerator Sep 14, 2016
Copy link
Contributor

@forki forki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there similar functions which use ref cells?

@liboz
Copy link
Contributor Author

liboz commented Sep 16, 2016

@forki, I didn't see any in the same file. I didn't check too hard for other files though.

@KevinRansom
Copy link
Contributor

These look good to me.

@manofstick
Copy link
Contributor

This doesn't actually do anything; the "let mutable ..." does just get converted into a ref field, as per @dsyme's comments.

I think where the observed speed up comes from is that in 4.4.1.x there is the "singleStepRangeEnumerator" optimization which I did a while back (which I guess I thought already existed in the 4.4.0.x codebase... Seemed like a long time ago!)

Anyway, it appears that my original gist was a bit of a red herring. I still think that seq {} blocks should drag the iteration into the state machine, rather than deferring it to the helper enumerable, but the effect of this is obviously not as bad as it had once been.

I think this PR should just be closed.

@liboz liboz changed the title Using Mutable instead of Dereferencing repeatedly for RangeEnumerator Using Deferencing Only once per function call for RangeEnumerator Sep 18, 2016
@liboz liboz changed the title Using Deferencing Only once per function call for RangeEnumerator Deferencing only once per function call for RangeEnumerator Sep 18, 2016
@liboz
Copy link
Contributor Author

liboz commented Sep 18, 2016

@manofstick You're right the mutable doesn't do anything in reality. The emitted IL is effectively the same. However, I think if we were to make the relevant function only dereference once per function call, we would in fact get some savings. Here's some of my results (note that the new Range is marginally slower for upperbound is 10,000 and 1,000,000, though it is within the measurement error of the old range) :

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4710HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435778 ticks, Resolution=410.5464 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1586.0

Type=Range  Mode=Throughput  LaunchCount=5  
TargetCount=50  

Method Platform Jit upperBound Median StdDev Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
oldRange X64 RyuJit 10 108.4412 ns 1.9252 ns 0.10 - - 38.26
newRange X64 RyuJit 10 92.0246 ns 1.4069 ns 0.10 - - 37.93
oldRange X86 LegacyJit 10 106.0151 ns 1.5874 ns 0.06 - - 22.52
newRange X86 LegacyJit 10 95.8192 ns 2.0231 ns 0.06 - - 22.94
oldRange X64 RyuJit 100 621.1180 ns 7.3605 ns 0.10 - - 38.94
newRange X64 RyuJit 100 603.8810 ns 8.6720 ns 0.09 - - 37.64
oldRange X86 LegacyJit 100 566.0929 ns 8.8126 ns 0.06 - - 22.72
newRange X86 LegacyJit 100 514.6959 ns 8.2234 ns 0.06 - - 23.12
oldRange X64 RyuJit 1000 5,625.5500 ns 70.0235 ns 0.06 - - 37.51
newRange X64 RyuJit 1000 5,615.5864 ns 58.6605 ns 0.06 - - 37.76
oldRange X86 LegacyJit 1000 4,969.5424 ns 58.0718 ns 0.06 - - 22.72
newRange X86 LegacyJit 1000 4,473.5380 ns 56.8198 ns 0.06 - - 22.53
oldRange X64 RyuJit 10000 55,660.0148 ns 904.1689 ns - - - 40.87
newRange X64 RyuJit 10000 55,674.8740 ns 1,106.6792 ns - - - 40.40
oldRange X86 LegacyJit 10000 49,051.7305 ns 847.4043 ns - - - 24.11
newRange X86 LegacyJit 10000 43,441.4970 ns 632.6051 ns - - - 23.69
oldRange X64 RyuJit 1000000 5,554,898.6813 ns 62,065.7178 ns - - - 183.26
newRange X64 RyuJit 1000000 5,555,662.0410 ns 73,486.9761 ns - - - 168.34
oldRange X86 LegacyJit 1000000 4,890,223.9859 ns 68,223.0582 ns - - - 175.41
newRange X86 LegacyJit 1000000 4,334,440.3205 ns 69,872.2515 ns - - - 162.46

Here is the gist for that test: https://gist.github.com/liboz/c9e529d0292f25135243721157b93fae

@manofstick
Copy link
Contributor

(...to ensure consumption of "Current", you should probably change Seq.len to something like Seq.sumBy sign.)

Anyway, no detrimental effects, and bit of a kick with x86, so why not?

@KevinRansom KevinRansom merged commit 782451a into dotnet:master Sep 20, 2016
@forki
Copy link
Contributor

forki commented Sep 20, 2016

❤️

@OkayX6
Copy link
Contributor

OkayX6 commented Sep 20, 2016

(off-topic) Hey @forki, do you have like a performance comparison of the last releases of F# versus the current master? Just as a summary. It feels like a lot have been implemented lately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants