Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Jan 7, 2016

copied from fsharp/fsharp#533

There may be a missing length check on exists, as without hte length check the cost of a comparison and a call is incurred.

Adding the simple len > 0 check before the loop appears (in the basic cases I have tested) to near triple performance.

let f arr = 
    Array.exists (fun e -> e = true) arr

for i in 0..60000000 do f ([||]:bool []) ;; // 1.8 seconds

let f2 arr = 
    if Array.length arr > 0 then
        Array.exists (fun e -> e = true) arr
    else 
        false

for i in 0..60000000 do f2 ([||]:bool []) ;; // 0.430 seconds

Note: this is something I have found an isolated instance of, it may be appropriate to ensure that such checks (if they are required, I can only speak for my use case) are consistant accross other modules, such as Seq\List.

*copied from fsharp/fsharp#533

There may be a missing length check on exists, as without hte length check the cost of a comparison and a `call` is incurred.

Adding the simple len > 0 check before the loop appears (in the basic cases I have tested) to near triple performance.
```
let f arr = 
    Array.exists (fun e -> e = true) arr

for i in 0..60000000 do f ([||]:bool []) ;; // 1.8 seconds

let f2 arr = 
    if Array.length arr > 0 then
        Array.exists (fun e -> e = true) arr
    else 
        false

for i in 0..60000000 do f2 ([||]:bool []) ;; // 0.430 seconds
```

Note: this is something I have found an isolated instance of, it may be appropriate to ensure that such checks (if they are required, I can only speak for my use case) are consistant accross other modules, such as Seq\List.
@msftclas
Copy link

msftclas commented Jan 7, 2016

Hi @xenocons, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

This seems like a small (but important) contribution, so no Contribution License Agreement is required at this point. Real humans will now evaluate your PR.

TTYL, MSBOT;

@vasily-kirichenko
Copy link
Contributor

👍

@forki
Copy link
Contributor

forki commented Jan 7, 2016

Are you sure your changes give the same perf advantage? Looks like the first call to the inner loop function has the same check. So you are optimizing one function call, right?

@mexx
Copy link
Contributor

mexx commented Jan 7, 2016

@forki according to my quick test on my machine with the inner implementation:

let exists (f: 'T -> bool) (array:'T[]) =
    let len = array.Length
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0

let exists2 (f: 'T -> bool) (array:'T[]) =
    let len = array.Length
    if len = 0 then false else
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0

#time
for i in 0..60000000 do exists (fun e -> e = true) ([||]:bool []) ;;
// Real: 00:00:00.506, CPU: 00:00:00.514, GC gen0: 343, gen1: 0, gen2: 0

for i in 0..60000000 do exists2 (fun e -> e = true) ([||]:bool []) ;;
// Real: 00:00:00.418, CPU: 00:00:00.421, GC gen0: 344, gen1: 0, gen2: 0

about 100ms for 60 billion calls

@forki
Copy link
Contributor

forki commented Jan 7, 2016

So it's indeed faster, but only a little bit. I wonder if a loop would be
better...
On Jan 7, 2016 12:23, "Max Malook" [email protected] wrote:

@forki https://github.com/forki according to my quick test on my
machine with the inner implementation:

let exists (f: 'T -> bool) (array:'T[]) =
let len = array.Length
let rec loop i = i < len && (f array.[i] || loop (i+1))
loop 0

let exists2 (f: 'T -> bool) (array:'T[]) =
let len = array.Length
if len = 0 then false else
let rec loop i = i < len && (f array.[i] || loop (i+1))
loop 0

#time
for i in 0..60000000 do exists (fun e -> e = true) ([||]:bool []) ;;
// Real: 00:00:00.506, CPU: 00:00:00.514, GC gen0: 343, gen1: 0, gen2: 0

for i in 0..60000000 do exists2 (fun e -> e = true) ([||]:bool []) ;;
Real: 00:00:00.418, CPU: 00:00:00.421, GC gen0: 344, gen1: 0, gen2: 0

about 100ms for 60 billion calls


Reply to this email directly or view it on GitHub
#854 (comment)
.

@mexx
Copy link
Contributor

mexx commented Jan 7, 2016

@forki I'm not good at loops, it's so imperative :)

@forki
Copy link
Contributor

forki commented Jan 7, 2016

yeah, but computers are really good in such things.

@ghost
Copy link
Author

ghost commented Jan 7, 2016

@mexx your same code on my PC, win10,16GB RAM,i7-4600U @ 2.10GHz, 64bit, 32bit FSI with --optimize:

Real: 00:00:00.927, CPU: 00:00:00.906, GC gen0: 686, gen1: 2, gen2: 1
val it : unit = ()
>
Real: 00:00:00.507, CPU: 00:00:00.500, GC gen0: 687, gen1: 0, gen2: 0
val it : unit = ()
>

@forki
Copy link
Contributor

forki commented Jan 7, 2016

Mhm. What about 1 element arrays? IIRC we have lots of functions that do:

match a.Length with
| 0 ->...
| 1 ->...
| n ->...

On Jan 8, 2016 00:46, "David Klein" [email protected] wrote:

@mexx https://github.com/mexx your same code on my PC, win10,16GB
RAM,i7-4600U @ 2.10GHz, 64bit, 32bit FSI with --optimize:

Real: 00:00:00.927, CPU: 00:00:00.906, GC gen0: 686, gen1: 2, gen2: 1
val it : unit = ()

Real: 00:00:00.507, CPU: 00:00:00.500, GC gen0: 687, gen1: 0, gen2: 0
val it : unit = ()


Reply to this email directly or view it on GitHub
#854 (comment)
.

@ghost
Copy link
Author

ghost commented Jan 8, 2016

FWIW, debug builds of originals, exists orig takes 20 instructions if empty array, exists2 takes 6 IL instructions if empty, this becomes clearer when you look at the assembler, exists2 exits the function reasonably efficiently.

I guess the question is, how often is the input to exists an empty array? In my case it happens a lot, but it turns out to be more efficient to do the Length = 0 check at the site of the my function that calls exists, rather than allow it to go into FSharp.Core first (as each function call incurs a cost), but from my personal expectations, I would assume something like exists to return immediately on empty, but once again this is my own personal tilt:

let exists (f: 'T -> bool) (array:'T[]) =
    let len = array.Length
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0

IL:
    IL_0000: nop
    IL_0001: ldarg.1
    IL_0002: ldlen
    IL_0003: conv.i4
    IL_0004: stloc.0
    IL_0005: ldarg.0
    IL_0006: stloc.1
    IL_0007: ldarg.1
    IL_0008: stloc.2
    IL_0009: ldloc.0
    IL_000a: stloc.3
    IL_000b: ldarg.0
    IL_000c: ldarg.1
    IL_000d: ldloc.0
    IL_000e: ldc.i4.0
    IL_000f: call bool Program::loop@5<!!T>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!0, bool>, !!0[], int32, int32)
    IL_0014: ret

ASM (trimmed to remove getting .Length):
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0
04B005BB  mov         ecx,dword ptr [ebp-14h]  
04B005BE  xor         edx,edx  
04B005C0  mov         eax,dword ptr [ecx]  
04B005C2  mov         eax,dword ptr [eax+28h]  
04B005C5  call        dword ptr [eax+10h]  
04B005C8  mov         dword ptr [ebp-8],eax  
04B005CB  movzx       eax,byte ptr [ebp-8]  
04B005CF  mov         esp,ebp  
04B005D1  pop         ebp  
04B005D2  ret  
let exists2 (f: 'T -> bool) (array:'T[]) =
    let len = array.Length
    if len = 0 then false else
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0

IL:
    IL_0000: nop
    IL_0001: ldarg.1
    IL_0002: ldlen
    IL_0003: conv.i4
    IL_0004: stloc.0
    IL_0005: ldloc.0
    IL_0006: brtrue.s IL_000a

    IL_0008: ldc.i4.0
    IL_0009: ret

    IL_000a: ldarg.0
    IL_000b: stloc.1
    IL_000c: ldarg.1
    IL_000d: stloc.2
    IL_000e: ldloc.0
    IL_000f: stloc.3
    IL_0010: ldarg.0
    IL_0011: ldarg.1
    IL_0012: ldloc.0
    IL_0013: ldc.i4.0
    IL_0014: call bool Program::'loop@11-1'<!!T>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!0, bool>, !!0[], int32, int32)
    IL_0019: ret

ASM(trimmed to remove getting .Length):
    if len = 0 then false else
02320766  cmp         dword ptr [ebp-4],0  
0232076A  jne         0232076F  
0232076C  nop  
0232076D  jmp         02320772  
0232076F  nop  
02320770  jmp         02320778  
02320772  xor         eax,eax  
02320774  mov         esp,ebp  
02320776  pop         ebp  
02320777  ret  
02320778  mov         ecx,0C169A0h  
0232077D  call        008B30F4  
02320782  mov         dword ptr [ebp-18h],eax  
02320785  push        dword ptr [ebp-10h]  
02320788  push        dword ptr [ebp-4]  
0232078B  mov         ecx,dword ptr [ebp-18h]  
0232078E  mov         edx,dword ptr [ebp-0Ch]  
02320791  call        dword ptr ds:[0C169C4h]  
02320797  mov         eax,dword ptr [ebp-18h]  
0232079A  mov         dword ptr [ebp-14h],eax  
    let rec loop i = i < len && (f array.[i] || loop (i+1))
    loop 0
0232079D  mov         ecx,dword ptr [ebp-14h]  
023207A0  xor         edx,edx  
023207A2  mov         eax,dword ptr [ecx]  
023207A4  mov         eax,dword ptr [eax+28h]  
023207A7  call        dword ptr [eax+10h]  
023207AA  mov         dword ptr [ebp-8],eax  
023207AD  movzx       eax,byte ptr [ebp-8]  
023207B1  mov         esp,ebp  
023207B3  pop         ebp  
023207B4  ret  

update:fixed paragraphing and spelling

@KevinRansom
Copy link
Contributor

@xenocons Do we have any measurements of memory usage, allocations and wall clock improvements associated with this change?

Thanks

Kevin

@ghost
Copy link
Author

ghost commented Jan 20, 2016

@KevinRansom Is there a suggested methodology for producing such information (apart from counting the number of x86 instructions and the cycle count)?

@dsyme
Copy link
Contributor

dsyme commented Jan 25, 2016

@KevinRansom - The perf results are here: #854 (comment). I think those results are sufficient.

I'm happy with this change, it is reasonable to check once for emptiness early to avoid the call. So we can pull this.

KevinRansom added a commit that referenced this pull request Feb 25, 2016
@KevinRansom KevinRansom merged commit 527a54f into dotnet:master Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants