Skip to content

Conversation

@SwapnilGaikwad
Copy link
Contributor

@SwapnilGaikwad SwapnilGaikwad commented Aug 16, 2022

Add small performance improvement to IndexOfAny Char intrinsics

@ghost ghost added area-System.Memory community-contribution Indicates that the PR has been added by a community member labels Aug 16, 2022
@ghost
Copy link

ghost commented Aug 16, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: SwapnilGaikwad
Assignees: -
Labels:

area-System.Memory

Milestone: -

@SwapnilGaikwad
Copy link
Contributor Author

Benchmarking numbers for x64 and Arm64

x64 (Xeon Gold 6152)

|                Method |        Job |                                                                                               Toolchain | Size |     Mean |    Error |   StdDev |   Median |      Min |      Max | Ratio | MannWhitney(2%) | RatioSD | Allocated | Alloc Ratio |
|---------------------- |----------- |-------------------------------------------------------------------------------------------------------- |----- |---------:|---------:|---------:|---------:|---------:|---------:|------:|---------------- |--------:|----------:|------------:|
|   IndexOfAnyTwoValues | Job-XWCMLH |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 39.54 ns | 0.551 ns | 0.515 ns | 39.60 ns | 38.93 ns | 40.22 ns |  1.00 |            Base |    0.00 |         - |          NA |
|   IndexOfAnyTwoValues | Job-WSUDKD | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 33.70 ns | 0.052 ns | 0.049 ns | 33.69 ns | 33.62 ns | 33.78 ns |  0.85 |          Faster |    0.01 |         - |          NA |
|                       |            |                                                                                                         |      |          |          |          |          |          |          |       |                 |         |           |             |
| IndexOfAnyThreeValues | Job-XWCMLH |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 31.72 ns | 0.582 ns | 0.544 ns | 31.41 ns | 31.36 ns | 32.69 ns |  1.00 |            Base |    0.00 |         - |          NA |
| IndexOfAnyThreeValues | Job-WSUDKD | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 30.23 ns | 0.110 ns | 0.103 ns | 30.21 ns | 30.05 ns | 30.43 ns |  0.95 |          Faster |    0.02 |         - |          NA |
|                       |            |                                                                                                         |      |          |          |          |          |          |          |       |                 |         |           |             |
|  IndexOfAnyFourValues | Job-XWCMLH |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 51.42 ns | 0.012 ns | 0.010 ns | 51.43 ns | 51.40 ns | 51.43 ns |  1.00 |            Base |    0.00 |         - |          NA |
|  IndexOfAnyFourValues | Job-WSUDKD | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 51.89 ns | 0.314 ns | 0.278 ns | 51.87 ns | 51.51 ns | 52.52 ns |  1.01 |            Same |    0.01 |         - |          NA |

Arm64 (Altra)

|                Method |        Job |                                                                                                 Toolchain | Size |     Mean |    Error |   StdDev |   Median |      Min |      Max | Ratio | MannWhitney(2%) | RatioSD | Allocated | Alloc Ratio |
|---------------------- |----------- |---------------------------------------------------------------------------------------------------------- |----- |---------:|---------:|---------:|---------:|---------:|---------:|------:|---------------- |--------:|----------:|------------:|
|   IndexOfAnyTwoValues | Job-LUMEUV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 43.61 ns | 0.883 ns | 0.826 ns | 43.43 ns | 42.64 ns | 44.61 ns |  1.00 |            Base |    0.00 |         - |          NA |
|   IndexOfAnyTwoValues | Job-PCATDM | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 40.03 ns | 0.198 ns | 0.185 ns | 40.02 ns | 39.74 ns | 40.34 ns |  0.92 |          Faster |    0.02 |         - |          NA |
|                       |            |                                                                                                           |      |          |          |          |          |          |          |       |                 |         |           |             |
| IndexOfAnyThreeValues | Job-LUMEUV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 60.21 ns | 0.107 ns | 0.100 ns | 60.20 ns | 60.10 ns | 60.38 ns |  1.00 |            Base |    0.00 |         - |          NA |
| IndexOfAnyThreeValues | Job-PCATDM | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 52.09 ns | 0.143 ns | 0.134 ns | 52.10 ns | 51.90 ns | 52.30 ns |  0.87 |          Faster |    0.00 |         - |          NA |
|                       |            |                                                                                                           |      |          |          |          |          |          |          |       |                 |         |           |             |
|  IndexOfAnyFourValues | Job-LUMEUV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 70.51 ns | 0.418 ns | 0.391 ns | 70.65 ns | 69.48 ns | 70.90 ns |  1.00 |            Base |    0.00 |         - |          NA |
|  IndexOfAnyFourValues | Job-PCATDM | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 | 64.60 ns | 0.258 ns | 0.242 ns | 64.60 ns | 64.25 ns | 65.00 ns |  0.92 |          Faster |    0.00 |         - |          NA |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding goto showed performance improvement on x64. @adamsitnik , feel free to confirm this on your end. If it is useful, we can apply it to other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm.. I'd expect inputVector != Vector128<ushort>.Zero to be lowered to MaxPairwise too. Does it have a different codegen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namely with #65632

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Apparently, JIT is somehow missing to optimise this one. It's still emitting umaxv.

...
cmeq    v19.8h, v16.8h, v18.8h
cmeq    v18.8h, v17.8h, v18.8h
orr     v18.8h, v19.8h, v18.8h
umaxv   s19, v18.4s
umov    w0, v19.s[0]
cmp     w0, #0
...
Full Arm64 Assembly
; Assembly listing for method System.SpanHelpers:IndexOfAny(byref,ushort,ushort,int):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Unix
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 4 single block inlinees; 3 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T05] (  5,  8.50)   byref  ->  x19         single-def
;  V01 arg1         [V01,T09] (  5,  5   )  ushort  ->  x22         single-def
;  V02 arg2         [V02,T06] (  5,  8.50)  ushort  ->  x20         single-def
;  V03 arg3         [V03,T08] (  6,  5.50)     int  ->  x21         single-def
;  V04 loc0         [V04,T00] ( 18, 38.50)    long  ->  x23
;  V05 loc1         [V05,T02] ( 13, 27.50)    long  ->  x24
;  V06 loc2         [V06,T01] ( 15, 36   )     int  ->  x25
;  V07 loc3         [V07,T13] (  3,  2.50)    long  ->   x0
;* V08 loc4         [V08    ] (  0,  0   )    long  ->  zero-ref
;  V09 loc5         [V09,T07] (  5, 10   )   byref  ->   x0
;  V10 loc6         [V10,T14] (  3,  1.50)     int  ->   x0
;  V11 loc7         [V11,T11] (  3,  5   )   byref  ->  x19
;* V12 loc8         [V12    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V13 loc9         [V13    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V14 loc10        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V15 loc11        [V15,T15] (  6, 13.50)  simd16  ->  d18         HFA(simd16)
;  V16 loc12        [V16,T16] (  6, 10   )  simd16  ->  d18         HFA(simd16)
;  V17 loc13        [V17,T17] (  3,  5   )  simd16  ->  d16         HFA(simd16)
;  V18 loc14        [V18,T18] (  3,  5   )  simd16  ->  d17         HFA(simd16)
;* V19 loc15        [V19    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  ld-addr-op
;* V20 loc16        [V20    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  ld-addr-op
;* V21 loc17        [V21    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)
;* V22 loc18        [V22    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  ld-addr-op
;# V23 OutArgs      [V23    ] (  1,  1   )  lclBlk ( 0) [sp+00H]   "OutgoingArgSpace"
;  V24 tmp1         [V24,T19] (  3,  3   )  simd16  ->  d16         HFA(simd16)  "Clone op1 for vector extractmostsignificantbits"
;  V25 tmp2         [V25,T20] (  3,  3   )  simd16  ->  d16         HFA(simd16)  "Clone op1 for vector extractmostsignificantbits"
;* V26 tmp3         [V26    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V27 tmp4         [V27    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V28 tmp5         [V28    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V29 tmp6         [V29    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V30 tmp7         [V30    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;  V31 tmp8         [V31,T10] (  5,  5   )     int  ->  x23         "Single return block return value"
;  V32 cse0         [V32,T12] (  6,  3   )     ref  ->   x1         "CSE - moderate"
;  V33 cse1         [V33,T04] (  9, 15.50)     int  ->  x26         "CSE - aggressive"
;  V34 cse2         [V34,T03] (  9, 19   )     int  ->  x27         "CSE - aggressive"
;
; Lcl frame size = 8

G_M34414_IG01:              ;; offset=0000H
        A9BA7BFD          stp     fp, lr, [sp,#-96]!
        A901D3F3          stp     x19, x20, [sp,#24]
        A902DBF5          stp     x21, x22, [sp,#40]
        A903E3F7          stp     x23, x24, [sp,#56]
        A904EBF9          stp     x25, x26, [sp,#72]
        F9002FFB          str     x27, [sp,#88]
        910003FD          mov     fp, sp
        AA0003F3          mov     x19, x0
        2A0103F6          mov     w22, w1
        2A0203F4          mov     w20, w2
        2A0303F5          mov     w21, w3
						;; size=44 bbWeight=1    PerfScore 8.50
G_M34414_IG02:              ;; offset=002CH
        710002BF          cmp     w21, #0
        5400016A          bge     G_M34414_IG04
						;; size=8 bbWeight=1    PerfScore 1.50
G_M34414_IG03:              ;; offset=0034H
        D2840500          movz    x0, #0x2028
        F2BB6800          movk    x0, #0xdb40 LSL #16
        F2DFF020          movk    x0, #0xff81 LSL #32
        F9400001          ldr     x1, [x0]
        AA0103E0          mov     x0, x1
        D28E3F02          movz    x2, #0x71f8      // code for System.Diagnostics.Debug:Fail
        F2AAAA42          movk    x2, #0x5552 LSL #16
        F2DFFFE2          movk    x2, #0xffff LSL #32
        F9400042          ldr     x2, [x2]
        D63F0040          blr     x2
						;; size=40 bbWeight=0.50 PerfScore 5.25
G_M34414_IG04:              ;; offset=005CH
        AA1F03F7          mov     x23, xzr
        2A1503F8          mov     w24, w21
        93407EA0          sxtw    x0, w21
        D1002000          sub     x0, x0, #8
        F100001F          cmp     x0, #0
        540003AB          blt     G_M34414_IG07
						;; size=24 bbWeight=1    PerfScore 3.50
G_M34414_IG05:              ;; offset=0074H
        AA0003F8          mov     x24, x0
        14000039          b       G_M34414_IG16
                          align   [0 bytes for IG09]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
						;; size=8 bbWeight=0.50 PerfScore 0.75
G_M34414_IG06:              ;; offset=007CH
        D37FFAE0          lsl     x0, x23, #1
        8B000260          add     x0, x19, x0
        79400019          ldrh    w25, [x0]
        53003EDA          uxth    w26, w22
        6B19035F          cmp     w26, w25
        54000640          beq     G_M34414_IG15
        53003E9B          uxth    w27, w20
        6B19037F          cmp     w27, w25
        540005E0          beq     G_M34414_IG15
        79400419          ldrh    w25, [x0,#2]
        6B19035F          cmp     w26, w25
        54000540          beq     G_M34414_IG14
        6B19037F          cmp     w27, w25
        54000500          beq     G_M34414_IG14
        79400819          ldrh    w25, [x0,#4]
        6B19035F          cmp     w26, w25
        54000460          beq     G_M34414_IG13
        6B19037F          cmp     w27, w25
        54000420          beq     G_M34414_IG13
        79400C19          ldrh    w25, [x0,#6]
        6B19035F          cmp     w26, w25
        54000360          beq     G_M34414_IG12
        6B19037F          cmp     w27, w25
        54000320          beq     G_M34414_IG12
        910012F7          add     x23, x23, #4
        D1001318          sub     x24, x24, #4
						;; size=104 bbWeight=2    PerfScore 55.00
G_M34414_IG07:              ;; offset=00E4H
        F100131F          cmp     x24, #4
        54FFFCA2          bhs     G_M34414_IG06
						;; size=8 bbWeight=4    PerfScore 6.00
G_M34414_IG08:              ;; offset=00ECH
        B4000198          cbz     x24, G_M34414_IG10
        53003EDA          uxth    w26, w22
						;; size=8 bbWeight=0.50 PerfScore 0.75
G_M34414_IG09:              ;; offset=00F4H
        D37FFAE0          lsl     x0, x23, #1
        78606A79          ldrh    w25, [x19, x0]
        6B19035F          cmp     w26, w25
        540002C0          beq     G_M34414_IG15
        53003E9B          uxth    w27, w20
        6B19037F          cmp     w27, w25
        54000260          beq     G_M34414_IG15
        910006F7          add     x23, x23, #1
        D1000718          sub     x24, x24, #1
        B5FFFEF8          cbnz    x24, G_M34414_IG09
						;; size=40 bbWeight=4    PerfScore 38.00
G_M34414_IG10:              ;; offset=011CH
        12800000          movn    w0, #0
						;; size=4 bbWeight=0.50 PerfScore 0.25
G_M34414_IG11:              ;; offset=0120H
        F9402FFB          ldr     x27, [sp,#88]
        A944EBF9          ldp     x25, x26, [sp,#72]
        A943E3F7          ldp     x23, x24, [sp,#56]
        A942DBF5          ldp     x21, x22, [sp,#40]
        A941D3F3          ldp     x19, x20, [sp,#24]
        A8C67BFD          ldp     fp, lr, [sp],#96
        D65F03C0          ret     lr
						;; size=28 bbWeight=0.50 PerfScore 4.00
G_M34414_IG12:              ;; offset=013CH
        11000EF7          add     w23, w23, #3
        1400004D          b       G_M34414_IG22
        D503201F          align   [4 bytes for IG18]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
						;; size=12 bbWeight=0.50 PerfScore 0.75
G_M34414_IG13:              ;; offset=0148H
        11000AF7          add     w23, w23, #2
        1400004A          b       G_M34414_IG22
						;; size=8 bbWeight=0.50 PerfScore 0.75
G_M34414_IG14:              ;; offset=0150H
        110006F7          add     w23, w23, #1
        14000048          b       G_M34414_IG22
						;; size=8 bbWeight=0.50 PerfScore 0.75
G_M34414_IG15:              ;; offset=0158H
        14000047          b       G_M34414_IG22
						;; size=4 bbWeight=0.50 PerfScore 0.50
G_M34414_IG16:              ;; offset=015CH
        710022BF          cmp     w21, #8
        5400016A          bge     G_M34414_IG17
        D2840500          movz    x0, #0x2028
        F2BB6800          movk    x0, #0xdb40 LSL #16
        F2DFF020          movk    x0, #0xff81 LSL #32
        F9400001          ldr     x1, [x0]
        AA0103E0          mov     x0, x1
        D28E3F02          movz    x2, #0x71f8      // code for System.Diagnostics.Debug:Fail
        F2AAAA42          movk    x2, #0x5552 LSL #16
        F2DFFFE2          movk    x2, #0xffff LSL #32
        F9400042          ldr     x2, [x2]
        D63F0040          blr     x2
						;; size=48 bbWeight=0.50 PerfScore 6.00
G_M34414_IG17:              ;; offset=018CH
        53003EDA          uxth    w26, w22
        4E020F50          dup     v16.8h, w26
        53003E9B          uxth    w27, w20
        4E020F71          dup     v17.8h, w27
        B40001B8          cbz     x24, G_M34414_IG19
						;; size=20 bbWeight=0.50 PerfScore 3.00
G_M34414_IG18:              ;; offset=01A0H
        D37FFAE0          lsl     x0, x23, #1
        3CE06A72          ldr     q18, [x19, x0]
        6E728E13          cmeq    v19.8h, v16.8h, v18.8h
        6E728E32          cmeq    v18.8h, v17.8h, v18.8h
        4EB21E72          orr     v18.8h, v19.8h, v18.8h
        6EB0AA53          umaxv   s19, v18.4s
        0E043E60          umov    w0, v19.s[0]
        7100001F          cmp     w0, #0
        54000361          bne     G_M34414_IG20
        910022F7          add     x23, x23, #8
        EB17031F          cmp     x24, x23
        54FFFEA8          bhi     G_M34414_IG18
						;; size=48 bbWeight=4    PerfScore 56.00
G_M34414_IG19:              ;; offset=01D0H
        D37FFB00          lsl     x0, x24, #1
        3CE06A72          ldr     q18, [x19, x0]
        AA1803F7          mov     x23, x24
        6E728E10          cmeq    v16.8h, v16.8h, v18.8h
        6E728E31          cmeq    v17.8h, v17.8h, v18.8h
        4EB11E12          orr     v18.8h, v16.8h, v17.8h
        6EB0AA50          umaxv   s16, v18.4s
        0E043E00          umov    w0, v16.s[0]
        7100001F          cmp     w0, #0
        54FFF940          beq     G_M34414_IG10
        9C000550          ldr     q16, [@RWD00]
        4E301E52          and     v18.16b, v18.16b, v16.16b
        9C000590          ldr     q16, [@RWD16]
        6E304650          ushl    v16.16b, v18.16b, v16.16b
        4F000411          movi    v17.4s, #0x00
        6E114211          ext     v17.16b, v16.16b, v17.16b, #8
        0E31BA31          addv    b17, v17.8b
        0E013E20          umov    w0, v17.b[0]
        53185C00          lsl     w0, w0, #8
        0E31BA10          addv    b16, v16.8b
        0E013E01          umov    w1, v16.b[0]
        2A010000          orr     w0, w0, w1
        1400000D          b       G_M34414_IG21
						;; size=92 bbWeight=0.50 PerfScore 14.00
G_M34414_IG20:              ;; offset=022CH
        9C0003B0          ldr     q16, [@RWD00]
        4E301E50          and     v16.16b, v18.16b, v16.16b
        9C0003F1          ldr     q17, [@RWD16]
        6E314610          ushl    v16.16b, v16.16b, v17.16b
        4F000411          movi    v17.4s, #0x00
        6E114211          ext     v17.16b, v16.16b, v17.16b, #8
        0E31BA31          addv    b17, v17.8b
        0E013E20          umov    w0, v17.b[0]
        53185C00          lsl     w0, w0, #8
        0E31BA10          addv    b16, v16.8b
        0E013E01          umov    w1, v16.b[0]
        2A010000          orr     w0, w0, w1
						;; size=48 bbWeight=0.50 PerfScore 7.25
G_M34414_IG21:              ;; offset=025CH
        5AC00000          rbit    w0, w0
        5AC01000          clz     w0, w0
        2A0003E0          mov     w0, w0
        D341FC00          lsr     x0, x0, #1
        8B0002F7          add     x23, x23, x0
        17FFFFBA          b       G_M34414_IG15
						;; size=24 bbWeight=0.50 PerfScore 2.25
G_M34414_IG22:              ;; offset=0274H
        2A1703E0          mov     w0, w23
						;; size=4 bbWeight=0.50 PerfScore 0.25
G_M34414_IG23:              ;; offset=0278H
        F9402FFB          ldr     x27, [sp,#88]
        A944EBF9          ldp     x25, x26, [sp,#72]
        A943E3F7          ldp     x23, x24, [sp,#56]
        A942DBF5          ldp     x21, x22, [sp,#40]
        A941D3F3          ldp     x19, x20, [sp,#24]
        A8C67BFD          ldp     fp, lr, [sp],#96
        D65F03C0          ret     lr
						;; size=28 bbWeight=0.50 PerfScore 4.00
RWD00  	dq	8080808080808080h, 8080808080808080h
RWD16  	dq	00FFFEFDFCFBFAF9h, 00FFFEFDFCFBFAF9h


; Total bytes of code 660, prolog size 44, PerfScore 285.00, instruction count 172, allocated bytes for code 660 (MethodHash=41e07991) for method System.SpanHelpers:IndexOfAny(byref,ushort,ushort,int):int
; ============================================================

@adamsitnik
Copy link
Member

@SwapnilGaikwad I am afraid I've caused a huge merge conflict with #73768. Could you please apply this optimization to the new generic SpanHelpers.T methods? https://github.com/DOTNEt/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs#L1329-L2615

@SwapnilGaikwad
Copy link
Contributor Author

@SwapnilGaikwad I am afraid I've caused a huge merge conflict with #73768. Could you please apply this optimization to the new generic SpanHelpers.T methods? https://github.com/DOTNEt/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs#L1329-L2615

Sure. I'll also take a look at x64 failures. Interestingly, when I downloaded the a bundle of failing tests locally using the runfo tool. They passed. Not sure if it's a build related issue or with the patch.

@adamsitnik
Copy link
Member

I'll also take a look at x64 failures.

The ones in you PR like this one? They were caused by our internal service outage (tests were passing but were being reported as failures). So as soon as you sync your branch and re-run the CI they should be gone.

@danmoseley
Copy link
Member

@stephentoub would this materially improve regex scenarios, do you think?

@stephentoub
Copy link
Member

stephentoub commented Aug 18, 2022

would this materially improve regex scenarios, do you think?

Depends on the expression and how much they're dominated by such a call. For example, if the expression were [Hh]ello, then yeah, it'd probably be measurable, as the operation would be dominated by an IndexOfAny('H', 'h'), but not huge.

@SwapnilGaikwad SwapnilGaikwad force-pushed the github-indexOfAny-char-arm64 branch from bf93551 to 5898e80 Compare August 25, 2022 16:02
@adamsitnik adamsitnik added the tenet-performance Performance related issue label Aug 26, 2022
@adamsitnik adamsitnik added this to the 8.0.0 milestone Aug 26, 2022
@adamsitnik
Copy link
Member

@SwapnilGaikwad could you please provide updated benchmark results?

@SwapnilGaikwad
Copy link
Contributor Author

@SwapnilGaikwad could you please provide updated benchmark results?

Here are the numbers from an altra.

Byte:

|                Method |        Job |                                                                                           Toolchain | Size |      Mean |    Error |   StdDev |    Median |       Min |       Max | Ratio | MannWhitney(2%) | Allocated | Alloc Ratio |
|---------------------- |----------- |---------------------------------------------------------------------------------------------------- |----- |----------:|---------:|---------:|----------:|----------:|----------:|------:|---------------- |----------:|------------:|
|   IndexOfAnyTwoValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 |  17.53 ns | 0.032 ns | 0.028 ns |  17.54 ns |  17.46 ns |  17.54 ns |  0.82 |          Faster |         - |          NA |
|   IndexOfAnyTwoValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 |  21.43 ns | 0.007 ns | 0.007 ns |  21.43 ns |  21.41 ns |  21.43 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |           |          |          |           |           |           |       |                 |           |             |
| IndexOfAnyThreeValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 |  23.30 ns | 0.100 ns | 0.088 ns |  23.30 ns |  23.22 ns |  23.51 ns |  0.87 |          Faster |         - |          NA |
| IndexOfAnyThreeValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 |  26.85 ns | 0.113 ns | 0.106 ns |  26.90 ns |  26.65 ns |  26.97 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |           |          |          |           |           |           |       |                 |           |             |
|  IndexOfAnyFourValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 862.24 ns | 0.073 ns | 0.068 ns | 862.25 ns | 862.15 ns | 862.33 ns |  1.00 |            Same |         - |          NA |
|  IndexOfAnyFourValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 862.13 ns | 0.041 ns | 0.036 ns | 862.12 ns | 862.10 ns | 862.23 ns |  1.00 |            Base |         - |          NA |

Char:

|                Method |        Job |                                                                                           Toolchain | Size |     Mean |    Error |   StdDev |   Median |      Min |      Max | Ratio | MannWhitney(2%) | Allocated | Alloc Ratio |
|---------------------- |----------- |---------------------------------------------------------------------------------------------------- |----- |---------:|---------:|---------:|---------:|---------:|---------:|------:|---------------- |----------:|------------:|
|   IndexOfAnyTwoValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 33.19 ns | 0.078 ns | 0.073 ns | 33.21 ns | 33.07 ns | 33.28 ns |  0.82 |          Faster |         - |          NA |
|   IndexOfAnyTwoValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 40.66 ns | 0.058 ns | 0.054 ns | 40.67 ns | 40.54 ns | 40.74 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |          |          |          |          |          |          |       |                 |           |             |
| IndexOfAnyThreeValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 44.79 ns | 0.097 ns | 0.091 ns | 44.82 ns | 44.64 ns | 44.92 ns |  0.88 |          Faster |         - |          NA |
| IndexOfAnyThreeValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 51.19 ns | 0.035 ns | 0.033 ns | 51.18 ns | 51.13 ns | 51.25 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |          |          |          |          |          |          |       |                 |           |             |
|  IndexOfAnyFourValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 55.97 ns | 0.214 ns | 0.200 ns | 56.08 ns | 55.65 ns | 56.17 ns |  0.88 |          Faster |         - |          NA |
|  IndexOfAnyFourValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 63.90 ns | 0.702 ns | 0.656 ns | 64.25 ns | 62.81 ns | 64.45 ns |  1.00 |            Base |         - |          NA |

Int32:

|                Method |        Job |                                                                                           Toolchain | Size |     Mean |   Error |  StdDev |   Median |      Min |      Max | Ratio | MannWhitney(2%) | Allocated | Alloc Ratio |
|---------------------- |----------- |---------------------------------------------------------------------------------------------------- |----- |---------:|--------:|--------:|---------:|---------:|---------:|------:|---------------- |----------:|------------:|
|   IndexOfAnyTwoValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 179.3 ns | 0.02 ns | 0.02 ns | 179.3 ns | 179.2 ns | 179.3 ns |  1.00 |            Same |         - |          NA |
|   IndexOfAnyTwoValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 179.2 ns | 0.02 ns | 0.02 ns | 179.2 ns | 179.1 ns | 179.2 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |          |         |         |          |          |          |       |                 |           |             |
| IndexOfAnyThreeValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 265.4 ns | 0.01 ns | 0.01 ns | 265.4 ns | 265.4 ns | 265.4 ns |  1.00 |            Same |         - |          NA |
| IndexOfAnyThreeValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 265.1 ns | 0.02 ns | 0.02 ns | 265.1 ns | 265.1 ns | 265.2 ns |  1.00 |            Base |         - |          NA |
|                       |            |                                                                                                     |      |          |         |         |          |          |          |       |                 |           |             |
|  IndexOfAnyFourValues | Job-UKSACN | /patch/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 864.1 ns | 0.07 ns | 0.07 ns | 864.1 ns | 864.1 ns | 864.3 ns |  1.00 |            Same |         - |          NA |
|  IndexOfAnyFourValues | Job-TOOAOK |  /main/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/corerun |  512 | 864.0 ns | 0.04 ns | 0.03 ns | 864.0 ns | 864.0 ns | 864.1 ns |  1.00 |            Base |         - |          NA |

@SwapnilGaikwad
Copy link
Contributor Author

@adamsitnik what can we do to push this forward? 🙂

@ghost ghost locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Memory community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants