Skip to content

Conversation

@tannergooding
Copy link
Member

@tannergooding tannergooding commented May 5, 2021

This is a continuation of past PRs (#35421 and #37882) which ported the legacy System.Numerics code to be implemented in terms of the newer System.Runtime.Intrinsics code.

In particular, this updates SIMDIntrinsicGetItem and SIMDIntrinsicSetItem to be implemented via the corresponding HWIntrinsics functionality bringing along benefits to both codepaths. Namely, the System.Numerics codepath becomes VEX aware and has better instruction selection, while the System.Runtime.Intrinsics codepath picked up an optimization for non-constant operands.

Some example diffs are:

  vextractf128 xmm0, ymm0, 1
- vpsrldq  xmm0, 8
- vmovd    edx, xmm0
+ vpextrq  edx, xmm0, 1

and:

- vmovaps  xmm4, xmm3
- vpsrldq  xmm4, 4
- vmovaps  xmm5, xmm2
- vpsrldq  xmm5, 8
+ vmovshdup xmm4, xmm3
+ vunpckhps xmm5, xmm2, xmm2
  vmulss   xmm6, xmm4, xmm5
- vmovaps  xmm7, xmm3
- vpsrldq  xmm7, 8
- vmovaps  xmm8, xmm2
- vpsrldq  xmm8, 4
+ vunpckhps xmm7, xmm3, xmm3
+ vmovshdup xmm8, xmm2
  vmulss   xmm9, xmm7, xmm8

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2021
@tannergooding
Copy link
Member Author

tannergooding commented May 5, 2021

Framework diff is:

Found 317 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53448994
Total bytes of diff: 53442523
Total bytes of delta: -6471 (-0.01% of base)
    diff is an improvement.


Top file improvements (bytes):
       -6440 : System.Private.CoreLib.dasm (-0.14% of base)
         -19 : System.Drawing.Common.dasm (-0.01% of base)
         -12 : System.Text.Json.dasm (-0.00% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 268 unchanged.

Top method regressions (bytes):
           4 (12.50% of base) : System.Private.CoreLib.dasm - Vector128:AsVector128(Vector2):Vector128`1

Top method improvements (bytes):
       -3830 (-99.61% of base) : System.Private.CoreLib.dasm - Sse41:Extract(Vector128`1,ubyte):float
        -409 (-12.10% of base) : System.Private.CoreLib.dasm - Vector256`1:ToString():String:this (6 methods)
        -327 (-46.12% of base) : System.Private.CoreLib.dasm - Vector128`1:<Equals>g__SoftwareFallback|12_0(byref,Vector128`1):bool (6 methods)
        -273 (-40.15% of base) : System.Private.CoreLib.dasm - Vector256`1:<Equals>g__SoftwareFallback|14_0(byref,Vector256`1):bool (6 methods)
        -253 (-7.89% of base) : System.Private.CoreLib.dasm - Vector128`1:ToString():String:this (6 methods)
        -206 (-27.43% of base) : System.Private.CoreLib.dasm - Vector128`1:GetHashCode():int:this (6 methods)
        -198 (-28.01% of base) : System.Private.CoreLib.dasm - Vector256`1:GetHashCode():int:this (6 methods)
        -104 (-9.52% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
         -98 (-75.38% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):ubyte:this
         -98 (-74.81% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):short:this
         -98 (-75.97% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):int:this
         -98 (-73.13% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):double:this
         -98 (-75.38% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):long:this
         -72 (-4.25% of base) : System.Private.CoreLib.dasm - Matrix4x4:Decompose(Matrix4x4,byref,byref,byref):bool
         -61 (-8.32% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateLookAt(Vector3,Vector3,Vector3):Matrix4x4
         -31 (-5.24% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateBillboard(Vector3,Vector3,Vector3,Vector3):Matrix4x4
         -30 (-5.80% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateWorld(Vector3,Vector3,Vector3):Matrix4x4
         -20 (-7.30% of base) : System.Private.CoreLib.dasm - Plane:CreateFromVertices(Vector3,Vector3,Vector3):Plane
         -17 (-1.95% of base) : System.Drawing.Common.dasm - Graphics:GetContextInfo(byref,bool,byref):this
         -16 (-1.63% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOf(byref,ubyte,int):int (2 methods)

Top method regressions (percentages):
           4 (12.50% of base) : System.Private.CoreLib.dasm - Vector128:AsVector128(Vector2):Vector128`1

Top method improvements (percentages):
       -3830 (-99.61% of base) : System.Private.CoreLib.dasm - Sse41:Extract(Vector128`1,ubyte):float
         -98 (-75.97% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):int:this
         -98 (-75.38% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):ubyte:this
         -98 (-75.38% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):long:this
         -98 (-74.81% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):short:this
         -98 (-73.13% of base) : System.Private.CoreLib.dasm - Vector`1:get_Item(int):double:this
        -327 (-46.12% of base) : System.Private.CoreLib.dasm - Vector128`1:<Equals>g__SoftwareFallback|12_0(byref,Vector128`1):bool (6 methods)
        -273 (-40.15% of base) : System.Private.CoreLib.dasm - Vector256`1:<Equals>g__SoftwareFallback|14_0(byref,Vector256`1):bool (6 methods)
        -198 (-28.01% of base) : System.Private.CoreLib.dasm - Vector256`1:GetHashCode():int:this (6 methods)
        -206 (-27.43% of base) : System.Private.CoreLib.dasm - Vector128`1:GetHashCode():int:this (6 methods)
        -409 (-12.10% of base) : System.Private.CoreLib.dasm - Vector256`1:ToString():String:this (6 methods)
         -12 (-10.00% of base) : System.Private.CoreLib.dasm - Plane:op_Equality(Plane,Plane):bool
        -104 (-9.52% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
         -12 (-9.30% of base) : System.Private.CoreLib.dasm - Plane:op_Inequality(Plane,Plane):bool
          -6 (-8.33% of base) : System.Private.CoreLib.dasm - Plane:Dot(Plane,Vector4):float
         -61 (-8.32% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateLookAt(Vector3,Vector3,Vector3):Matrix4x4
        -253 (-7.89% of base) : System.Private.CoreLib.dasm - Vector128`1:ToString():String:this (6 methods)
         -20 (-7.30% of base) : System.Private.CoreLib.dasm - Plane:CreateFromVertices(Vector3,Vector3,Vector3):Plane
          -8 (-6.84% of base) : System.Private.CoreLib.dasm - SpanHelpers:LocateLastFoundByte(Vector`1):int
          -8 (-6.84% of base) : System.Private.CoreLib.dasm - SpanHelpers:LocateLastFoundChar(Vector`1):int

39 total methods with Code Size differences (38 improved, 1 regressed), 269971 unchanged.

The regression in Vector128:AsVector128(Vector2) is due to an additional xorps that isn't elided. HWIntrinsics need this optimization enabled for them like we do for GPR zero and limitedly for System.Numerics intrinsics

I'm working on getting the PMI diffs for tests as well, but they take a while to run.

@tannergooding
Copy link
Member Author

CC. @echesakovMSFT

@tannergooding tannergooding force-pushed the improved-numerics-vector branch 5 times, most recently from 5e62003 to 7c9380d Compare May 6, 2021 20:13
@echesakov echesakov self-requested a review May 6, 2021 22:42
@tannergooding
Copy link
Member Author

tannergooding commented May 7, 2021

Diff for benchmarks:

Found 87 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 507624
Total bytes of diff: 507381
Total bytes of delta: -243 (-0.05% of base)
    diff is an improvement.


Top file improvements (bytes):
        -139 : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm (-0.41% of base)
         -82 : SIMD\RayTracer\RayTracer\RayTracer.dasm (-0.33% of base)
         -20 : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm (-0.65% of base)
          -2 : BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm (-0.11% of base)

4 total files with Code Size differences (4 improved, 0 regressed), 78 unchanged.

Top method regressions (bytes):
           2 ( 0.15% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleStrictRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
           2 ( 0.15% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleStrictRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this

Top method improvements (bytes):
         -63 (-13.91% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorHelper:ForEach(Vector`1,Action`2) (6 methods)
         -52 (-6.18% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Camera:Create(Vector,Vector):Camera
         -20 (-9.13% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Cross(Vector,Vector):Vector
         -14 (-0.92% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
         -14 (-0.93% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
         -13 (-1.51% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedNoADT>b__1(int):this
         -13 (-1.50% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedWithADT>b__1(int):this
          -9 (-0.41% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedNoADT>b__1(int):this (3 methods)
          -6 (-0.40% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__1(int):this (2 methods)
          -5 (-2.18% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_0(Vector):Color:this
          -5 (-4.76% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_2(Vector):double:this
          -4 (-3.15% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:FindByte(byref):int
          -4 (-2.00% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:InnerLoop(byref,byref)
          -4 (-0.65% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:XunitBenchmarkLoop(byref,byref)
          -4 (-0.70% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:ManualTimerLoop(byref,byref)
          -4 (-0.56% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:Test(int,bool):bool
          -4 (-0.30% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
          -4 (-0.29% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -3 (-0.44% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass6_0:<RenderMultiThreadedWithADT>b__1(int):this
          -1 (-0.50% of base) : BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm - MandelBrot_7:GetByte(long,double,int,int):ubyte

Top method regressions (percentages):
           2 ( 0.15% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleStrictRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
           2 ( 0.15% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleStrictRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this

Top method improvements (percentages):
         -63 (-13.91% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorHelper:ForEach(Vector`1,Action`2) (6 methods)
         -20 (-9.13% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Cross(Vector,Vector):Vector
         -52 (-6.18% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Camera:Create(Vector,Vector):Camera
          -5 (-4.76% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_2(Vector):double:this
          -4 (-3.15% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:FindByte(byref):int
          -5 (-2.18% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_0(Vector):Color:this
          -4 (-2.00% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:InnerLoop(byref,byref)
         -13 (-1.51% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedNoADT>b__1(int):this
         -13 (-1.50% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedWithADT>b__1(int):this
         -14 (-0.93% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
         -14 (-0.92% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -4 (-0.70% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:ManualTimerLoop(byref,byref)
          -4 (-0.65% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:XunitBenchmarkLoop(byref,byref)
          -4 (-0.56% of base) : SIMD\SeekUnroll\SeekUnroll\SeekUnroll.dasm - SeekUnroll:Test(int,bool):bool
          -1 (-0.50% of base) : BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm - MandelBrot_7:GetByte(long,double,int,int):ubyte
          -3 (-0.44% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass6_0:<RenderMultiThreadedWithADT>b__1(int):this
          -9 (-0.41% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedNoADT>b__1(int):this (3 methods)
          -6 (-0.40% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__1(int):this (2 methods)
          -1 (-0.34% of base) : BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm - <>c__DisplayClass4_0:<DoBench>b__0(int):this
          -4 (-0.30% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this

23 total methods with Code Size differences (21 improved, 2 regressed), 1878 unchanged.

@tannergooding tannergooding force-pushed the improved-numerics-vector branch 2 times, most recently from f8788e5 to 57576b9 Compare May 8, 2021 18:26
@tannergooding tannergooding force-pushed the improved-numerics-vector branch from 57576b9 to cc9c3dd Compare May 8, 2021 20:47
@tannergooding
Copy link
Member Author

CC. @echesakovMSFT, @dotnet/jit-contrib

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some questions that I would like to get answered before approving?

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for the contribution @tannergooding !

@ghost
Copy link

ghost commented May 18, 2021

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

clamp03 added a commit to clamp03/runtime that referenced this pull request May 20, 2021
PR dotnet#52288 throws ArgumentOutOfRangeException instead of IndexOutOfRangeException.
The test is not updated. It still use IndexOutOfRangeException.
@clamp03 clamp03 mentioned this pull request May 20, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
@tannergooding tannergooding deleted the improved-numerics-vector branch July 1, 2025 14:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants