[X86] Remove extra MOV after widening atomic store#197619
Conversation
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: jofrn ChangesThis change adds patterns to optimize out an extra MOV present after Store-side counterpart to #148898. Stacked on top of #197618. 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index 6ab6f870f1bb8..b2a7bce8d7571 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1242,6 +1242,105 @@ def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
(VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>;
+// store atomic <2 x i8>
+def : Pat<(atomic_store_16
+ (i16 (trunc (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128:$src))),
+ (iPTR 0))))),
+ addr:$dst),
+ (PEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[UseSSE41]>;
+def : Pat<(atomic_store_16
+ (i16 (trunc (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128:$src))),
+ (iPTR 0))))),
+ addr:$dst),
+ (VPEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[HasAVX, NoBWI]>;
+def : Pat<(atomic_store_16
+ (i16 (trunc (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128X:$src))),
+ (iPTR 0))))),
+ addr:$dst),
+ (VPEXTRWZmri addr:$dst, VR128X:$src, 0)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i16>, <4 x i8>
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_32
+ (i32 (extractelt
+ (v4i32 (bitconvert (v16i8 VR128X:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i32,float>, <4 x i16>, <2 x ptr addrspace(270)>
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4i32 VR128X:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v4f32 VR128X:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+ (i64 (extractelt
+ (v2i64 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+ addr:$dst),
+ (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
// Floating point loads/stores.
def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
(MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 659cdec91d3e7..91c4d0a3d8c1c 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -353,30 +353,37 @@ define void @store_atomic_vec1_double_align(ptr %x, <1 x double> %v) nounwind {
}
define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
-; CHECK-SSE-O3-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT: movw %ax, (%rdi)
-; CHECK-SSE-O3-NEXT: retq
+; CHECK-SSE2-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O3: # %bb.0:
+; CHECK-SSE2-O3-NEXT: movd %xmm0, %eax
+; CHECK-SSE2-O3-NEXT: movw %ax, (%rdi)
+; CHECK-SSE2-O3-NEXT: retq
+;
+; CHECK-SSE4-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O3: # %bb.0:
+; CHECK-SSE4-O3-NEXT: pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec2_i8:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT: movw %ax, (%rdi)
+; CHECK-AVX-O3-NEXT: vpextrw $0, %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
-; CHECK-SSE-O0-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; CHECK-SSE-O0-NEXT: movw %ax, (%rdi)
-; CHECK-SSE-O0-NEXT: retq
+; CHECK-SSE2-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O0: # %bb.0:
+; CHECK-SSE2-O0-NEXT: movd %xmm0, %eax
+; CHECK-SSE2-O0-NEXT: # kill: def $ax killed $ax killed $eax
+; CHECK-SSE2-O0-NEXT: movw %ax, (%rdi)
+; CHECK-SSE2-O0-NEXT: retq
+;
+; CHECK-SSE4-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O0: # %bb.0:
+; CHECK-SSE4-O0-NEXT: pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec2_i8:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; CHECK-AVX-O0-NEXT: movw %ax, (%rdi)
+; CHECK-AVX-O0-NEXT: vpextrw $0, %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <2 x i8> %v, ptr %x release, align 4
ret void
@@ -385,26 +392,22 @@ define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
; CHECK-SSE-O3-LABEL: store_atomic_vec2_i16:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT: movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT: movss %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec2_i16:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT: movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovss %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec2_i16:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT: movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT: movd %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec2_i16:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT: movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovd %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <2 x i16> %v, ptr %x release, align 4
ret void
@@ -413,26 +416,22 @@ define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
; CHECK-SSE-O3-LABEL: store_atomic_vec2_ptr270:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT: movlps %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec2_ptr270:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovlps %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec2_ptr270:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT: movq %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec2_ptr270:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovq %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <2 x ptr addrspace(270)> %v, ptr %x release, align 8
ret void
@@ -441,26 +440,22 @@ define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
; CHECK-SSE-O3-LABEL: store_atomic_vec2_i32_align:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT: movlps %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec2_i32_align:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovlps %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec2_i32_align:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT: movq %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec2_i32_align:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovq %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <2 x i32> %v, ptr %x release, align 8
ret void
@@ -469,26 +464,22 @@ define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
; CHECK-SSE-O3-LABEL: store_atomic_vec2_float_align:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT: movlps %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec2_float_align:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovlps %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec2_float_align:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT: movq %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec2_float_align:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovq %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <2 x float> %v, ptr %x release, align 8
ret void
@@ -497,26 +488,22 @@ define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
; CHECK-SSE-O3-LABEL: store_atomic_vec4_i8:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT: movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT: movss %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec4_i8:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT: movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovss %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec4_i8:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT: movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT: movd %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec4_i8:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT: movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovd %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <4 x i8> %v, ptr %x release, align 4
ret void
@@ -525,26 +512,22 @@ define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
define void @store_atomic_vec4_i16(ptr %x, <4 x i16> %v) nounwind {
; CHECK-SSE-O3-LABEL: store_atomic_vec4_i16:
; CHECK-SSE-O3: # %bb.0:
-; CHECK-SSE-O3-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT: movlps %xmm0, (%rdi)
; CHECK-SSE-O3-NEXT: retq
;
; CHECK-AVX-O3-LABEL: store_atomic_vec4_i16:
; CHECK-AVX-O3: # %bb.0:
-; CHECK-AVX-O3-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT: vmovlps %xmm0, (%rdi)
; CHECK-AVX-O3-NEXT: retq
;
; CHECK-SSE-O0-LABEL: store_atomic_vec4_i16:
; CHECK-SSE-O0: # %bb.0:
-; CHECK-SSE-O0-NEXT: movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT: movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT: movq %xmm0, (%rdi)
; CHECK-SSE-O0-NEXT: retq
;
; CHECK-AVX-O0-LABEL: store_atomic_vec4_i16:
; CHECK-AVX-O0: # %bb.0:
-; CHECK-AVX-O0-NEXT: vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT: movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT: vmovq %xmm0, (%rdi)
; CHECK-AVX-O0-NEXT: retq
store atomic <4 x i16> %v, ptr %x release, align 8
ret void
|
|
|
| (VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>; | ||
|
|
||
| // store atomic <2 x i8> | ||
| def : Pat<(atomic_store_16 |
There was a problem hiding this comment.
Are there existing non-store patterns for this? Can do a better job avoiding duplication (i.e, there should be a PatFrag that covers atomic and non-atomic cases)
There was a problem hiding this comment.
It should be good now.
876a35f to
740f199
Compare
ac8361d to
a730eaf
Compare
| def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>; | ||
|
|
||
| // Matches either 'store' or 'atomic_store' (any alignment, any ordering). | ||
| def memstore : PatFrags<(ops node:$val, node:$ptr), |
There was a problem hiding this comment.
if you're adding atomic store support to alignedstore - would unalignedstore be better here for equivalence?
There was a problem hiding this comment.
unalignedstore sounds good.
| def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>; | ||
|
|
||
| // Matches either 'store' or 'atomic_store' (any alignment, any ordering). | ||
| def memstore : PatFrags<(ops node:$val, node:$ptr), |
There was a problem hiding this comment.
The name isn't clear, and this can go in the generic PatFrags (though that goes for a number of the PatFags here)
a730eaf to
b20b8d4
Compare
740f199 to
7637943
Compare
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
b20b8d4 to
16bfe6a
Compare
7637943 to
f6ebebc
Compare
16bfe6a to
7b09891
Compare
7b09891 to
1613d11
Compare
e0ef9b7 to
63ef83c
Compare
1613d11 to
7fb4fcf
Compare
63ef83c to
fc66de1
Compare
RKSimon
left a comment
There was a problem hiding this comment.
please fix merge conflicts
7fb4fcf to
3681145
Compare
3681145 to
149e8c0
Compare
This change adds patterns to optimize out an extra MOV present after widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>, <4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
149e8c0 to
27aec96
Compare
Vector types that aren't widened are split so that a single ATOMIC_STORE is issued for the entire vector at once. This enables SelectionDAG to translate vectors with type bfloat,half. Store-side counterpart to llvm#165818. Stacked on top of llvm#197619; and below of llvm#197861.
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers
<2 x i8>(SSE4.1+),<2 x i16>,<4 x i8>,<2 x i32>,<2 x float>,<4 x i16>,<2 x ptr addrspace(270)>.Store-side counterpart to #148898. Stacked on top of #197618; and below of #197860.