[X86] Remove extra MOV after widening atomic store by jofrn · Pull Request #197619 · llvm/llvm-project

jofrn · 2026-05-14T05:29:19Z

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>,
<2 x ptr addrspace(270)>.

Store-side counterpart to #148898. Stacked on top of #197618; and below of #197860.

llvmorg-github-actions · 2026-05-14T05:29:52Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: jofrn

Changes

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers `<2 x i8>` (SSE4.1+), `<2 x i16>`,
`<4 x i8>`, `<2 x i32>`, `<2 x float>`, `<4 x i16>`,
`<2 x ptr addrspace(270)>`.

Store-side counterpart to #148898. Stacked on top of #197618.

Full diff: https://github.com/llvm/llvm-project/pull/197619.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86InstrCompiler.td (+99)
(modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+47-64)

diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index 6ab6f870f1bb8..b2a7bce8d7571 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1242,6 +1242,105 @@ def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
 def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
           (VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>;
 
+// store atomic <2 x i8>
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (PEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[UseSSE41]>;
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (VPEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[HasAVX, NoBWI]>;
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128X:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (VPEXTRWZmri addr:$dst, VR128X:$src, 0)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i16>, <4 x i8>
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i32,float>, <4 x i16>, <2 x ptr addrspace(270)>
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
           (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 659cdec91d3e7..91c4d0a3d8c1c 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -353,30 +353,37 @@ define void @store_atomic_vec1_double_align(ptr %x, <1 x double> %v) nounwind {
 }
 
 define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
-; CHECK-SSE-O3-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movw %ax, (%rdi)
-; CHECK-SSE-O3-NEXT:    retq
+; CHECK-SSE2-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O3:       # %bb.0:
+; CHECK-SSE2-O3-NEXT:    movd %xmm0, %eax
+; CHECK-SSE2-O3-NEXT:    movw %ax, (%rdi)
+; CHECK-SSE2-O3-NEXT:    retq
+;
+; CHECK-SSE4-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O3:       # %bb.0:
+; CHECK-SSE4-O3-NEXT:    pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i8:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movw %ax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vpextrw $0, %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
-; CHECK-SSE-O0-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    # kill: def $ax killed $ax killed $eax
-; CHECK-SSE-O0-NEXT:    movw %ax, (%rdi)
-; CHECK-SSE-O0-NEXT:    retq
+; CHECK-SSE2-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O0:       # %bb.0:
+; CHECK-SSE2-O0-NEXT:    movd %xmm0, %eax
+; CHECK-SSE2-O0-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-SSE2-O0-NEXT:    movw %ax, (%rdi)
+; CHECK-SSE2-O0-NEXT:    retq
+;
+; CHECK-SSE4-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O0:       # %bb.0:
+; CHECK-SSE4-O0-NEXT:    pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i8:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    # kill: def $ax killed $ax killed $eax
-; CHECK-AVX-O0-NEXT:    movw %ax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vpextrw $0, %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i8> %v, ptr %x release, align 4
   ret void
@@ -385,26 +392,22 @@ define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
 define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_i16:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movss %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i16:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovss %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_i16:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movd %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i16:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovd %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i16> %v, ptr %x release, align 4
   ret void
@@ -413,26 +416,22 @@ define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
 define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x ptr addrspace(270)> %v, ptr %x release, align 8
   ret void
@@ -441,26 +440,22 @@ define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
 define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i32> %v, ptr %x release, align 8
   ret void
@@ -469,26 +464,22 @@ define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
 define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_float_align:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_float_align:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_float_align:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_float_align:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x float> %v, ptr %x release, align 8
   ret void
@@ -497,26 +488,22 @@ define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
 define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec4_i8:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movss %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec4_i8:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovss %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec4_i8:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movd %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec4_i8:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovd %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <4 x i8> %v, ptr %x release, align 4
   ret void
@@ -525,26 +512,22 @@ define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
 define void @store_atomic_vec4_i16(ptr %x, <4 x i16> %v) nounwind {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec4_i16:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec4_i16:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec4_i16:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec4_i16:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <4 x i16> %v, ptr %x release, align 8
   ret void

github-actions · 2026-05-14T05:30:23Z

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

arsenm · 2026-05-14T10:04:32Z

          (VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>;

+// store atomic <2 x i8>
+def : Pat<(atomic_store_16


Are there existing non-store patterns for this? Can do a better job avoiding duplication (i.e, there should be a PatFrag that covers atomic and non-atomic cases)

It should be good now.

RKSimon · 2026-05-17T21:46:38Z

 def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>;

+// Matches either 'store' or 'atomic_store' (any alignment, any ordering).
+def memstore : PatFrags<(ops node:$val, node:$ptr),


if you're adding atomic store support to alignedstore - would unalignedstore be better here for equivalence?

unalignedstore sounds good.

arsenm · 2026-05-18T11:01:00Z

 def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>;

+// Matches either 'store' or 'atomic_store' (any alignment, any ordering).
+def memstore : PatFrags<(ops node:$val, node:$ptr),


The name isn't clear, and this can go in the generic PatFrags (though that goes for a number of the PatFags here)

#197619 (comment) okie.

maybe any_store_<size>?

github-actions · 2026-05-19T05:17:16Z

🐧 Linux x64 Test Results

195610 tests passed
5239 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-05-19T05:17:16Z

🪟 Windows x64 Test Results

134936 tests passed
3303 tests skipped

✅ The build succeeded and all tests passed.

RKSimon

please fix merge conflicts

Vector types of 2 elements must be widened. This change does this for vector types of atomic store in SelectionDAG so that it can translate aligned vectors of >1 size. Store-side counterpart to #148897. Stacked on top of #197166; and below of #197619.

This change adds patterns to optimize out an extra MOV present after widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>, <4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.

Vector types that aren't widened are split so that a single ATOMIC_STORE is issued for the entire vector at once. This enables SelectionDAG to translate vectors with type bfloat,half. Store-side counterpart to #165818. Stacked on top of #197619; and below of #197861.

Vector types that aren't widened are split so that a single ATOMIC_STORE is issued for the entire vector at once. This enables SelectionDAG to translate vectors with type bfloat,half. Store-side counterpart to llvm#165818. Stacked on top of llvm#197619; and below of llvm#197861.

llvmorg-github-actions Bot added the backend:X86 label May 14, 2026

jofrn requested review from RKSimon and arsenm May 14, 2026 05:30

jofrn mentioned this pull request May 14, 2026

[SelectionDAG] Widen <2 x T> vector types for atomic store #197618

Merged

arsenm reviewed May 14, 2026

View reviewed changes

jofrn mentioned this pull request May 15, 2026

[SelectionDAG] Split vector types for atomic store #197860

Merged

jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 876a35f to 740f199 Compare May 16, 2026 13:00

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from ac8361d to a730eaf Compare May 16, 2026 13:00

RKSimon reviewed May 17, 2026

View reviewed changes

arsenm reviewed May 18, 2026

View reviewed changes

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from a730eaf to b20b8d4 Compare May 19, 2026 04:46

jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 740f199 to 7637943 Compare May 19, 2026 04:46

llvmorg-github-actions Bot added llvm:SelectionDAG SelectionDAGISel as well labels May 19, 2026

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from b20b8d4 to 16bfe6a Compare May 19, 2026 08:08

jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 7637943 to f6ebebc Compare May 19, 2026 13:28

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 16bfe6a to 7b09891 Compare May 19, 2026 13:28

jofrn requested review from RKSimon and arsenm May 20, 2026 21:40

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 7b09891 to 1613d11 Compare May 20, 2026 21:49

jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from e0ef9b7 to 63ef83c Compare May 20, 2026 21:59

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 1613d11 to 7fb4fcf Compare May 20, 2026 21:59

jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 63ef83c to fc66de1 Compare May 28, 2026 02:10

RKSimon requested changes May 28, 2026

View reviewed changes

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 7fb4fcf to 3681145 Compare May 28, 2026 21:36

jofrn requested a review from RKSimon May 29, 2026 06:12

arsenm approved these changes May 29, 2026

View reviewed changes

RKSimon approved these changes May 29, 2026

View reviewed changes

Base automatically changed from users/jofrn/widen-vec-atomic-store to main June 1, 2026 20:41

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 3681145 to 149e8c0 Compare June 1, 2026 20:49

[X86] Remove extra MOV after widening atomic store

27aec96

This change adds patterns to optimize out an extra MOV present after widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>, <4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.

jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 149e8c0 to 27aec96 Compare June 1, 2026 20:51

jofrn enabled auto-merge (squash) June 1, 2026 20:55

jofrn merged commit 9e29f7d into main Jun 1, 2026
9 of 10 checks passed

jofrn deleted the users/jofrn/x86-remove-extra-mov-atomic-store branch June 1, 2026 21:33

adams381 mentioned this pull request Jun 4, 2026

[CIR] Lower constant block addresses for goto #201644

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] Remove extra MOV after widening atomic store#197619

[X86] Remove extra MOV after widening atomic store#197619
jofrn merged 1 commit into
mainfrom
users/jofrn/x86-remove-extra-mov-atomic-store

jofrn commented May 14, 2026 •

edited

Loading

Uh oh!

llvmorg-github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

arsenm May 14, 2026

Uh oh!

jofrn May 20, 2026

Uh oh!

RKSimon May 17, 2026

Uh oh!

jofrn May 18, 2026

Uh oh!

arsenm May 18, 2026

Uh oh!

jofrn May 18, 2026

Uh oh!

arsenm May 18, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jofrn commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmorg-github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

arsenm May 14, 2026

Choose a reason for hiding this comment

Uh oh!

jofrn May 20, 2026

Choose a reason for hiding this comment

Uh oh!

RKSimon May 17, 2026

Choose a reason for hiding this comment

Uh oh!

jofrn May 18, 2026

Choose a reason for hiding this comment

Uh oh!

arsenm May 18, 2026

Choose a reason for hiding this comment

Uh oh!

jofrn May 18, 2026

Choose a reason for hiding this comment

Uh oh!

arsenm May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jofrn commented May 14, 2026 •

edited

Loading

llvmorg-github-actions Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading