Skip to content

[X86] Remove extra MOV after widening atomic store#197619

Merged
jofrn merged 1 commit into
mainfrom
users/jofrn/x86-remove-extra-mov-atomic-store
Jun 1, 2026
Merged

[X86] Remove extra MOV after widening atomic store#197619
jofrn merged 1 commit into
mainfrom
users/jofrn/x86-remove-extra-mov-atomic-store

Conversation

@jofrn
Copy link
Copy Markdown
Contributor

@jofrn jofrn commented May 14, 2026

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>,
<2 x ptr addrspace(270)>.

Store-side counterpart to #148898. Stacked on top of #197618; and below of #197860.

@llvmorg-github-actions
Copy link
Copy Markdown

llvmorg-github-actions Bot commented May 14, 2026

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: jofrn

Changes

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers `<2 x i8>` (SSE4.1+), `<2 x i16>`,
`<4 x i8>`, `<2 x i32>`, `<2 x float>`, `<4 x i16>`,
`<2 x ptr addrspace(270)>`.

Store-side counterpart to #148898. Stacked on top of #197618.


Full diff: https://github.com/llvm/llvm-project/pull/197619.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrCompiler.td (+99)
  • (modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+47-64)
diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index 6ab6f870f1bb8..b2a7bce8d7571 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1242,6 +1242,105 @@ def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
 def : Pat<(v4i32 (atomic_load_128_v4i32 addr:$src)),
           (VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>;
 
+// store atomic <2 x i8>
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (PEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[UseSSE41]>;
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (VPEXTRWmri addr:$dst, VR128:$src, 0)>, Requires<[HasAVX, NoBWI]>;
+def : Pat<(atomic_store_16
+            (i16 (trunc (i32 (extractelt
+                               (v4i32 (bitconvert (v16i8 VR128X:$src))),
+                               (iPTR 0))))),
+            addr:$dst),
+          (VPEXTRWZmri addr:$dst, VR128X:$src, 0)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i16>, <4 x i8>
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_32
+            (i32 (extractelt
+                   (v4i32 (bitconvert (v16i8 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPDI2DIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
+// store atomic <2 x i32,float>, <4 x i16>, <2 x ptr addrspace(270)>
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (MOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseSSE2]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QImr addr:$dst, VR128:$src)>, Requires<[UseAVX]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4i32 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v4f32 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+def : Pat<(atomic_store_64
+            (i64 (extractelt
+                   (v2i64 (bitconvert (v8i16 VR128X:$src))), (iPTR 0))),
+            addr:$dst),
+          (VMOVPQI2QIZmr addr:$dst, VR128X:$src)>, Requires<[HasAVX512]>;
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
           (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 659cdec91d3e7..91c4d0a3d8c1c 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -353,30 +353,37 @@ define void @store_atomic_vec1_double_align(ptr %x, <1 x double> %v) nounwind {
 }
 
 define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
-; CHECK-SSE-O3-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movw %ax, (%rdi)
-; CHECK-SSE-O3-NEXT:    retq
+; CHECK-SSE2-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O3:       # %bb.0:
+; CHECK-SSE2-O3-NEXT:    movd %xmm0, %eax
+; CHECK-SSE2-O3-NEXT:    movw %ax, (%rdi)
+; CHECK-SSE2-O3-NEXT:    retq
+;
+; CHECK-SSE4-O3-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O3:       # %bb.0:
+; CHECK-SSE4-O3-NEXT:    pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i8:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movw %ax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vpextrw $0, %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
-; CHECK-SSE-O0-LABEL: store_atomic_vec2_i8:
-; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    # kill: def $ax killed $ax killed $eax
-; CHECK-SSE-O0-NEXT:    movw %ax, (%rdi)
-; CHECK-SSE-O0-NEXT:    retq
+; CHECK-SSE2-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE2-O0:       # %bb.0:
+; CHECK-SSE2-O0-NEXT:    movd %xmm0, %eax
+; CHECK-SSE2-O0-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-SSE2-O0-NEXT:    movw %ax, (%rdi)
+; CHECK-SSE2-O0-NEXT:    retq
+;
+; CHECK-SSE4-O0-LABEL: store_atomic_vec2_i8:
+; CHECK-SSE4-O0:       # %bb.0:
+; CHECK-SSE4-O0-NEXT:    pextrw $0, %xmm0, (%rdi)
+; CHECK-SSE4-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i8:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    # kill: def $ax killed $ax killed $eax
-; CHECK-AVX-O0-NEXT:    movw %ax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vpextrw $0, %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i8> %v, ptr %x release, align 4
   ret void
@@ -385,26 +392,22 @@ define void @store_atomic_vec2_i8(ptr %x, <2 x i8> %v) {
 define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_i16:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movss %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i16:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovss %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_i16:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movd %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i16:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovd %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i16> %v, ptr %x release, align 4
   ret void
@@ -413,26 +416,22 @@ define void @store_atomic_vec2_i16(ptr %x, <2 x i16> %v) {
 define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_ptr270:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x ptr addrspace(270)> %v, ptr %x release, align 8
   ret void
@@ -441,26 +440,22 @@ define void @store_atomic_vec2_ptr270(ptr %x, <2 x ptr addrspace(270)> %v) {
 define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_i32_align:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x i32> %v, ptr %x release, align 8
   ret void
@@ -469,26 +464,22 @@ define void @store_atomic_vec2_i32_align(ptr %x, <2 x i32> %v) {
 define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec2_float_align:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec2_float_align:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec2_float_align:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec2_float_align:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <2 x float> %v, ptr %x release, align 8
   ret void
@@ -497,26 +488,22 @@ define void @store_atomic_vec2_float_align(ptr %x, <2 x float> %v) {
 define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec4_i8:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movss %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec4_i8:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O3-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovss %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec4_i8:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movd %xmm0, %eax
-; CHECK-SSE-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movd %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec4_i8:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovd %xmm0, %eax
-; CHECK-AVX-O0-NEXT:    movl %eax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovd %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <4 x i8> %v, ptr %x release, align 4
   ret void
@@ -525,26 +512,22 @@ define void @store_atomic_vec4_i8(ptr %x, <4 x i8> %v) nounwind {
 define void @store_atomic_vec4_i16(ptr %x, <4 x i16> %v) nounwind {
 ; CHECK-SSE-O3-LABEL: store_atomic_vec4_i16:
 ; CHECK-SSE-O3:       # %bb.0:
-; CHECK-SSE-O3-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O3-NEXT:    movlps %xmm0, (%rdi)
 ; CHECK-SSE-O3-NEXT:    retq
 ;
 ; CHECK-AVX-O3-LABEL: store_atomic_vec4_i16:
 ; CHECK-AVX-O3:       # %bb.0:
-; CHECK-AVX-O3-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O3-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O3-NEXT:    vmovlps %xmm0, (%rdi)
 ; CHECK-AVX-O3-NEXT:    retq
 ;
 ; CHECK-SSE-O0-LABEL: store_atomic_vec4_i16:
 ; CHECK-SSE-O0:       # %bb.0:
-; CHECK-SSE-O0-NEXT:    movq %xmm0, %rax
-; CHECK-SSE-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-SSE-O0-NEXT:    movq %xmm0, (%rdi)
 ; CHECK-SSE-O0-NEXT:    retq
 ;
 ; CHECK-AVX-O0-LABEL: store_atomic_vec4_i16:
 ; CHECK-AVX-O0:       # %bb.0:
-; CHECK-AVX-O0-NEXT:    vmovq %xmm0, %rax
-; CHECK-AVX-O0-NEXT:    movq %rax, (%rdi)
+; CHECK-AVX-O0-NEXT:    vmovq %xmm0, (%rdi)
 ; CHECK-AVX-O0-NEXT:    retq
   store atomic <4 x i16> %v, ptr %x release, align 8
   ret void

@github-actions
Copy link
Copy Markdown

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

Comment thread llvm/lib/Target/X86/X86InstrCompiler.td Outdated
(VMOVAPDZ128rm addr:$src)>, Requires<[HasAVX512]>;

// store atomic <2 x i8>
def : Pat<(atomic_store_16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there existing non-store patterns for this? Can do a better job avoiding duplication (i.e, there should be a PatFrag that covers atomic and non-atomic cases)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be good now.

@jofrn jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 876a35f to 740f199 Compare May 16, 2026 13:00
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from ac8361d to a730eaf Compare May 16, 2026 13:00
def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>;

// Matches either 'store' or 'atomic_store' (any alignment, any ordering).
def memstore : PatFrags<(ops node:$val, node:$ptr),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're adding atomic store support to alignedstore - would unalignedstore be better here for equivalence?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unalignedstore sounds good.

def extloadv16f16 : PatFrag<(ops node:$ptr), (extloadvf16 node:$ptr)>;

// Matches either 'store' or 'atomic_store' (any alignment, any ordering).
def memstore : PatFrags<(ops node:$val, node:$ptr),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name isn't clear, and this can go in the generic PatFrags (though that goes for a number of the PatFags here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe any_store_<size>?

@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from a730eaf to b20b8d4 Compare May 19, 2026 04:46
@jofrn jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 740f199 to 7637943 Compare May 19, 2026 04:46
@llvmorg-github-actions llvmorg-github-actions Bot added llvm:SelectionDAG SelectionDAGISel as well labels May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

🐧 Linux x64 Test Results

  • 195610 tests passed
  • 5239 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

🪟 Windows x64 Test Results

  • 134936 tests passed
  • 3303 tests skipped

✅ The build succeeded and all tests passed.

@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from b20b8d4 to 16bfe6a Compare May 19, 2026 08:08
@jofrn jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 7637943 to f6ebebc Compare May 19, 2026 13:28
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 16bfe6a to 7b09891 Compare May 19, 2026 13:28
@jofrn jofrn requested review from RKSimon and arsenm May 20, 2026 21:40
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 7b09891 to 1613d11 Compare May 20, 2026 21:49
@jofrn jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from e0ef9b7 to 63ef83c Compare May 20, 2026 21:59
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 1613d11 to 7fb4fcf Compare May 20, 2026 21:59
@jofrn jofrn force-pushed the users/jofrn/widen-vec-atomic-store branch from 63ef83c to fc66de1 Compare May 28, 2026 02:10
Copy link
Copy Markdown
Contributor

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix merge conflicts

@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 7fb4fcf to 3681145 Compare May 28, 2026 21:36
@jofrn jofrn requested a review from RKSimon May 29, 2026 06:12
Base automatically changed from users/jofrn/widen-vec-atomic-store to main June 1, 2026 20:41
jofrn added a commit that referenced this pull request Jun 1, 2026
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.

Store-side counterpart to #148897. Stacked on top of #197166; and below
of #197619.
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 3681145 to 149e8c0 Compare June 1, 2026 20:49
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
@jofrn jofrn force-pushed the users/jofrn/x86-remove-extra-mov-atomic-store branch from 149e8c0 to 27aec96 Compare June 1, 2026 20:51
@jofrn jofrn enabled auto-merge (squash) June 1, 2026 20:55
@jofrn jofrn merged commit 9e29f7d into main Jun 1, 2026
9 of 10 checks passed
@jofrn jofrn deleted the users/jofrn/x86-remove-extra-mov-atomic-store branch June 1, 2026 21:33
jofrn added a commit that referenced this pull request Jun 2, 2026
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.

Store-side counterpart to #165818. Stacked on top of #197619; and below
of #197861.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.

Store-side counterpart to llvm#165818. Stacked on top of llvm#197619; and below
of llvm#197861.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants