[x86 & wasm] Split up double saturating-narrows from i32 #7280

rootjalex · 2023-01-16T23:46:36Z

We can get much better codegen for double saturating narrows from i32 on x86 and wasm. HVX and ARM backends both already do this. Also added tests to simd_op_check

Fixes #7069

Here's an example from x86 (wasm is similar):

Var x("x");
ImageParam in(Int(32), 1);
Func f("f");

f(x) = u8_sat(in(x));
f.vectorize(x, 32);

Target x86("x86-64-linux-avx-avx2-fma-sse41");
f.compile_to_assembly("vpack.asm", f.infer_arguments(), x86);

Previously:

	vpminsd	-96(%rax,%rcx,4), %ymm0, %ymm2
	vpminsd	-64(%rax,%rcx,4), %ymm0, %ymm3
	vpminsd	-32(%rax,%rcx,4), %ymm0, %ymm4
	vpminsd	(%rax,%rcx,4), %ymm0, %ymm5
	vpmaxsd	%ymm1, %ymm2, %ymm2
	vpmaxsd	%ymm1, %ymm3, %ymm3
	vpackusdw	%ymm3, %ymm2, %ymm2
	vpmaxsd	%ymm1, %ymm4, %ymm3
	vpmaxsd	%ymm1, %ymm5, %ymm4
	vpackusdw	%ymm4, %ymm3, %ymm3
	vpermq	$216, %ymm3, %ymm3              # ymm3 = ymm3[0,2,1,3]
	vpermq	$216, %ymm2, %ymm2              # ymm2 = ymm2[0,2,1,3]
	vpackuswb	%ymm3, %ymm2, %ymm2
	vpermq	$216, %ymm2, %ymm2              # ymm2 = ymm2[0,2,1,3]
	vmovdqu	%ymm2, (%r8,%rcx)

Now:

	vmovdqu	-96(%rax,%rcx,4), %ymm0
	vmovdqu	-32(%rax,%rcx,4), %ymm1
	vpackssdw	-64(%rax,%rcx,4), %ymm0, %ymm0
	vpackssdw	(%rax,%rcx,4), %ymm1, %ymm1
	vpermq	$216, %ymm0, %ymm0              # ymm0 = ymm0[0,2,1,3]
	vpermq	$216, %ymm1, %ymm1              # ymm1 = ymm1[0,2,1,3]
	vpackuswb	%ymm1, %ymm0, %ymm0
	vpermq	$216, %ymm0, %ymm0              # ymm0 = ymm0[0,2,1,3]
	vmovdqu	%ymm0, (%r8,%rcx)

steven-johnson · 2023-01-17T17:26:22Z

https://buildbot.halide-lang.org/master/#/builders/42/builds/693 looks like perhaps a real failure

steven-johnson

https://buildbot.halide-lang.org/master/#/builders/42/builds/693 looks like a problem

rootjalex · 2023-01-17T17:28:20Z

@steven-johnson I see the same failure on #7279 , so I don't think either PR is responsible for the failure
https://buildbot.halide-lang.org/master/#/builders/42/builds/692

steven-johnson · 2023-01-17T17:32:48Z

@steven-johnson I see the same failure on #7279 , so I don't think either PR is responsible for the failure https://buildbot.halide-lang.org/master/#/builders/42/builds/692

great, it's probably another LLVM injection :-/

Let me try to confirm that first

steven-johnson · 2023-01-17T18:09:11Z

The predicated-load failure isn't happening for me locally with top-of-tree LLVM, so maybe it's a temporary flake; I'm forcing rebuilds on the x64 bots to see if it recurs

…nto rootjalex/x86-double-sat

rootjalex · 2023-01-19T16:19:20Z

Only failing test appears unrelated. @steven-johnson think it’s good to go?

steven-johnson · 2023-01-19T17:57:53Z

Failure is vectorized_gpu_allocation, which I've never seen before as a flake or even an ordinary failure, so let me retry it just a bit first.

steven-johnson · 2023-01-19T20:22:34Z

The failure is now in our old friend, correctness_atomics, aka "Mr. Flaky", so I think we're good to go

* better x86 double sat-cast + add test * fix wasm too + test Co-authored-by: Steven Johnson <[email protected]>

rootjalex added 2 commits January 16, 2023 12:54

better x86 double sat-cast + add test

2270685

fix wasm too + test

c400ecb

rootjalex requested review from abadams and steven-johnson January 16, 2023 23:46

rootjalex added the performance label Jan 16, 2023

abadams approved these changes Jan 17, 2023

View reviewed changes

steven-johnson requested changes Jan 17, 2023

View reviewed changes

steven-johnson and others added 4 commits January 17, 2023 10:24

trigger buildbots

739f8f6

trigger buildbots

97f9d39

trigger buildbots

404e1b5

Merge branch 'rootjalex/x86-double-sat' of github.com:halide/Halide i…

ee2894d

…nto rootjalex/x86-double-sat

rootjalex requested a review from steven-johnson January 19, 2023 16:19

steven-johnson approved these changes Jan 19, 2023

View reviewed changes

Merge branch 'main' into rootjalex/x86-double-sat

52d349e

rootjalex merged commit bafd60f into main Jan 20, 2023

rootjalex deleted the rootjalex/x86-double-sat branch January 20, 2023 18:03

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

[x86 & wasm] Split up double saturating-narrows from i32 (halide#7280)

3ed06f7

* better x86 double sat-cast + add test * fix wasm too + test Co-authored-by: Steven Johnson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[x86 & wasm] Split up double saturating-narrows from i32 #7280

[x86 & wasm] Split up double saturating-narrows from i32 #7280

Uh oh!

rootjalex commented Jan 16, 2023

Uh oh!

steven-johnson commented Jan 17, 2023

Uh oh!

steven-johnson left a comment

Uh oh!

rootjalex commented Jan 17, 2023

Uh oh!

steven-johnson commented Jan 17, 2023 •

edited

Loading

Uh oh!

steven-johnson commented Jan 17, 2023

Uh oh!

rootjalex commented Jan 19, 2023

Uh oh!

steven-johnson commented Jan 19, 2023

Uh oh!

steven-johnson commented Jan 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[x86 & wasm] Split up double saturating-narrows from i32 #7280

[x86 & wasm] Split up double saturating-narrows from i32 #7280

Uh oh!

Conversation

rootjalex commented Jan 16, 2023

Uh oh!

steven-johnson commented Jan 17, 2023

Uh oh!

steven-johnson left a comment

Choose a reason for hiding this comment

Uh oh!

rootjalex commented Jan 17, 2023

Uh oh!

steven-johnson commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steven-johnson commented Jan 17, 2023

Uh oh!

rootjalex commented Jan 19, 2023

Uh oh!

steven-johnson commented Jan 19, 2023

Uh oh!

steven-johnson commented Jan 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

steven-johnson commented Jan 17, 2023 •

edited

Loading