-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Cranelift: Big performance difference between two cases with the same semantics #7085
Description
Hi, I use the Emscripten to compile the attached test case(flops.fast.c) and run it on Wasmtime. And then I do a very small change(flops.slow.c, only move the location of two printf statements) in this case which don't affect the program semantics. However, the execution time of flops.slow.c is almost 4 slower than that of flops.fast.c (measured by time tool).
flops.fast.c: 0.615sflops.slow.c: 2.395s
And then I run these two cases in WasmEdge and Wamr with AOT mode, and there is almost not difference between the execution time, so I think this bug is not caused by Emscripten.
WasmEdge
flops.fast.c: 0.43sflops.slow.c: 0.43s
Wamr
flops.fast.c: 0.42sflops.slow.c: 0.42s
Try to Debug
I try to increase the number of iterations (the case only contains one loop) from 156250000 to 1562500000, and I find that the execution time will also increase by the same multiple(10), so I guess such small changes will affect the execution time of the loop.
Since Wasmtime will compile the wasm code to binary using Cranelift before execution, I dump the binary using wasmtime compile flops.fast.wasm -o flops.fast.o and check the difference of the loop between such two binary code but I don't find any semantics difference in my eyes.
# flops.fast.c
2f2: 8d 71 01 lea esi,[rcx+0x1]
2f5: c5 4b 59 25 73 04 00 vmulsd xmm12,xmm6,QWORD PTR [rip+0x473] # 770 <wasm[0]::function[3]+0x730>
2fc: 00
2fd: c4 41 53 5e fc vdivsd xmm15,xmm5,xmm12
302: c5 1b 2a e1 vcvtsi2sd xmm12,xmm12,ecx
306: c4 41 03 59 e4 vmulsd xmm12,xmm15,xmm12
30b: c4 c1 1b 59 fc vmulsd xmm7,xmm12,xmm12
310: c5 63 59 e7 vmulsd xmm12,xmm3,xmm7
314: c5 1b 58 e2 vaddsd xmm12,xmm12,xmm2
318: c4 41 43 59 e4 vmulsd xmm12,xmm7,xmm12
31d: c5 1b 58 e0 vaddsd xmm12,xmm12,xmm0
321: c4 41 43 59 e4 vmulsd xmm12,xmm7,xmm12
326: c4 41 1b 58 e6 vaddsd xmm12,xmm12,xmm14
32b: c4 41 43 59 e4 vmulsd xmm12,xmm7,xmm12
330: c4 41 1b 58 e5 vaddsd xmm12,xmm12,xmm13
335: c4 41 43 59 e4 vmulsd xmm12,xmm7,xmm12
33a: f3 44 0f 6f 44 24 10 movdqu xmm8,XMMWORD PTR [rsp+0x10]
341: c4 41 1b 58 e0 vaddsd xmm12,xmm12,xmm8
346: c4 41 43 59 e4 vmulsd xmm12,xmm7,xmm12
34b: c4 41 1b 58 e1 vaddsd xmm12,xmm12,xmm9
350: c4 41 23 58 cc vaddsd xmm9,xmm11,xmm12
355: 81 fe 90 2f 50 09 cmp esi,0x9502f90
35b: 0f 84 08 00 00 00 je 369 <wasm[0]::function[3]+0x329>
361: 48 89 f1 mov rcx,rsi
364: e9 89 ff ff ff jmp 2f2 <wasm[0]::function[3]+0x2b2>
# flops.slow.c
2c5: 8d 7e 01 lea edi,[rsi+0x1]
2c8: c5 1b 2a e6 vcvtsi2sd xmm12,xmm12,esi
2cc: c5 1b 59 25 8c 04 00 vmulsd xmm12,xmm12,QWORD PTR [rip+0x48c] # 760 <wasm[0]::function[3]+0x720>
2d3: 00
2d4: c4 41 1b 59 e4 vmulsd xmm12,xmm12,xmm12
2d9: c4 41 33 59 ec vmulsd xmm13,xmm9,xmm12
2de: c4 41 13 58 e8 vaddsd xmm13,xmm13,xmm8
2e3: c4 41 1b 59 ed vmulsd xmm13,xmm12,xmm13
2e8: c5 13 58 ee vaddsd xmm13,xmm13,xmm6
2ec: c4 41 1b 59 ed vmulsd xmm13,xmm12,xmm13
2f1: c5 13 58 ed vaddsd xmm13,xmm13,xmm5
2f5: c4 41 1b 59 ed vmulsd xmm13,xmm12,xmm13
2fa: c5 13 58 ec vaddsd xmm13,xmm13,xmm4
2fe: c4 41 1b 59 ed vmulsd xmm13,xmm12,xmm13
303: c5 13 58 eb vaddsd xmm13,xmm13,xmm3
307: c4 41 1b 59 e5 vmulsd xmm12,xmm12,xmm13
30c: c4 41 1b 58 d2 vaddsd xmm10,xmm12,xmm10
311: c5 2b 58 15 57 04 00 vaddsd xmm10,xmm10,QWORD PTR [rip+0x457] # 770 <wasm[0]::function[3]+0x730>
318: 00
319: 81 ff 90 2f 50 09 cmp edi,0x9502f90
31f: 0f 84 08 00 00 00 je 32d <wasm[0]::function[3]+0x2ed>
325: 48 89 fe mov rsi,rdi
328: e9 98 ff ff ff jmp 2c5 <wasm[0]::function[3]+0x285>
Steps to Reproduce
- Compile the two C cases into Wasm:
emcc -O2 -s WASM=1 -s TOTAL_MEMORY=512MB flops.fast.c -o flops.fast.wasm
emcc -O2 -s WASM=1 -s TOTAL_MEMORY=512MB flops.slow.c -o flops.slow.wasm
- Run wasm in
Wasmtimeand collect execution time usingtime
time wasmtime flops.fast.wasm
time wasmtime flops.slow.wasm
Versions and Environment
-
Wasmtime: f10d665 -
WasmEdge: 777d8b4ba50567b3b39aa8c9d81a6dfade355af8 -
Wamr: 1ff41ebdc257b45c9f574d8d1f7ee6df4be79645 -
Emscripten:
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.45 (ef3e4e3b044de98e1811546e0bc605c65d3412f4)
clang version 18.0.0 (https://github.com/llvm/llvm-project d1e685df45dc5944b43d2547d0138cd4a3ee4efe)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/ringzzz/emsdk/upstream/bin -
kernel version: Linux 5.15.0-83-generic
