Win64 AVX misalignment bug patch #1209#8317
Conversation
|
You might want to check your git settings, it seems you might not have it setup properly as it is picking up |
|
Another note, your issue tag in the commit message is not pointing to the right repository, it's pointing to issue 1209 in this repo instead of msys2/MSYS2-packages#1209 you might want to move that to the commit message as well |
|
@1480c1 , I used the wrong link for the issue in the comment line of the commit. No idea how to correct that. |
|
you can run |
68b88cd to
6563d8a
Compare
|
This needs to go through upstream (binutils or gcc).I don't know enough to merge this. If you propose this upstream and get feedback we might be able to backport it before the next release though. |
|
It is highly unlikely, that this patch will be accepted be GNU binutils. This patch is a pragmatic approch to cirumvent the unfortunate mismatch between the the Windows 64 ABI and gcc/binutils. It is also unlikely that this mismatch will be handled anytime soon by the gcc developers. The corresonding issues are open for quite some time now. Up to now it is not possible to safely use -mavx for gcc 64-bit on windows due to misalignment segfaults. |
|
I understand that upstreaming things can be frustrating, but we have to make sure that our packages stay maintainable and don't diverge too much from upstream. And with this being a core package we have to be extra careful. |
|
I have asked kai teitz for his opinion on this PR. IMHO not completely broken is better than broken. But let's see. |
|
I tested patching these opcodes from binutils 2.34 on. Practice has shown that using sed scripts is much more stable than using fragile patch files in this specific case. It can also shown, that the binary output compared to the default settings exclusively replaces aligned with unaligned instructions. The questions which should be discussed is: do we want to perform this kind of hack? |
|
Maybe behind an option? |
|
An option for PKBUILD? It would be nice to have an option for ld.exe. However, in this case I have no ideo how to accomplish that. |
|
We can ask people if they agree to the idea for advice. There are enough toolchain experts around here to help with the implementation too, i hope. |
|
Hmm, I thought there were recently patches done to gcc for better support of aligned variables for AVX. |
|
Are there GCC commits for better support of aligned or for unaligned variables? It would be interesting to hear, that unaligned addresses don't segfault with newer gcc's with aligned instructions, but I could not find anything about that. It is said, that there is no significant performance loss on aligned adresses with unaligned compared to aligned instructions. And in case of unaligned addresses bad performance is better than segfaults. So in principle patching gcc is the better approach I agree. Would not help for written assembler code though. |
|
well, it might be that fix at bugzilla/PR target/99234 helped here |
|
I am not sure if it address it in all aspects, I doubt it to some extend, but it should bring things here in a better state also for alignment. Anyway I can guide you through producing a proper binutils/gcc patch, if you like. I have some experience in that area |
|
Since i have no idea how to achieve this goal with GCC patches, i will need all the help i can get. |
|
Here is a quote from Intel: This PR was created according to this recommendation (last paragraph). When GCC creates no longer misaligned code by itself this patch could be removed again. |
|
Having said this, if this PR will not be accepted as it is I will close it. Patches of different kind to GCC and/or binutils should land in another PR IMHO. |
|
Hey Carl,
I attached a patch, which should address this issue for mingw targets. The
patch is against current head of gcc, and it is just for testing purposes.
For finalizing this patch, we will need to adjust tests about mova/movu
issues for mingw (maybe cygwin too?), and add general tests for the new
options -mno-align-vector-insn and -malign-vector-insn.
Regards,
Kai
…On Tue, Apr 20, 2021 at 11:16 AM carlkl ***@***.***> wrote:
Having said this, if this PR will not be accepted as it is I will close
it. Alternative patches to GCC should land in another PR IMHO.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8317 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJLMJCLIHO5C5G7A4JXWR3TJVA6RANCNFSM42U3S7SA>
.
|
|
@katietz, did I missed something? There is no attachment as far I can see. |
|
Patch file for gcc |
|
Hmm |
|
Could you send me an Email address... I can't attach the Patch file |
|
Perhaps you could try uploading it to gist.github.com and sending a link to it? |
|
The patch works with a small change (see comment in the gist) |
|
Applying this patch doesn't fix the bug at all. |
|
@MehdiChinoune, could you explain this a little bit more specific? Diid you apply the alternative @katietz gcc patch or the binutils opcode patch? Can you give an example where your bug applies? In theory the binutils patch should avoid Segfaults, not introduce new ones. |
I have applied the binutils opcode. nothing has changed still segfaults with avx and avx2 |
|
@MehdiChinoune, you first applied this patch to binutils, then (2) compiled binutils with this patch and then (3) installed your newly created binutils package via pacman to your Msys2 environment, right? |
Yes, Of course. I am not new to MSYS2/MinGW64. |
|
However, without further informations it will not be possible to investigate further into the bug you found. |
// compile command: gcc -O0 -m64 -mavx -o bug.exe bug.c
typedef double v4d __attribute__ ((vector_size (32), aligned (32)));
v4d f (v4d x)
{
return x;
}
int main ()
{
v4d x = { 1.0, 2.0, 3.0, 4.0, };
v4d r = f (x);
}
Is this difficult to do! |
|
My test with the binutils patch shows this: I didn't encounter segfaults with bug.exe in case of patched binutils. Unpatched I get: with segfaults |
I think you are still using binutils 2.36, right! - sed -i '/vmovapd/s/0x6628/0x6610/g;/vmovaps/s/0x28/0x10/g;/vmovdqa/s/0x666f/0xf36f/g;/vmovdqa/s/0x666F/0xF36F/g' opcodes/i386-tbl.h
+ sed -i '/vmovapd/s/0x28/0x10/g;/vmovaps/s/0x28/0x10/g;/vmovdqa/s/0x666f/0xf36f/g;/vmovdqa/s/0x666F/0xF36F/g' opcodes/i386-tbl.hbinutils have |
|
It is correct, the patch should be updated to the recent version of binutils that is used right now by Msys2. I test this I soon I can. Feel free to test yourself and try again.
EDIT: binutils-2.3.7 i386-opc.tbl: The |
Did you read my comment? The line that should be fixed is the one for opcodes/i386-tbl.h Anyway, After applying the modified lines, Things look good. the example no longer segfaults. |
|
I see; I will update the PR for binutils-2.3.7. Good to here that it works for you with this simple example. I am looking forward for measures for real live examples. |
|
While the binutils patch fix that little example, other applications segfaults. |
|
Qt5/Qt6 Applications no longer crashes after applying the gcc patch. |
|
@MehdiChinoune, congrats to this success. As you tested Kai's patch do you want to create a new PR with reference to the original gist? Did you test with Kai's patch only or with the binutils patch as well? |
I don't want to mess with something I don't understand. The compiler is a critical component of the subsystem.
without binutils patch. |
|
If you want to maintain Kai's patch please let me know. (EDIT: Please ignore, I didn't read your last message before sending this comment.) |
Fell free to close it. |
|
patching opcodes in GCC's code generation is the better approach. |
FYI, this code still segfaults when built/run under msys2. |
MINGW64 and UCRT64: msys2/MSYS2-packages#1209
Unconditionally replaces the opcodes for aligned AVX instructions with the opcodes of matching unaligned instructions in the assembler stage. There should be no loss of performance for aligned memory accesses on moderate modern processors. Unaligned memory access will not segfault anymore.
So-called not-temporal streaming instructions are not handled in this PR as they have no unaligned counterpart. These instructions are used much less commonly used. Falling back to SSE2 streaming instructions is the recommended approach for this use case.