-
Notifications
You must be signed in to change notification settings - Fork 416
[ARM64] Clobbering x0-x15 Breaks Down on Clang #421
Copy link
Copy link
Closed
Labels
Description
Dear BLIS developers,
This seems a duplicate of #207 . Sorry for having opened another issue.
The current arm64 kernels and new armsve kernel play well with GCC but breaks down under Clang with:
kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c:73:1: fatal error: inline assembly requires more registers than available.
Toolchain:
- LLVM 6.0 + Clang or
- ARM Allinea Studio 20.1;
Possible Reason:
Number of input registers + Number of clobbers becomes larger than 30?
Workaround:
Comment out clobbering of x0-x15. These general-purpose registers can be discarded w.r.t. the AArch64 calling convention and results of C code lying between function entry and __asm__ volatile are saved to memory as input registers. There should be no hazard on doing this.
e.g.
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:1068:
:// Register clobber list
> // "x0", "x1", "x2","x3","x4",
> // "x5", "x6", "x7", "x8",
> // "x9", "x10","x11","x12",
> // "x13","x14","x15",
"x16","x17","x18","x19",
"x20","x21","x22","x23",
"x24","x25","x26","x27",
"v0", "v1", "v2", "v3",
"v4", "v5", "v6", "v7",
"v8", "v9", "v10","v11",
"v12","v13","v14","v15",
"v16","v17","v18","v19",
"v20","v21","v22","v23",
"v24","v25","v26","v27",
"v28","v29","v30","v31"
...
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:2061:
:// Register clobber list
> //"x0","x1","x2","x3",
> //"x4","x5","x6",
> //"x7","x8","x9",
> //"x10","x11","x12","x13","x14",
"x16","x17",
"x20","x21","x22","x23","x24","x25","x26",
"x27",
"v0","v1","v2",
"v3","v4","v5",
"v6","v7","v8",
"v9","v10","v11",
"v12","v13","v14",
"v15","v16","v17","v18","v19",
"v20","v21","v22","v23",
"v24","v25","v26","v27",
"v28","v29","v30","v31"
Reactions are currently unavailable