Conversation
See the GCC doc concerning priorities.
This only affects compiling with cpu-specific flags, not anything in the micro-kernel.
power8 follows power9 in just using the reference micro-kernel. Also fix an unused variable warning.
This needs fixing properly somehow, but using -O3 (at least with gcc 8.3),
we get this:
Program received signal SIGILL, Illegal instruction.
0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
at ref_kernels/bli_cntx_ref.c:456
456 for ( i = 0; i < BLIS_NUM_LEVEL3_OPS; ++i ) vfuncs[ i ] = NULL;
(gdb) bt
#0 0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
at ref_kernels/bli_cntx_ref.c:456
flame#1 0x000000001004c0a8 in bli_cntx_init_power9 (cntx=<optimized out>)
at config/power9/bli_cntx_init_power9.c:42
flame#2 0x000000001003c85c in bli_gks_register_cntx (id=BLIS_ARCH_POWER9,
nat_fp=0x1004c090 <bli_cntx_init_power9>,
ref_fp=0x1004c0d0 <bli_cntx_init_power9_ref>, ind_fp=<optimized out>)
at frame/base/bli_gks.c:373
flame#3 0x000000001003c97c in bli_gks_init () at frame/base/bli_gks.c:155
flame#4 0x000000001003cfe8 in bli_init_apis () at frame/base/bli_init.c:78
flame#5 0x00007ffff7e045a8 in __pthread_once_slow () from /lib64/libpthread.so.0
flame#6 0x00000000100492e8 in bli_pthread_once (once=<optimized out>,
init=<optimized out>) at frame/thread/bli_pthread.c:314
flame#7 0x000000001003d138 in bli_init_once () at frame/base/bli_init.c:104
flame#8 bli_init_auto () at frame/base/bli_init.c:54
flame#9 0x0000000010011300 in cdotc_ (n=<optimized out>, x=<optimized out>,
incx=<optimized out>, y=<optimized out>, incy=<optimized out>)
at frame/compat/bla_dot.c:89
flame#10 0x0000000010002a48 in check2_ (sfac=0x103d14dc <sfac>)
at blastest/src/cblat1.c:529
flame#11 0x0000000010001ef4 in main () at blastest/src/cblat1.c:112
|
I should have referenced #322. |
Environment variable BLIS_CORETYPE is examined for a value matching an element of the relevant family. This overrides cpuid-style selection (similarly to the mechanism in OpenBLAS). This needs to be documented. It could be used for explicit runtime selection without a cpuid-type implementation.
Details: - Environment variable BLIS_CORETYPE is examined for a value matching an element of the relevant family (e.g. "skx"). This overrides cpuid-style micro-architecture selection (similarly to the mechanism in OpenBLAS). It needs to be documented, possibly for explicit runtime selection without a cpuid-type implementation, or for testing.
See the GCC doc about -march, -mtune, and -mpu and maybe https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu
Tighten the pattern to avoid generating garbage in the compilation command from auto configuration with the POWER code, and sort to avoid the environment-checking code duplicating the definitions.
|
@loveshack Thanks for your efforts, Dave. This is an unusually busy time for us, so it will be a bit before I can take a look at these modifications. In the meantime, if you aren't already doing so, I would suggest looking into the AppVeyor failures to see if those are related to the commits on this PR branch or if (perhaps less likely) they are being caused by something else (environment/hardware changes at AppVeyor, for example). |
|
I fixed the failure, after confusion why it seemed MS Windows-specific.
I don't know how the export spec got lost.
|
Also remove unnecessary headers.
Mainly for BLIS_CORETYPE architecture selection.
Avoids test numerical errors (with gcc 9).
This makes a significant difference for dgemm built with -march=native on haswell. Assume that loop nests don't need restructuring, but do need unwinding the inner loop, and we also want to unroll non-nested loops.
Uses the generic kernel, but with configuration for z13 and z14+ which can be used dynamically to use vx (vector instructions) -- double for z13, double and single for z14+.
It can't be necessary, and broke on f31 390x due to a gcc bug. Also it might not work with other compilers that may support -fopenmp-simd.
Set BLIS_NUM_ARCHS automatically by inserting it as the last member in the enum. Also remove `=0` on first enum member since it is guaranteed by the standard. This make is easier and less error-prone to reorder enum entries. This change was also included in changes I made to #344 and #345 in remerging them, but I thought it important to make this change earlier rather than later.
devinamatthews
left a comment
There was a problem hiding this comment.
Lots of ARM stuff, please separate this out into a separate PR.
| // NOTE: This value must be updated to reflect the number of enum values | ||
| // listed above for arch_t! | ||
| #define BLIS_NUM_ARCHS 25 | ||
|
|
There was a problem hiding this comment.
This isn't needed anymore, see last entry in arch_t.
There was a problem hiding this comment.
Can you explain this comment, Devin?
There was a problem hiding this comment.
In another PR, I made BLIS_NUM_ARCHS a member of the arch_t enum (the last one) so that it will always be up-to-date.
There was a problem hiding this comment.
Gotcha. I hadn't seen that yet.
| COPTFLAGS := -O0 | ||
| else | ||
| COPTFLAGS := -O2 | ||
| COPTFLAGS := -O2 -mcpu=cortrex-a15 |
There was a problem hiding this comment.
This is a bit off-topic. Any ARM-related stuff should be moved to its own PR.
| +endif | ||
|
|
||
| # Store all of the variables here to new variables containing the | ||
| # configuration name. |
There was a problem hiding this comment.
I think this file is "junk", right?
There was a problem hiding this comment.
Looks junky to me. What is a .rej file, anyway?
| @@ -0,0 +1 @@ | |||
| #define BLIS_SIMD_SIZE 16 // 128-bit SIMD | |||
There was a problem hiding this comment.
@fgvanzee's gonna want the whole copyright header in here, maybe some additional boilerplate.
There was a problem hiding this comment.
Yes, license header please kthnx.
| @@ -0,0 +1 @@ | |||
| #define BLIS_SIMD_SIZE 16 // 128-bit SIMD | |||
| bli_gks_register_cntx( BLIS_ARCH_A64FX, bli_cntx_init_a64fx, | ||
| bli_cntx_init_a64fx_ref, | ||
| bli_cntx_init_a64fx_ind ); | ||
| #endif |
| }; | ||
| enum | ||
| { | ||
| MODEL_Z900 = 0, |
|
|
||
| #ifdef BLIS_CONFIG_A64FX | ||
| CNTX_INIT_PROTS( a64fx ) | ||
| #endif |
| #ifdef BLIS_FAMILY_ARM32 | ||
| #include "bli_family_arm32.h" | ||
| #endif | ||
|
|
| BLIS_ARCH_CORTEXA53, | ||
| BLIS_ARCH_CORTEXA15, | ||
| BLIS_ARCH_CORTEXA9, | ||
| BLIS_ARCH_A64FX, |
|
Somehow, notification of Devin's reviews made it into my inbox. Anyhow, while I'm here... |
This implements a Linux-specific, compiler-neutral selection, unlike the (possibly?) gcc-specific simple one you didn't like. It currently only affects the compilation of generic code.
Note that there's a strange problem I've had to kludge round by lowering the optimization levels noted in a log message, but it seems worth recording this now. I can only try with Fedora 28's gcc 8.3 (on power8) currently, but will try to get the system updated to a supported Fedora. I might be able to get access to power9, but haven't tried.
This was branched from the arm64 equivalent as there originally was a merge clash with bits I removed.