Skip to content

POWER micro-architecture dispatch#345

Open
loveshack wants to merge 97 commits intoflame:masterfrom
loveshack:power
Open

POWER micro-architecture dispatch#345
loveshack wants to merge 97 commits intoflame:masterfrom
loveshack:power

Conversation

@loveshack
Copy link
Copy Markdown
Contributor

This implements a Linux-specific, compiler-neutral selection, unlike the (possibly?) gcc-specific simple one you didn't like. It currently only affects the compilation of generic code.

Note that there's a strange problem I've had to kludge round by lowering the optimization levels noted in a log message, but it seems worth recording this now. I can only try with Fedora 28's gcc 8.3 (on power8) currently, but will try to get the system updated to a supported Fedora. I might be able to get access to power9, but haven't tried.

This was branched from the arm64 equivalent as there originally was a merge clash with bits I removed.

loveshack and others added 9 commits September 23, 2019 11:08
See the GCC doc concerning priorities.
This only affects compiling with cpu-specific flags, not anything in
the micro-kernel.
power8 follows power9 in just using the reference micro-kernel.

Also fix an unused variable warning.
This needs fixing properly somehow, but using -O3 (at least with gcc 8.3),
we get this:

Program received signal SIGILL, Illegal instruction.
0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
    at ref_kernels/bli_cntx_ref.c:456
456             for ( i = 0; i < BLIS_NUM_LEVEL3_OPS; ++i ) vfuncs[ i ] = NULL;
(gdb) bt
#0  0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
    at ref_kernels/bli_cntx_ref.c:456
flame#1  0x000000001004c0a8 in bli_cntx_init_power9 (cntx=<optimized out>)
    at config/power9/bli_cntx_init_power9.c:42
flame#2  0x000000001003c85c in bli_gks_register_cntx (id=BLIS_ARCH_POWER9,
    nat_fp=0x1004c090 <bli_cntx_init_power9>,
    ref_fp=0x1004c0d0 <bli_cntx_init_power9_ref>, ind_fp=<optimized out>)
    at frame/base/bli_gks.c:373
flame#3  0x000000001003c97c in bli_gks_init () at frame/base/bli_gks.c:155
flame#4  0x000000001003cfe8 in bli_init_apis () at frame/base/bli_init.c:78
flame#5  0x00007ffff7e045a8 in __pthread_once_slow () from /lib64/libpthread.so.0
flame#6  0x00000000100492e8 in bli_pthread_once (once=<optimized out>,
    init=<optimized out>) at frame/thread/bli_pthread.c:314
flame#7  0x000000001003d138 in bli_init_once () at frame/base/bli_init.c:104
flame#8  bli_init_auto () at frame/base/bli_init.c:54
flame#9  0x0000000010011300 in cdotc_ (n=<optimized out>, x=<optimized out>,
    incx=<optimized out>, y=<optimized out>, incy=<optimized out>)
    at frame/compat/bla_dot.c:89
flame#10 0x0000000010002a48 in check2_ (sfac=0x103d14dc <sfac>)
    at blastest/src/cblat1.c:529
flame#11 0x0000000010001ef4 in main () at blastest/src/cblat1.c:112
@loveshack
Copy link
Copy Markdown
Contributor Author

I should have referenced #322.

Environment variable BLIS_CORETYPE is examined for a value matching an
element of the relevant family.  This overrides cpuid-style selection
(similarly to the mechanism in OpenBLAS).

This needs to be documented.  It could be used for explicit runtime
selection without a cpuid-type implementation.
Details:
- Environment variable BLIS_CORETYPE is examined for a value matching
  an element of the relevant family (e.g. "skx").  This overrides
  cpuid-style micro-architecture selection (similarly to the mechanism
  in OpenBLAS).

  It needs to be documented, possibly for explicit runtime selection
  without a cpuid-type implementation, or for testing.
Tighten the pattern to avoid generating garbage in the compilation
command from auto configuration with the POWER code, and sort to avoid
the environment-checking code duplicating the definitions.
@fgvanzee
Copy link
Copy Markdown
Member

fgvanzee commented Oct 4, 2019

@loveshack Thanks for your efforts, Dave. This is an unusually busy time for us, so it will be a bit before I can take a look at these modifications.

In the meantime, if you aren't already doing so, I would suggest looking into the AppVeyor failures to see if those are related to the commits on this PR branch or if (perhaps less likely) they are being caused by something else (environment/hardware changes at AppVeyor, for example).

@loveshack
Copy link
Copy Markdown
Contributor Author

loveshack commented Oct 6, 2019 via email

Avoids test numerical errors (with gcc 9).
This makes a significant difference for dgemm built with -march=native
on haswell.  Assume that loop nests don't need restructuring, but do
need unwinding the inner loop, and we also want to unroll non-nested
loops.
Uses the generic kernel, but with configuration for z13 and z14+ which
can be used dynamically to use vx (vector instructions) -- double for
z13, double and single for z14+.
It can't be necessary, and broke on f31 390x due to a gcc bug.  Also
it might not work with other compilers that may support -fopenmp-simd.
devinamatthews added a commit that referenced this pull request Sep 7, 2020
Set BLIS_NUM_ARCHS automatically by inserting it as the last member in the enum. Also remove `=0` on first enum member since it is guaranteed by the standard. This make is easier and less error-prone to reorder enum entries.

This change was also included in changes I made to #344 and #345 in remerging them, but I thought it important to make this change earlier rather than later.
Copy link
Copy Markdown
Member

@devinamatthews devinamatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of ARM stuff, please separate this out into a separate PR.

// NOTE: This value must be updated to reflect the number of enum values
// listed above for arch_t!
#define BLIS_NUM_ARCHS 25

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed anymore, see last entry in arch_t.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this comment, Devin?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR, I made BLIS_NUM_ARCHS a member of the arch_t enum (the last one) so that it will always be up-to-date.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I hadn't seen that yet.

COPTFLAGS := -O0
else
COPTFLAGS := -O2
COPTFLAGS := -O2 -mcpu=cortrex-a15
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit off-topic. Any ARM-related stuff should be moved to its own PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

+endif

# Store all of the variables here to new variables containing the
# configuration name.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is "junk", right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks junky to me. What is a .rej file, anyway?

@@ -0,0 +1 @@
#define BLIS_SIMD_SIZE 16 // 128-bit SIMD
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgvanzee's gonna want the whole copyright header in here, maybe some additional boilerplate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, license header please kthnx.

@@ -0,0 +1 @@
#define BLIS_SIMD_SIZE 16 // 128-bit SIMD
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

bli_gks_register_cntx( BLIS_ARCH_A64FX, bli_cntx_init_a64fx,
bli_cntx_init_a64fx_ref,
bli_cntx_init_a64fx_ind );
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic.

};
enum
{
MODEL_Z900 = 0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MODEL_Z900 handled?


#ifdef BLIS_CONFIG_A64FX
CNTX_INIT_PROTS( a64fx )
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic.

#ifdef BLIS_FAMILY_ARM32
#include "bli_family_arm32.h"
#endif

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic.

BLIS_ARCH_CORTEXA53,
BLIS_ARCH_CORTEXA15,
BLIS_ARCH_CORTEXA9,
BLIS_ARCH_A64FX,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic.

@fgvanzee
Copy link
Copy Markdown
Member

Somehow, notification of Devin's reviews made it into my inbox. Anyhow, while I'm here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants