Arm micro-architecture dispatch by loveshack · Pull Request #344 · flame/blis

loveshack · 2019-09-24T09:49:27Z

This provides a cpuid-like mechanism for arm64, per #299.
After doing it, I realized it presumably has limited effect because it doesn't affect block sizes, though it does affect the ISA for ThunderX2, which I haven't been able to try yet. (OpenBLAS does have different kernel configurations for some of the micro-arches, but I don't know if significantly different.)
There are also comments about arm32 implementation if anyone is interested.

This only affects compiling with cpu-specific flags, not anything in the micro-kernel.

Minor changes, as it was mostly there already. Fixme: Find out whether or not there's a way to avoid reading /proc, as for arm64. Otherwise, select the cpuinfo entry for the core we're actually running on in case of big.little etc.

devinamatthews · 2020-08-07T15:04:06Z

@fgvanzee we should revisit this.

fgvanzee · 2020-08-07T20:14:28Z

@loveshack What is the status of this PR? Is it ready to go as far as you know?

loveshack · 2020-08-17T11:24:22Z

@loveshack What is the status of this PR? Is it ready to go as far as you know?

Probably not now. It was all tested to the extent I had virtual hardware, but needs merging and testing again. It also needs an addition for a64fx/sve, presumably, and other micro-arches, like neoverse. It's unclear to me to what extent aarch64 variants should be configured individually.

devinamatthews · 2020-08-17T14:48:42Z

@loveshack Leaving out some uarches for now is fine. Getting the base code in is what is important from this PR I think. I did a remerge recently but you should check my changes.

Not exposed as family yet. Also treat ThunderX3 as ThunderX2.

Set BLIS_NUM_ARCHS automatically by inserting it as the last member in the enum. Also remove `=0` on first enum member since it is guaranteed by the standard. This make is easier and less error-prone to reorder enum entries. This change was also included in changes I made to #344 and #345 in remerging them, but I thought it important to make this change earlier rather than later.

devinamatthews · 2020-10-20T14:57:31Z

@loveshack is this mergeable? I know you had wanted to add some stuff like neoverse but I think getting those can be handled in smaller PRs later.

c.f. flame/blis#344

"a64fx" was included twice and some other entries were mis-ordered.

- Remove stray SystemZ prototype. - Add a "stub" for Apple Silicon support (and Neoverse N1). - Slightly rework the detection logic. `make check` passes on qemu-aarch64 with `configure auto` and `configure arm64`.

devinamatthews · 2021-08-21T21:11:12Z

@fgvanzee can you give this PR a final look? I think everything should pass now, except maybe the ARMSVE QEMU test (I can check this out if it does fail).

devinamatthews · 2021-08-21T21:13:20Z

?? @fgvanzee all the Travis builds seem to be getting canceled?

fgvanzee · 2021-08-21T21:18:33Z

?? @fgvanzee all the Travis builds seem to be getting canceled?

Actually, the first one failed after all of the jobs were in progress and running for a while. Then the second one was cancelled.

But yeah, no idea.

Only gcc for ARMSVE as of now due to the clang inline assembly register usage issue.

Minor change to trigger Travis.

xrq-phys · 2021-08-23T08:36:16Z

frame/base/bli_cpuid.c

+
+   that just yields "v7l" on cortexa9.
+
+ */


I'm wondering if these comments are sort of duplicating with L994.

Maybe it'd be also good to add a comment here mentioning it as a fallback method for ARMv7?

I actually hadn't looked at the ARMv7 part. I will give it a once-over.

xrq-phys · 2021-08-23T08:48:15Z

Additionally, may I propose the following changes so that feature is correctly set under the new detection method?
(Although the variable seems not used at all.)

diff --git a/frame/base/bli_cpuid.c b/frame/base/bli_cpuid.c
index 3626f9bb..d704d825 100644
--- a/frame/base/bli_cpuid.c
+++ b/frame/base/bli_cpuid.c
@@ -956,7 +956,10 @@ et al
 #include <sys/sysctl.h>
 #endif
 
-static uint32_t get_coretype()
+static uint32_t get_coretype
+     (
+       uint32_t* features
+     )
 {
        int implementer = 0x00, part = 0x000;
 
@@ -983,7 +986,10 @@ static uint32_t get_coretype()
        }
        
        bool has_sve = getauxval( AT_HWCAP ) & HWCAP_SVE;
+       if (has_sve)
+         *features |= FEATURE_SVE;
 #endif //__linux__
+       *features |= FEATURE_NEON;
 
 #ifdef __APPLE__
        // Better values could be obtained from sysctlbyname()
@@ -1135,8 +1141,7 @@ uint32_t bli_cpuid_query
      )
 {
        *model    = MODEL_ARMV8;
-       *features = 0;
-       *part     = get_coretype();
+       *part     = get_coretype(features);
 
        return VENDOR_ARM;
 }

devinamatthews · 2021-08-23T15:26:35Z

Additionally, may I propose the following changes so that feature is correctly set under the new detection method?
(Although the variable seems not used at all.)

I can add that, but at least for now the features aren't used in selecting the configuration (that is actually happening before get_coretype() returns...). has_sve is used there, but it really only kicks in for emulated SVE (QEMU) or some "unknown" SVE chip. In principle, I suppose the code should also check the SVE vector length?

I'll make the above changes for now and save these other things for PRs later, I just want to finally get this code in to master.

devinamatthews · 2021-09-10T17:54:01Z

@fgvanzee can you give this a final blessing?

devinamatthews · 2021-10-04T15:16:35Z

@fgvanzee pinging again, this should be ready to merge.

fgvanzee · 2021-10-04T17:51:53Z

Yes, sorry for the delay. This looks fine to me.

(We may need to resolve conflicts since the PR is now a few weeks old.)

fgvanzee · 2021-10-04T18:03:35Z

No conflicts after all. Yay. 🎆

@rpath

* commit '81e10346': Alloc at least 1 elem in pool_t block_ptrs. (flame#560) Fix insufficient pool-growing logic in bli_pool.c. (flame#559) Arm SVE C/ZGEMM Fix FMOV 0 Mistake SH Kernel Unused Eigher Arm SVE C/ZGEMM Support *beta==0 Arm SVE Config armsve Use ZGEMM/CGEMM Arm SVE: Update Perf. Graph Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0 Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0 A64FX Config Use ZGEMM/CGEMM Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg Arm SVE Add SGEMM 2Vx10 Unindexed Arm SVE ZGEMM Support Gather Load / Scatt. St. Arm SVE Add ZGEMM 2Vx10 Unindexed Arm SVE Add ZGEMM 2Vx7 Unindexed Arm SVE Add ZGEMM 2Vx8 Unindexed Update Travis CI badge Armv8 Trash New Bulk Kernels Enable testing 1m in `make check`. Config ArmSVE Unregister 12xk. Move 12xk to Old Revert __has_include(). Distinguish w/ BLIS_FAMILY_** Register firestorm into arm64 Metaconfig Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo Add test for Apple M1 (firestorm) Firestorm CPUID Dispatcher Armv8 GEMMSUP Edge Cases Require Signed Ints Make error checking level a thread-local variable. Fix data race in testsuite. Update .appveyor.yml Firestorm Block Size Fixes Armv8 Handle *beta == 0 for GEMMSUP ??r Case. Move unused ARM SVE kernels to "old" directory. Add an option to control whether or not to use @rpath. Fix $ORIGIN usage on linux. Arm micro-architecture dispatch (flame#344) Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries. Armv8 Handle *beta == 0 for GEMMSUP ?rc Case. Armv8 Fix 6x8 Row-Maj Ukr Apply patch from @xrq-phys. Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. bli_error: more cleanup on the error strings array Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9 Arm SVE: Correct PACKM Ker Name: Intrinsic Kers Fix config_name in bli_arch.c Arm Whole GEMMSUP Call Route is Asm/Int Optimized Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref Header Typo Arm: DGEMMSUP ??r(rv) Invoke Edge Size Arm: DGEMMSUP ?rc(rd) Invoke Edge Size Arm: Implement GEMMSUP Fallback Method Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin Added Apple Firestorm (A14/M1) Subconfig Arm64 8x4 Kernel Use Less Regs Armv8-A Supplimentary GEMMSUP Sizes for RD Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm Armv8-A Adjust Types for PACKM Kernels Armv8-A GEMMSUP-RD 6x8m Armv8-A GEMMSUP-RD 6x8n Armv8-A s/d Packing Kernels Fix Typo Armv8-A Introduced s/d Packing Kernels Armv8-A DGEMMSUP 6x8m Kernel Armv8-A DGEMMSUP Adjustments Armv8-A Add More DGEMMSUP Armv8-A Add GEMMSUP 4x8n Kernel Armv8-A Add Part of GEMMSUP 8x4m Kernel Armv8A DGEMM 4x4 Kernel WIP. Slow Armv8-A Add 8x4 Kernel WIP AMD-Internal: [CPUPL-2698] Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803

loveshack added 3 commits September 23, 2019 12:36

Add arm64 micro-arch selection logic (flame#299)

2958b14

This only affects compiling with cpu-specific flags, not anything in the micro-kernel.

Document ThunderX2 hardware support

eaed681

Add comments about arm32 selection

509c4d7

loveshack mentioned this pull request Sep 25, 2019

don't use popen #335

Closed

loveshack added 5 commits September 26, 2019 16:19

arm64: Conditionalize Linux-specific code

8ffdb0c

Remove old comment and fix indentation

92fa38b

Include ARM family headers

85e1e5a

Add Fixme comments

e0cccb5

Add arm32 micro-arch selection logic

7f93c80

Minor changes, as it was mostly there already. Fixme: Find out whether or not there's a way to avoid reading /proc, as for arm64. Otherwise, select the cpuinfo entry for the core we're actually running on in case of big.little etc.

loveshack changed the title ~~Arm64 micro-architecture dispatch~~ Arm micro-architecture dispatch Nov 4, 2019

devinamatthews mentioned this pull request Aug 6, 2020

arm64 family support from OpenBLAS? #299

Closed

Merge branch 'master' into arm64-cpuid

80e27c7

loveshack added 3 commits August 28, 2020 20:08

Add A64FX dynamic architecture support

b526841

Not exposed as family yet. Also treat ThunderX3 as ThunderX2.

Comment

19705bb

Fix bool_t instances

04d6f47

devinamatthews mentioned this pull request Sep 7, 2020

Update bli_type_defs.h #446

Closed

devinamatthews mentioned this pull request Oct 20, 2020

A64FX not recognised by ./configure auto, wants to switch to generic #452

Open

Comment fixes

9595a07

xrq-phys pushed a commit to xrq-phys/tblis that referenced this pull request Apr 15, 2021

Improved Arm's CPUID

f583980

c.f. flame/blis#344

xrq-phys mentioned this pull request Apr 15, 2021

Arm Kernels & Configurations MatthewsResearchGroup/tblis#39

Closed

3 tasks

xrq-phys mentioned this pull request May 18, 2021

New GEMM Assembly & Configuration Set for Arm SVE #424

Merged

Update aarch64 part list

32cb8b2

devinamatthews mentioned this pull request Aug 20, 2021

ARMv8 PACKM and GEMMSUP Kernels + Apple Firestorm Subconfig #533

Merged

Merge branch 'master' into arm64-cpuid

d494af5

devinamatthews added 4 commits August 20, 2021 16:17

Remove duplicate BLIS_ARCH_A64FX

a9a770d

Allow clang for all aarch64 configs.

30c419d

Fix architecture strings.

cd40cfd

"a64fx" was included twice and some other entries were mis-ordered.

Final clean-up of aarch64 cpuid code.

dd87347

- Remove stray SystemZ prototype. - Add a "stub" for Apple Silicon support (and Neoverse N1). - Slightly rework the detection logic. `make check` passes on qemu-aarch64 with `configure auto` and `configure arm64`.

devinamatthews added 2 commits August 22, 2021 04:31

"Generic" ARMSVE auto-detected correctly.

de156bd

Only gcc for ARMSVE as of now due to the clang inline assembly register usage issue.

Update HardwareSupport.md

a7fa9ce

Minor change to trigger Travis.

xrq-phys reviewed Aug 23, 2021

View reviewed changes

devinamatthews added 2 commits August 23, 2021 10:47

Assign features in get_coretype() for ARMv8

6e07b90

Whitespace change to trigger CI build

26d66c6

fgvanzee merged commit d0a0b4b into flame:master Oct 4, 2021

fgvanzee mentioned this pull request Oct 4, 2022

"configure auto" results in an error for users with a Mac M1 Pro processor #672

Closed

Conversation

loveshack commented Sep 24, 2019

Uh oh!

devinamatthews commented Aug 7, 2020

Uh oh!

fgvanzee commented Aug 7, 2020

Uh oh!

loveshack commented Aug 17, 2020 via email

Uh oh!

devinamatthews commented Aug 17, 2020

Uh oh!

devinamatthews commented Oct 20, 2020

Uh oh!

devinamatthews commented Aug 21, 2021

Uh oh!

devinamatthews commented Aug 21, 2021

Uh oh!

fgvanzee commented Aug 21, 2021

Uh oh!

xrq-phys Aug 23, 2021

Choose a reason for hiding this comment

Uh oh!

devinamatthews Aug 23, 2021

Choose a reason for hiding this comment

Uh oh!

xrq-phys commented Aug 23, 2021

Uh oh!

devinamatthews commented Aug 23, 2021

Uh oh!

devinamatthews commented Sep 10, 2021

Uh oh!

devinamatthews commented Oct 4, 2021

Uh oh!

fgvanzee commented Oct 4, 2021

Uh oh!

fgvanzee commented Oct 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants