Skip to content

Add vzeroupper to Haswell gemm kernels.#524

Merged
fgvanzee merged 3 commits intomasterfrom
haswell-vzeroupper
Jul 9, 2021
Merged

Add vzeroupper to Haswell gemm kernels.#524
fgvanzee merged 3 commits intomasterfrom
haswell-vzeroupper

Conversation

@devinamatthews
Copy link
Copy Markdown
Member

Fixes #523.

@devinamatthews
Copy link
Copy Markdown
Member Author

@fgvanzee if you know of a specific reason that vzeroupper was removed let me know, otherwise I'll merge.

@devinamatthews
Copy link
Copy Markdown
Member Author

Looks like the reason is that the zen kernels got copy-pasted from haswell back before this was added, then were co-opted to replace the existing haswell kernels. I also fixed an issue with insufficient prefetch of A that I've fixed previously at least once.

@fgvanzee
Copy link
Copy Markdown
Member

fgvanzee commented Jul 9, 2021

I'm making a few whitespace adjustments. Then I'll merge.

@fgvanzee fgvanzee merged commit 17729cf into master Jul 9, 2021
@devinamatthews devinamatthews deleted the haswell-vzeroupper branch July 9, 2021 20:11
pradeeptrgit pushed a commit to amd/blis that referenced this pull request Nov 13, 2022
Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm'
  microkernels so as to avoid a performance penalty when mixing AVX
  and SSE instructions. These vzeroupper instructions were once part
  of the haswell kernels, but were inadvertently removed during a source
  code shuffle some time ago when we were managing duplicate 'haswell'
  and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down
  and re-inserting the missing instructions.

Change-Id: I418fea9fed27ba3ad7d395cf96d1be507955d8e9
sireeshasanga pushed a commit to amd/blis that referenced this pull request Feb 28, 2024
* commit 'e366665c':
  Fixed stale API calls to membrk API in gemmlike.
  Fixed bli_init.c compile-time error on OSX clang.
  Fixed configure breakage on OSX clang.
  Fixed one-time use property of bli_init() (flame#525).
  CREDITS file update.
  Added Graviton2 Neoverse N1 performance results.
  Remove unnecesary windows/zen2 directory.
  Add vzeroupper to Haswell microkernels. (flame#524)
  Fix Win64 AVX512 bug.
  Add comment about make checkblas on Windows
  CREDITS file update.
  Test installation in Travis CI
  Add symlink to blis.pc.in for out-of-tree builds
  Revert "Always run `make check`."
  Always run `make check`.
  Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script.   if the string contains zen and zen2, and zen need to be replaced with another string, then zen2   also be incorrectly replaced.
  Update POWER10.md
  Rework POWER10 sandbox
  Skip clearing temp microtile in gemmlike sandbox.
  Fix asm warning
  Sandbox header edits trigger full library rebuild.
  Add vhsubpd/vhsubpd.
  Fixed bugs in cpackm kernels, gemmlike code.
  Armv8A Rename Regs for Safe Darwin Compile
  Armv8A Rename Regs for Clang Compile: FP32 Part
  Armv8A Rename Regs for Clang Compile: FP64 Part
  Asm Flag Mingling for Darwin_Aarch64
  Added a new 'gemmlike' sandbox.
  Updated Fugaku (a64fx) performance results.
  Add explicit compiler check for Windows.
  Remove `rm-dupls` function in common.mk.
  Travis CI Revert Unnecessary Extras from 91d3636
  Adjust TravisCI
  Travis Support Arm SVE
  Added 512b SVE-based a64fx subconfig + SVE kernels.
  Replace bli_dlamch with something less archaic (flame#498)
  Allow clang for ThunderX2 config

AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Haswell kernels lost vzeroupper at some point

2 participants