Enable hipSOLVER in ROCm builds #97370

alugorey · 2023-03-22T19:53:04Z

Enables the hipSolver backend for ROCm builds

Minimum ROCm version requirement - 5.3
Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER
Adds hipSOLVER API to hipification process
combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings
Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr()
Will enable 100+ linalg unit tests for ROCm

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10

pytorch-bot · 2023-03-22T19:53:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97370

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit f508f9c:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/utils/hipify/cuda_to_hip_mappings.py

jithunnair-amd · 2023-03-23T07:24:58Z

test/test_linalg.py

    import scipy

-def setLinalgBackendsToDefaultFinally(fn):
-    @wraps(fn)


Since this decorator definition moved to a different file, the lint error about import statement for wraps is legit

jithunnair-amd · 2023-03-29T16:07:58Z

@malfet @ngimel Please review this PR with priority, if possible. It adds hipSolver support for ROCm.

malfet

Instead of introducing another call (which is mutually exclusive with hasCuSolver), wouldn't it be better to just reuse the same call (and perhaps renaming it to something vendor-agnosic, say hasAcceleratedLapack()?

IvanYashchuk · 2023-03-30T16:56:19Z

test/test_linalg.py

We don't have anything specific in Python tests for hipBLAS or rocBLAS. Why should we have it for hipSOLVER? Why can't it be so that cuSOLVER == hipSOLVER on the ROCm platform?

wrapping the logic from skipCUDAIfNoCusolverAndNoHipsolver into skipCUDAIfNoCusolver

IvanYashchuk

I think hipSPARSE required a considerably smaller number of intrusive code changes.

IvanYashchuk · 2023-03-30T16:59:16Z

torch/utils/hipify/hipify_python.py

copy-pasted comment should be modified.

IvanYashchuk · 2023-03-30T17:00:50Z

torch/utils/hipify/hipify_python.py

Why is a separate PYTORCH_SOLVER_MAP needed here?
PYTORCH_SPARSE_MAP and PYTORCH_SOLVER_MAP should be unified and the comment describing this part of code updated.

IvanYashchuk · 2023-03-30T17:03:01Z

test/test_meta.py

Why is this needed in a file that tests meta tensors?

IvanYashchuk · 2023-03-30T17:04:21Z

cmake/public/LoadHIP.cmake

Why is rocSOLVER added here? We don't add rocSPARSE for example.

IvanYashchuk · 2023-03-30T17:04:42Z

cmake/Dependencies.cmake

Is rocsolver necessary here?

ngimel · 2023-03-30T17:01:17Z

aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp

this is extremely confusing, you are returning a non-contiguous tensor if make_contiguous argument is true

@ngimel Could you please elaborate? The stanza starting at line 2185 is only entered when make_contiguous is true. As such, we set the memory format to at::MemoryFormat::Contiguous on line 2189. Am I misunderstanding how this API works?

@alugorey , why .mT() calls are needed there?

@malfet After investigating and talking amongst the team, we discovered this change to be an artifact left over from an earlier version of development. I have pushed up a new commit removing this unnecessary transposition.

And I think this addresses @ngimel's previous comment.

aten/src/ATen/native/LinearAlgebraUtils.h

malfet · 2023-04-11T03:25:25Z

aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.h

I believe I've left this comment already, but why two defines is needed here? Why not #ifdef USE_GPU_SOLVER which in the common header file is define for both CUDA and ROCm platforms?

@malfet We keep these two defines separate because hipsolver still hasn't implemented all of the features cusolver supports. If you look in BatchLinearAlgebra.cpp, you'll see instances where the existing #ifdef USE_CUSOLVER was left alone without adding check for USE_HIPSOLVER. Keeping these two separate is how we can control the feature enablement for hipSOLVER. Once it is 1 to 1, we can consolidate.

linux-foundation-easycla · 2023-04-20T14:40:17Z

The committers listed above are authorized under a signed CLA.

✅ login: alugorey / name: Andres Lugo (05a6da1a9af5e836cda80bfca16f446c377f167f)

alugorey · 2023-04-21T21:38:02Z

@malfet Had to rebase onto viable/strict and squash due to administrative issue. ready for review again.

Reverses CUDA only requirement of following comment https://github.com/pytorch/pytorch/pull/100130/files#r1179722035

facebook-github-bot · 2023-05-25T23:34:49Z

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jithunnair-amd · 2023-05-30T16:10:14Z

@alugorey Looking at the latest CI runs, I see two failures that seem to be related to this PR, based on history.
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_nuc_cuda_float32
test_linalg.py::TestLinalgCUDA::test_pca_lowrank_cuda

Did you already check these?

jithunnair-amd · 2023-05-31T16:51:24Z

@pytorchbot merge -f "CI failures are unrelated to PR"

pytorchmergebot · 2023-05-31T16:53:19Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: pytorch#97370 Approved by: https://github.com/malfet

This reverts commit eaffd98.

alugorey requested review from a team, IvanYashchuk, lezcano, mruberry, ngimel and nikitaved as code owners March 22, 2023 19:53

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: linalg_frontend release notes category labels Mar 22, 2023

pytorchbot added the open source label Mar 22, 2023

lezcano removed their request for review March 22, 2023 22:38

jithunnair-amd reviewed Mar 23, 2023

View reviewed changes

torch/utils/hipify/cuda_to_hip_mappings.py Outdated Show resolved Hide resolved

jithunnair-amd reviewed Mar 23, 2023

View reviewed changes

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 24, 2023

jithunnair-amd added the rocm priority high priority ROCm PRs from performance or other aspects label Mar 27, 2023

malfet requested changes Mar 29, 2023

View reviewed changes

IvanYashchuk reviewed Mar 30, 2023

View reviewed changes

ngimel reviewed Mar 30, 2023

View reviewed changes

malfet reviewed Apr 11, 2023

View reviewed changes

jeffdaily mentioned this pull request Apr 19, 2023

hipSOLVER support? #99508

Closed

alugorey changed the base branch from main to viable/strict April 21, 2023 21:22

alugorey force-pushed the hipsolver_enablement branch 2 times, most recently from 05a6da1 to dd0c046 Compare April 21, 2023 21:33

alugorey force-pushed the hipsolver_enablement branch 3 times, most recently from 617934f to 809fd70 Compare April 25, 2023 15:31

jeffdaily requested a review from ngimel May 22, 2023 15:54

alugorey added 8 commits May 25, 2023 16:26

Enable hipSOLVER on ROCm builds

0cf2d03

Consolidate USE_HIPSOLVER and USE_CUSOLVER

fa4f896

add runOnRocm decorator for test that passes on rocm

db7ab03

re-add is_cusparse_file for lint

cd240ed

fix typo

58c1e76

Remove CUDA only check since hipSolver now behaves like cuSOLVER

28d4560

Reverses CUDA only requirement of following comment https://github.com/pytorch/pytorch/pull/100130/files#r1179722035

Add is_cuda_available to wrap hip in

9a9bc64

Remove unneeded tensor transpose

b8e6ae3

alugorey force-pushed the hipsolver_enablement branch from 6a0d3b1 to b8e6ae3 Compare May 25, 2023 17:32

malfet approved these changes May 25, 2023

View reviewed changes

Run norm_nuc on ROCm which now succeeds

f508f9c

github-actions bot added the module: inductor label May 30, 2023

pytorchmergebot added the merging label May 31, 2023

pytorchmergebot added Merged and removed merging labels May 31, 2023

pytorchmergebot closed this in eaffd98 May 31, 2023

jithunnair-amd mentioned this pull request May 31, 2023

DISABLED test_svd_lowrank_cuda_float64 (__main__.TestLinalgCUDA) #102629

Closed

clee2000 mentioned this pull request Jun 1, 2023

DISABLED test_compare_cpu_linalg_pinv_singular_cuda_float32 (__main__.TestCommonCUDA) #102678

Closed

jithunnair-amd mentioned this pull request Jun 1, 2023

Skip test_svd_lowrank on rocm #102638

Closed

jithunnair-amd mentioned this pull request Jun 8, 2023

Update wheel build scripts for ROCm5.5 pytorch/builder#1413

Merged

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Oct 11, 2023

Revert "Enable hipSOLVER in ROCm builds (pytorch#97370)"

bd6406a

This reverts commit eaffd98.

Enable hipSOLVER in ROCm builds #97370

Enable hipSOLVER in ROCm builds #97370

Uh oh!

Conversation

alugorey commented Mar 22, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enables the hipSolver backend for ROCm builds

Uh oh!

pytorch-bot bot commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97370

❌ 2 New Failures

Uh oh!

Uh oh!

jithunnair-amd Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd commented Mar 29, 2023

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla bot commented Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alugorey commented Apr 21, 2023

Uh oh!

facebook-github-bot commented May 25, 2023

Uh oh!

jithunnair-amd commented May 30, 2023

Uh oh!

jithunnair-amd commented May 31, 2023

Uh oh!

pytorchmergebot commented May 31, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

alugorey commented Mar 22, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 22, 2023 •

edited

Loading

jithunnair-amd Mar 23, 2023 •

edited

Loading

linux-foundation-easycla bot commented Apr 20, 2023 •

edited

Loading