-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Enable hipSOLVER in ROCm builds #97370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97370
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit f508f9c: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| import scipy | ||
|
|
||
| def setLinalgBackendsToDefaultFinally(fn): | ||
| @wraps(fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this decorator definition moved to a different file, the lint error about import statement for wraps is legit
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing another call (which is mutually exclusive with hasCuSolver), wouldn't it be better to just reuse the same call (and perhaps renaming it to something vendor-agnosic, say hasAcceleratedLapack()?
test/test_linalg.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything specific in Python tests for hipBLAS or rocBLAS. Why should we have it for hipSOLVER? Why can't it be so that cuSOLVER == hipSOLVER on the ROCm platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapping the logic from skipCUDAIfNoCusolverAndNoHipsolver into skipCUDAIfNoCusolver
IvanYashchuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hipSPARSE required a considerably smaller number of intrusive code changes.
torch/utils/hipify/hipify_python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted comment should be modified.
torch/utils/hipify/hipify_python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a separate PYTORCH_SOLVER_MAP needed here?
PYTORCH_SPARSE_MAP and PYTORCH_SOLVER_MAP should be unified and the comment describing this part of code updated.
test/test_meta.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed in a file that tests meta tensors?
cmake/public/LoadHIP.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is rocSOLVER added here? We don't add rocSPARSE for example.
cmake/Dependencies.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rocsolver necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is extremely confusing, you are returning a non-contiguous tensor if make_contiguous argument is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel Could you please elaborate? The stanza starting at line 2185 is only entered when make_contiguous is true. As such, we set the memory format to at::MemoryFormat::Contiguous on line 2189. Am I misunderstanding how this API works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alugorey , why .mT() calls are needed there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet After investigating and talking amongst the team, we discovered this change to be an artifact left over from an earlier version of development. I have pushed up a new commit removing this unnecessary transposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think this addresses @ngimel's previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've left this comment already, but why two defines is needed here? Why not #ifdef USE_GPU_SOLVER which in the common header file is define for both CUDA and ROCm platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet We keep these two defines separate because hipsolver still hasn't implemented all of the features cusolver supports. If you look in BatchLinearAlgebra.cpp, you'll see instances where the existing #ifdef USE_CUSOLVER was left alone without adding check for USE_HIPSOLVER. Keeping these two separate is how we can control the feature enablement for hipSOLVER. Once it is 1 to 1, we can consolidate.
05a6da1 to
dd0c046
Compare
|
@malfet Had to rebase onto viable/strict and squash due to administrative issue. ready for review again. |
617934f to
809fd70
Compare
Reverses CUDA only requirement of following comment https://github.com/pytorch/pytorch/pull/100130/files#r1179722035
6a0d3b1 to
b8e6ae3
Compare
|
@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@alugorey Looking at the latest CI runs, I see two failures that seem to be related to this PR, based on history. Did you already check these? |
|
@pytorchbot merge -f "CI failures are unrelated to PR" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: pytorch#97370 Approved by: https://github.com/malfet
This reverts commit eaffd98.
Enables the hipSolver backend for ROCm builds
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10