-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU #67980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 3a81c30 (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
cc @ptrblck |
|
I like this idea! Now, my only question is: should we also add a similar one for MAGMA, as otherwise it feels a bit lacking. I think these are good points to discuss in tomorrow's meeting. |
|
@lezcano Thanks for the comments. I have a similar idea to also add a flag for MAGMA, but there are some technical reasons. So I first made a POC with only one bool flag and see how that works. |
|
Notes from linalg meeting:
Follow-ups:
|
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binding code is much simpler! Thanks for taking the time to update it!
| # The main purpose of this test is to make sure these "backend" calls work normally without raising exceptions. | ||
| x = torch.randint(2, 5, (2, 4, 4), device='cuda', dtype=torch.double) | ||
|
|
||
| torch.backends.cuda.preferred_linalg_library('cusolver') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sending a warning? If so, it can be nice to catch it to avoid spam.
Also we have something to make warn once always want and you can actually check that you do get the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's using TORCH_WARN_ONCE in Context.cpp. It raises Warning instead of UserWarning and doesn't work well with assertWarnsOnceRegex.
[W Context.cpp:157] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or regression when this flag is set please file an issue on GitHub. (function operator())
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the warning message and ping me when it's done.
aten/src/ATen/Context.cpp
Outdated
| if (b != at::LinalgBackend::Default) { | ||
| TORCH_WARN_ONCE( | ||
| "torch.backends.cuda.preferred_linalg_library is an experimental feature. " | ||
| "If you see any error or regression when this flag is set " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf regressions are expected if you pick the wrong backend, right? So the error message shouldn't suggest filing issue in this case. It could suggest filing an issue for errors or unexpected behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I've changed the warning message.
|
Hi @ngimel , I've fixed the warning message. The CI failures seem irrelevant. Should I rebase and trigger the CI again or is this ready to go? |
|
That's fine, you don't have to rebase |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Per title.
This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU.
Usage:
Available options (str):
'default','cusolver','magma'.Issue #63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime.
Performance of linear algebra operators after this PR should be no worse than before. The flag is set to
'default'by default, which makes everything the same as before this PR.The implementation of this PR is basically following that of #67790.