Skip to content

Conversation

@xwang233
Copy link
Collaborator

@xwang233 xwang233 commented Nov 8, 2021

Per title.

This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU.

Usage:

torch.backends.cuda.preferred_linalg_library('cusolver')

Available options (str): 'default', 'cusolver', 'magma'.

Issue #63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime.

Performance of linear algebra operators after this PR should be no worse than before. The flag is set to 'default' by default, which makes everything the same as before this PR.

The implementation of this PR is basically following that of #67790.

@pytorch-probot
Copy link

pytorch-probot bot commented Nov 8, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/xwang233/pytorch/blob/3a81c30d0d306fbd71229d72ff2602b30a7bf8e3/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default,ciflow/all

Workflows Labels (bold enabled) Status
Triggered Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux ✅ triggered
docker-builds ciflow/all ✅ triggered
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos ✅ triggered
libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow ✅ triggered
linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
macos-10-15-py3-arm64 ciflow/all, ciflow/macos ✅ triggered
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos ✅ triggered
macos-11-py3-x86-64 ciflow/all, ciflow/macos ✅ triggered
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux ✅ triggered
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck ✅ triggered
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Nov 8, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 3a81c30 (more details on the Dr. CI page):


  • 5/6 failures possibly* introduced in this PR
    • 1/5 non-scanned failure(s)
  • 1/6 broken upstream at merge base 97750e0 since Dec 02

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-02T23:23:27.8130927Z Build left local git repository checkout dirty
2021-12-02T23:23:27.1194787Z real	54m17.964s
2021-12-02T23:23:27.1195179Z user	69m5.677s
2021-12-02T23:23:27.1195609Z sys	7m59.447s
2021-12-02T23:23:27.1196010Z + assert_git_not_dirty
2021-12-02T23:23:27.1197355Z + [[ linux-xenial-cuda11.3-py3.6-gcc7-default != *rocm* ]]
2021-12-02T23:23:27.1198662Z + [[ linux-xenial-cuda11.3-py3.6-gcc7-default != *xla* ]]
2021-12-02T23:23:27.1199607Z ++ git status --porcelain
2021-12-02T23:23:27.8128493Z + git_status='?? third_party/flatbuffers/'
2021-12-02T23:23:27.8129428Z + [[ -n ?? third_party/flatbuffers/ ]]
2021-12-02T23:23:27.8130217Z + echo 'Build left local git repository checkout dirty'
2021-12-02T23:23:27.8130927Z Build left local git repository checkout dirty
2021-12-02T23:23:27.8131657Z + echo 'git status --porcelain:'
2021-12-02T23:23:27.8132535Z git status --porcelain:
2021-12-02T23:23:27.8133267Z + echo '?? third_party/flatbuffers/'
2021-12-02T23:23:27.8133809Z ?? third_party/flatbuffers/
2021-12-02T23:23:27.8134265Z + exit 1
2021-12-02T23:23:27.8134623Z + cleanup
2021-12-02T23:23:27.8135013Z + retcode=1
2021-12-02T23:23:27.8135371Z + set +x
2021-12-02T23:23:27.8176855Z ##[error]Process completed with exit code 1.
2021-12-02T23:23:27.8234027Z ##[group]Run # Ensure the working directory gets chowned back to the current user

See GitHub Actions build periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (2/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-02T23:17:06.7867853Z Build left local git repository checkout dirty
2021-12-02T23:17:06.0844182Z real	48m50.779s
2021-12-02T23:17:06.0844590Z user	40m59.629s
2021-12-02T23:17:06.0844968Z sys	5m57.464s
2021-12-02T23:17:06.0845379Z + assert_git_not_dirty
2021-12-02T23:17:06.0847177Z + [[ periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug-default != *rocm* ]]
2021-12-02T23:17:06.0849093Z + [[ periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug-default != *xla* ]]
2021-12-02T23:17:06.0850372Z ++ git status --porcelain
2021-12-02T23:17:06.7865473Z + git_status='?? third_party/flatbuffers/'
2021-12-02T23:17:06.7866334Z + [[ -n ?? third_party/flatbuffers/ ]]
2021-12-02T23:17:06.7867128Z + echo 'Build left local git repository checkout dirty'
2021-12-02T23:17:06.7867853Z Build left local git repository checkout dirty
2021-12-02T23:17:06.7868607Z + echo 'git status --porcelain:'
2021-12-02T23:17:06.7869248Z git status --porcelain:
2021-12-02T23:17:06.7870063Z + echo '?? third_party/flatbuffers/'
2021-12-02T23:17:06.7870676Z ?? third_party/flatbuffers/
2021-12-02T23:17:06.7871241Z + exit 1
2021-12-02T23:17:06.7871726Z + cleanup
2021-12-02T23:17:06.7872118Z + retcode=1
2021-12-02T23:17:06.7872477Z + set +x
2021-12-02T23:17:06.7918411Z ##[error]Process completed with exit code 1.
2021-12-02T23:17:06.7976371Z ##[group]Run # Ensure the working directory gets chowned back to the current user

See GitHub Actions build linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-02T23:02:56.8383328Z Build left local git repository checkout dirty
2021-12-02T23:02:55.7932487Z real	37m30.478s
2021-12-02T23:02:55.7933304Z user	36m9.412s
2021-12-02T23:02:55.7933708Z sys	4m1.395s
2021-12-02T23:02:55.7934094Z + assert_git_not_dirty
2021-12-02T23:02:55.7936001Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-default != *rocm* ]]
2021-12-02T23:02:55.7937624Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-default != *xla* ]]
2021-12-02T23:02:55.7938578Z ++ git status --porcelain
2021-12-02T23:02:56.8380896Z + git_status='?? third_party/flatbuffers/'
2021-12-02T23:02:56.8381834Z + [[ -n ?? third_party/flatbuffers/ ]]
2021-12-02T23:02:56.8382623Z + echo 'Build left local git repository checkout dirty'
2021-12-02T23:02:56.8383328Z Build left local git repository checkout dirty
2021-12-02T23:02:56.8384051Z + echo 'git status --porcelain:'
2021-12-02T23:02:56.8384693Z git status --porcelain:
2021-12-02T23:02:56.8385338Z + echo '?? third_party/flatbuffers/'
2021-12-02T23:02:56.8385921Z ?? third_party/flatbuffers/
2021-12-02T23:02:56.8386352Z + exit 1
2021-12-02T23:02:56.8386708Z + cleanup
2021-12-02T23:02:56.8387118Z + retcode=1
2021-12-02T23:02:56.8387487Z + set +x
2021-12-02T23:02:56.8428069Z ##[error]Process completed with exit code 1.
2021-12-02T23:02:56.8495221Z ##[group]Run # Ensure the working directory gets chowned back to the current user

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-02T23:14:16.4796214Z Build left local git repository checkout dirty
2021-12-02T23:14:15.7860705Z real	45m2.587s
2021-12-02T23:14:15.7861262Z user	43m3.580s
2021-12-02T23:14:15.7861631Z sys	5m9.731s
2021-12-02T23:14:15.7862050Z + assert_git_not_dirty
2021-12-02T23:14:15.7863378Z + [[ linux-xenial-cuda11.3-py3.6-gcc7-default != *rocm* ]]
2021-12-02T23:14:15.7864990Z + [[ linux-xenial-cuda11.3-py3.6-gcc7-default != *xla* ]]
2021-12-02T23:14:15.7868632Z ++ git status --porcelain
2021-12-02T23:14:16.4793055Z + git_status='?? third_party/flatbuffers/'
2021-12-02T23:14:16.4793994Z + [[ -n ?? third_party/flatbuffers/ ]]
2021-12-02T23:14:16.4794981Z + echo 'Build left local git repository checkout dirty'
2021-12-02T23:14:16.4796214Z Build left local git repository checkout dirty
2021-12-02T23:14:16.4797555Z + echo 'git status --porcelain:'
2021-12-02T23:14:16.4798390Z git status --porcelain:
2021-12-02T23:14:16.4799064Z + echo '?? third_party/flatbuffers/'
2021-12-02T23:14:16.4799611Z ?? third_party/flatbuffers/
2021-12-02T23:14:16.4800068Z + exit 1
2021-12-02T23:14:16.4800431Z + cleanup
2021-12-02T23:14:16.4800826Z + retcode=1
2021-12-02T23:14:16.4801242Z + set +x
2021-12-02T23:14:16.4841207Z ##[error]Process completed with exit code 1.
2021-12-02T23:14:16.4898741Z ##[group]Run # Ensure the working directory gets chowned back to the current user

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@xwang233 xwang233 requested review from mruberry and ngimel November 8, 2021 04:06
@xwang233
Copy link
Collaborator Author

xwang233 commented Nov 8, 2021

cc @ptrblck

@lezcano
Copy link
Collaborator

lezcano commented Nov 8, 2021

I like this idea! Now, my only question is: should we also add a similar one for MAGMA, as otherwise it feels a bit lacking.
It also makes testing different backends waaay easier, which is nice.

I think these are good points to discuss in tomorrow's meeting.

@xwang233
Copy link
Collaborator Author

xwang233 commented Nov 8, 2021

@lezcano Thanks for the comments. I have a similar idea to also add a flag for MAGMA, but there are some technical reasons. ATen/Context.h is a heavily included header, and most global flags are primitive types, specifically bool. I was thinking to add an enum type to a standalone file and include it in the Context.h header, but I'm not sure if that's the right approach. Yeah, let's discuss these technical details as well.

So I first made a POC with only one bool flag and see how that works.

@H-Huang H-Huang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 8, 2021
@mruberry
Copy link
Collaborator

mruberry commented Nov 9, 2021

Notes from linalg meeting:

  • look at similar switch changing behavior of reductions
  • a similar option to prefer MAGMA would be interesting
  • we could consider adding more algo/driver arguments to ops
  • this would let users override heuristics
  • this would facilitate debugging/development
  • let's document which functions can be affected by this flag?

Follow-ups:

  • on whether this should be an enum now to allow for MAGMA expansion without a BC-breaking change.
  • Loss reductions probably live in ATen and could be a good model to follow.
  • Name needs bikeshedding. "preferred" sounds right

@xwang233 xwang233 changed the title [Linalg] Add a runtime switch to let pytorch prefer cuSolver impl in linalg functions on GPU [Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU Nov 11, 2021
@mruberry
Copy link
Collaborator

Hey @xwang233! @ngimel and I had a chance to look together and we left some inline comments. See #67946 and Kurt's work on the deterministic flag for maybe a better, simpler guide to the process of adding a new enum and flag for it.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binding code is much simpler! Thanks for taking the time to update it!

# The main purpose of this test is to make sure these "backend" calls work normally without raising exceptions.
x = torch.randint(2, 5, (2, 4, 4), device='cuda', dtype=torch.double)

torch.backends.cuda.preferred_linalg_library('cusolver')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sending a warning? If so, it can be nice to catch it to avoid spam.
Also we have something to make warn once always want and you can actually check that you do get the warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using TORCH_WARN_ONCE in Context.cpp. It raises Warning instead of UserWarning and doesn't work well with assertWarnsOnceRegex.

[W Context.cpp:157] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or regression when this flag is set please file an issue on GitHub. (function operator())

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the warning message and ping me when it's done.

if (b != at::LinalgBackend::Default) {
TORCH_WARN_ONCE(
"torch.backends.cuda.preferred_linalg_library is an experimental feature. "
"If you see any error or regression when this flag is set "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf regressions are expected if you pick the wrong backend, right? So the error message shouldn't suggest filing issue in this case. It could suggest filing an issue for errors or unexpected behaviors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I've changed the warning message.

@xwang233
Copy link
Collaborator Author

xwang233 commented Dec 3, 2021

Hi @ngimel , I've fixed the warning message. The CI failures seem irrelevant. Should I rebase and trigger the CI again or is this ready to go?

@ngimel
Copy link
Collaborator

ngimel commented Dec 3, 2021

That's fine, you don't have to rebase

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants