Skip to content

Conversation

@jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Nov 3, 2021

This PR introduces a new function _select_conv_backend that returns a ConvBackend enum representing the selected backend for a given set of convolution inputs and params.

The function and enum are exposed to python for testing purposes through torch/csrc/Module.cpp (please let me know if there's a better place to do this).

A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet:

  • nnpack (for mobile)
  • xnnpack (for mobile)
  • winograd 3x3 (for mobile)

Some flowcharts for reference:
conv_routing_graph md
conv_nogroup_routing_graph md

@pytorch-probot
Copy link

pytorch-probot bot commented Nov 3, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/jbschlosser/pytorch/blob/a8c7cc89775d87fe8823920045d7c29bf03f2a56/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Nov 3, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit a8c7cc8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Comment on lines +8 to +38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved here from Convolution.cpp

Comment on lines 63 to 76
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two overloads right now, which isn't great. The one exposed to python purposefully matches the signature of convolution

Comment on lines 952 to 998
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposes ConvBackend and _select_conv_backend to python

@jbschlosser jbschlosser requested review from albanD and zou3519 November 4, 2021 18:11
@jbschlosser jbschlosser force-pushed the conv_forward_refactor branch from 06571a1 to 6c777d6 Compare November 4, 2021 20:22
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Only some comments and spurious public API.

@jbschlosser jbschlosser force-pushed the conv_forward_refactor branch from 5dd4a97 to e53eff3 Compare November 8, 2021 22:33
@jbschlosser jbschlosser marked this pull request as ready for review November 8, 2021 22:33
@jbschlosser jbschlosser changed the title [WIP] Factor backend routing logic out of convolution forward Factor backend routing logic out of convolution forward Nov 8, 2021
@jbschlosser jbschlosser requested a review from albanD November 8, 2021 22:43
@zou3519
Copy link
Contributor

zou3519 commented Nov 8, 2021

Some backends aren't tested yet:

miopen (for AMD ROCm)
nnpack (for mobile)
xnnpack (for mobile)
winograd 3x3 (for mobile)

I just started reading through this, but what is the plan for those? Is it fine to ignore them in this PR or do they need to be handled here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You don't need to fix this because this is a move, but this looks like a bug: groups seems like it should be int64_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with you - seems like a bug. Think it's worth opening an issue for?

test/test_nn.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be torch.backends.xnnpack.enabled, not sure if that helps...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, looks like I can get a test in for that backend by disabling cuDNN / mkldnn and checking torch.backends.xnnpack.enabled. My plan was to punt on mobile testing for this PR but I'll open an issue suggesting this

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM only richard's nit

@jbschlosser
Copy link
Contributor Author

Some backends aren't tested yet:
miopen (for AMD ROCm)
nnpack (for mobile)
xnnpack (for mobile)
winograd 3x3 (for mobile)

I just started reading through this, but what is the plan for those? Is it fine to ignore them in this PR or do they need to be handled here?

Current plan is to ignore them for this PR but add a note that they should be handled at some point. I was able to get miopen tested, so that just leaves the mobile backends. I can open a mobile issue indicating that they're not tested and leave it to those folks to deal with it.

@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jbschlosser jbschlosser force-pushed the conv_forward_refactor branch from 61973c8 to c4a9816 Compare November 9, 2021 22:40
@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params.

The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this).

A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet:
* nnpack (for mobile)
* xnnpack (for mobile)
* winograd 3x3 (for mobile)

Some flowcharts for reference:
![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png)
![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png)

Pull Request resolved: pytorch#67790

Reviewed By: zou3519

Differential Revision: D32280878

Pulled By: jbschlosser

fbshipit-source-id: 4b9946f09411993b1c1d2f6bca8b485be823aebe
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D32280878

@jbschlosser jbschlosser force-pushed the conv_forward_refactor branch from c4a9816 to a8c7cc8 Compare November 10, 2021 15:48
@facebook-github-bot
Copy link
Contributor

@jbschlosser merged this pull request in 9a2db6f.

facebook-github-bot pushed a commit that referenced this pull request Dec 4, 2021
… linalg functions on GPU (#67980)

Summary:
Per title.

This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU.

Usage:
```python
torch.backends.cuda.preferred_linalg_library('cusolver')
```

Available options (str): `'default'`, `'cusolver'`, `'magma'`.

Issue #63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime.

Performance of linear algebra operators after this PR should be no worse than before. The flag is set to **`'default'`** by default, which makes everything the same as before this PR.

The implementation of this PR is basically following that of #67790.

Pull Request resolved: #67980

Reviewed By: mruberry

Differential Revision: D32849457

Pulled By: ngimel

fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants