Extend DispatchStub to support CUDA dispatch #9664

colesbury · 2018-07-20T22:48:19Z

This is a modification of the strategy from #8919 and #9579.

Previously, the CPU architecture-specific kernels self-registered with
the DispatchStub. When linking as part of a static library, this requires
the flag --whole-archive to be passed to the linker to ensure that the
object files for the kernels are included. Caffe2 and TensorFlow use that
strategy.

We ran into some issues with --whole-archive blowing up the binary size
of some downstream projects in Facebook. This PR avoids --whole-archive
for CPU kernels. The downside is that the generic code needs to be aware
of whether kernels are compiled with AVX and with AVX2 (via
HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION).

The CUDA kernels still self-register with DispatchStub because the CPU
library is not aware of whether the CUDA library will be available at
runtime.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This is a modification of the strategy from pytorch#8919 and pytorch#9579. Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel.

soumith · 2018-07-21T04:45:18Z

@pytorchbot retest this please

ezyang · 2018-07-23T14:53:44Z

Build error looks real:

05:21:39 /private/var/lib/jenkins/workspace/pytorch-builds/pytorch-macos-10.13-py3-test2/aten/src/ATen/native/DispatchStub.h:72:18: error: instantiation of variable 'at::native::DispatchStub<void (*)(at::Tensor &, const at::Tensor &), at::native::softmax_lastdim_kernel>::AVX' required here, but no definition is available [-Werror,-Wundefined-var-template]
05:21:39       AT_ASSERTM(AVX, "DispatchStub: missing AVX kernel");
05:21:39                  ^
05:21:39 /private/var/lib/jenkins/workspace/pytorch-builds/pytorch-macos-10.13-py3-test2/aten/src/ATen/native/DispatchStub.h:50:28: note: in instantiation of member function 'at::native::DispatchStub<void (*)(at::Tensor &, const at::Tensor &), at::native::softmax_lastdim_kernel>::choose_cpu_impl' requested here
05:21:39         cpu_dispatch_ptr = choose_cpu_impl();
05:21:39                            ^
05:21:39 /private/var/lib/jenkins/workspace/pytorch-builds/pytorch-macos-10.13-py3-test2/aten/src/ATen/native/SoftMax.cpp:131:27: note: in instantiation of function template specialization 'at::native::DispatchStub<void (*)(at::Tensor &, const at::Tensor &), at::native::softmax_lastdim_kernel>::operator()<at::Tensor, at::Tensor>' requested here
05:21:39     softmax_lastdim_kernel(kCPU, output, input);
05:21:39                           ^
05:21:39 /private/var/lib/jenkins/workspace/pytorch-builds/pytorch-macos-10.13-py3-test2/aten/src/ATen/native/DispatchStub.h:84:16: note: forward declaration of template entity is here
05:21:39   static FnPtr AVX;
05:21:39                ^
05:21:39 /private/var/lib/jenkins/workspace/pytorch-builds/pytorch-macos-10.13-py3-test2/aten/src/ATen/native/DispatchStub.h:72:18: note: add an explicit instantiation declaration to suppress this warning if 'at::native::DispatchStub<void (*)(at::Tensor &, const at::Tensor &), at::native::softmax_lastdim_kernel>::AVX' is explicitly instantiated in another translation unit
05:21:39       AT_ASSERTM(AVX, "DispatchStub: missing AVX kernel");

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

gchanan

lgtm.

Summary: This is a modification of the strategy from pytorch/pytorch#8919 and pytorch/pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch/pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

Summary: This is a modification of the strategy from pytorch#8919 and pytorch#9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: pytorch#9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef

colesbury requested review from apaszke, ezyang, gchanan, soumith and zdevito as code owners July 20, 2018 22:48

facebook-github-bot reviewed Jul 20, 2018

View reviewed changes

colesbury force-pushed the dispatch_stub_v2 branch from f21d099 to a75ff47 Compare July 20, 2018 23:23

colesbury force-pushed the dispatch_stub_v2 branch from a75ff47 to 52152a8 Compare July 21, 2018 00:52

Ignore undefined var template warning

8055f84

facebook-github-bot reviewed Jul 23, 2018

View reviewed changes

gchanan approved these changes Jul 23, 2018

View reviewed changes

facebook-github-bot closed this in aa8a9fa Jul 23, 2018

ezyang added the merged label Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend DispatchStub to support CUDA dispatch #9664

Extend DispatchStub to support CUDA dispatch #9664

Uh oh!

colesbury commented Jul 20, 2018 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

soumith commented Jul 21, 2018

Uh oh!

ezyang commented Jul 23, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

gchanan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Extend DispatchStub to support CUDA dispatch #9664

Extend DispatchStub to support CUDA dispatch #9664

Uh oh!

Conversation

colesbury commented Jul 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

soumith commented Jul 21, 2018

Uh oh!

ezyang commented Jul 23, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

gchanan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

colesbury commented Jul 20, 2018 •

edited

Loading