Skip to content

Conversation

@vishwakftw
Copy link
Contributor

Changelog:

  • Migrate _dirichlet_grad implementation from TH to ATen
  • Add CUDA support for _dirichlet_grad

Closes #11030.
Closes #15773.

Changelog:
- Migrate _dirichlet_grad implementation from TH to ATen
- Add CUDA support for _dirichlet_grad
@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 31, 2019
@vishwakftw
Copy link
Contributor Author

cc: @fritzo

@vishwakftw
Copy link
Contributor Author

@pytorchbot rebase this please

@vishwakftw vishwakftw force-pushed the dirichlet_grad-cuda-support branch from f971b43 to fd088b3 Compare May 31, 2019 12:25
@fritzo
Copy link
Collaborator

fritzo commented May 31, 2019

Great. Can you confirm that

  • This PR moves dirichlet_grad math to a new file so that the code can be used by both CPU and CUDA codepaths, and therefore the CUDA implementation is covered by existing CPU tests.
  • The CUDA implementation uses double precision for the function approximations (which require higher precision).

@vishwakftw
Copy link
Contributor Author

The _dirichlet_grad uses the _dirichlet_grad_one function defined in Distributions.h for both CPU and CUDA. This addresses your first point.

Regarding your second point, I have followed the same steps as standard_gamma_grad. In CUDA, I think the acc_type (data type used for function approximations) for half and float is float, and double for double. In CPU however, the acc_type is double universally.

@vishwakftw vishwakftw requested review from ezyang and gchanan and removed request for gchanan June 1, 2019 04:52
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming there were no substantive algorithmic changes.

Tensor _dirichlet_grad_cpu(const Tensor& x, const Tensor& alpha, const Tensor& total) {
Tensor ret = at::empty(x.sizes(), x.options());
AT_DISPATCH_FLOATING_TYPES(x.scalar_type(), "_dirichlet_grad_cpu", [&] {
CPU_tensor_apply4<scalar_t, scalar_t, scalar_t, scalar_t>(ret, x, alpha, total,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be this PR, but at some point it would be good to port this into using TensorIterator (which I think should be doable here.) CC @VitalyFedyunin

Copy link
Contributor Author

@vishwakftw vishwakftw Jun 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of dispatches in this file that have to be ported into using TensorIterator.

@ezyang
Copy link
Contributor

ezyang commented Jun 3, 2019

Windows failures look related to this diff:

14:29:13          C:/Jenkins/workspace/caffe2-builds/py2-cuda9.0-cudnn7-windows-build/aten/src\ATen/native/Distributions.h(271): error : calling a __host__ function("isnan< ::c10::Half> ") from a __device__ function("_NV_ANON_NAMESPACE::_beta_grad_alpha_small< ::c10::Half, float> ") is not allowed [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\caffe2\caffe2_gpu.vcxproj]
14:29:13          C:/Jenkins/workspace/caffe2-builds/py2-cuda9.0-cudnn7-windows-build/aten/src\ATen/native/Distributions.h(271): error : identifier "isnan< ::c10::Half> " is undefined in device code [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\caffe2\caffe2_gpu.vcxproj]
14:29:13          C:/Jenkins/workspace/caffe2-builds/py2-cuda9.0-cudnn7-windows-build/aten/src\ATen/native/Distributions.h(288): error : calling a __host__ function("isnan< ::c10::Half> ") from a __device__ function("_NV_ANON_NAMESPACE::_beta_grad_beta_small< ::c10::Half, float> ") is not allowed [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\caffe2\caffe2_gpu.vcxproj]
14:29:13          C:/Jenkins/workspace/caffe2-builds/py2-cuda9.0-cudnn7-windows-build/aten/src\ATen/native/Distributions.h(288): error : identifier "isnan< ::c10::Half> " is undefined in device code [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\caffe2\caffe2_gpu.vcxproj]

@ezyang
Copy link
Contributor

ezyang commented Jun 3, 2019

I guess you accidentally turned on half precision support for this operation?

@vishwakftw
Copy link
Contributor Author

vishwakftw commented Jun 3, 2019

Yes, I'm sorry. I'll revert that change - it's in Distributions.cu.

@vishwakftw
Copy link
Contributor Author

Windows failure is unrelated to this PR.

@ezyang is this good to go?

@vishwakftw
Copy link
Contributor Author

@pytorchbot rebase this please

@vishwakftw
Copy link
Contributor Author

@pytorchbot merge this please

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Jun 5, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vishwakftw vishwakftw deleted the dirichlet_grad-cuda-support branch June 5, 2019 13:41
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 5, 2019
Summary:
Changelog:
- Migrate _dirichlet_grad implementation from TH to ATen
- Add CUDA support for _dirichlet_grad

Closes #11030.
Closes #15773.
Pull Request resolved: pytorch/pytorch#21191

Differential Revision: D15660330

Pulled By: ezyang

fbshipit-source-id: c8ad5b80366e5348139ce9be10400f22fc430344
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 6251c56.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-this-please Was marked for merge with @pytorchbot merge this please Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need GPU implementation of dirichlet_grad (originally: Reparameterized gradient on GPU for beta / Dirichlet) implement dirichlet / beta GPU grad

6 participants