Skip to content

Conversation

@iotamudelta
Copy link
Contributor

Missing __device__ and __host__ annotations in the complex case. Make it less UB.

Note that this still rather unsavory code: std::real is only constexpr from C++14 on onwards ( https://en.cppreference.com/w/cpp/numeric/complex/real2 ) which is the requirement for __device__.

What I am trying to say is: this particular piece of code should not have passed review and not been merged, IMHO, as it tries to codify UB.

Also note that the benchmarks referenced in source were CPU and CUDA-only.

@iotamudelta iotamudelta added module: rocm AMD GPU support for Pytorch open source labels Nov 11, 2019
@iotamudelta iotamudelta requested review from bddppq and ezyang November 11, 2019 14:21
@ezyang
Copy link
Contributor

ezyang commented Nov 11, 2019

cc @dylanbespalko

@ezyang
Copy link
Contributor

ezyang commented Nov 11, 2019

cc @zasdfgbnm

@ezyang
Copy link
Contributor

ezyang commented Nov 11, 2019

waiting on CI

@iotamudelta
Copy link
Contributor Author

Looks like this is already failing CI.

@bddppq
Copy link
Contributor

bddppq commented Nov 11, 2019

master doesn't seem broken. The centos build failure looks to be infra issue:

14:48:11 error: failed to execute compile
14:48:11 caused by: error reading compile response from server
14:48:11 caused by: Failed to read response header
14:48:11 caused by: failed to fill whole buffer

I have kicked off another run.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dylanbespalko dylanbespalko mentioned this pull request Nov 11, 2019
10 tasks
@dylanbespalko
Copy link
Contributor

@iotamudelta,

I'm working on adding complex number support to PyTorch. I'm sorry about the issue, I have added it to the discussion on complex numbers #755. Please provide your feedback there.


template<>
inline void cast_and_store<std::complex<float>>(const ScalarType dest_type, void *ptr, std::complex<float> value_) {
C10_HOST_DEVICE inline void cast_and_store<std::complex<float>>(const ScalarType dest_type, void *ptr, std::complex<float> value_) {
Copy link
Collaborator

@zasdfgbnm zasdfgbnm Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iotamudelta, is there any specific warning you are seeing? IIRC when I wrote this piece of code, I did this intentionally because this function shouldn't be called inside a __device__ or __device__ __host__ function (because we didn't support complex on cuda).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are seeing issues with this with clang (not hcc) which does instantiate for complex. I agree that it's not UB if it's not doing complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add: with a vanilla PyTorch code base.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, this could be a problem only if inside the kernel the dispatch has scalar_t being any of the complex types, which I don't think there is any.

Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my reply here: #755 (comment)

cast_and_store is not expected to be called anywhere inside a device code in current PyTorch, that is the reason why the C10_HOST_DEVICE qualifier was not there and there is no UB with current PyTorch. But since we are switching to C++14 soon, it is nice to have a C10_HOST_DEVICE qualifier which is necessary when we need to enable complex on GPU.

@bddppq
Copy link
Contributor

bddppq commented Nov 11, 2019

The build fails at compiling the hipified BinaryOpsKernel.cu (unfortunately I don't see any useful error message from the build log):

23:27:31 CMake Error at torch_generated_BinaryOpsKernel.hip.o.cmake:174 (message):
23:27:31   Error generating file
23:27:31   /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/hip/./torch_generated_BinaryOpsKernel.hip.o
23:27:31 
23:27:31 

This file has been split into smaller ones in #29428 (Thanks to @zasdfgbnm) because the compilation unit being too big. @iotamudelta Could you rebase this one to the current master to see if the build failure goes away?

facebook-github-bot pushed a commit that referenced this pull request Nov 12, 2019
Summary:
Currently, the dynamic casting mechanism is implemented assuming no support of complex on GPU. This will no longer be true in the soon future.

#29547 could clear some clang warning but the complex support on GPU is still not complete:
- fetch is not supported
- casting between complex64 and complex128 is not supported
- complex scalar types are not tested

This PR is what should be done for type promotion in order to add support to complex dtype on GPU, as suggested in #755 (comment)

Note that what is newly added here in this PR is not tested due to the lack of basic support of complex dtypes (I can not construct a complex tensor). But his PR shouldn't break any existing part of PyTorch.

For the merge this PR, consider two options:
- We could merge this PR now so that dylanbespalko could conveniently work based on master, if there is something wrong here not found by code review, dylanbespalko would find when adding complex integration.
- Or, we could just leave this PR open, don't merge it. But then dylanbespalko might need to manually apply this to his branch in order to support type promotion of complex.
Pull Request resolved: #29612

Differential Revision: D18451061

Pulled By: ezyang

fbshipit-source-id: 6d4817e87f0cc2e844dc28c0355a7e53220933a6
@bddppq
Copy link
Contributor

bddppq commented Nov 12, 2019

@iotamudelta Could you resolve merge conflicts?

@iotamudelta
Copy link
Contributor Author

@bddppq should be resolved. Looks like it minimized the required changes.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@iotamudelta
Copy link
Contributor Author

@bddppq any insight into the one failing internal test target?

@ezyang
Copy link
Contributor

ezyang commented Nov 13, 2019

It's fake, you're good to go.

@facebook-github-bot
Copy link
Contributor

@bddppq merged this pull request in 6d54c5d.

@iotamudelta iotamudelta deleted the missing_host_device branch November 13, 2019 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: rocm AMD GPU support for Pytorch open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants