-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[ROCm] Missing host device #29547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] Missing host device #29547
Conversation
|
cc @zasdfgbnm |
|
waiting on CI |
|
Looks like this is already failing CI. |
|
master doesn't seem broken. The centos build failure looks to be infra issue: I have kicked off another run. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
I'm working on adding complex number support to PyTorch. I'm sorry about the issue, I have added it to the discussion on complex numbers #755. Please provide your feedback there. |
c10/util/TypeCast.h
Outdated
|
|
||
| template<> | ||
| inline void cast_and_store<std::complex<float>>(const ScalarType dest_type, void *ptr, std::complex<float> value_) { | ||
| C10_HOST_DEVICE inline void cast_and_store<std::complex<float>>(const ScalarType dest_type, void *ptr, std::complex<float> value_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iotamudelta, is there any specific warning you are seeing? IIRC when I wrote this piece of code, I did this intentionally because this function shouldn't be called inside a __device__ or __device__ __host__ function (because we didn't support complex on cuda).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are seeing issues with this with clang (not hcc) which does instantiate for complex. I agree that it's not UB if it's not doing complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add: with a vanilla PyTorch code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, this could be a problem only if inside the kernel the dispatch has scalar_t being any of the complex types, which I don't think there is any.
zasdfgbnm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my reply here: #755 (comment)
cast_and_store is not expected to be called anywhere inside a device code in current PyTorch, that is the reason why the C10_HOST_DEVICE qualifier was not there and there is no UB with current PyTorch. But since we are switching to C++14 soon, it is nice to have a C10_HOST_DEVICE qualifier which is necessary when we need to enable complex on GPU.
|
The build fails at compiling the hipified This file has been split into smaller ones in #29428 (Thanks to @zasdfgbnm) because the compilation unit being too big. @iotamudelta Could you rebase this one to the current master to see if the build failure goes away? |
Summary: Currently, the dynamic casting mechanism is implemented assuming no support of complex on GPU. This will no longer be true in the soon future. #29547 could clear some clang warning but the complex support on GPU is still not complete: - fetch is not supported - casting between complex64 and complex128 is not supported - complex scalar types are not tested This PR is what should be done for type promotion in order to add support to complex dtype on GPU, as suggested in #755 (comment) Note that what is newly added here in this PR is not tested due to the lack of basic support of complex dtypes (I can not construct a complex tensor). But his PR shouldn't break any existing part of PyTorch. For the merge this PR, consider two options: - We could merge this PR now so that dylanbespalko could conveniently work based on master, if there is something wrong here not found by code review, dylanbespalko would find when adding complex integration. - Or, we could just leave this PR open, don't merge it. But then dylanbespalko might need to manually apply this to his branch in order to support type promotion of complex. Pull Request resolved: #29612 Differential Revision: D18451061 Pulled By: ezyang fbshipit-source-id: 6d4817e87f0cc2e844dc28c0355a7e53220933a6
|
@iotamudelta Could you resolve merge conflicts? |
|
@bddppq should be resolved. Looks like it minimized the required changes. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@bddppq any insight into the one failing internal test target? |
|
It's fake, you're good to go. |
Missing
__device__and__host__annotations in the complex case. Make it less UB.Note that this still rather unsavory code:
std::realis onlyconstexprfrom C++14 on onwards ( https://en.cppreference.com/w/cpp/numeric/complex/real2 ) which is the requirement for__device__.What I am trying to say is: this particular piece of code should not have passed review and not been merged, IMHO, as it tries to codify UB.
Also note that the benchmarks referenced in source were CPU and CUDA-only.