Skip to content

Conversation

@vadimkantorov
Copy link
Contributor

Following dicussion with @ezyang in #26288

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels Oct 9, 2019
@vadimkantorov
Copy link
Contributor Author

yes, compilation error, the same is I had got locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a Tensor& here and I think your problem will go away. You might need to adjust your inner code to directly return output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is returning a reference to a local variable (tensor) safe?

Copy link
Contributor Author

@vadimkantorov vadimkantorov Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm understanding your suggestion. Are you suggesting replacing Tensor as the return type of batch_norm_elemt_cuda_out and batch_norm_elemt_cuda by Tensor&?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we change return type, in theory, the error should persist: Oct 09 19:39:07 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Normalization.cu(67): error: cannot overload functions distinguished by return type alone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is returning a reference to a local variable (tensor) safe?

No it is not. However, the convention is that _out functions return the reference for out that was passed in. It should look something like:

Tensor& foo_out(Tensor& output, const Tensor& input) {
  // ...
  return output;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we change return type, in theory, the error should persist

I'm not sure I follow. There might be other errors in your code, but your error message is consistent with this C++ code:

// Auto-generated in NativeFunctions.h
Tensor& foo_out(Tensor& output, const Tensor& input);

// Your implementation
Tensor foo_out(Tensor& output, const Tensor& input) {
  ...
}

This is exactly the situation a C++ compiler will complain about an overload distinguished only in return type.

By the way, you can find out exactly what signature ATen is expecting by inspecting the generated native functions header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang Thanks for explanation! I now understand you, I'll try the suggested return type for the _out-variant.

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Oct 12, 2019

clang-tidy check error seems unrelated

And compilation fails with:

HEAD is now at 3e7b27a2be Tensor -> Tensor& as out-variant return type
+ git merge --no-edit --no-ff fcb6dd079eb01b594aeea9f7c83ea1b36aa65793
fatal: refusing to merge unrelated histories
Exited with code 128

Do I need to rebase?

Also, https://github.com/vadimkantorov/pytorch/blob/master/aten/src/ATen/native/cuda/Normalization.cuh/#L678-L679 in an extreme case could reallocate, and then returning a reference to output_ (before reshape) is incorrect. In practice this should never happen. Is there a simple way to ensure that reshape does not reallocate?

@ezyang
Copy link
Contributor

ezyang commented Oct 14, 2019

Do I need to rebase?

Yep.

Also, https://github.com/vadimkantorov/pytorch/blob/master/aten/src/ATen/native/cuda/Normalization.cuh/#L678-L679 in an extreme case could reallocate, and then returning a reference to output_ (before reshape) is incorrect

You should error in that case. Replace this with a view which is guaranteed not to reallcoate?

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Oct 14, 2019

You should error in that case. Replace this with a view which is guaranteed not to reallcoate?
Ok, I'll replace the output_.reshape with output.view. On a side note, currently there is no easy way for:

a = torch.rand(2,3,4)
a_ = a.transpose(-1, -2)
b_ = someltwisefunc(a_.reshape(2, -1)) # reshape seems to reallocate (checked by data_ptr), view will error out
b = b_.view_as(a)

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Oct 15, 2019

Squashed and rebased. I hope it's done correctly (I added remote upstream, fetched it, squashed commits and then rebased, it's my first time using git rebase, so sorry for this verbosity):

git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream master
git rebase -i HEAD~2 # pick + squash
git rebase upstream/master
git push -f

If compilation works, I'll replace output_.reshape to output_.view

@ezyang
Copy link
Contributor

ezyang commented Oct 15, 2019

Build failure is real


Oct 15 13:15:06 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Normalization.cu(70): error: initial value of reference to non-const must be an lvalue
Oct 15 13:15:06 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view here, to avoid realloc

@ezyang
Copy link
Contributor

ezyang commented Oct 15, 2019

The rest of the code looks reasonable!

@vadimkantorov
Copy link
Contributor Author

Build failure is real


Oct 15 13:15:06 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Normalization.cu(70): error: initial value of reference to non-const must be an lvalue
Oct 15 13:15:06 

Is it because AT_DISPATCH_FLOATING_TYPES_AND_HALF and the lambda passed in are implicitly returning a Tensor, not Tensor&? Would you have an advice how to fix that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way would be to remove returns here and replace them by return output at the end. Should I do that @ezyang?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I would do.

Tensor -> Tensor& as out-variant return type

batch_norm_elemt_cuda_template: reshape -> view for output

batch_norm_elemt_cuda_template return type: Tensor& -> void
@vadimkantorov
Copy link
Contributor Author

finally it compiled :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 17, 2019
Summary:
Following dicussion with ezyang in pytorch/pytorch#26288
Pull Request resolved: pytorch/pytorch#27621

Differential Revision: D17978858

Pulled By: ezyang

fbshipit-source-id: f843b691a67f1dc48b87ed6a633007d193150cf7
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in e1be08f.

thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Following dicussion with ezyang in pytorch#26288
Pull Request resolved: pytorch#27621

Differential Revision: D17978858

Pulled By: ezyang

fbshipit-source-id: f843b691a67f1dc48b87ed6a633007d193150cf7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants