-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Change calling convention of ATenDispatch from getOp to callUnboxed. #26361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 940f460 Pull Request resolved: #26361
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
| void operator()(const at::Tensor& x) { | ||
| ts = ts | x.type_set(); | ||
| } | ||
| void operator()(TensorOptions x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also take this by const&?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you could. TensorOptions is a pass by value object so it doesn't really matter one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorOptions is quite a few words large, right? Why is it a "pass by value object"? For things above 1-2 words, passing by reference is usually faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's two words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We should aspire to fit in one machine-size word; but a size greater than two
// words is too much. (We are doing terribly on 32-bit archs, where we require
// three machine size words to store tensor options. Eek!)
static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2,
"TensorOptions must fit in 128-bits" );
| } | ||
| } | ||
| template <typename T> | ||
| void operator()(const T& x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fragile. Let's assert that these are only the types we expect, either by static_assert or by only overloading the concrete types. Example: c10 takes tensor lists not as ArrayRef but as torch::List. We would probably miss that once we start using this for c10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a "cure is worse than the disease" situation. We support a lot of input types for kernels, easily dozens, and what you've proposed would effectively require us to maintain yet another list of every possible input type in the codebase, and as you've seen, we add types all the time. Whereas I feel pretty comfortable assuming that the number of Tensor-containing argument types will change very slowly in the future.
Re torch::List, I don't understand why this type exists at all in the arguments of kernels (cc @dzhulgakov, original reviewer of the PR #21164 -- I'm marked as the reviewer but I only reapproved the diff for administrative reasons). As best I can tell, List is an owning type, which means that if I have some sort of type that cannot be directly moved into List, e.g., an initializer_list, I have to copy the elements. But for arguments to kernels, this doesn't really make any sense: if I write:
convolution(x, {2, 3} /* padding */);
There is no reason to copy the {2, 3} initializer list into an owning list. I want to pass it in by reference, and that's exactly what ArrayRef does. Is there a use case that I'm not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dicts the reasoning was that we want to hide the underlying implementation type. For Lists one could make the same argument, though it becomes more far-fetched. @smessmer - correct me if I'm wrong - we support both types now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for torch::List as an argument and return type was mostly consistency. You can't return ArrayRef for example, so you'd need different types if you take a list as argument vs when you return it. Also, ArrayRef doesn't work with book, so you'd need a different non-ArrayRef type for BoolList arguments. See the whole reasoning here: https://quip.com/UWUpAoaPsipL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the original point, you got me convinced that we shouldn't introduce a new list of supported types here. We'll hopefully notice if something is missing. Alternatively, we could reuse the already existing list of supported types in the c10 dispatcher, but I'll leave that up to you.
Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for torch::List as an argument and return type was mostly consistency
Foolish consistency? I mean, I guess if people write kernels that take in List as an argument, I won't stop them. But it's a sort of fact of life in C++ that argument types and return types are going to look different. The best practice is going to be to use ArrayRef on input. The consistency argument also feels even weaker because List is our own home grown type and so you have to tell the user it exists.
I don't find the bool case particularly convincing either. We're not building a general purpose operator library: we're building a tensor library. I have not seen a single compelling use case for passing a variable-length list of booleans as an argument to an operator. (torch.tensor doesn't count; you don't want to bind that in native anyway.)
Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?
We've got to draw the line somewhere. I'm pretty OK with saying, "Only Tensor, ArrayRef, TensorOptions" participate in multiple dispatch. We have similar limitations with autograd tracking for user defined functions: if you smuggle a Variable in, inside some complex struct, it's not going to be counted as an input for the purposes of the autograd function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can't use lazy tensors in Dicts for example?
aten/src/ATen/core/ATenDispatch.h
Outdated
| }; | ||
|
|
||
| template <typename... Args> | ||
| TensorTypeSet multi_dispatch_tensor_type_set(Args&&... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably change the arguments to const Args&... args.
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: de77fc6 Pull Request resolved: pytorch/pytorch#26361
Stack from ghstack:
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle). Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code). This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.
Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.
Signed-off-by: Edward Z. Yang [email protected]
Differential Revision: D17429162