Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 17, 2019

Stack from ghstack:

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle). Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code). This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang [email protected]

Differential Revision: D17429162

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
@pytorchbot pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Sep 17, 2019
ezyang added a commit that referenced this pull request Sep 17, 2019
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 940f460
Pull Request resolved: #26361
@ezyang ezyang requested a review from smessmer September 17, 2019 19:00
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
void operator()(const at::Tensor& x) {
ts = ts | x.type_set();
}
void operator()(TensorOptions x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also take this by const&?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could. TensorOptions is a pass by value object so it doesn't really matter one way or another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorOptions is quite a few words large, right? Why is it a "pass by value object"? For things above 1-2 words, passing by reference is usually faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's two words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// We should aspire to fit in one machine-size word; but a size greater than two
// words is too much.  (We are doing terribly on 32-bit archs, where we require
// three machine size words to store tensor options.  Eek!)
static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2,
               "TensorOptions must fit in 128-bits" );

}
}
template <typename T>
void operator()(const T& x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile. Let's assert that these are only the types we expect, either by static_assert or by only overloading the concrete types. Example: c10 takes tensor lists not as ArrayRef but as torch::List. We would probably miss that once we start using this for c10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a "cure is worse than the disease" situation. We support a lot of input types for kernels, easily dozens, and what you've proposed would effectively require us to maintain yet another list of every possible input type in the codebase, and as you've seen, we add types all the time. Whereas I feel pretty comfortable assuming that the number of Tensor-containing argument types will change very slowly in the future.

Re torch::List, I don't understand why this type exists at all in the arguments of kernels (cc @dzhulgakov, original reviewer of the PR #21164 -- I'm marked as the reviewer but I only reapproved the diff for administrative reasons). As best I can tell, List is an owning type, which means that if I have some sort of type that cannot be directly moved into List, e.g., an initializer_list, I have to copy the elements. But for arguments to kernels, this doesn't really make any sense: if I write:

convolution(x, {2, 3} /* padding */);

There is no reason to copy the {2, 3} initializer list into an owning list. I want to pass it in by reference, and that's exactly what ArrayRef does. Is there a use case that I'm not aware of?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dicts the reasoning was that we want to hide the underlying implementation type. For Lists one could make the same argument, though it becomes more far-fetched. @smessmer - correct me if I'm wrong - we support both types now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for torch::List as an argument and return type was mostly consistency. You can't return ArrayRef for example, so you'd need different types if you take a list as argument vs when you return it. Also, ArrayRef doesn't work with book, so you'd need a different non-ArrayRef type for BoolList arguments. See the whole reasoning here: https://quip.com/UWUpAoaPsipL

Copy link
Contributor

@smessmer smessmer Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the original point, you got me convinced that we shouldn't introduce a new list of supported types here. We'll hopefully notice if something is missing. Alternatively, we could reuse the already existing list of supported types in the c10 dispatcher, but I'll leave that up to you.

Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for torch::List as an argument and return type was mostly consistency

Foolish consistency? I mean, I guess if people write kernels that take in List as an argument, I won't stop them. But it's a sort of fact of life in C++ that argument types and return types are going to look different. The best practice is going to be to use ArrayRef on input. The consistency argument also feels even weaker because List is our own home grown type and so you have to tell the user it exists.

I don't find the bool case particularly convincing either. We're not building a general purpose operator library: we're building a tensor library. I have not seen a single compelling use case for passing a variable-length list of booleans as an argument to an operator. (torch.tensor doesn't count; you don't want to bind that in native anyway.)

Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?

We've got to draw the line somewhere. I'm pretty OK with saying, "Only Tensor, ArrayRef, TensorOptions" participate in multiple dispatch. We have similar limitations with autograd tracking for user defined functions: if you smuggle a Variable in, inside some complex struct, it's not going to be counted as an input for the purposes of the autograd function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can't use lazy tensors in Dicts for example?

};

template <typename... Args>
TensorTypeSet multi_dispatch_tensor_type_set(Args&&... args) {
Copy link
Contributor

@smessmer smessmer Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably change the arguments to const Args&... args.

…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
This was referenced Sep 25, 2019
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
…llUnboxed."

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162)

[ghstack-poisoned]
@ezyang ezyang merged commit 39e8913 into gh/ezyang/366/base Sep 25, 2019
xxtEchjovs44 pushed a commit to xxtEchjovs44/pytorch that referenced this pull request Jan 29, 2020
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: de77fc6
Pull Request resolved: pytorch/pytorch#26361
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: internals Related to internal abstractions in c10 and ATen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants