Change calling convention of ATenDispatch from getOp to callUnboxed. #26361

ezyang · 2019-09-17T18:57:58Z

Stack from ghstack:

Tests for fallback boxed dispatch (including TLS mode) #26719 Tests for fallback boxed dispatch.
Add boxed fallback #26368 Add boxed fallback
Refactor dispatch structure so fallback code lives inline. #26367 Refactor dispatch structure so fallback code lives inline.
Change calling convention of ATenDispatch from getOp to callUnboxed. #26361 Change calling convention of ATenDispatch from getOp to callUnboxed.
Add some missing constructors to IValue. #26806 Add some missing constructors to IValue.
Make resize_as_ generic, so XLA works. #26809 Make resize_as_ generic, so XLA works.

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle). Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code). This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang [email protected]

Differential Revision: D17429162

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 940f460 Pull Request resolved: #26361

…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]

smessmer · 2019-09-19T18:14:50Z

aten/src/ATen/core/ATenDispatch.h

+    void operator()(const at::Tensor& x) {
+      ts = ts | x.type_set();
+    }
+    void operator()(TensorOptions x) {


Why not also take this by const&?

I suppose you could. TensorOptions is a pass by value object so it doesn't really matter one way or another.

TensorOptions is quite a few words large, right? Why is it a "pass by value object"? For things above 1-2 words, passing by reference is usually faster.

It's two words.

// We should aspire to fit in one machine-size word; but a size greater than two // words is too much. (We are doing terribly on 32-bit archs, where we require // three machine size words to store tensor options. Eek!) static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, "TensorOptions must fit in 128-bits" );

smessmer · 2019-09-19T18:16:27Z

aten/src/ATen/core/ATenDispatch.h

+      }
+    }
+    template <typename T>
+    void operator()(const T& x) {


This seems fragile. Let's assert that these are only the types we expect, either by static_assert or by only overloading the concrete types. Example: c10 takes tensor lists not as ArrayRef but as torch::List. We would probably miss that once we start using this for c10.

This feels like a "cure is worse than the disease" situation. We support a lot of input types for kernels, easily dozens, and what you've proposed would effectively require us to maintain yet another list of every possible input type in the codebase, and as you've seen, we add types all the time. Whereas I feel pretty comfortable assuming that the number of Tensor-containing argument types will change very slowly in the future.

Re torch::List, I don't understand why this type exists at all in the arguments of kernels (cc @dzhulgakov, original reviewer of the PR #21164 -- I'm marked as the reviewer but I only reapproved the diff for administrative reasons). As best I can tell, List is an owning type, which means that if I have some sort of type that cannot be directly moved into List, e.g., an initializer_list, I have to copy the elements. But for arguments to kernels, this doesn't really make any sense: if I write:

convolution(x, {2, 3} /* padding */);

There is no reason to copy the {2, 3} initializer list into an owning list. I want to pass it in by reference, and that's exactly what ArrayRef does. Is there a use case that I'm not aware of?

For dicts the reasoning was that we want to hide the underlying implementation type. For Lists one could make the same argument, though it becomes more far-fetched. @smessmer - correct me if I'm wrong - we support both types now

The reason for torch::List as an argument and return type was mostly consistency. You can't return ArrayRef for example, so you'd need different types if you take a list as argument vs when you return it. Also, ArrayRef doesn't work with book, so you'd need a different non-ArrayRef type for BoolList arguments. See the whole reasoning here: https://quip.com/UWUpAoaPsipL

About the original point, you got me convinced that we shouldn't introduce a new list of supported types here. We'll hopefully notice if something is missing. Alternatively, we could reuse the already existing list of supported types in the c10 dispatcher, but I'll leave that up to you.

Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?

The reason for torch::List as an argument and return type was mostly consistency

Foolish consistency? I mean, I guess if people write kernels that take in List as an argument, I won't stop them. But it's a sort of fact of life in C++ that argument types and return types are going to look different. The best practice is going to be to use ArrayRef on input. The consistency argument also feels even weaker because List is our own home grown type and so you have to tell the user it exists.

I don't find the bool case particularly convincing either. We're not building a general purpose operator library: we're building a tensor library. I have not seen a single compelling use case for passing a variable-length list of booleans as an argument to an operator. (torch.tensor doesn't count; you don't want to bind that in native anyway.)

Btw talking of missing types, do you have a plan on how to handle Dicts with Tensor values? Nested things like List of List of Dict of ... ?

We've got to draw the line somewhere. I'm pretty OK with saying, "Only Tensor, ArrayRef, TensorOptions" participate in multiple dispatch. We have similar limitations with autograd tracking for user defined functions: if you smuggle a Variable in, inside some complex struct, it's not going to be counted as an input for the purposes of the autograd function.

so you can't use lazy tensors in Dicts for example?

smessmer · 2019-09-19T18:17:37Z

aten/src/ATen/core/ATenDispatch.h

+  };
+
+  template <typename... Args>
+  TensorTypeSet multi_dispatch_tensor_type_set(Args&&... args) {


You should probably change the arguments to const Args&... args.

…llUnboxed." Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17429162](https://our.internmc.facebook.com/intern/diff/D17429162) [ghstack-poisoned]

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: de77fc6 Pull Request resolved: pytorch/pytorch#26361

pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Sep 17, 2019

This was referenced Sep 17, 2019

Implement multiple dispatch #25653

Closed

Delete backwards compatibility Backend overload for registerOp #25914

Closed

Implement multiple dispatch in boxed c10 dispatcher #26118

Closed

Remove unnecessary include from TensorBody #26360

Closed

ezyang requested a review from smessmer September 17, 2019 19:00

This was referenced Sep 17, 2019

Refactor dispatch structure so fallback code lives inline. #26367

Closed

Add boxed fallback #26368

Closed

ezyang added 4 commits September 18, 2019 10:05

ezyang mentioned this pull request Sep 19, 2019

Implement multiple dispatch #26468

Closed

smessmer approved these changes Sep 19, 2019

View reviewed changes

ezyang added 3 commits September 19, 2019 17:31

ezyang mentioned this pull request Sep 19, 2019

Implement multiple dispatch (#26468) #26501

Closed

ezyang mentioned this pull request Sep 20, 2019

Remove TensorOptions logic from generated code #26428

Open

ezyang added 2 commits September 24, 2019 10:34

This was referenced Sep 24, 2019

Add some missing constructors to IValue. #26718

Closed

Tests for fallback boxed dispatch (including TLS mode) #26719

Closed

ezyang mentioned this pull request Sep 25, 2019

Add some missing constructors to IValue. #26806

Closed

This was referenced Sep 25, 2019

Make resize_as_ generic, so XLA works. #26809

Closed

fixup #26810

Closed

ezyang added 3 commits September 25, 2019 11:35

ezyang merged commit 39e8913 into gh/ezyang/366/base Sep 25, 2019

Change calling convention of ATenDispatch from getOp to callUnboxed. #26361

Change calling convention of ATenDispatch from getOp to callUnboxed. #26361

Uh oh!

Conversation

ezyang commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ezyang commented Sep 17, 2019 •

edited

Loading

smessmer Sep 25, 2019 •

edited

Loading

smessmer Sep 19, 2019 •

edited

Loading