Register ATen ops with c10 #23667

smessmer · 2019-08-01T17:40:53Z

Stack from ghstack:

[wip] custom op API improvements #26128 [wip] custom op API improvements
Register ATen ops with c10 #23667 Register ATen ops with c10
Allow overwriting catch-all kernels #25947 Allow overwriting catch-all kernels
Make schema part of RegisterOperators::Options #26114 Make schema part of RegisterOperators::Options

Changes in this PR:

For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
Enable the use_c10_dispatcher: True flag for about ~70% of operators
This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the use_c10_dispatcher flag set to true):

Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
-> void in native_functions.yaml vs -> () expected by function schema parser
out functions have different argument order in C++ as in the jit schema
Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
fixed-size arrays like int[3] not supported in c10 yet
These will be fixed in separate diffs and then the exclusion tag will be removed.

Differential Revision: D16603131

gchanan · 2019-08-06T22:12:43Z

what's the benefit of the registration? Are you going to try to move over the eager ops ? is it so the JIT can universally use the c10 dispatcher? etc.

when are you planning to address the above limitations?

gchanan · 2019-08-06T22:15:34Z

what happens if someone adds a function with one of the above limitations? Do they get a compiler error? An error if they try to dispatch to the function (via JIT?)?

There is no documentation in the README.

Why was the choice made to exclude rather than include via the flag?

smessmer · 2019-08-06T22:22:51Z

Yes, I'm planning to move over all of native_functions.yaml to c10 so that we have one dispatch mechanism for everything instead of multiple ones.

Most of the above limitations will be addressed very soon. The argument order for out functions probably needs a bit more discussion with you on how to do it first.

Adding a function that violates the limitations is a compiler error.

The choice to exclude instead of include is that over time the code base can PR by PR get better and better with fewer uses of the flag, until the flag is entirely gone. This is following the same pattern of ops that are annotated with matches_jit_signature: False.

I added a section to the README

gchanan · 2019-08-06T22:52:41Z

that sounds generally reasonable. Do I understand correctly: if a user adds a function that violates the limitations but doesn't put the exclude, they will get a compiler error? If so, we shouldn't do that. We should either detect the error and print a message or have the annotation be opt-in.

aten/src/ATen/function_wrapper.py

li-roy · 2019-08-07T23:19:36Z

aten/src/ATen/native/native_functions.yaml

@@ -105,13 +112,15 @@
  named_guard: False

 - func: abs_(Tensor(a!) self) -> Tensor(a!)
+  exclude_from_c10_dispatcher: True


I think you forgot to mention in the description that inplace and _out ops can't be registered at the moment because of Tensor & in their signature. Did you decide on an approach for this yet btw?

no. out functions don't work anyhow because jit and c++ have a different argument order. Not sure yet about how to fix this.

according to @gchanan, the reason for is that _out variants of functions have out argument as kwarg_only in python, but in C++ it's logical to always provide it while still have other args with default values. Thus the only option is to make it the first argument

Relevant code seems to be:

pytorch/aten/src/ATen/native_parse.py

Line 190 in 33a1c30

is_out_fn = True

pytorch/tools/jit/gen_jit_dispatch.py

Line 452 in 33a1c30

decl['jit_argument_order'] = [nargs - 1] + list(range(nargs - 1))

Note that all C++ APIs today: public API, kernel implementations all use "reordered" variant with out argument first. TorchScript parsing and the Stack in JIT uses the "original" order.

If we try to introduce reordering logic in c10 it'd cause a mess when switching between boxed and unboxed APIs.

I see several options:

switch kernel definitions to "original" order (giant codemod) and keep reordering only in generated C++ API exposed to user

try mirroring reordering logic in c10 such that all boxing/unboxing functions do reordering

if we give up the idea of "no codegen" (which I think is reasonable in mid-term). Then for native_functions we can provide ability to codegen boxing/unboxing wrappers and just register them with c10 as hooks (and utilize current codegen). Default hook would be the regular templated wrappers generated automatically based on C++ signature for cases where there's no reordering.

I'd view this problem as analogous to the TensorOptions issue. There, we do some codegen in order to make the C++ API nice (good!), but then need to reimplement the crazy logic in multiple places (bad!). All of these endpoints don't need to use the fancy C++ signature, they aren't user endpoints, a dumb signature is totally fine.

Now, you may or may not want a single way of telling downstream components that they should actually map to a different C++ function call so we could have private implementation calls (which would allow us to e.g. only expose a single "public" function for factories and out functions, as we do now), but that logic can be super simple.

smessmer · 2019-08-07T23:26:03Z

@gchanan I changed it to be an op-in flag instead of opt-out

dzhulgakov · 2019-08-09T17:29:47Z

I think you need to patch some more tests - RPC now uses overload signatures too, it seems

aten/src/ATen/native/README.md

aten/src/ATen/function_wrapper.py

Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the use_c10_dispatcher flag set to true): - Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - -> void in native_functions.yaml vs -> () expected by function schema parser - out functions have different argument order in C++ as in the jit schema Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like int[3] not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. Differential Revision: [D16603131](https://our.internmc.facebook.com/intern/diff/D16603131/)

aten/src/ATen/templates/OpsAlreadyMovedToC10.cpp

ezyang · 2019-09-10T14:49:09Z

aten/src/ATen/core/OpsAlreadyMovedToC10.cpp

+
+// TODO Once all ATen ops are moved to c10, this file should be removed
+
+const std::unordered_set<c10::OperatorName>& aten_ops_already_moved_to_c10() {


IIUC, @gchanan is trying to get rid of the checked in autogenerated files. So I'm pretty unhappy that we're adding another checked in autogenerated file in this patch. Is it absolutely necessary that this file lives in core? In particular, I find it hard to believe that mobile actually needs to depend on this file (and it really shouldn't, because you're going to pay for binary size to hold the unordered set and strings). If mobile doesn't depend on this file, there is no reason for the autogenerated file to be checked in.

I've tried several other alternatives like adding this code to VariableType.cpp or TypeDefault.cpp before but they didn't work. ATenDispatch.h/cpp depends on this and is part of the ATen/core build and not having this file in ATen/core caused linker errors.

I don't really like this either but I think it's ok as an intermediate solution. There's a comment in the file that it ought to be deleted as soon as we can.

My comment at #23668 (comment) is applicable here

ezyang · 2019-09-10T14:51:40Z

aten/src/ATen/core/op_registration/op_registration.h

+        // setting cache creator to nullptr so calling the kernel doesn't need to call it, which would be expensive
+        // This, however, only works if there are no constructor parameters (i.e. no runtime function pointer)
+        // Backend extensions use runtime function pointers, so we need a cache if sizeof...(ConstructorParameters) != 0
+        (sizeof...(ConstructorParameters) == 0) ? KernelCacheCreatorFunction(nullptr) : detail::KernelFactory<KernelFunctor, guts::decay_t<ConstructorParameters>...>(std::forward<ConstructorParameters>(constructorParameters)...),


So, this patch also adds support for some operators which didn't work before?

This is not for new operators but for backend extensions. All ATen operators in native_functions.yaml codegen registrations that don't need this, but if you look at "Additional concern: Extension backends" in #24132, you see the plan to keep globalATenDispatch().registerOp() working until we moved all ops to c10, and only then break all ops at once for them. To keep globalATenDispatch().registerOp() working, we need this.

aten/src/ATen/function_wrapper.py

ezyang · 2019-09-10T14:55:01Z

It looks like the patch should work. My primary objection is procedural: adding another checked in autogenerated file.

Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the use_c10_dispatcher flag set to true): - Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - -> void in native_functions.yaml vs -> () expected by function schema parser - out functions have different argument order in C++ as in the jit schema Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like int[3] not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. Differential Revision: [D16603131](https://our.internmc.facebook.com/intern/diff/D16603131/)

ezyang · 2019-09-12T17:07:35Z

aten/src/ATen/core/OpsAlreadyMovedToC10.h

+CAFFE2_API const std::unordered_set<c10::OperatorName>& aten_ops_already_moved_to_c10();
+
+// list of ATen ops that are still on the globalATenDispatch dispatcher.
+CAFFE2_API const std::unordered_set<c10::OperatorName>& aten_ops_not_moved_to_c10_yet();


Two sets?! :O

There's semantically three sets of operators:

aten_ops_already_moved_to_c10

aten_ops_not_moved_to_c10_yet

non_aten_ops (e.g. custom ops)

In the previous approach:

register_c10_ops.cpp needed to decide if an op is in aten_ops_already_moved_to_c10 or in union(aten_ops_not_moved_to_c10_yet, non_aten_ops) because it needs to exclude the former.

ATenDispatch needed to decide if an op is in aten_ops_already_moved_to_c10 or in union(aten_ops_not_moved_to_c10_yet, non_aten_ops) so it can register them with c10 instead.

Since both just needed the very same decision, it was enough to store aten_ops_already_moved_to_c10 and make the decision based on that.

Now, since in the new approach it is the c10 dispatcher forwarding to ATenDispatch, it needs to decide if an op is in aten_ops_not_moved_to_c10_yet or in union(aten_ops_already_moved_to_c10, non_aten_ops), which is different to what register_c10_ops.cpp needs. I need to store two sets to be able to make both decisions.

^-- this should be in the source code

Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the use_c10_dispatcher flag set to true): - Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - -> void in native_functions.yaml vs -> () expected by function schema parser - out functions have different argument order in C++ as in the jit schema Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like int[3] not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. Differential Revision: [D16603131](https://our.internmc.facebook.com/intern/diff/D16603131/)

ezyang

Reiterating from the closed PR: I think the RegisterOperators is bad (having to specify schema string twice is a big no no) and needs to be fixed.

Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the use_c10_dispatcher flag set to true): - Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - -> void in native_functions.yaml vs -> () expected by function schema parser - out functions have different argument order in C++ as in the jit schema Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like int[3] not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. Differential Revision: [D16603131](https://our.internmc.facebook.com/intern/diff/D16603131/)

smessmer · 2019-09-12T22:11:52Z

Don't know what exactly happened, but somehow ghexport closed this. Reopening in #26131

pytorchbot added module: internals Related to internal abstractions in c10 and ATen module: operators labels Aug 1, 2019

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 1, 2019

smessmer changed the title ~~[wip] Register ATen ops with c10~~ Register ATen ops with c10 Aug 6, 2019

smessmer requested review from dzhulgakov, gchanan and li-roy August 6, 2019 21:28

pytorchbot added the module: docs Related to our documentation, both in docs/ and docblocks label Aug 6, 2019

li-roy reviewed Aug 7, 2019

View reviewed changes

aten/src/ATen/function_wrapper.py Outdated Show resolved Hide resolved

li-roy reviewed Aug 7, 2019

View reviewed changes

smessmer mentioned this pull request Aug 9, 2019

Plan for Migrating ATen ops to the c10 dispatcher #24132

Closed

25 tasks

smessmer requested a review from li-roy August 13, 2019 18:07

smessmer mentioned this pull request Aug 13, 2019

[test] Benchmarks #24253

Closed

gchanan requested changes Aug 14, 2019

View reviewed changes

aten/src/ATen/native/README.md Show resolved Hide resolved

aten/src/ATen/function_wrapper.py Outdated Show resolved Hide resolved

smessmer added 4 commits September 9, 2019 16:29

ezyang reviewed Sep 10, 2019

View reviewed changes

aten/src/ATen/templates/OpsAlreadyMovedToC10.cpp Outdated Show resolved Hide resolved

ezyang reviewed Sep 10, 2019

View reviewed changes

aten/src/ATen/function_wrapper.py Outdated Show resolved Hide resolved

ezyang reviewed Sep 10, 2019

View reviewed changes

aten/src/ATen/function_wrapper.py Show resolved Hide resolved

smessmer requested a review from ezyang September 10, 2019 16:26

smessmer mentioned this pull request Sep 10, 2019

Allow overwriting catch-all kernels #25947

Closed

smessmer mentioned this pull request Sep 11, 2019

forward c10 registrations to aten #25969

Closed

smessmer added 2 commits September 10, 2019 23:30

ezyang reviewed Sep 12, 2019

View reviewed changes

pytorchbot added the module: cpp-extensions Related to torch.utils.cpp_extension label Sep 12, 2019

ezyang approved these changes Sep 12, 2019

View reviewed changes

This was referenced Sep 12, 2019

Make schema part of RegisterOperators::Options #26114

Closed

[wip] custom op API improvements #26128

Closed

smessmer merged commit 549214b into gh/smessmer/15/base Sep 12, 2019

This was referenced Sep 12, 2019

Remove unboxedAutogradKernel from c10 #26130

Closed

Register ATen ops with c10 #26131

Closed

ailzhang mentioned this pull request Sep 13, 2019

use c10 dispatcher pytorch/xla#1035

Merged


		// TODO Once all ATen ops are moved to c10, this file should be removed

		const std::unordered_set<c10::OperatorName>& aten_ops_already_moved_to_c10() {

Register ATen ops with c10 #23667

Register ATen ops with c10 #23667

Uh oh!

Conversation

smessmer commented Aug 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gchanan commented Aug 6, 2019

Uh oh!

gchanan commented Aug 6, 2019

Uh oh!

smessmer commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gchanan commented Aug 6, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer commented Aug 7, 2019

Uh oh!

dzhulgakov commented Aug 9, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ezyang commented Sep 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

smessmer commented Sep 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

smessmer commented Aug 1, 2019 •

edited

Loading

smessmer commented Aug 6, 2019 •

edited

Loading

smessmer Sep 10, 2019 •

edited

Loading

smessmer Sep 12, 2019 •

edited

Loading