Plan for Migrating ATen ops to the c10 dispatcher

We currently have two separate dispatchers for ATen.

- globalATenDispatch which replaced the vtable dispatch mechanism and is used for ops in `native_functions.yaml`
- the c10 dispatcher which is used for custom ops and caffe2 ops that are exported to PyTorch.

The c10 dispatcher was designed to be "one dispatcher to rule them all", and we are now planning to cash in on that promise and remove globalATenDispatch.

**Overview**: `native_functions.yaml` codegen will generate operator registrations for the c10 dispatcher instead of for globalATenDispatch. While ultimately, we are considering switching these operators to be called in a boxed fashion, that is not part of this plan yet. For now, the c10 dispatcher will have two types of operators - boxed ones and unboxed ones - and store them separately. This simplifies the migration and allows us to tackle the boxing issue separately.

**Steps:**

- [x] The c10 dispatcher needs to support storing and calling operators in an unboxed way. (done in https://github.com/pytorch/pytorch/pull/23447 and https://github.com/pytorch/pytorch/pull/23665)
- [x] The c10 dispatcher allows registering autograd kernels that are called instead of the device kernels if the tensor `is_variable() == True` (done in https://github.com/pytorch/pytorch/pull/23666)
- [x] `native_functions.yaml` gets a `use_c10: True` flag that can be added to functions. If the flag is true, the function will be registered to and called through the c10 dispatcher instead of globalATenDispatch (https://github.com/pytorch/pytorch/pull/23667, https://github.com/pytorch/pytorch/pull/23668, https://github.com/pytorch/pytorch/pull/26255). In the beginning, we can enable this flag for 56% of ops.
- The other 44% use features that c10 doesn't support yet. Most of them are inconsistencies between the JIT and ATen function schemas. We're going to fix them and then enable the flag for the corresponding ops step by step. The missing features I found so far are (percentage means percentage of ops using this feature):
    - [x] (1%) `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser. We should change `native_functions.yaml` to also use `-> ()`. https://github.com/pytorch/pytorch/pull/28290
    - [x] (17%) out functions don't work
       - [x] they have different argument order in C++ as in the jit schema
       - [x] they use `Tensor&` instead of `const Tensor&`
    - [x] (9%) Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
    - [x] (1%) Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet. Either switch ops to use `Tensor(a!)?` instead or add functionality to parser. https://github.com/pytorch/pytorch/pull/28182
    - Types not supported in the c10 dispatcher yet:
      - [x] (7%) fixed-size arrays like int[3] (https://github.com/pytorch/pytorch/pull/23695)
      - [x] (8%) ScalarType
      - [x] (6%) Device
      - [x] (6%) Layout
      - [x] (1%) Storage
      - [x] (4%) Generator (https://github.com/pytorch/pytorch/pull/26434)
      - [x] (4%) Dimname
      - [x] (1%) DimnameList (https://github.com/pytorch/pytorch/pull/28181)
      - [x] (1%) MemoryFormat
      - [x] (1%) ConstQuantizerPtr
      - [x] (1%) QScheme (https://github.com/pytorch/pytorch/pull/30134)
- [x] After all functions have the flag enabled, we can remove globalATenDispatch and remove the flag from `native_functions.yaml`.

**Additional concern: Registering ops with JIT**
The c10 dispatcher registers its operators with JIT in `register_c10_ops.cpp` and there is a codegen'ed `register_aten_ops.cpp` for registering ATen ops with JIT. Once an ATen operator is added to c10, it would be registered twice. We plan to

- [x] at first, we're going to codegen a global `std::unordered_set` with the names of the ATen operators that are added to c10, and use this set as a blacklist for c10 ops that are exported to JIT. ATen operators, even when on c10, will use the `register_aten_ops.cpp` mechanism (https://github.com/pytorch/pytorch/pull/23667)
- [ ] later, we're planning to change that so that `register_aten_ops.cpp` ignores these ops and they use the generic non-codegen code from `register_c10_ops.cpp` instead (https://github.com/pytorch/pytorch/pull/26865)

**Additional concern: Extension backends**
Extension backends use globalATenDispatch to overwrite ops and it would be very bad UX to ask them to use `globalATenDispatch` for some ops but the c10 `torch::RegisterOperators` for others, and have that list constantly change as we move ops to c10. We will, instead,
- [x] change `torch::RegisterOperators` registration in a way that forwards registration to `globalATenDispatch` for the ops that don't have `use_c10: True`, using the same global `std::unordered_set` mentioned above. (https://github.com/pytorch/pytorch/pull/23668)
- [x] migrate all extension backends to use `torch::RegisterOperators` instead of `globalATenDispatch`. After this, we can start moving ops to c10 without breaking backend extensions.

**Additional concern: Benchmarks**
Benchmarks show no relevant regression, see here: https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit?usp=sharing

**Out of scope: Boxing**
This will get us to a world where we have only one dispatcher, but the dispatcher has two kinds of operators: boxed and unboxed. We need to auto-generate wrappers in both directions so that boxed/unboxed kernels can be called using the unboxed/boxed API as well. This is, however, out of scope for this plan.

**Out of scope: Getting rid of Codegen**
While this plan will get rid of some codegen (namely `register_aten_ops.cpp`), most things codegen'ed today will still be codegen'ed after this. We're just planning to change the codegen'ed registrations to register the functions with a different dispatcher. Getting rid of codegen is out of scope and will potentially be tackled later on.

**Out of scope: Redesigning autograd**
There are some ideas on how autograd could be done better, for example the generic dispatch proposal currently in flow. This is orthogonal and out of scope for this plan. In this plan, we will handle autograd exactly as globalATenDispatch does it, i.e. each op can store one autograd kernel that will be called if `is_variable() == True`, just move that functionality to the c10 dispatcher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plan for Migrating ATen ops to the c10 dispatcher #24132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plan for Migrating ATen ops to the c10 dispatcher #24132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions