-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
We currently have two separate dispatchers for ATen.
- globalATenDispatch which replaced the vtable dispatch mechanism and is used for ops in
native_functions.yaml - the c10 dispatcher which is used for custom ops and caffe2 ops that are exported to PyTorch.
The c10 dispatcher was designed to be "one dispatcher to rule them all", and we are now planning to cash in on that promise and remove globalATenDispatch.
Overview: native_functions.yaml codegen will generate operator registrations for the c10 dispatcher instead of for globalATenDispatch. While ultimately, we are considering switching these operators to be called in a boxed fashion, that is not part of this plan yet. For now, the c10 dispatcher will have two types of operators - boxed ones and unboxed ones - and store them separately. This simplifies the migration and allows us to tackle the boxing issue separately.
Steps:
- The c10 dispatcher needs to support storing and calling operators in an unboxed way. (done in Unboxed kernels in c10 #23447 and Allow kernels that don't have a boxed version #23665)
- The c10 dispatcher allows registering autograd kernels that are called instead of the device kernels if the tensor
is_variable() == True(done in c10 dispatcher stores autograd kernels #23666) -
native_functions.yamlgets ause_c10: Trueflag that can be added to functions. If the flag is true, the function will be registered to and called through the c10 dispatcher instead of globalATenDispatch (Register ATen ops with c10 #23667, Call aten ops through c10 dispatcher #23668, Move more ops to c10 #26255). In the beginning, we can enable this flag for 56% of ops. - The other 44% use features that c10 doesn't support yet. Most of them are inconsistencies between the JIT and ATen function schemas. We're going to fix them and then enable the flag for the corresponding ops step by step. The missing features I found so far are (percentage means percentage of ops using this feature):
- (1%)
-> voidin native_functions.yaml vs-> ()expected by function schema parser. We should changenative_functions.yamlto also use-> (). [ATEN->C10] Migrate return type void to () for native functions. #28290 - (17%) out functions don't work
- they have different argument order in C++ as in the jit schema
- they use
Tensor&instead ofconst Tensor&
- (9%) Tensor? (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- (1%) Tensor?(a!) (i.e. optional tensor with annotations) not supported in c++ function schema parser yet. Either switch ops to use
Tensor(a!)?instead or add functionality to parser. Fix overload names #28182 - Types not supported in the c10 dispatcher yet:
- (7%) fixed-size arrays like int[3] (Fixed size arrays #23695)
- (8%) ScalarType
- (6%) Device
- (6%) Layout
- (1%) Storage
- (4%) Generator (Move Generator ops to c10 #26434)
- (4%) Dimname
- (1%) DimnameList (Add unsupported types to schema type parser #28181)
- (1%) MemoryFormat
- (1%) ConstQuantizerPtr
- (1%) QScheme (Move QScheme ops to c10 #30134)
- (1%)
- After all functions have the flag enabled, we can remove globalATenDispatch and remove the flag from
native_functions.yaml.
Additional concern: Registering ops with JIT
The c10 dispatcher registers its operators with JIT in register_c10_ops.cpp and there is a codegen'ed register_aten_ops.cpp for registering ATen ops with JIT. Once an ATen operator is added to c10, it would be registered twice. We plan to
- at first, we're going to codegen a global
std::unordered_setwith the names of the ATen operators that are added to c10, and use this set as a blacklist for c10 ops that are exported to JIT. ATen operators, even when on c10, will use theregister_aten_ops.cppmechanism (Register ATen ops with c10 #23667) - later, we're planning to change that so that
register_aten_ops.cppignores these ops and they use the generic non-codegen code fromregister_c10_ops.cppinstead ([wip] Remove manual boxing wrappers #26865)
Additional concern: Extension backends
Extension backends use globalATenDispatch to overwrite ops and it would be very bad UX to ask them to use globalATenDispatch for some ops but the c10 torch::RegisterOperators for others, and have that list constantly change as we move ops to c10. We will, instead,
- change
torch::RegisterOperatorsregistration in a way that forwards registration toglobalATenDispatchfor the ops that don't haveuse_c10: True, using the same globalstd::unordered_setmentioned above. (Call aten ops through c10 dispatcher #23668) - migrate all extension backends to use
torch::RegisterOperatorsinstead ofglobalATenDispatch. After this, we can start moving ops to c10 without breaking backend extensions.
Additional concern: Benchmarks
Benchmarks show no relevant regression, see here: https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit?usp=sharing
Out of scope: Boxing
This will get us to a world where we have only one dispatcher, but the dispatcher has two kinds of operators: boxed and unboxed. We need to auto-generate wrappers in both directions so that boxed/unboxed kernels can be called using the unboxed/boxed API as well. This is, however, out of scope for this plan.
Out of scope: Getting rid of Codegen
While this plan will get rid of some codegen (namely register_aten_ops.cpp), most things codegen'ed today will still be codegen'ed after this. We're just planning to change the codegen'ed registrations to register the functions with a different dispatcher. Getting rid of codegen is out of scope and will potentially be tackled later on.
Out of scope: Redesigning autograd
There are some ideas on how autograd could be done better, for example the generic dispatch proposal currently in flow. This is orthogonal and out of scope for this plan. In this plan, we will handle autograd exactly as globalATenDispatch does it, i.e. each op can store one autograd kernel that will be called if is_variable() == True, just move that functionality to the c10 dispatcher.