-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Tests for fallback boxed dispatch (including TLS mode) #26719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 6d0ec71 Pull Request resolved: #26719
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: c52f6f9 Pull Request resolved: #26719
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 92d0ced Pull Request resolved: #26719
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 3da260e Pull Request resolved: #26719
…dispatch." Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: d792770 Pull Request resolved: #26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_included_tensor_type_set()` which is considered as part of dispatch. (It is a little goofy that we now have to do two TLS accesses; I'm not too sure if there's a way to avoid this.) * There is a screed about how modes (Profiling/Tracing) are different from hybrid wrapper-modes (Variable) * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 512eb4b Pull Request resolved: #26719
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. I have a few medium level comments about the API and use of thread locals.
c10/core/impl/LocalTensorTypeSet.cpp
Outdated
|
|
||
| #endif | ||
|
|
||
| void set_enabled_via_excluded(TensorTypeId tid, bool enabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names of these functions are confusing to me. The caller still needs to understand that thefinal set will be (tensor_properties | included) - excluded but then there is also a second translation between how this function affects included/excluded. Wouldn't it make more sense to just have it follow the set operations directely, excluded.add, excluded.remove, included.add, included.remove.
c10/core/impl/LocalTensorTypeSet.h
Outdated
| C10_API void tls_variable_set_enabled(bool enabled); | ||
| C10_API TensorTypeSet tls_excluded_tensor_type_set(); | ||
|
|
||
| C10_API bool TESTING_ONLY_tls_generic_mode_is_enabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the best idea to have separate functions here for every key? I am concerned that over time non-simple operations on the tensor set can end up here, but I'd prefer the opposite: that the only thing you can do to a key is to enable it in a thread or in tensors properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solves my giant comment problem too.
c10/core/impl/LocalTensorTypeSet.cpp
Outdated
|
|
||
| // NB: Zero initialized! | ||
| thread_local uint64_t raw_excluded; | ||
| thread_local uint64_t raw_included; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used adjacent to each other in dispatch, so it would be more efficient if they were in a single thread_local struct. Otherwise there is a sequence of like 4 loads to get to each one. I would imagine this struct would grow into the generic thread-local dispatch state that the chain of responsibility would need to call super as well.
…ng TLS mode)" This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_included_tensor_type_set()` which is considered as part of dispatch. (It is a little goofy that we now have to do two TLS accesses; I'm not too sure if there's a way to avoid this.) * There is a screed about how modes (Profiling/Tracing) are different from hybrid wrapper-modes (Variable) * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 1b16e64 Pull Request resolved: #26719
…ch (including TLS mode)" This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: e8b7d08 Pull Request resolved: #26719
This PR introduces a dirt simple registration API to allow custom Tensors. The API exposed aims to avoid code complexity and compilation times. It makes an explicit trade-off supporting developer productivity over compute efficiency. Couple of reasons to have this: - Tests are easier to write (see #26719) - It addresses the limit for the 64 possible TensorTypeId's we can have with TensorTypeSet - Drastically reduces mental overhead for folks working in research wanting to add a Tensor - Allows entirely decoupled static registration for tensors defined in separate repos (planned usage in pytorch/sparse) [ghstack-poisoned]
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624) [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 9e3ebf7 Pull Request resolved: #26719
Summary: Pull Request resolved: pytorch/pytorch#26719 This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: D17549624 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
Summary: Pull Request resolved: pytorch#26719 This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: D17549624 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
Stack from ghstack:
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
TESTING_ONLY_GenericWrapperTensorIdandTESTING_ONLY_GenericModeTensorId, which our tests use. They might find other use in other tests if necessary.TESTING_ONLY_GenericModeTensorId. Introduces a new thread local variable accessible bytls_local_tensor_type_set()which is considered as part of dispatch.The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
getOperatoris horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.GenericWrapperTensorImpldoesn't populate all of its fields accurately. Most notably, size is not setup correctly.generic_wrapper_fallbackshould handle tensor lists in arguments and returns properly.One pitfall: fallback dispatch only works with non-c10 code. That's why I test using
batch_norm.Signed-off-by: Edward Z. Yang [email protected]
Differential Revision: D17549624