Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 24, 2019

Stack from ghstack:

This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

  • Add two new testing tensor type ids, TESTING_ONLY_GenericWrapperTensorId and TESTING_ONLY_GenericModeTensorId, which our tests use. They might find other use in other tests if necessary.
  • Add support for toggling the availability of TESTING_ONLY_GenericModeTensorId. Introduces a new thread local variable accessible by tls_local_tensor_type_set() which is considered as part of dispatch.
  • The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
  • The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

  • getOperator is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
  • GenericWrapperTensorImpl doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
  • generic_wrapper_fallback should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using batch_norm.

Signed-off-by: Edward Z. Yang [email protected]

Differential Revision: D17549624

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 25, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: c52f6f9
Pull Request resolved: #26719
ezyang added a commit that referenced this pull request Sep 26, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 92d0ced
Pull Request resolved: #26719
ezyang added a commit that referenced this pull request Sep 26, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 3da260e
Pull Request resolved: #26719
ezyang added a commit that referenced this pull request Sep 27, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: d792770
Pull Request resolved: #26719
@ezyang ezyang changed the title Tests for fallback boxed dispatch. Tests for fallback boxed dispatch (including TLS mode) Sep 27, 2019
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_included_tensor_type_set()` which is considered as part of dispatch. (It is a little goofy that we now have to do two TLS accesses; I'm not too sure if there's a way to avoid this.)
* There is a screed about how modes (Profiling/Tracing) are different from hybrid wrapper-modes (Variable)
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 30, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 512eb4b
Pull Request resolved: #26719
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I have a few medium level comments about the API and use of thread locals.


#endif

void set_enabled_via_excluded(TensorTypeId tid, bool enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names of these functions are confusing to me. The caller still needs to understand that thefinal set will be (tensor_properties | included) - excluded but then there is also a second translation between how this function affects included/excluded. Wouldn't it make more sense to just have it follow the set operations directely, excluded.add, excluded.remove, included.add, included.remove.

C10_API void tls_variable_set_enabled(bool enabled);
C10_API TensorTypeSet tls_excluded_tensor_type_set();

C10_API bool TESTING_ONLY_tls_generic_mode_is_enabled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the best idea to have separate functions here for every key? I am concerned that over time non-simple operations on the tensor set can end up here, but I'd prefer the opposite: that the only thing you can do to a key is to enable it in a thread or in tensors properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solves my giant comment problem too.


// NB: Zero initialized!
thread_local uint64_t raw_excluded;
thread_local uint64_t raw_included;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used adjacent to each other in dispatch, so it would be more efficient if they were in a single thread_local struct. Otherwise there is a sequence of like 4 loads to get to each one. I would imagine this struct would grow into the generic thread-local dispatch state that the chain of responsibility would need to call super as well.

…ng TLS mode)"


This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_included_tensor_type_set()` which is considered as part of dispatch. (It is a little goofy that we now have to do two TLS accesses; I'm not too sure if there's a way to avoid this.)
* There is a screed about how modes (Profiling/Tracing) are different from hybrid wrapper-modes (Variable)
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Oct 2, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 1b16e64
Pull Request resolved: #26719
…ch (including TLS mode)"


This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. 
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624)

[ghstack-poisoned]
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. 
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Oct 3, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: e8b7d08
Pull Request resolved: #26719
bwasti added a commit that referenced this pull request Oct 3, 2019
This PR introduces a dirt simple registration API to allow custom Tensors.  The API exposed aims to avoid code complexity and compilation times.  It makes an explicit trade-off supporting developer productivity over compute efficiency.

Couple of reasons to have this:
- Tests are easier to write (see #26719)
- It addresses the limit for the 64 possible TensorTypeId's we can have with TensorTypeSet
- Drastically reduces mental overhead for folks working in research wanting to add a Tensor
- Allows entirely decoupled static registration for tensors defined in separate repos (planned usage in pytorch/sparse)





[ghstack-poisoned]
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. 
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D17549624](https://our.internmc.facebook.com/intern/diff/D17549624)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Oct 8, 2019
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 9e3ebf7
Pull Request resolved: #26719
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in d70f8dd.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 9, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26719

This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: D17549624

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/375/head branch October 28, 2019 22:11
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#26719

This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: D17549624

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: build Build system issues module: internals Related to internal abstractions in c10 and ATen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants