WIP: add __torch_function__ API override mechanism by rgommers · Pull Request #25629 · pytorch/pytorch

rgommers · 2019-09-04T04:27:21Z

This is still draft, the Python implementation is complete, but the C++ part still needs to be added.

This mechanism allows Tensor-like objects (including Tensor subclasses) to override torch functions with their own implementations.

Closes gh-24015 (see description of that issue for more details).

For a toy example, see the DiagonalTensor class in test/test_overrides.py. The __torch_function__ method and implements decorator there are what a package with a Tensor-like class should implement. It can then override a PyTorch function with its own function decorated with @implements(torch.<funcname>).

Performance of the current Python implementation of the override mechanism is O(1-2 us) overhead for regular use (a small number of input parameters to the function). Benchmark results:

$ asv run --python=same
· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_rgommers_anaconda3_envs_pytorch-gcc91_bin_python                                                                           
[  8.33%] ··· Running (bench_overrides.TorchFunction.time_mock_broadcast_tensors_duck--)......
[ 58.33%] ··· ....TorchFunction.time_mock_broadcast_tensors_duck           958±70ns
[ 66.67%] ··· ...TorchFunction.time_mock_broadcast_tensors_torch           962±90ns
[ 75.00%] ··· ...rrides.TorchFunction.time_mock_concatenate_duck        1.59±0.09μs
[ 83.33%] ··· ...rrides.TorchFunction.time_mock_concatenate_many           93.8±8μs
[ 91.67%] ··· ...rides.TorchFunction.time_mock_concatenate_mixed        2.66±0.02μs
[100.00%] ··· ...rides.TorchFunction.time_mock_concatenate_torch           990±10ns

As discussed in the Performance considerations section of gh-22402, the goal is that once the dispatch mechanism is moved to C++, the overhead when inputs are Tensor instances is zero, and the overhead for Tensor-like objects sub-microsecond.

This feature is inspired by and analogous to NumPy's __array_function__ protocol (see NumPy Enhancement Proposal 18).

This PR currently contains:

the Tensor.__torch_function__ method
tests for __torch_function__ behavior
overrides for unique, tensordot, lu, and broadcast_tensors (no complete coverage of even torch.functional, but enough to get a first idea)
benchmarks that measure the overhead for a range of scenarios, based on the airspeed velocity benchmarking framework
documentation for the override/dispatch mechanism as docstrings

Missing from the PR:

C++ implementation of the performance-critical parts of the dispatch-mechanism
C++ equivalent of torch_function_dispatch decorator to make a single function overridable
overrides for all public functions in the torch and torch.<public_submodule> namespaces
higher-level documentation for library authors and end users

Remove utility code that we can simply import from NumPy for now.

Things import again and can be tested.

Note, it does not get called normally (there's just a check it exists), the dispatcher calls the function implementation directly.

Manual check of current performance: ``` In [10]: %timeit mock_concatenate([Tensor(1), Tensor(2)]) 2.58 µs ± 7.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [11]: %timeit mock_broadcast_tensors(Tensor(1)) 1.65 µs ± 3.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Run with ASV: ``` $ asv run --python=same --dry-run · Discovering benchmarks · Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks) [ 0.00%] ·· Benchmarking existing-py_home_rgommers_anaconda3_envs_pytorch_bin_python [ 8.33%] ··· Running (bench_overrides.TorchFunction.time_mock_broadcast_tensors_duck--)...... [ 58.33%] ··· ...orchFunction.time_mock_broadcast_tensors_duck 793±5ns [ 66.67%] ··· ...rchFunction.time_mock_broadcast_tensors_torch 867±70ns [ 75.00%] ··· ...ides.TorchFunction.time_mock_concatenate_duck 1.44±0.1μs [ 83.33%] ··· ...ides.TorchFunction.time_mock_concatenate_many 86.2±7μs [ 91.67%] ··· ...des.TorchFunction.time_mock_concatenate_mixed 2.33±0.01μs [100.00%] ··· ...des.TorchFunction.time_mock_concatenate_torch 902±9ns ``` So performance is as expected for a pure-Python implementation.

return.

That behavior of forwarding the sum() function to a sum() method is specific to NumPy.

The removed test checked handling of >32 input parameters. NumPy limits this to 32, with NPY_MAXARGS. PyTorch doesn't have that limitation.

This way, the ASV benchmarks can be run on master. Individual benchmarks will fail, but not the ASV run itself. This is an ASV feature; you can go back in time and run a benchmark suite on older commits.

ezyang · 2019-09-04T12:18:44Z

cc @cpuhrsch

rgommers · 2019-09-04T18:52:56Z

Hi @cpuhrsch, I just read your NestedTensor RFC 0.0.2 and I guess @ezyang Cc'd you because of this Prototype Dispatch section in that document:

We define an explicit monkey_patch function within torch.nested that requires an explicit user opt-in to use this prototype. The dispatch mechanism is implemented via isinstance and used to overwrite some of the existing torch functions. The explicit opt-in is deemed necessary because a) some torch functions might slightly differ in signature b) the dispatch mechanism is very slow.

I still have to read the other NestedTensor docs and code. It'd be great to have a chat soon to make sure this __torch_function__ is useful for your design. Also I'd like to understand why you need signatures that differ, and perhaps include those as test cases here.

ezyang · 2019-09-04T19:37:16Z

benchmarks/README.md

+-----
+
+Airspeed Velocity manages building and Python virtualenvs or conda envs by
+itself, unless told otherwise (e.g. with `--python=same`).


A feature that I've always found extremely irritating about asv XD

Yep, the default is quite annoying. It makes sense for CI or running it on a dedicated server, but not for typical use when you're developing. I've never complained about it, but maybe I should go open an issue now (EDIT: I did do that).

Fixed now in ASV master, asv dev does this now. (airspeed-velocity/asv#872)

ezyang · 2019-09-04T19:48:34Z

benchmarks/asv.conf.json

@@ -0,0 +1,85 @@
+{
+    // The version of the config file format.  Do not change, unless
+    // you know what you are doing.


MY EYEEEESS

ezyang · 2019-09-04T19:49:11Z

cc @apaszke since there are some benchmark bits here

cpuhrsch · 2019-09-04T19:49:44Z

@rgommers: Thanks for reading the RFC!

Some torch functions that are currently implemented for torch.Tensor might require various extensions when used with NestedTensor. For example, we might want torch.narrow(input, dim, start, length) to accept tuples for dim, start, length if the input is a NestedTensor. In essence, if using a NestedTensor the semantics of an operation generalize which might slightly change the way we want to parse or check the arguments. We could have torch.nested_narrow of course, but that'll quickly yield to a massive API surface for small one-off changes.

Let's get together and talk about this in person whenever you want!

ezyang · 2019-09-04T19:50:03Z

benchmarks/benchmarks/__init__.py

@@ -0,0 +1 @@
+from __future__ import absolute_import, division, print_function


In the terminal PR, we'll probably ask you to move these benchmarks to https://github.com/pytorch/benchmark since it makes it easier to run benchmarks across versions if they live out of line.

Ah, that's where they live - I was wondering why there were so few in this repo. Sounds good to move these at the end.

ezyang · 2019-09-04T19:54:17Z

@cpuhrsch I believe this is fine, because __torch_function__ interposes prior to argument parsing.

ezyang · 2019-09-04T19:56:49Z

torch/_overrides.py

+        # We only collect arguments if they have a unique type, which ensures
+        # reasonable performance even with a long list of possibly overloaded
+        # arguments.
+        if (arg_type not in overloaded_types and


I always find it interesting when an O(n) lookup is used over O(1). overloaded_types is probably always small so this should not be a problem.

(from reading below: order matters!)

ezyang · 2019-09-04T20:06:36Z

torch/_overrides.py

+        # exec. This version has the advantage of giving the helper function a
+        # more interpretable name. Otherwise, the original function does not
+        # show up at all in many cases, e.g., if it's written in C++ or if the
+        # dispatcher gets an invalid keyword argument.


Hmm. Does overriding __repr__ on the decorator type work?

There's a test that Tensor.__repr__ works as expected, and the DiagonalTensor test has a custom __repr__. And overriding:

In [1]: import torch In [2]: torch.unique.__repr__() Out[2]: '<function unique at 0x7fb935427378>' In [3]: torch.unique.__repr__ = lambda : 'aahhhh' In [4]: torch.unique.__repr__() Out[4]: 'aahhhh' In [5]: torch.unique is torch.unique._implementation # checking the decorator was active Out[5]: False

ezyang · 2019-09-04T20:08:07Z

torch/_overrides.py

+
+def implement_torch_function(
+        implementation, public_api, relevant_args, args, kwargs):
+    """Implement a function with checks for __torch_function__ overrides.


Though not a decorator, intruigingly.

torch_function_dispatch is the decorator. this generates the function that that decorator returns, and is used from within torch_function_dispatch only.

ezyang · 2019-09-04T20:20:04Z

torch/_overrides.py

+    # (directly or with subclasses that do not override __torch_function__).
+    if (not overloaded_args or types == _TENSOR_ONLY or
+            all(type(arg).__torch_function__ is _TORCH_FUNCTION
+                for arg in overloaded_args)):


Hmm. I suppose the benchmarks for the C++ will eventually show up, but the short cut doesn't seem all that short-cutty to me. In particular, get_overloaded_types_and_args will treat Tensor as an "overload", so if __torch_function__ is defined on Tensor (which it is), then you will never actually be in a situation where overloaded_args is falsish. (If this code is just meant to be semantics, as opposed to code that will be directly transliterated to C++, you can disregard this comment)

You're right, this is not meant for direct translation to C++. There we want to hook in only after the check that inputs are tensor instances. In the torch.functional Python functions there's no such check though, so the inputs are checked upfront.

rgommers · 2019-09-05T04:17:31Z

@cpuhrsch I believe this is fine, because __torch_function__ interposes prior to argument parsing.

Indeed, there's no fundamental problem there, mismatching signatures can be handled. It sounds though like the signatures match but the semantics change, at least in the case of torch.narrow.

Some torch functions that are currently implemented for torch.Tensor might require various extensions when used with NestedTensor. For example, we might want torch.narrow(input, dim, start, length) to accept tuples for dim, start, length if the input is a NestedTensor. In essence, if using a NestedTensor the semantics of an operation generalize which might slightly change the way we want to parse or check the arguments. We could have torch.nested_narrow of course, but that'll quickly yield to a massive API surface for small one-off changes.

Generalized semantics sounds like a good thing. As long as the semantics for inputs that also work with torch.Tensor don't change to the extent that correctness of code that works with both isn't guaranteed (probably obvious, but I won't forget the damage that numpy.matrix has done easily ...).

Let's get together and talk about this in person whenever you want!

Are you on the PyTorch Slack? I'm Ralf Gommers there. Or otherwise [email protected].

rgommers · 2019-09-05T05:08:26Z

The caffe2-py2-devtoolset7-rocmrpm-centos7.5-test CI failure is due to that config using Python < 2.7.9, and exec having a bug there (https://bugs.python.org/issue21591). I have a fix or workaround for that one.

The other failure is:

FAIL: test_unique (__main__.TestOperators)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/onnx/test_operators.py", line 729, in test_unique
    opset_version=11)
  File "test/onnx/test_operators.py", line 67, in assertONNX
    self.assertExpected(onnx_model_pbtxt, subname)
  File "/home/rgommers/code/pytorch/test/common_utils.py", line 901, in assertExpected
    self.assertMultiLineEqual(expected, s)
AssertionError: 'ir_v[88 chars]ut: "x"\n    output: "1"\n    output: "2"\n   [1164 chars]n}\n' != 'ir_v[88 chars]ut: "0"\n    output: "1"\n    output: "2"\n   [1164 chars]n}\n'
  ir_version: 4
  producer_name: "pytorch"
  producer_version: "1.2"
  graph {
    node {
-     input: "x"
?             ^
+     input: "0"
?             ^
      output: "1"
      output: "2"
      output: "3"
      output: "4"
      op_type: "Unique"
      attribute {
        name: "axis"
        i: 0
        type: INT
      }
      attribute {
        name: "sorted"
        i: 1
        type: INT
      }
    }
    name: "torch-jit-export"
    input {
-     name: "x"
?            ^
+     name: "0"
?            ^
      type {
        tensor_type {

I have the impression that the new version is actually more consistent (if outputs are labelled "1", "2", etc., it makes sense for the input positional parameter to be labelled "0" rather than "x"). It's fixable by regenerating test/onnx/expect/, but I'm not sure if that's desired here?

Docstring and signature of unique are unchanged, as is introspection result:

>>> inspect.signature(torch.unique)                                                
<Signature (input, sorted=True, return_inverse=False, return_counts=False, dim=None)>

So it looks like something subtle in how the ONNX output is generated.

ezyang · 2019-09-05T14:48:09Z

You can just accept the new output.

ezyang · 2019-09-05T21:13:45Z

torch/_overrides.py

+    """
+    def decorator(implementation):
+        if verify:
+            verify_matching_signatures(implementation, dispatcher)


Nice, I would not have thought to implement this.

Thanks. I can't take the credit for that one though, stolen from NumPy:)

This change seems to be due to regenerating the ONNX export now that unique() was decorated with `torch_function_dispatch`. The same will need to be done for other expected values once we add overrides to them.

Most py27 CI builds passed, but one failed with: ``` SyntaxError: unqualified exec is not allowed in function 'decorator' it is a nested function (_overrides.py, line 231) ``` This is https://bugs.python.org/issue21591, which was fixed in Python 2.7.9, looks like `caffe2-py2-devtoolset7-rocmrpm-centos7.5-test` uses an older version.

rgommers · 2019-09-21T03:56:28Z

@pytorchbot rebase this please

pytorchbot · 2019-09-21T03:56:35Z

Sorry, I can't merge this because there are conflicts. To merge this yourself, run the commands below:

git fetch origin master
git fetch [email protected]:Quansight/pytorch.git torch_function
git checkout FETCH_HEAD
git merge origin/master
git push [email protected]:Quansight/pytorch.git HEAD:torch_function

(To learn more about this bot, see Bot commands.)

jph00 · 2019-09-26T20:55:26Z

@rgommers we're very interested in this work for stuff we're writing for fastai v2 - do you have a sense of how far this is from being merged and available in nightlies?

ezyang · 2019-09-27T03:46:02Z

This branch is not merged yet, so it's not available in nightlies.

rgommers · 2019-09-27T08:59:48Z

@jph00 moving this into C++ is getting there; it's taken a little longer than expected because touching the central signature/argument parsing and codegen is tricky. But I think we have something that works now for all functions that go through the tools/autograd/gen_python_functions.py machinery (which is most of what we need).

Timeline wise I hope to update this PR next week and have it merged within 3-4 weeks.

we're very interested in this work for stuff we're writing for fastai v2

That sounds very interesting. I'd love to make sure that what we do covers your needs for fastai v2 and is in time. I'll comment on gh-22402 in more detail.

rgommers · 2019-10-24T15:27:01Z

Continued in gh-27064, which is ready for review/testing. So closing this PR.

prasunanand and others added 30 commits September 3, 2019 16:51

first try

e966598

modify dispatcher

cda2057

signatures matched

9f710a4

Fix mixed tabs/spaces

fee5ef1

Add implement_torch_function (in Python) implementation.

1bc13fd

Move code from torch/__init__.py to torch/_overrides.py

f9ea2ac

Remove utility code that we can simply import from NumPy for now.

Remove TORCH_FUNCTION_ENABLED, this is an env var we don't need

5965063

Fix flake8 warnings

6ea9718

Add TODO for temporary addition to torch/__init__.py

8982470

Add some imports, comments, and dummy __torch_function__

04af379

Things import again and can be tested.

Implement __torch_function__ in Python.

4c05cfc

Note, it does not get called normally (there's just a check it exists), the dispatcher calls the function implementation directly.

Add an example of using the override for gemm in test/test_overrides.py

8e80113

Add some documentation for writing and running ASV benchmarks

c3e8731

Add a few overloads, and adds docs on what dispatcher functions should

3753da7

return.

Another documentation tweak.

da6ed8e

adopt tests from numpy

47f4703

modify the unittests and mark a few of them to be skipped

fb8986e

correct type order for subclass tests

fce2ef8

Remove assert_ again in favor of plain assert

cb22ee1

Remove TORCH_FUNCTION_ENABLED, it wasn't doing anything.

3a35770

Remove torch.gemm, and change test to use torch.unique

70e9e29

Fix __torch_function__ subclass ordering test

c361848

Remove irrelevant test for __torch_function__

a70d13b

That behavior of forwarding the sum() function to a sum() method is specific to NumPy.

Fix a couple more __torch_function__ tests.

e0fedb9

Fix an issue with subclasses for __torch_function__.

20e535e

Fix one more test, and remove an unnecessary test

b757e11

The removed test checked handling of >32 input parameters. NumPy limits this to 32, with NPY_MAXARGS. PyTorch doesn't have that limitation.

Fix last failing test for __torch_function__.

26885eb

Treat imports properly in override benchmarks.

fb71ad0

This way, the ASV benchmarks can be run on master. Individual benchmarks will fail, but not the ASV run itself. This is an ASV feature; you can go back in time and run a benchmark suite on older commits.

Fix some typos in comments

58de6bb

ezyang reviewed Sep 4, 2019

View reviewed changes

ezyang reviewed Sep 5, 2019

View reviewed changes

rgommers added 2 commits September 5, 2019 16:22

Update expected value for ONNX export for unique.

d611561

This change seems to be due to regenerating the ONNX export now that unique() was decorated with `torch_function_dispatch`. The same will need to be done for other expected values once we add overrides to them.

pytorchbot added the module: onnx Related to torch.onnx label Sep 5, 2019

rgommers added the open source label Sep 6, 2019

rgommers mentioned this pull request Sep 27, 2019

PyTorch Tensor subclasses and protocols for NumPy interoperability #22402

Open

prasunanand mentioned this pull request Sep 30, 2019

add __torch_function__ API override mechanism #27064

Closed

fmassa mentioned this pull request Oct 2, 2019

[RFC] Abstractions for segmentation / detection transforms pytorch/vision#1406

Open

rgommers closed this Oct 24, 2019

		@@ -0,0 +1 @@
		from __future__ import absolute_import, division, print_function

Conversation

rgommers commented Sep 4, 2019

Uh oh!

ezyang commented Sep 4, 2019

Uh oh!

rgommers commented Sep 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers Sep 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Sep 4, 2019

Uh oh!

cpuhrsch commented Sep 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Sep 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Sep 5, 2019

Uh oh!

rgommers commented Sep 5, 2019

Uh oh!

ezyang commented Sep 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Sep 21, 2019

Uh oh!

pytorchbot commented Sep 21, 2019

Uh oh!

jph00 commented Sep 26, 2019

Uh oh!

ezyang commented Sep 27, 2019

Uh oh!

rgommers commented Sep 27, 2019

Uh oh!

rgommers commented Oct 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rgommers Sep 4, 2019 •

edited

Loading

rgommers Sep 7, 2019 •

edited

Loading