WIP: add __torch_function__ API override mechanism#25629
WIP: add __torch_function__ API override mechanism#25629rgommers wants to merge 35 commits intopytorch:masterfrom
Conversation
Remove utility code that we can simply import from NumPy for now.
Things import again and can be tested.
Note, it does not get called normally (there's just a check it exists), the dispatcher calls the function implementation directly.
Manual check of current performance: ``` In [10]: %timeit mock_concatenate([Tensor(1), Tensor(2)]) 2.58 µs ± 7.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [11]: %timeit mock_broadcast_tensors(Tensor(1)) 1.65 µs ± 3.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Run with ASV: ``` $ asv run --python=same --dry-run · Discovering benchmarks · Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks) [ 0.00%] ·· Benchmarking existing-py_home_rgommers_anaconda3_envs_pytorch_bin_python [ 8.33%] ··· Running (bench_overrides.TorchFunction.time_mock_broadcast_tensors_duck--)...... [ 58.33%] ··· ...orchFunction.time_mock_broadcast_tensors_duck 793±5ns [ 66.67%] ··· ...rchFunction.time_mock_broadcast_tensors_torch 867±70ns [ 75.00%] ··· ...ides.TorchFunction.time_mock_concatenate_duck 1.44±0.1μs [ 83.33%] ··· ...ides.TorchFunction.time_mock_concatenate_many 86.2±7μs [ 91.67%] ··· ...des.TorchFunction.time_mock_concatenate_mixed 2.33±0.01μs [100.00%] ··· ...des.TorchFunction.time_mock_concatenate_torch 902±9ns ``` So performance is as expected for a pure-Python implementation.
That behavior of forwarding the sum() function to a sum() method is specific to NumPy.
The removed test checked handling of >32 input parameters. NumPy limits this to 32, with NPY_MAXARGS. PyTorch doesn't have that limitation.
This way, the ASV benchmarks can be run on master. Individual benchmarks will fail, but not the ASV run itself. This is an ASV feature; you can go back in time and run a benchmark suite on older commits.
|
cc @cpuhrsch |
|
Hi @cpuhrsch, I just read your NestedTensor RFC 0.0.2 and I guess @ezyang Cc'd you because of this Prototype Dispatch section in that document: We define an explicit I still have to read the other |
| ----- | ||
|
|
||
| Airspeed Velocity manages building and Python virtualenvs or conda envs by | ||
| itself, unless told otherwise (e.g. with `--python=same`). |
There was a problem hiding this comment.
A feature that I've always found extremely irritating about asv XD
There was a problem hiding this comment.
Yep, the default is quite annoying. It makes sense for CI or running it on a dedicated server, but not for typical use when you're developing. I've never complained about it, but maybe I should go open an issue now (EDIT: I did do that).
There was a problem hiding this comment.
Fixed now in ASV master, asv dev does this now. (airspeed-velocity/asv#872)
| @@ -0,0 +1,85 @@ | |||
| { | |||
| // The version of the config file format. Do not change, unless | |||
| // you know what you are doing. | |||
|
cc @apaszke since there are some benchmark bits here |
|
@rgommers: Thanks for reading the RFC! Some torch functions that are currently implemented for torch.Tensor might require various extensions when used with NestedTensor. For example, we might want torch.narrow(input, dim, start, length) to accept tuples for dim, start, length if the input is a NestedTensor. In essence, if using a NestedTensor the semantics of an operation generalize which might slightly change the way we want to parse or check the arguments. We could have torch.nested_narrow of course, but that'll quickly yield to a massive API surface for small one-off changes. Let's get together and talk about this in person whenever you want! |
| @@ -0,0 +1 @@ | |||
| from __future__ import absolute_import, division, print_function | |||
There was a problem hiding this comment.
In the terminal PR, we'll probably ask you to move these benchmarks to https://github.com/pytorch/benchmark since it makes it easier to run benchmarks across versions if they live out of line.
There was a problem hiding this comment.
Ah, that's where they live - I was wondering why there were so few in this repo. Sounds good to move these at the end.
|
@cpuhrsch I believe this is fine, because |
| # We only collect arguments if they have a unique type, which ensures | ||
| # reasonable performance even with a long list of possibly overloaded | ||
| # arguments. | ||
| if (arg_type not in overloaded_types and |
There was a problem hiding this comment.
I always find it interesting when an O(n) lookup is used over O(1). overloaded_types is probably always small so this should not be a problem.
There was a problem hiding this comment.
(from reading below: order matters!)
| # exec. This version has the advantage of giving the helper function a | ||
| # more interpretable name. Otherwise, the original function does not | ||
| # show up at all in many cases, e.g., if it's written in C++ or if the | ||
| # dispatcher gets an invalid keyword argument. |
There was a problem hiding this comment.
Hmm. Does overriding __repr__ on the decorator type work?
There was a problem hiding this comment.
There's a test that Tensor.__repr__ works as expected, and the DiagonalTensor test has a custom __repr__. And overriding:
In [1]: import torch
In [2]: torch.unique.__repr__()
Out[2]: '<function unique at 0x7fb935427378>'
In [3]: torch.unique.__repr__ = lambda : 'aahhhh'
In [4]: torch.unique.__repr__()
Out[4]: 'aahhhh'
In [5]: torch.unique is torch.unique._implementation # checking the decorator was active
Out[5]: False
|
|
||
| def implement_torch_function( | ||
| implementation, public_api, relevant_args, args, kwargs): | ||
| """Implement a function with checks for __torch_function__ overrides. |
There was a problem hiding this comment.
Though not a decorator, intruigingly.
There was a problem hiding this comment.
torch_function_dispatch is the decorator. this generates the function that that decorator returns, and is used from within torch_function_dispatch only.
| # (directly or with subclasses that do not override __torch_function__). | ||
| if (not overloaded_args or types == _TENSOR_ONLY or | ||
| all(type(arg).__torch_function__ is _TORCH_FUNCTION | ||
| for arg in overloaded_args)): |
There was a problem hiding this comment.
Hmm. I suppose the benchmarks for the C++ will eventually show up, but the short cut doesn't seem all that short-cutty to me. In particular, get_overloaded_types_and_args will treat Tensor as an "overload", so if __torch_function__ is defined on Tensor (which it is), then you will never actually be in a situation where overloaded_args is falsish. (If this code is just meant to be semantics, as opposed to code that will be directly transliterated to C++, you can disregard this comment)
There was a problem hiding this comment.
You're right, this is not meant for direct translation to C++. There we want to hook in only after the check that inputs are tensor instances. In the torch.functional Python functions there's no such check though, so the inputs are checked upfront.
Indeed, there's no fundamental problem there, mismatching signatures can be handled. It sounds though like the signatures match but the semantics change, at least in the case of
Generalized semantics sounds like a good thing. As long as the semantics for inputs that also work with
Are you on the PyTorch Slack? I'm |
|
The The other failure is: I have the impression that the new version is actually more consistent (if outputs are labelled "1", "2", etc., it makes sense for the input positional parameter to be labelled "0" rather than "x"). It's fixable by regenerating Docstring and signature of So it looks like something subtle in how the ONNX output is generated. |
|
You can just accept the new output. |
| """ | ||
| def decorator(implementation): | ||
| if verify: | ||
| verify_matching_signatures(implementation, dispatcher) |
There was a problem hiding this comment.
Nice, I would not have thought to implement this.
There was a problem hiding this comment.
Thanks. I can't take the credit for that one though, stolen from NumPy:)
This change seems to be due to regenerating the ONNX export now that unique() was decorated with `torch_function_dispatch`. The same will need to be done for other expected values once we add overrides to them.
Most py27 CI builds passed, but one failed with: ``` SyntaxError: unqualified exec is not allowed in function 'decorator' it is a nested function (_overrides.py, line 231) ``` This is https://bugs.python.org/issue21591, which was fixed in Python 2.7.9, looks like `caffe2-py2-devtoolset7-rocmrpm-centos7.5-test` uses an older version.
|
@pytorchbot rebase this please |
|
Sorry, I can't merge this because there are conflicts. To merge this yourself, run the commands below: (To learn more about this bot, see Bot commands.) |
|
@rgommers we're very interested in this work for stuff we're writing for fastai v2 - do you have a sense of how far this is from being merged and available in nightlies? |
|
This branch is not merged yet, so it's not available in nightlies. |
|
@jph00 moving this into C++ is getting there; it's taken a little longer than expected because touching the central signature/argument parsing and codegen is tricky. But I think we have something that works now for all functions that go through the Timeline wise I hope to update this PR next week and have it merged within 3-4 weeks.
That sounds very interesting. I'd love to make sure that what we do covers your needs for fastai v2 and is in time. I'll comment on gh-22402 in more detail. |
|
Continued in gh-27064, which is ready for review/testing. So closing this PR. |
This is still draft, the Python implementation is complete, but the C++ part still needs to be added.
This mechanism allows
Tensor-like objects (includingTensorsubclasses) to overridetorchfunctions with their own implementations.Closes gh-24015 (see description of that issue for more details).
For a toy example, see the
DiagonalTensorclass intest/test_overrides.py. The__torch_function__method andimplementsdecorator there are what a package with aTensor-like class should implement. It can then override a PyTorch function with its own function decorated with@implements(torch.<funcname>).Performance of the current Python implementation of the override mechanism is O(1-2 us) overhead for regular use (a small number of input parameters to the function). Benchmark results:
As discussed in the Performance considerations section of gh-22402, the goal is that once the dispatch mechanism is moved to C++, the overhead when inputs are
Tensorinstances is zero, and the overhead forTensor-like objects sub-microsecond.This feature is inspired by and analogous to NumPy's
__array_function__protocol (see NumPy Enhancement Proposal 18).This PR currently contains:
Tensor.__torch_function__method__torch_function__behaviorunique,tensordot,lu, andbroadcast_tensors(no complete coverage of eventorch.functional, but enough to get a first idea)Missing from the PR:
torch_function_dispatchdecorator to make a single function overridabletorchandtorch.<public_submodule>namespaces