Port THS to ATen. #8409

ezyang · 2018-06-13T02:00:44Z

The basic structure of the patch:

All kernels in aten/src/THS got rewritten as native
functions in aten/src/ATen/native/sparse

I took the liberty to rename some of the kernels,
opting for a longer, more transparent names than
things like 'spaddcmul'.
Instead of holding fields for sparse tensor in the TH
C struct THSTensor, they are now held in a C++ class
SparseTensorImpl (this explains why I had to do this
all in one go; I can't have two reps for sparse
tensors!)

Along the way, we change a key internal representation
invariant: an "empty" sparse tensor has dimI == 1 and
dimV == 0 (this is different from dimI == 0 and dimV == 0
we had before); this ensures that we maintain the invariant
that dim == dimI + dimV. "Scalar" sparse tensors are
made illegal, because there really is no way to properly
express them in COO format.
Because we haven't ported THCS or any of the traditional
dense TH implementations, there is a new set of adapter
functions in native/LegacyBridge.cpp exclusively devoted
to deciding whether or not to go to the new native implementation
or back to the legacy TH binding (prefixed with th_).
The intent is that when everything gets ported, we can
delete this file.
I've kept the stubs for all the THS functions, but they now all
error if you try to actually call them. Eventually, we should
replace these with calls to ATen so that everything keeps
working.
I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
it says on the tin.
axpy templated function moved to BlasUtils.h, there's a new macro
which lets you easily forward to all of the TH functions. We also expose
THBlas_(copy). I'm not terribly pleased with these functions but
they seem to serve a purpose they need.
New method on Tensor to get TensorImpl*, unsafeGetTensorImpl
accessor() is now this-const, since const-correctness on Tensor is a lie
New toSparse()/toDense() methods on Type; now you can call these
directly without having to manually apply at::toSparse/toDense
on the Backend and then running toBackend yourself.

Changes to the kernels:

Previously, the whole body of all kernels was compiled for
every supported scalar type. In our new implementation,
the scalar dispatch has been pushed into the smallest extent
which (1) is not in a type loop and (2) requires statically
knowing the scalar type. These sites all use
AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as
possible, but sometimes it was not possible when a OpenMP
pragma was used.
Anywhere we tested if the nDimension of a tensor was zero,
we replaced with a test that numel is zero. Because, as we
known, nDimension of zero-size tensors in TH is zero, and
that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

Places where previously fastget1d was used, I now use a
TensorAccessor. However, you have to be careful about grabbing
the accessor, because sometimes you will be accessor'ing
indices/values and they are empty, which means they will
be 1D ("oh, aren't indices always 2D?" Nope. Nyet.)
So, essentially, it is only safe to grab an accessor after
you have checked that nnz != 0. All of these shenanigans
will go away when we properly support zero-size dimensions.

A few places, we test for this case just by wrapping the loop
in a conditional on nnz. Some other places this is not so easy,
so we instead short-circuit the function with a special case for
when nnz == 0 (usually, these implementations are degenerate).
There is a very subtle but important difference between
_sparse_get_impl(self)->indices() and self._indices();
the latter may return a view! This is because nnz is
not guaranteed to match the dimensions of indices/values;
you can "truncate" a sparse tensor by setting the nnz.
Actually, I think this is not a good idea and we should
enforce a stronger invariant, but for this patch I slavishly
adhere to the old ways, and as such I have to be very
careful if I want to resize something, I had better use
the former and not the latter.
I had to reimplement broadcasting by hand (thus the s_
and non-s_ functions in the sparse native files). There
is a very important distinction between foo_out and foo_,
so it is important that the LegacyBridge function always
call to the lower layer, and not try to avoid boilerplate
by calling to another LegacyBridge function first.
I did NOT put broadcasting in LegacyBridge (even though,
ultimately, that's where it must live), because the th_
functions which are invoked from LegacyBridge handle
broadcasting themselves, and I don't want to broadcast
twice.
Sparse function MUST explicitly specify the Type they
dispatch from, otherwise Variable wrapping/unwrapping will
not work correctly. If you use _get_sparse_impl, that is
sufficient to levy this requirement.
The "has native" tests in LegacyBridge.cpp are not 100%,
because some of the functions are mixed dense-sparse functions,
and so you can't just say, "Oh, if it's sparse and CPU, call
the native sparse implementation." This is handled on a
case by case basis. There is some especially complex
logic for add(), which has dense-dense, sparse-sparse
and dense-sparse implementations.
I added some uses of SparseTensorRef in native_functions.yaml,
but you will notice that these are all on native_* functions,
and not the actual, top-level functions. So the SparseTensorRef
is purely documentary (helping you not call the wrong overload)
but there is no magic; we do the wrapping ourselves the hard
way. (This is in constrast to the TH binding code which is magical.)
Except for _sparse_mask; _sparse_mask is magical.
There is a raw_copy_sparse_ method, which is really my way of
getting around the fact that copy_ has never been implemented
for sparse tensors (even before this patch), but there IS a
super secret, internal way of doing these copies that the THS
code used, and which I needed to get my hands on when I did this
port. We should refactor so that either (a) copy_ does support
sparse-sparse copy natively, or (b) we do this other ways.
Irritatingly, I must explicitly resize_as_ before copy_ into
a tensor. This was not the case with THTensor_(copy) but I don't
have any direct binding that doesn't have this requirement.
For some reason, the sparse tensor constructor accepts a scalar
tensor for the values tensor. This is kind of weird because
you always need an nnz-dimension. However, the old code supported
this and just expanded it into a 1D size 0 tensor; so we need some
explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise. I added them all when I was debugging and was
loathe to remove them.

Signed-off-by: Edward Z. Yang [email protected]

TODO: I claim that I have "error"ed out all of the old THS functions, but this has not actually been done yet except for one function.

ezyang · 2018-06-13T03:55:46Z

I know why the broadcast tests are failing (the broadcast calls are behind a dispatch: SparseCPU, so they aren't interpreted as part pieces I need to autograd through) but there is some special casing for this situation in TH which I can't seem to quite replicate when some overloads are native while others are TH. I'll try to work it out tomorrow.

ezyang · 2018-06-13T17:55:49Z

There are some very uncomfortable hacks in the last set of commits. Please review with care.

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/ScalarType.h

aten/src/ATen/SparseTensorImpl.cpp

aten/src/ATen/SparseTensorImpl.h

aten/src/ATen/SparseTensorImpl.cpp

aten/src/ATen/SparseTensorImpl.h

aten/src/ATen/copy_wrapper.py

aten/src/ATen/native/BlasUtils.h

aten/src/ATen/native/LegacyBridge.cpp

aten/src/ATen/native/README.md

aten/src/THS/generic/THSTensor.cpp

test/test_jit.py

tools/autograd/gen_python_functions.py

tools/autograd/gen_variable_type.py

The basic structure of the patch: - All kernels in aten/src/THS got rewritten as native functions in aten/src/ATen/native/sparse I took the liberty to rename some of the kernels, opting for a longer, more transparent names than things like 'spaddcmul'. - Instead of holding fields for sparse tensor in the TH C struct THSTensor, they are now held in a C++ class SparseTensorImpl (this explains why I had to do this all in one go; I can't have *two* reps for sparse tensors!) Along the way, we change a key internal representation invariant: an "empty" sparse tensor has dimI == 1 and dimV == 0 (this is different from dimI == 0 and dimV == 0 we had before); this ensures that we maintain the invariant that dim == dimI + dimV. "Scalar" sparse tensors are made illegal, because there really is no way to properly express them in COO format. - Because we haven't ported THCS or any of the traditional dense TH implementations, there is a new set of adapter functions in native/LegacyBridge.cpp exclusively devoted to deciding whether or not to go to the new native implementation or back to the legacy TH binding (prefixed with th_). The intent is that when everything gets ported, we can delete this file. - I've kept the stubs for all the THS functions, but they now all error if you try to actually call them. Eventually, we should replace these with calls to ATen so that everything keeps working. - I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty. There are some miscellaneous improvements which were needed for other changes in this patch: - There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what it says on the tin. - axpy templated function moved to BlasUtils.h, there's a new macro which lets you easily forward to all of the TH functions. We also expose THBlas_(copy). I'm not terribly pleased with these functions but they seem to serve a purpose they need. - New method on Tensor to get TensorImpl*, unsafeGetTensorImpl - accessor() is now this-const, since const-correctness on Tensor is a lie - New toSparse()/toDense() methods on Type; now you can call these directly without having to manually apply at::toSparse/toDense on the Backend and then running toBackend yourself. Changes to the kernels: - Previously, the whole body of all kernels was compiled for every supported scalar type. In our new implementation, the scalar dispatch has been pushed into the smallest extent which (1) is not in a type loop and (2) requires statically knowing the scalar type. These sites all use AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as possible, but sometimes it was not possible when a OpenMP pragma was used. - Anywhere we tested if the nDimension of a tensor was zero, we replaced with a test that numel is zero. Because, as we known, nDimension of zero-size tensors in TH is zero, and that's wrong wrong wrong (and not done this way in ATen). Some subtleties: - Places where previously fastget1d was used, I now use a TensorAccessor. However, you have to be careful about grabbing the accessor, because sometimes you will be accessor'ing indices/values and they are empty, which means they will be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.) So, essentially, it is only safe to grab an accessor *after* you have checked that nnz != 0. All of these shenanigans will go away when we properly support zero-size dimensions. A few places, we test for this case just by wrapping the loop in a conditional on nnz. Some other places this is not so easy, so we instead short-circuit the function with a special case for when nnz == 0 (usually, these implementations are degenerate). - There is a very subtle but important difference between _sparse_get_impl(self)->indices() and self._indices(); the latter may return a view! This is because nnz is not guaranteed to match the dimensions of indices/values; you can "truncate" a sparse tensor by setting the nnz. Actually, I think this is not a good idea and we should enforce a stronger invariant, but for this patch I slavishly adhere to the old ways, and as such I have to be very careful if I want to resize something, I had better use the former and not the latter. - I had to reimplement broadcasting by hand (thus the s_ and non-s_ functions in the sparse native files). There is a very important distinction between foo_out and foo_, so it is important that the LegacyBridge function always call to the lower layer, and not try to avoid boilerplate by calling to another LegacyBridge function first. I did NOT put broadcasting in LegacyBridge (even though, ultimately, that's where it must live), because the th_ functions which are invoked from LegacyBridge handle broadcasting themselves, and I don't want to broadcast twice. - Sparse function MUST explicitly specify the Type they dispatch from, otherwise Variable wrapping/unwrapping will not work correctly. If you use _get_sparse_impl, that is sufficient to levy this requirement. - The "has native" tests in LegacyBridge.cpp are not 100%, because some of the functions are mixed dense-sparse functions, and so you can't just say, "Oh, if it's sparse and CPU, call the native sparse implementation." This is handled on a case by case basis. There is some especially complex logic for add(), which has dense-dense, sparse-sparse and dense-sparse implementations. - I added some uses of SparseTensorRef in native_functions.yaml, but you will notice that these are all on native_* functions, and not the actual, top-level functions. So the SparseTensorRef is purely documentary (helping you not call the wrong overload) but there is no magic; we do the wrapping ourselves the hard way. (This is in constrast to the TH binding code which is magical.) Except for _sparse_mask; _sparse_mask is magical. - There is a raw_copy_sparse_ method, which is really my way of getting around the fact that copy_ has never been implemented for sparse tensors (even before this patch), but there IS a super secret, internal way of doing these copies that the THS code used, and which I needed to get my hands on when I did this port. We should refactor so that either (a) copy_ does support sparse-sparse copy natively, or (b) we do this other ways. - Irritatingly, I must explicitly resize_as_ before copy_ into a tensor. This was not the case with THTensor_(copy) but I don't have any direct binding that doesn't have this requirement. - For some reason, the sparse tensor constructor accepts a scalar tensor for the values tensor. This is kind of weird because you always need an nnz-dimension. However, the old code supported this and just expanded it into a 1D size 0 tensor; so we need some explicit code to do this. There are maybe a bit more AT_ASSERTs in some of the kernels than is wise. I added them all when I was debugging and was loathe to remove them. Signed-off-by: Edward Z. Yang <[email protected]>

Signed-off-by: Edward Z. Yang <[email protected]>

…ully the derivatives are short. Signed-off-by: Edward Z. Yang <[email protected]>

… and if you don't do this the Python arg parsing is wrong. Ick. Signed-off-by: Edward Z. Yang <[email protected]>

Signed-off-by: Edward Z. Yang <[email protected]>

Some other fixes too: - Make numel uniformly supported on all types, not just dense types - Add tests for is_nonzero() method (which exercises localScalar) Signed-off-by: Edward Z. Yang <[email protected]>

Signed-off-by: Edward Z. Yang <[email protected]>

gchanan

I don't think I fully understand what this will look like once everything is ported over, but this looks good to merge if you want.

Signed-off-by: Edward Z. Yang <[email protected]>

gchanan · 2018-06-15T15:21:21Z

This is great btw :).

Signed-off-by: Edward Z. Yang <[email protected]>

This reverts commit 69398c6.

ezyang requested a review from gchanan June 13, 2018 02:00

ezyang requested review from apaszke, colesbury, soumith and zdevito as code owners June 13, 2018 02:00

ezyang force-pushed the pr/ths-to-aten branch from b5cc033 to be2e57a Compare June 13, 2018 20:12

ssnl reviewed Jun 13, 2018

View reviewed changes

ezyang requested review from Yangqing, anderspapitto, bddppq, dzhulgakov, houseroad, jamesr66a and smessmer as code owners June 13, 2018 21:14

gchanan reviewed Jun 14, 2018

View reviewed changes

ezyang force-pushed the pr/ths-to-aten branch from 9693874 to 557a724 Compare June 14, 2018 13:15

ezyang mentioned this pull request Jun 14, 2018

Get rid of expand nodes from broadcasting operators when tracing #8495

Closed

gchanan reviewed Jun 14, 2018

View reviewed changes

ezyang added 9 commits June 15, 2018 07:22

Move expand outside of dispatch so autograd works.

893a097

Signed-off-by: Edward Z. Yang <[email protected]>

Make the addmm overload actually work by not using SparseTensorRef

e02692e

Signed-off-by: Edward Z. Yang <[email protected]>

Hack to duplicate the derivatives for our now two definitions. Mercif…

75d7969

…ully the derivatives are short. Signed-off-by: Edward Z. Yang <[email protected]>

Apparently, TH has a special case to make foo_ functions method only,…

f667157

… and if you don't do this the Python arg parsing is wrong. Ick. Signed-off-by: Edward Z. Yang <[email protected]>

dynamic_type is Scalar instead of real, hack around it

7624694

Signed-off-by: Edward Z. Yang <[email protected]>

Unfuck the tracing situation

899a96a

Signed-off-by: Edward Z. Yang <[email protected]>

Apply DCE to a test_jit case, fixes wobbling

89db66d

Signed-off-by: Edward Z. Yang <[email protected]>

Update test_function

7317564

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang added 16 commits June 15, 2018 07:23

Error out _move

b7de4f5

Signed-off-by: Edward Z. Yang <[email protected]>

Some gotchas about Python argument parsing.

bac7a97

Signed-off-by: Edward Z. Yang <[email protected]>

More robust dynamic_type normalization, fixes ONNX bug

93b4355

Signed-off-by: Edward Z. Yang <[email protected]>

More stubs

e4b1e80

Signed-off-by: Edward Z. Yang <[email protected]>

lintfix

0cc6850

Signed-off-by: Edward Z. Yang <[email protected]>

I give up on giving BC hints

4e3bb2c

Signed-off-by: Edward Z. Yang <[email protected]>

lintfix

ae77cd6

Signed-off-by: Edward Z. Yang <[email protected]>

Kill the rest of the dead stuff.

2142ac8

Signed-off-by: Edward Z. Yang <[email protected]>

bugfix

f471578

Signed-off-by: Edward Z. Yang <[email protected]>

Code review.

fa2c549

Signed-off-by: Edward Z. Yang <[email protected]>

Expunge is_variable_or_undefined

bc8de63

Signed-off-by: Edward Z. Yang <[email protected]>

Make new_with_tensor_sparse more CUDA safe

8e1b4e5

Signed-off-by: Edward Z. Yang <[email protected]>

Note about CUDA-safety in SparseTensorImpl

ae59234

Signed-off-by: Edward Z. Yang <[email protected]>

Rename dimI/dimV to sparseDims/denseDims.

f18e039

Signed-off-by: Edward Z. Yang <[email protected]>

Clarify SparseTensorImpl comment.

442e8e4

Signed-off-by: Edward Z. Yang <[email protected]>

Make localScalar on SparseTensorImpl work.

8e080ef

Some other fixes too: - Make numel uniformly supported on all types, not just dense types - Add tests for is_nonzero() method (which exercises localScalar) Signed-off-by: Edward Z. Yang <[email protected]>

ezyang force-pushed the pr/ths-to-aten branch from 8fe5474 to 8e080ef Compare June 15, 2018 14:27

ezyang added 5 commits June 15, 2018 07:29

CR

595c078

Signed-off-by: Edward Z. Yang <[email protected]>

Work around torch._thnn import problem

69398c6

Signed-off-by: Edward Z. Yang <[email protected]>

Move BlasUtils into TH

1a7aa70

Signed-off-by: Edward Z. Yang <[email protected]>

lintfix

accbba1

Signed-off-by: Edward Z. Yang <[email protected]>

Clarify Multiple dispatch to sparse

6ca62ed

Signed-off-by: Edward Z. Yang <[email protected]>

gchanan approved these changes Jun 15, 2018

View reviewed changes

ezyang added 2 commits June 15, 2018 08:01

CR

2c5e257

Signed-off-by: Edward Z. Yang <[email protected]>

Remove unnecessary print

6063c49

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang added 2 commits June 15, 2018 09:25

Disable constant tests.

14eb00d

Signed-off-by: Edward Z. Yang <[email protected]>

Revert "Work around torch._thnn import problem"

cdf42cd

This reverts commit 69398c6.

ezyang merged commit 711e5a6 into pytorch:master Jun 15, 2018

Port THS to ATen. #8409

Port THS to ATen. #8409

Uh oh!

Conversation

ezyang commented Jun 13, 2018

Uh oh!

ezyang commented Jun 13, 2018

Uh oh!

ezyang commented Jun 13, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!