Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jun 13, 2018

The basic structure of the patch:

  • All kernels in aten/src/THS got rewritten as native
    functions in aten/src/ATen/native/sparse

    I took the liberty to rename some of the kernels,
    opting for a longer, more transparent names than
    things like 'spaddcmul'.

  • Instead of holding fields for sparse tensor in the TH
    C struct THSTensor, they are now held in a C++ class
    SparseTensorImpl (this explains why I had to do this
    all in one go; I can't have two reps for sparse
    tensors!)

    Along the way, we change a key internal representation
    invariant: an "empty" sparse tensor has dimI == 1 and
    dimV == 0 (this is different from dimI == 0 and dimV == 0
    we had before); this ensures that we maintain the invariant
    that dim == dimI + dimV. "Scalar" sparse tensors are
    made illegal, because there really is no way to properly
    express them in COO format.

  • Because we haven't ported THCS or any of the traditional
    dense TH implementations, there is a new set of adapter
    functions in native/LegacyBridge.cpp exclusively devoted
    to deciding whether or not to go to the new native implementation
    or back to the legacy TH binding (prefixed with th_).
    The intent is that when everything gets ported, we can
    delete this file.

  • I've kept the stubs for all the THS functions, but they now all
    error if you try to actually call them. Eventually, we should
    replace these with calls to ATen so that everything keeps
    working.

  • I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

  • There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
    it says on the tin.

  • axpy templated function moved to BlasUtils.h, there's a new macro
    which lets you easily forward to all of the TH functions. We also expose
    THBlas_(copy). I'm not terribly pleased with these functions but
    they seem to serve a purpose they need.

  • New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

  • accessor() is now this-const, since const-correctness on Tensor is a lie

  • New toSparse()/toDense() methods on Type; now you can call these
    directly without having to manually apply at::toSparse/toDense
    on the Backend and then running toBackend yourself.

Changes to the kernels:

  • Previously, the whole body of all kernels was compiled for
    every supported scalar type. In our new implementation,
    the scalar dispatch has been pushed into the smallest extent
    which (1) is not in a type loop and (2) requires statically
    knowing the scalar type. These sites all use
    AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as
    possible, but sometimes it was not possible when a OpenMP
    pragma was used.

  • Anywhere we tested if the nDimension of a tensor was zero,
    we replaced with a test that numel is zero. Because, as we
    known, nDimension of zero-size tensors in TH is zero, and
    that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

  • Places where previously fastget1d was used, I now use a
    TensorAccessor. However, you have to be careful about grabbing
    the accessor, because sometimes you will be accessor'ing
    indices/values and they are empty, which means they will
    be 1D ("oh, aren't indices always 2D?" Nope. Nyet.)
    So, essentially, it is only safe to grab an accessor after
    you have checked that nnz != 0. All of these shenanigans
    will go away when we properly support zero-size dimensions.

    A few places, we test for this case just by wrapping the loop
    in a conditional on nnz. Some other places this is not so easy,
    so we instead short-circuit the function with a special case for
    when nnz == 0 (usually, these implementations are degenerate).

  • There is a very subtle but important difference between
    _sparse_get_impl(self)->indices() and self._indices();
    the latter may return a view! This is because nnz is
    not guaranteed to match the dimensions of indices/values;
    you can "truncate" a sparse tensor by setting the nnz.
    Actually, I think this is not a good idea and we should
    enforce a stronger invariant, but for this patch I slavishly
    adhere to the old ways, and as such I have to be very
    careful if I want to resize something, I had better use
    the former and not the latter.

  • I had to reimplement broadcasting by hand (thus the s_
    and non-s_ functions in the sparse native files). There
    is a very important distinction between foo_out and foo_,
    so it is important that the LegacyBridge function always
    call to the lower layer, and not try to avoid boilerplate
    by calling to another LegacyBridge function first.
    I did NOT put broadcasting in LegacyBridge (even though,
    ultimately, that's where it must live), because the th_
    functions which are invoked from LegacyBridge handle
    broadcasting themselves, and I don't want to broadcast
    twice.

  • Sparse function MUST explicitly specify the Type they
    dispatch from, otherwise Variable wrapping/unwrapping will
    not work correctly. If you use _get_sparse_impl, that is
    sufficient to levy this requirement.

  • The "has native" tests in LegacyBridge.cpp are not 100%,
    because some of the functions are mixed dense-sparse functions,
    and so you can't just say, "Oh, if it's sparse and CPU, call
    the native sparse implementation." This is handled on a
    case by case basis. There is some especially complex
    logic for add(), which has dense-dense, sparse-sparse
    and dense-sparse implementations.

  • I added some uses of SparseTensorRef in native_functions.yaml,
    but you will notice that these are all on native_* functions,
    and not the actual, top-level functions. So the SparseTensorRef
    is purely documentary (helping you not call the wrong overload)
    but there is no magic; we do the wrapping ourselves the hard
    way. (This is in constrast to the TH binding code which is magical.)
    Except for _sparse_mask; _sparse_mask is magical.

  • There is a raw_copy_sparse_ method, which is really my way of
    getting around the fact that copy_ has never been implemented
    for sparse tensors (even before this patch), but there IS a
    super secret, internal way of doing these copies that the THS
    code used, and which I needed to get my hands on when I did this
    port. We should refactor so that either (a) copy_ does support
    sparse-sparse copy natively, or (b) we do this other ways.

  • Irritatingly, I must explicitly resize_as_ before copy_ into
    a tensor. This was not the case with THTensor_(copy) but I don't
    have any direct binding that doesn't have this requirement.

  • For some reason, the sparse tensor constructor accepts a scalar
    tensor for the values tensor. This is kind of weird because
    you always need an nnz-dimension. However, the old code supported
    this and just expanded it into a 1D size 0 tensor; so we need some
    explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise. I added them all when I was debugging and was
loathe to remove them.

Signed-off-by: Edward Z. Yang [email protected]

TODO: I claim that I have "error"ed out all of the old THS functions, but this has not actually been done yet except for one function.

@ezyang ezyang requested a review from gchanan June 13, 2018 02:00
@ezyang
Copy link
Contributor Author

ezyang commented Jun 13, 2018

I know why the broadcast tests are failing (the broadcast calls are behind a dispatch: SparseCPU, so they aren't interpreted as part pieces I need to autograd through) but there is some special casing for this situation in TH which I can't seem to quite replicate when some overloads are native while others are TH. I'll try to work it out tomorrow.

@ezyang
Copy link
Contributor Author

ezyang commented Jun 13, 2018

There are some very uncomfortable hacks in the last set of commits. Please review with care.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

test/test_jit.py Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang added 9 commits June 15, 2018 07:22
The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_(copy).  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Signed-off-by: Edward Z. Yang <[email protected]>
…ully the derivatives are short.

Signed-off-by: Edward Z. Yang <[email protected]>
… and if you don't do this the Python arg parsing is wrong. Ick.

Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
ezyang added 16 commits June 15, 2018 07:23
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Some other fixes too:

- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)

Signed-off-by: Edward Z. Yang <[email protected]>
ezyang added 5 commits June 15, 2018 07:29
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand what this will look like once everything is ported over, but this looks good to merge if you want.

ezyang added 2 commits June 15, 2018 08:01
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
@gchanan
Copy link
Contributor

gchanan commented Jun 15, 2018

This is great btw :).

@ezyang ezyang merged commit 711e5a6 into pytorch:master Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants