Polymorphic sparse_mask and sparse_copy #1471

ezyang · 2017-05-04T15:55:19Z

ezyang · 2017-05-08T17:57:25Z

This is ready for review!

apaszke

I'm not sure what did it buy you - you have as many _sparse_select functions defined as many _sparse_masks you'd need, and now we have cyclic dependencies between TH(C) and TH(C)S

test/test_sparse.py

torch/lib/THCS/generic/THCSTensor.c

torch/lib/THS/generic/THSTensor.c

torch/tensor.py

ezyang · 2017-05-09T15:20:56Z

I'm not sure what did it buy you - you have as many _sparse_select functions defined as many _sparse_masks you'd need

Sorry, I wasn't clear about how the savings work.

Recall that the type of sparse_mask: THSTensor sparseMask(THTensor dense, THSTensor mask). Via our macro magic, this means that I get one implementation of sparseMask per tensor type, O(n) in total; e.g., I have:

THSFloatTensor sparseMask(THFloatTensor dense, THSFloatTensor mask)
THSLongTensor sparseMask(THLongTensor dense, THSLongTensor mask)
etc

Now, the problem is that I want to be able to call sparseMask with a Float dense tensor, but a, say, Long sparse mask. Intuitively, there's no problem with this sort of operation, since at the end of the day we only care about the indices of the sparse tensor. However, this means that if I do this with macros, I end up with O(n^2) copies of sparseMask:

THSFloatTensor sparseMask(THFloatTensor dense, THSFloatTensor mask)
THSFloatTensor sparseMask(THFloatTensor dense, THSLongTensor mask)
THSFloatTensor sparseMask(THFloatTensor dense, ... mask)
THSFloatTensor sparseMask(THLongTensor dense, THSFloatTensor mask)
THSLongTensor sparseMask(THLongTensor dense, THSLongTensor mask)
THSLongTensor sparseMask(THLongTensor dense, THSLongTensor mask)
THSLongTensor sparseMask(THLongTensor dense, ... mask)
etc

In this patch, I get polymorphic sparseMask, but with only O(n) functions, since they accept the index tensor (of which there's only one type.) That's the benefit.

ezyang · 2017-05-09T20:22:37Z

OK, nailed all of the review comments that I agreed with. (I also accidentally slipped in my ref counting docs patch oops)

Signed-off-by: Edward Z. Yang <[email protected]>

Fixes #1462. Signed-off-by: Edward Z. Yang <[email protected]>

Signed-off-by: Edward Z. Yang <[email protected]>

adamlerer

I think that this PR is mostly okay functionally, but I think the whole sparseMask/sparseSelect/sparseMask interface is the wrong level of abstraction, as these are not logically sensible operations (it doesn't represent any mathematical operation). The fact that you have to add the polymorphism is a sign of that (since you should be working directly on the indices).

Here's an alternative THS API that can achieve the same thing... I think it's simpler and more general.

sparse.linear_indices()  # returns a 1D tensor with the linearized indices, same as THSTensor_(flattenedIndices)`
sparse.nelemI()  # product of nDimI dimensions
sparse.nelemV()  # product of nDimV dimensions

Now you can write sparseSelect and sparseCopy as:

dense.view(sparse.nelemI(), sparse.nelemV()).index_select(sparse.linear_indices())  # sparse_select
dense.view(sparse.nelemI(), sparse.nelemV()).index_copy(sparse.linear_indices(), sparse.values())  # sparse_copy

You could also do the index_add variant etc. with a python one-liner (which means you could reduce the THCSTensor_(spadd) function to just use these function calls).

test/test_sparse.py

torch/lib/TH/generic/THTensor.h

    long *stride;
    int nDimension;

+    // Note: storage->size may be greater than the recorded size


torch/lib/THCS/generic/THCSTensor.c

-  for (long d = 0; d < mask->nDimensionI; d++) {
-    THCudaLongTensor_mul(state, indices, indices, mask->size[d]);
-    THCudaLongTensor_select(state, indicesBuffer, maskIndices, 0, d);
+  for (long d = 0; d < nDimI; d++) {


ezyang · 2017-06-07T20:29:24Z

I'm cool with rewriting all of the interfaces with Adam's plan (it sounds good to me.) I'd like to hear what other people think though! CC @martinraison @apaszke @soumith @ebetica

We can of course bikeshed the names.

martinraison · 2017-06-08T11:27:16Z

I like @adamlerer's suggestion! We may still want to provide higher level abstractions like sparse_mask, but implemented directly in Python using linear_indices/view/index_select.

I'm not sure what the best names would be. I'm just giving some additional suggestions for nelemI/nelemV: isize/vsize, sizeI/sizeV, inumel/vnumel, numelI/numelV (we just want to make sure those things are not confused with nnz, and I think something close to size or numel would fit well with the Tensor API)

ezyang · 2017-06-09T14:22:33Z

This PR will be obsoleted.

ezyang · 2017-06-11T00:03:01Z

Quick update from the implementation world. Here is a more correct definition of sparse_mask, using the numelI/numelV nomenclature:

    def _sparse_mask(self, s):
        v = self.view(s.numelI(), *s._values().size()[1:]).index_select(0, s._linear_indices())
        return type(s)(s._indices(), v, s.size())

Notice that to get the correct dimensionality of v, we want to preserve the sizes of the V dimensions. I think there should be a more convenient way to refer to this, and I think sizeV is a good match. Then we have invariants:

s.sizeI() + s.sizeV() == s.size()
s.numelI() * s.numelV() == s.to_dense().numel() (Arguably we should add numel natively for sparse)

I guess I should also add new() support, so I don't have to use type(s) to hack the constructor.

CC @adamlerer and @martinraison

supports alias on casted inputs. This allows us to handle fp16 running stats (fp16 norm layers). Arguably we would want to do this in the parser code instead of inside Fusion::aliasOutputToInput. But that means we'll duplicate this code in a few places, which is ugly as well. TODO: properly handle segment io alias cpp test in PR pytorch#1471

fix pytorch#67610 drop aliased output from pushing to stack Currently we don't support to mark aliased output to be real outputs. which is tracked in pytorch#1488

… sync (pytorch#1455) (pytorch#1471) * [SWDEV-469514] hipGraphExecDestroy requires an explicit sync There is a new hip feature where they do not free hipGraph memory as soon as hipGraphExecDestroy is called. This is to support async work on the GPU. See this for more details: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects We noticed this issue when an allreduce op inside a hipGraph hung. Essentially, ncclCommAbort was waiting for all GPU activity to finish. However, since hipGraph memory was technically still in use, we had an infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's destructor to esure that memory is freed and got test_allreduce_in_cudagraph UT to pass. However, when I ran this on CUDA machine, I noticed that they did not require this extra sync in order to successfully run the UT. It seems that they were calling cudaGraphInstantiateWithFlags with cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees memory after graph lauch. There is support for this API in our ROCm stack, but we were missing cuda to hip mappings in PyTorch. So, I brought them in and added the necesary conditions to call this API in HIP case also. * Update comments * Use USE_ROCM in keeping with convention * Use USE_ROCM to match convention --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit e752b4f)

ezyang force-pushed the sparse-select branch from dfccb6b to bbf10df Compare May 4, 2017 19:09

ezyang changed the title ~~[WIP] sparse_select~~ Polymorphic sparse_mask and sparse_copy May 8, 2017

ezyang force-pushed the sparse-select branch from 36605e9 to dce4929 Compare May 8, 2017 17:57

apaszke reviewed May 9, 2017

View reviewed changes

ezyang added 18 commits May 15, 2017 10:31

Reseed at the beginning of each test.

9401804

Signed-off-by: Edward Z. Yang <[email protected]>

Cyclic coordinate descent is MUCH more stable

80681b6

Signed-off-by: Edward Z. Yang <[email protected]>

Sparse tensor clone() preserves coalescedness.

afaf9b5

Signed-off-by: Edward Z. Yang <[email protected]>

Add some docs about storage->Size.

ac31ec5

Signed-off-by: Edward Z. Yang <[email protected]>

Polymorphic _sparse_mask and a new method _sparse_copy_

32921d7

Fixes #1462. Signed-off-by: Edward Z. Yang <[email protected]>

Use tuple-indexing instead of explicit for loop.

009e267

Add smart constructor to get SparseSize from sparse tensor, and use it.

f9ba3af

Signed-off-by: Edward Z. Yang <[email protected]>

Test clone preserves uncoalescedness if it wasn't coalesced.

3bdd86d

Signed-off-by: Edward Z. Yang <[email protected]>

More efficient nnz test in spcadd.

b86ce21

Signed-off-by: Edward Z. Yang <[email protected]>

Call out move as consuming rIndices/rValues

a3bf41e

Signed-off-by: Edward Z. Yang <[email protected]>

More efficient nnz check in sparseSelect, and fix space leak.

0a4f432

Signed-off-by: Edward Z. Yang <[email protected]>

Coalesce sparse tensor in CUDA sparseCopy

9086ad8

Signed-off-by: Edward Z. Yang <[email protected]>

Fix documentation typo

95f9b1d

Signed-off-by: Edward Z. Yang <[email protected]>

Remove omp parallel pragma since I don't understand how to tune it.

29b07b3

Signed-off-by: Edward Z. Yang <[email protected]>

Python implementations of norm and sum on sparse tensors.

fe413c6

Signed-off-by: Edward Z. Yang <[email protected]>

Bug out in error case

e35a368

Signed-off-by: Edward Z. Yang <[email protected]>

Appease the angry lint god.

e6c9d8b

Signed-off-by: Edward Z. Yang <[email protected]>

Readd accidentally deleted resizeAs calls.

3d99b56

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang force-pushed the sparse-select branch from 9434ded to 3d99b56 Compare May 15, 2017 20:03

adamlerer reviewed Jun 7, 2017

View reviewed changes

ezyang closed this Jun 9, 2017

ezyang mentioned this pull request Jun 9, 2017

Bugfixes and improvements on sparse #1761

Closed

ezyang mentioned this pull request Jun 11, 2017

New sparse API with linear_index #1770

Closed

ezyang deleted the sparse-select branch September 7, 2017 20:23

Polymorphic sparse_mask and sparse_copy #1471

Polymorphic sparse_mask and sparse_copy #1471

Uh oh!

Conversation

ezyang commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented May 8, 2017

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented May 9, 2017

Uh oh!

ezyang commented May 9, 2017

Uh oh!

adamlerer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Jun 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinraison commented Jun 8, 2017

Uh oh!

ezyang commented Jun 9, 2017

Uh oh!

ezyang commented Jun 11, 2017

Uh oh!

Reviewers

ezyang commented May 4, 2017 •

edited

Loading

adamlerer left a comment •

edited

Loading

ezyang commented Jun 7, 2017 •

edited

Loading