-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Polymorphic sparse_mask and sparse_copy #1471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is ready for review! |
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what did it buy you - you have as many _sparse_select functions defined as many _sparse_masks you'd need, and now we have cyclic dependencies between TH(C) and TH(C)S
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/THCS/generic/THCSTensor.c
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/THCS/generic/THCSTensor.c
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/THS/generic/THSTensor.c
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/THS/generic/THSTensor.c
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/tensor.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Sorry, I wasn't clear about how the savings work. Recall that the type of
Now, the problem is that I want to be able to call sparseMask with a Float dense tensor, but a, say, Long sparse mask. Intuitively, there's no problem with this sort of operation, since at the end of the day we only care about the indices of the sparse tensor. However, this means that if I do this with macros, I end up with O(n^2) copies of sparseMask:
In this patch, I get polymorphic sparseMask, but with only O(n) functions, since they accept the index tensor (of which there's only one type.) That's the benefit. |
|
OK, nailed all of the review comments that I agreed with. (I also accidentally slipped in my ref counting docs patch oops) |
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Fixes #1462. Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this PR is mostly okay functionally, but I think the whole sparseMask/sparseSelect/sparseMask interface is the wrong level of abstraction, as these are not logically sensible operations (it doesn't represent any mathematical operation). The fact that you have to add the polymorphism is a sign of that (since you should be working directly on the indices).
Here's an alternative THS API that can achieve the same thing... I think it's simpler and more general.
sparse.linear_indices() # returns a 1D tensor with the linearized indices, same as THSTensor_(flattenedIndices)`
sparse.nelemI() # product of nDimI dimensions
sparse.nelemV() # product of nDimV dimensions
Now you can write sparseSelect and sparseCopy as:
dense.view(sparse.nelemI(), sparse.nelemV()).index_select(sparse.linear_indices()) # sparse_select
dense.view(sparse.nelemI(), sparse.nelemV()).index_copy(sparse.linear_indices(), sparse.values()) # sparse_copy
You could also do the index_add variant etc. with a python one-liner (which means you could reduce the THCSTensor_(spadd) function to just use these function calls).
test/test_sparse.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| long *stride; | ||
| int nDimension; | ||
|
|
||
| // Note: storage->size may be greater than the recorded size |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| for (long d = 0; d < mask->nDimensionI; d++) { | ||
| THCudaLongTensor_mul(state, indices, indices, mask->size[d]); | ||
| THCudaLongTensor_select(state, indicesBuffer, maskIndices, 0, d); | ||
| for (long d = 0; d < nDimI; d++) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I'm cool with rewriting all of the interfaces with Adam's plan (it sounds good to me.) I'd like to hear what other people think though! CC @martinraison @apaszke @soumith @ebetica We can of course bikeshed the names. |
|
I like @adamlerer's suggestion! We may still want to provide higher level abstractions like I'm not sure what the best names would be. I'm just giving some additional suggestions for |
|
This PR will be obsoleted. |
|
Quick update from the implementation world. Here is a more correct definition of sparse_mask, using the numelI/numelV nomenclature: Notice that to get the correct dimensionality of v, we want to preserve the sizes of the V dimensions. I think there should be a more convenient way to refer to this, and I think
I guess I should also add new() support, so I don't have to use CC @adamlerer and @martinraison |
supports alias on casted inputs. This allows us to handle fp16 running stats (fp16 norm layers). Arguably we would want to do this in the parser code instead of inside Fusion::aliasOutputToInput. But that means we'll duplicate this code in a few places, which is ugly as well. TODO: properly handle segment io alias cpp test in PR pytorch#1471
fix pytorch#67610 drop aliased output from pushing to stack Currently we don't support to mark aliased output to be real outputs. which is tracked in pytorch#1488
… sync (pytorch#1455) (pytorch#1471) * [SWDEV-469514] hipGraphExecDestroy requires an explicit sync There is a new hip feature where they do not free hipGraph memory as soon as hipGraphExecDestroy is called. This is to support async work on the GPU. See this for more details: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects We noticed this issue when an allreduce op inside a hipGraph hung. Essentially, ncclCommAbort was waiting for all GPU activity to finish. However, since hipGraph memory was technically still in use, we had an infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's destructor to esure that memory is freed and got test_allreduce_in_cudagraph UT to pass. However, when I ran this on CUDA machine, I noticed that they did not require this extra sync in order to successfully run the UT. It seems that they were calling cudaGraphInstantiateWithFlags with cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees memory after graph lauch. There is support for this API in our ROCm stack, but we were missing cuda to hip mappings in PyTorch. So, I brought them in and added the necesary conditions to call this API in HIP case also. * Update comments * Use USE_ROCM in keeping with convention * Use USE_ROCM to match convention --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit e752b4f)
Fixes #1462