Add return_counts to torch.unique #15495

ptrblck · 2018-12-22T01:01:44Z

Add return_count argument to torch.unique.
Original feature request: #12598

The performance is approx. the same as using the dim argument here.

Additionally to the added tests in test_torch.py, I've used these tests to make sure all combinations yield the same result as numpy's implementation.

Sorry again for the delay of this PR and thanks @t-vi for providing me access to his server and GPU, while I was building mine.

CC @ezyang @colesbury

Best,
ptrblck

ezyang · 2019-01-07T14:22:46Z

torch/functional.py



-def unique(input, sorted=False, return_inverse=False, dim=None):
+def unique(input, sorted=False, return_inverse=False, return_counts=False, dim=None):


Documentation below needs updating

ezyang · 2019-01-07T14:27:29Z

aten/src/ATen/native/Unique.cpp

  Tensor inverse_indices = at::empty({0}, self.options().dtype(kLong));
-  if (return_inverse) {
+  Tensor counts = at::empty({0}, self.options().dtype(kLong));
+  if (return_inverse || return_counts) {


This looks a bit questionable. You're computing both the inverse and the counts if either of them are requested. But if one is requested but not the other you're just doing wasted work.

Thanks for the comment! Yeah, I thought the same, but decided for code clarity, since counts is being calculated in basically the same loop as the inverse_indices and wanted to avoid code duplication.

EDIT: I'm wrong at this point. I'm using the inverse_map for the counts so that the sorted/unsorted unique tensors correspond to the counts. What do you think?

ezyang · 2019-01-07T14:32:32Z

aten/src/ATen/native/cuda/Unique.cu

+    if (return_counts) {
+      counts.resize_(output.sizes()).fill_(0);
+      Tensor val = at::ones({num_inp}, self.options().dtype(kLong));
+      counts.scatter_add_(0, inverse_indices.view(-1), val);


Humm, a broadcasting scatter_add_ would have been very useful here :)

Yeah, you mean such that we can just pass a single 1 as the value?
Is there a more efficient version of this code?

ezyang · 2019-01-07T14:34:14Z

PR looks basically reasonable (even if the CUDA implementation is a little inefficient), but docs must be updated before we can merge this

ezyang · 2019-01-17T07:20:33Z

pushed mergefix

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2019-01-17T18:03:21Z

This appears to be BC-breaking in ONNX, in a way that is not caught by current OSS tests. In internal tests, we see the following failure:

  File "torch/onnx/__init__.py", line 22, in _export
    return utils._export(*args, **kwargs)
  File "torch/onnx/utils.py", line 281, in _export
    example_outputs, propagate)
  File "torch/onnx/utils.py", line 227, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type)
  File "torch/onnx/utils.py", line 155, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "torch/onnx/__init__.py", line 52, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "torch/onnx/utils.py", line 504, in _run_symbolic_function
    return fn(g, *inputs, **attrs)
  File "torch/onnx/symbolic.py", line 118, in wrapper
    assert len(arg_descriptors) == len(args)
AssertionError

I think the problem is probably that we need to adjust the symbolic definition for unique to handle having three args.

ptrblck · 2019-01-30T12:56:23Z

@ezyang Could you give me a hint on adjusting the symbolic definition or is it being worked on in another PR?

ezyang · 2019-02-10T03:59:16Z

Unfortunately I don't think anyone is taking point on this. The steps to resolve would be to first reproduce the error (probably it should be sufficient to just try exporting unique in the first place), and then adjust the symbolic to handle the new keyword argument. I can take a look (eventually) if you don't have the bandwidth for it.

ptrblck · 2019-02-15T00:51:04Z

@ezyang
I could reproduce this error locally by trying to export a dummy model using torch.unique inside the forward method. I'll try to figure out how to adjust the symbolic.

ezyang · 2019-02-15T14:42:29Z

Thanks!

ptrblck · 2019-02-15T17:21:25Z

@ezyang
Not sure, if the current fails are still due to the current PR or not.

ezyang · 2019-02-15T17:49:32Z

I'd suggest merging with master (there are some conflicts) and getting another run of CI.

ngimel · 2019-02-17T02:26:13Z

aten/src/ATen/native/cuda/Unique.cu

    output.resize_(num_out);

+    Tensor counts = at::empty({0}, self.options().dtype(kLong));
+    if (return_counts) {


You'd need to condition computation of inverse_indices on return_counts too, otherwise if return_counts is true and inverse_indices is false, this won't work

Alternatively, you can do thrust::unique_by_key with counting iterator as a key, and thrust::adjacent_differences to get counts, that will probably have better worst case performance for cases where you have a lot of equal elements and there would be a lot of atomic collisions in scatter_add_, and you also won't compute inverse_indices that you might not need.

ezyang

Requesting changes as per @ngimel's comment

Summary: Fixes: #12598 This PR was originally authorized by ptrblck at #15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`. The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see #18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is. cc: ptrblck ezyang ngimel colesbury Pull Request resolved: #18391 Reviewed By: soumith Differential Revision: D14605905 Pulled By: VitalyFedyunin fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce

Summary: Fixes: pytorch/pytorch#12598 This PR was originally authorized by ptrblck at pytorch/pytorch#15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`. The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see pytorch/pytorch#18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is. cc: ptrblck ezyang ngimel colesbury Pull Request resolved: pytorch/pytorch#18391 Reviewed By: soumith Differential Revision: D14605905 Pulled By: VitalyFedyunin fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce

initial commit for unique_counts pytorch#12598

d68edb6

ezyang reviewed Jan 7, 2019

View reviewed changes

ptrblck mentioned this pull request Jan 7, 2019

torch.unique performance issue for return_inverse=True #15804

Closed

ptrblck and others added 2 commits January 8, 2019 00:22

add documentation

5c821ce

Merge remote-tracking branch 'origin/master' into ptrblck-unique

7343e47

facebook-github-bot reviewed Jan 17, 2019

View reviewed changes

adjust ONNX symbolic to hande return_counts

48e1d5f

ptrblck added 2 commits February 16, 2019 01:49

mergefix

b3c9bc3

fixing next merge conflict

651fc35

ngimel reviewed Feb 17, 2019

View reviewed changes

ezyang requested changes Feb 20, 2019

View reviewed changes

zasdfgbnm mentioned this pull request Mar 23, 2019

Add return_counts to torch.unique #18391

Closed

soumith closed this Mar 27, 2019

ptrblck deleted the unique branch April 6, 2019 23:48

ezyang added the open source label Jun 24, 2019



		def unique(input, sorted=False, return_inverse=False, dim=None):
		def unique(input, sorted=False, return_inverse=False, return_counts=False, dim=None):

Add return_counts to torch.unique #15495

Add return_counts to torch.unique #15495

Uh oh!

Conversation

ptrblck commented Dec 22, 2018

Uh oh!

ezyang Jan 7, 2019

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 7, 2019

Choose a reason for hiding this comment

Uh oh!

ptrblck Jan 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 7, 2019

Choose a reason for hiding this comment

Uh oh!

ptrblck Jan 7, 2019

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 7, 2019

Uh oh!

ezyang commented Jan 17, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 17, 2019

Uh oh!

ptrblck commented Jan 30, 2019

Uh oh!

ezyang commented Feb 10, 2019

Uh oh!

ptrblck commented Feb 15, 2019

Uh oh!

ezyang commented Feb 15, 2019

Uh oh!

ptrblck commented Feb 15, 2019

Uh oh!

ezyang commented Feb 15, 2019

Uh oh!

ngimel Feb 17, 2019

Choose a reason for hiding this comment

Uh oh!

ngimel Feb 17, 2019

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ptrblck Jan 7, 2019 •

edited

Loading