Skip to content

_unique_dim needs refactor #18405

@zasdfgbnm

Description

@zasdfgbnm

I am profiling the training of my model which uses
tensor.unique(dim=0, sorted=True, return_inverse=True, return_counts=True) for a tensor of size ~(15388, 2), with a manually merge of #18391. There is only one call of unique for each forward pass on my model. Timings for unique_dim is really bad:

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

        Because the autograd profiler uses the CUDA event API,
        the CUDA time column reports approximately max(cuda_time, cpu_time).
        Please ignore this output if your code does not use CUDA.

----------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                     CPU time        CUDA time            Calls        CPU total       CUDA total
----------------  ---------------  ---------------  ---------------  ---------------  ---------------
_unique_dim         1127002.344us    1126993.000us                1    1127002.344us    1126993.000us
_unique_dim         1125635.745us    1125626.000us                1    1125635.745us    1125626.000us
_unique_dim         1120160.511us    1120151.000us                1    1120160.511us    1120151.000us
_unique_dim         1116257.000us    1116249.000us                1    1116257.000us    1116249.000us
_unique_dim         1099370.667us    1099362.000us                1    1099370.667us    1099362.000us
_unique_dim         1085741.313us    1085733.875us                1    1085741.313us    1085733.875us
_unique_dim         1083343.180us    1083333.750us                1    1083343.180us    1083333.750us
_unique_dim         1083208.457us    1083199.000us                1    1083208.457us    1083199.000us
_unique_dim         1082070.081us    1082061.500us                1    1082070.081us    1082061.500us
_unique_dim         1061865.168us    1061857.500us                1    1061865.168us    1061857.500us
_unique_dim          606158.761us     606154.000us                1     606158.761us     606154.000us
index_select           3095.179us       3106.500us                1       3095.179us       3106.500us
index                  2425.817us       2427.875us                1       2425.817us       2427.875us
sum                    1642.662us       1644.250us                1       1642.662us       1644.250us
EluBackward            1629.843us       1630.000us                1       1629.843us       1630.000us

_unique and _unique_dim actually need some refactor, https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Unique.cu
Currently these two operators are implemented seperately, but the logic of these operators should be more or less the same, so they should be combined into a single function such that improves in _unqiue (#16145) could also benefit _unique_dim.

See also: #15804

@ptrblck @ngimel If you don't mind, I will work on this. But I'm planning to get #18391 merged as is without any refactor and base the refactor on that issue.

Also: cc @ezyang

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: performanceIssues related to performance, either of kernel code or framework glue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions