-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
I am profiling the training of my model which uses
tensor.unique(dim=0, sorted=True, return_inverse=True, return_counts=True) for a tensor of size ~(15388, 2), with a manually merge of #18391. There is only one call of unique for each forward pass on my model. Timings for unique_dim is really bad:
--------------------------------------------------------------------------------
autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
top 15 events sorted by cpu_time_total
Because the autograd profiler uses the CUDA event API,
the CUDA time column reports approximately max(cuda_time, cpu_time).
Please ignore this output if your code does not use CUDA.
---------------- --------------- --------------- --------------- --------------- ---------------
Name CPU time CUDA time Calls CPU total CUDA total
---------------- --------------- --------------- --------------- --------------- ---------------
_unique_dim 1127002.344us 1126993.000us 1 1127002.344us 1126993.000us
_unique_dim 1125635.745us 1125626.000us 1 1125635.745us 1125626.000us
_unique_dim 1120160.511us 1120151.000us 1 1120160.511us 1120151.000us
_unique_dim 1116257.000us 1116249.000us 1 1116257.000us 1116249.000us
_unique_dim 1099370.667us 1099362.000us 1 1099370.667us 1099362.000us
_unique_dim 1085741.313us 1085733.875us 1 1085741.313us 1085733.875us
_unique_dim 1083343.180us 1083333.750us 1 1083343.180us 1083333.750us
_unique_dim 1083208.457us 1083199.000us 1 1083208.457us 1083199.000us
_unique_dim 1082070.081us 1082061.500us 1 1082070.081us 1082061.500us
_unique_dim 1061865.168us 1061857.500us 1 1061865.168us 1061857.500us
_unique_dim 606158.761us 606154.000us 1 606158.761us 606154.000us
index_select 3095.179us 3106.500us 1 3095.179us 3106.500us
index 2425.817us 2427.875us 1 2425.817us 2427.875us
sum 1642.662us 1644.250us 1 1642.662us 1644.250us
EluBackward 1629.843us 1630.000us 1 1629.843us 1630.000us
_unique and _unique_dim actually need some refactor, https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Unique.cu
Currently these two operators are implemented seperately, but the logic of these operators should be more or less the same, so they should be combined into a single function such that improves in _unqiue (#16145) could also benefit _unique_dim.
See also: #15804
@ptrblck @ngimel If you don't mind, I will work on this. But I'm planning to get #18391 merged as is without any refactor and base the refactor on that issue.
Also: cc @ezyang