Optimize CPU version performance of the nonzero function. by VitalyFedyunin · Pull Request #15190 · pytorch/pytorch

VitalyFedyunin · 2018-12-13T22:31:48Z

Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.

Can be further optimized for 1D tensors and boolean tensors.

VitalyFedyunin · 2018-12-13T22:44:42Z

Related to #14848

VitalyFedyunin · 2018-12-14T18:16:55Z

New version with pointers math works faster than numpy in all cases (2x faster in avg.).

Still need to change GPU version.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

VitalyFedyunin · 2018-12-19T21:22:03Z

@gchanan bug fixed

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

gchanan

A test for the non-contiguous case would be a nice addition.

This looks good to go though, feel free to either commit or address the comments then commit.

) Summary: Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy. Can be further optimized for 1D tensors and boolean tensors. Pull Request resolved: pytorch#15190 Differential Revision: D13468570 fbshipit-source-id: 31e155c5ef247a8983b4c1c12f25b0aafb315e43

facebook-github-bot

@VitalyFedyunin is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy. Can be further optimized for 1D tensors and boolean tensors. Pull Request resolved: pytorch/pytorch#15190 Differential Revision: D13468570 Pulled By: VitalyFedyunin fbshipit-source-id: e55ce54d60626a42d9a10a02e407856458b8055e

Summary: Same as #15190 but compatible with MSVS compiler Pull Request resolved: #15925 Differential Revision: D13623473 Pulled By: VitalyFedyunin fbshipit-source-id: d0db9dbc1a0d8fc9bda08348cb1d3763ae9f8679

botcs · 2019-08-05T11:02:09Z

Hi @gchanan
This still seems to be a problem in 1.1.0 for some applications.
If you want to have only the sum of nonzero elements it is faster by an order of magnitude to clamp your tensor and do a sum over it. Both on GPU (I used torch.cuda.synchronize for benchmarking) and CPU.

also using uint8 as dtype instead of int64 will double the computing time, that is really annoying

VitalyFedyunin · 2019-08-05T16:48:23Z

@botcs are you talking about CPU or GPU implementations

gchanan · 2019-08-05T18:13:52Z

@botcs please file an issue.

botcs · 2019-08-06T22:16:15Z

I have to correct myself, the GPU time does not double, I had other processes stuck in. Sorry...

@gchanan filed an issue #23907

makslevental · 2020-04-18T20:25:57Z

@VitalyFedyunin how can this be optimized for Boolean tensors?

VitalyFedyunin · 2020-04-22T22:28:06Z

To be honest first step here should be migrating from TH to Aten code, as it might help with vectorization.

VitalyFedyunin changed the title ~~Optimize performance of CPU version of the nonzero function.~~ Optimize CPU version performance of the nonzero function. Dec 13, 2018

gchanan reviewed Dec 13, 2018

View reviewed changes

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated

facebook-github-bot reviewed Dec 14, 2018

View reviewed changes

gchanan reviewed Dec 17, 2018

View reviewed changes

facebook-github-bot reviewed Dec 19, 2018

View reviewed changes

gchanan approved these changes Jan 8, 2019

View reviewed changes

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated

VitalyFedyunin force-pushed the perf_nonzero branch from c859e9d to 9cbfe30 Compare January 9, 2019 16:09

facebook-github-bot reviewed Jan 9, 2019

View reviewed changes

facebook-github-bot closed this in 5838b59 Jan 9, 2019

VitalyFedyunin mentioned this pull request Jan 10, 2019

Optimize CPU version performance of the nonzero function. #15925

Closed

ezyang added the merged label Jun 25, 2019

gchanan mentioned this pull request Jul 29, 2019

torch.nonzero slower than np.nonzero #14848

Closed

botcs mentioned this pull request Aug 6, 2019

count_nonzero #23907

Closed

Conversation

VitalyFedyunin commented Dec 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VitalyFedyunin commented Dec 13, 2018

Uh oh!

Uh oh!

Uh oh!

VitalyFedyunin commented Dec 14, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VitalyFedyunin commented Dec 19, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

gchanan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

botcs commented Aug 5, 2019

Uh oh!

VitalyFedyunin commented Aug 5, 2019

Uh oh!

gchanan commented Aug 5, 2019

Uh oh!

botcs commented Aug 6, 2019

Uh oh!

makslevental commented Apr 18, 2020

Uh oh!

VitalyFedyunin commented Apr 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

VitalyFedyunin commented Dec 13, 2018 •

edited

Loading