Skip to content

Optimize CPU version performance of the nonzero function.#15190

Closed
VitalyFedyunin wants to merge 1 commit intopytorch:masterfrom
VitalyFedyunin:perf_nonzero
Closed

Optimize CPU version performance of the nonzero function.#15190
VitalyFedyunin wants to merge 1 commit intopytorch:masterfrom
VitalyFedyunin:perf_nonzero

Conversation

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin commented Dec 13, 2018

Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.

Can be further optimized for 1D tensors and boolean tensors.

@VitalyFedyunin VitalyFedyunin changed the title Optimize performance of CPU version of the nonzero function. Optimize CPU version performance of the nonzero function. Dec 13, 2018
@VitalyFedyunin
Copy link
Copy Markdown
Contributor Author

Related to #14848

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
@VitalyFedyunin
Copy link
Copy Markdown
Contributor Author

screen shot 2018-12-14 at 1 01 35 pm

New version with pointers math works faster than numpy in all cases (2x faster in avg.).

Still need to change GPU version.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
@VitalyFedyunin
Copy link
Copy Markdown
Contributor Author

@gchanan bug fixed

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test for the non-contiguous case would be a nice addition.

This looks good to go though, feel free to either commit or address the comments then commit.

Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
Comment thread aten/src/TH/generic/THTensorEvenMoreMath.cpp Outdated
)

Summary:
Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.

Can be further optimized for 1D tensors and boolean tensors.
Pull Request resolved: pytorch#15190

Differential Revision: D13468570

fbshipit-source-id: 31e155c5ef247a8983b4c1c12f25b0aafb315e43
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jan 9, 2019
Summary:
Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.

Can be further optimized for 1D tensors and boolean tensors.
Pull Request resolved: pytorch/pytorch#15190

Differential Revision: D13468570

Pulled By: VitalyFedyunin

fbshipit-source-id: e55ce54d60626a42d9a10a02e407856458b8055e
facebook-github-bot pushed a commit that referenced this pull request Jan 11, 2019
Summary:
Same as #15190 but compatible with MSVS compiler
Pull Request resolved: #15925

Differential Revision: D13623473

Pulled By: VitalyFedyunin

fbshipit-source-id: d0db9dbc1a0d8fc9bda08348cb1d3763ae9f8679
@ezyang ezyang added the merged label Jun 25, 2019
@botcs
Copy link
Copy Markdown

botcs commented Aug 5, 2019

Hi @gchanan
This still seems to be a problem in 1.1.0 for some applications.
If you want to have only the sum of nonzero elements it is faster by an order of magnitude to clamp your tensor and do a sum over it. Both on GPU (I used torch.cuda.synchronize for benchmarking) and CPU.

also using uint8 as dtype instead of int64 will double the computing time, that is really annoying

@VitalyFedyunin
Copy link
Copy Markdown
Contributor Author

@botcs are you talking about CPU or GPU implementations

@gchanan
Copy link
Copy Markdown
Contributor

gchanan commented Aug 5, 2019

@botcs please file an issue.

@botcs botcs mentioned this pull request Aug 6, 2019
@botcs
Copy link
Copy Markdown

botcs commented Aug 6, 2019

I have to correct myself, the GPU time does not double, I had other processes stuck in. Sorry...

@gchanan filed an issue #23907

@makslevental
Copy link
Copy Markdown
Contributor

@VitalyFedyunin how can this be optimized for Boolean tensors?

@VitalyFedyunin
Copy link
Copy Markdown
Contributor Author

To be honest first step here should be migrating from TH to Aten code, as it might help with vectorization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants