Skip to content

Conversation

@xuhdev
Copy link
Collaborator

@xuhdev xuhdev commented Oct 1, 2019

This is a small fix, but the runtime improvement does seem consistent (a bit less than 10%):

Benchmark (no turbo, Release build, gcc 8.3, RHEL 7.7, Intel(R) Core(TM) i7-8850H):

import timeit

for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
    print(f'dtype={dtype}')
    for n, t in [(70_000, 200000),
                (700_000, 20000)]:
        print(f'torch.nn.Threshold(0.1, 20)(a), numel() == {n} for {t} times')
        print(timeit.timeit(f'm(a)', setup=f'import torch; m=torch.nn.Threshold(0.1, 20); a = torch.arange({n}, dtype={dtype})', number=t))

Before:

dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.88117562699972
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.525143070000013
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.673380930000349
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.677610996000112
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.957677209999929
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.8512293700005102
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.624350482999944
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.670380037000541
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.86375758200029
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.468234717999621

After:

dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.64173036200009
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.456986365000375
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.431988049000211
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.446968590000324
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.743787463999979
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.823233144000369
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.42801834400052
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.4600211680008215
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.562551314000302
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.37924196699987

This is a small fix, but the runtime improvement does seem consistent (a bit less than 10%):

Benchmark (no turbo, gcc 8.3, RHEL 7.7, Intel(R) Core(TM) i7-8850H):

```python
import timeit

for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
    print(f'dtype={dtype}')
    for n, t in [(70_000, 200000),
                (700_000, 20000)]:
        print(f'torch.nn.Threshold(0.1, 20)(a), numel() == {n} for {t} times')
        print(timeit.timeit(f'm(a)', setup=f'import torch; m=torch.nn.Threshold(0.1, 20); a = torch.arange({n}, dtype={dtype})', number=t))
```

Before:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.88117562699972
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.525143070000013
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.673380930000349
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.677610996000112
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.957677209999929
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.8512293700005102
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.624350482999944
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.670380037000541
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.86375758200029
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.468234717999621
```

After:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.64173036200009
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.456986365000375
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.431988049000211
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.446968590000324
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.743787463999979
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.823233144000369
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.42801834400052
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.4600211680008215
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.562551314000302
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.37924196699987
```
@xuhdev xuhdev requested review from colesbury and xiaomengy October 1, 2019 19:14
@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Oct 1, 2019
@soumith soumith requested a review from VitalyFedyunin October 4, 2019 17:27
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@xuhdev
Copy link
Collaborator Author

xuhdev commented Oct 22, 2019

@pytorchbot merge this please

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Oct 22, 2019
zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 1, 2019
Summary:
This is a small fix, but the runtime improvement does seem consistent (a bit less than 10%):

Benchmark (no turbo, Release build, gcc 8.3, RHEL 7.7, Intel(R) Core(TM) i7-8850H):

```python
import timeit

for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
    print(f'dtype={dtype}')
    for n, t in [(70_000, 200000),
                (700_000, 20000)]:
        print(f'torch.nn.Threshold(0.1, 20)(a), numel() == {n} for {t} times')
        print(timeit.timeit(f'm(a)', setup=f'import torch; m=torch.nn.Threshold(0.1, 20); a = torch.arange({n}, dtype={dtype})', number=t))
```

Before:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.88117562699972
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.525143070000013
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.673380930000349
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.677610996000112
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.957677209999929
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.8512293700005102
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.624350482999944
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.670380037000541
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.86375758200029
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.468234717999621
```

After:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.64173036200009
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.456986365000375
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.431988049000211
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.446968590000324
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.743787463999979
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.823233144000369
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.42801834400052
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.4600211680008215
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.562551314000302
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.37924196699987
```
Pull Request resolved: pytorch/pytorch#27155

Differential Revision: D17790768

Pulled By: VitalyFedyunin

fbshipit-source-id: 3281eaff77ddddd658048c9e73824dd68c548591
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 8a1f42b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-this-please Was marked for merge with @pytorchbot merge this please Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants