Skip to content

Conversation

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jun 20, 2019

Get benefit from the compile time vectorization and multi-threading.

Before:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

After:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

After with multi-processing:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Jun 20, 2019
@cpuhrsch
Copy link
Contributor

@pytorchbot retest this please

@VitalyFedyunin VitalyFedyunin requested a review from cpuhrsch June 21, 2019 15:06
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bonus points we'd add this to more benchmark infra.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in fe580e8.

iotamudelta pushed a commit to ROCm/pytorch that referenced this pull request Jun 21, 2019
…vectorization. (pytorch#22038)

Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: pytorch#22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 21, 2019
…vectorization. (#22038)

Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: pytorch/pytorch#22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants