Skip to content

Conversation

@dskhudia
Copy link
Contributor

@dskhudia dskhudia commented Sep 24, 2019

Stack from ghstack:

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported.
Resnext101-32x4d shapes

Differential Revision: D17540567

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
@pytorchbot pytorchbot added module: operators oncall: quantization Quantization support in PyTorch labels Sep 24, 2019
dskhudia added a commit that referenced this pull request Sep 24, 2019
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

ghstack-source-id: 90635816
Pull Request resolved: #26692
@dskhudia dskhudia requested review from dzhulgakov, jianyuh and raghuramank100 and removed request for jianyuh September 24, 2019 00:07
@jamesr66a
Copy link
Collaborator

Yeah seems reasonable, but definitely need perf numbers before approval

@dskhudia dskhudia requested a review from ilia-cher September 24, 2019 00:25
@dskhudia dskhudia added this to the 1.3 milestone Sep 24, 2019
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
dskhudia added a commit that referenced this pull request Sep 24, 2019
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 90712827

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)
@dskhudia dskhudia reopened this Sep 25, 2019
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
dskhudia added a commit that referenced this pull request Sep 25, 2019
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 90776466

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)
@dskhudia dskhudia removed the merged label Sep 25, 2019
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
dskhudia added a commit that referenced this pull request Sep 28, 2019
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 90976812

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
Copy link
Member

@jianyuh jianyuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
dskhudia added a commit that referenced this pull request Sep 30, 2019
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91050435

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)
Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

~~TODO: Performance numbers.~~
qconv performance ( I had to set manually with at::set_num_threads(4)). The op idxs which see the same performance are groupwise convolutions and parallelism for groupwise is not yet supported. 
![Resnext101-32x4d shapes](https://user-images.githubusercontent.com/37562707/65555703-2ff3e900-dee2-11e9-8e27-d61d05ebf9b9.png)


Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)

[ghstack-poisoned]
dskhudia added a commit that referenced this pull request Oct 1, 2019
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Differential Revision: [D17540567](https://our.internmc.facebook.com/intern/diff/D17540567/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 3eefc54.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 2, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jamesr66a pushed a commit that referenced this pull request Oct 3, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jamesr66a pushed a commit that referenced this pull request Oct 3, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jamesr66a pushed a commit that referenced this pull request Oct 3, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jamesr66a pushed a commit that referenced this pull request Oct 4, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jamesr66a pushed a commit that referenced this pull request Oct 4, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
soumith pushed a commit that referenced this pull request Oct 7, 2019
Summary:
Pull Request resolved: #26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
jianyuh added a commit that referenced this pull request Oct 22, 2019
Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op.

Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/)

[ghstack-poisoned]
jianyuh added a commit that referenced this pull request Oct 22, 2019
… operator"

Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op.

Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/)

[ghstack-poisoned]
jianyuh added a commit that referenced this pull request Oct 22, 2019
Pull Request resolved: #28477

Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op.
ghstack-source-id: 92419573

Differential Revision: [D18074757](https://our.internmc.facebook.com/intern/diff/D18074757/)
facebook-github-bot pushed a commit that referenced this pull request Oct 28, 2019
…28477)

Summary:
Pull Request resolved: #28477

Similar to #26692, we would like to enable the intra-op parallelism for dynamic Linear op.
ghstack-source-id: 92419573

Test Plan:
CI

Test Benchmark:
```
import time
import torch

K, N = 1024, 1024

print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ')

for M in range(512, 2049, 512):
    print(M, sep=',', end=', ')
    for num_threads in (1, 2, 4, 8, 16,):

        torch.set_num_threads(num_threads)

        x = torch.rand(M, K)
        w = torch.rand(K, N)

        NITER = 20

        # Test dynamic quantized
        q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8)
        packed_w = torch.ops.quantized.linear_prepack(q_w, None)

        s = time.time()
        for i in range(NITER):
            torch.ops.quantized.linear_dynamic(x, packed_w)
        elapsed_per_iter_dyn_quant = (time.time() - s) / NITER

        print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ')
    print("\n", end='')
```
Before this Diff:
```
(base) [[email protected] ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 119.28, 139.50, 141.66, 141.58, 141.42,
1024, 122.42, 141.21, 123.09, 141.85, 123.03,
1536, 122.80, 122.18, 141.39, 123.25, 141.35,
2048, 123.41, 141.34, 123.62, 140.55, 123.76,
```

After this Diff:
```
(base) [[email protected] ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 123.29, 271.99, 508.66, 882.83, 1295.07,
1024, 126.05, 273.15, 515.42, 914.11, 877.63,
1536, 142.48, 236.85, 524.10, 481.32, 970.81,
2048, 124.76, 279.03, 433.73, 958.67, 1045.82,
```

Differential Revision: D18074757

fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5
zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 28, 2019
…#28477)

Summary:
Pull Request resolved: pytorch/pytorch#28477

Similar to pytorch/pytorch#26692, we would like to enable the intra-op parallelism for dynamic Linear op.
ghstack-source-id: 92419573

Test Plan:
CI

Test Benchmark:
```
import time
import torch

K, N = 1024, 1024

print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ')

for M in range(512, 2049, 512):
    print(M, sep=',', end=', ')
    for num_threads in (1, 2, 4, 8, 16,):

        torch.set_num_threads(num_threads)

        x = torch.rand(M, K)
        w = torch.rand(K, N)

        NITER = 20

        # Test dynamic quantized
        q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8)
        packed_w = torch.ops.quantized.linear_prepack(q_w, None)

        s = time.time()
        for i in range(NITER):
            torch.ops.quantized.linear_dynamic(x, packed_w)
        elapsed_per_iter_dyn_quant = (time.time() - s) / NITER

        print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ')
    print("\n", end='')
```
Before this Diff:
```
(base) [[email protected] ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 119.28, 139.50, 141.66, 141.58, 141.42,
1024, 122.42, 141.21, 123.09, 141.85, 123.03,
1536, 122.80, 122.18, 141.39, 123.25, 141.35,
2048, 123.41, 141.34, 123.62, 140.55, 123.76,
```

After this Diff:
```
(base) [[email protected] ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 123.29, 271.99, 508.66, 882.83, 1295.07,
1024, 126.05, 273.15, 515.42, 914.11, 877.63,
1536, 142.48, 236.85, 524.10, 481.32, 970.81,
2048, 124.76, 279.03, 433.73, 958.67, 1045.82,
```

Differential Revision: D18074757

fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5
@facebook-github-bot facebook-github-bot deleted the gh/dskhudia/11/head branch October 28, 2019 22:08
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
Pull Request resolved: pytorch#26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants