-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fast histogram observer #29790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast histogram observer #29790
Conversation
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c6c795d Pull Request resolved: #29790
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 87a1583 Pull Request resolved: #29790
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Speed up histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 09481b9 Pull Request resolved: #29790
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Speed up histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7b2778e Pull Request resolved: #29790
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM, it seems you already added it.
hx89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great idea and huge speedup!
torch/quantization/observer.py
Outdated
| dtype=torch.double)[downsample_rate - 1 :: downsample_rate] | ||
| # Finally perform interpolation | ||
| shifted_integral_histogram = torch.zeros((Nbins)) | ||
| shifted_integral_histogram[0] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems this is not needed since it's initialized to zero in line 739?
torch/quantization/observer.py
Outdated
| self.min_val = None | ||
| self.max_val = None | ||
| self.dst_nbins = 2 ** torch.iinfo(self.dtype).bits | ||
| self.upsample_rate = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add comment on why choose 128 here? And could make upsample_rate as an input argument.
Summary: 10x speed up of histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: 10x speed up of histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Address review comments and typos Speed up histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0785394 Pull Request resolved: #29790
hx89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary: 10x speed up of histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D18508562](https://our.internmc.facebook.com/intern/diff/D18508562) [ghstack-poisoned]
Summary: Address review comments and typos Speed up histogram observers by rewriting histogram combination routine. Previously this was done as explicit bilinear interpolation. Now this is done as a sample rate conversion operation, where we achieve resampling by an upsampling (zero-order hold) followed by box filtering and downsampling. Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Before change: 6.9 After change: 0.6 Resnet-18 accuracy is unchanged with the faster histogram observer. Acc1 = 69.4 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 34dec38 Pull Request resolved: #29790
|
This pull request has been merged in 67b77af. |
Summary: Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1a6c6f9 Pull Request resolved: pytorch/pytorch#29790
Stack from ghstack:
Summary:
10x speed up of histogram observers by rewriting histogram combination routine.
Previously this was done as explicit bilinear interpolation.
Now this is done as a sample rate conversion operation, where we achieve
resampling by an upsampling (zero-order hold) followed by box filtering and
downsampling.
Test Plan:
import torch
import time
import numpy as np
from torch.quantization.observer import HistogramObserver
X = torch.randn(1,1,224,224)
obs = HistogramObserver(2048)
acc_time = 0
for i in range(100):
X = torch.randn(10,1,320,320)
start = time.time()
obs(X)
#obs.forward_new(X)
acc_time = acc_time + time.time()-start
print(acc_time)
Before change:
6.9
After change:
0.6
Resnet-18 accuracy is unchanged with the faster histogram observer.
Acc1 = 69.4
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D18508562