-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
oncall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
First 10-20 calls to forward are very slow for a jit version of a pre-packaged inception3 model.
To Reproduce
Code:
import sys
import time
import psutil
import torch
from torch import nn
from torchvision import models
def get_vms():
return psutil.Process().memory_info().vms
def sync_cuda():
t0 = time.time()
torch.cuda.synchronize('cuda')
print(f"cuda sync took {time.time() - t0:.3f} second(s)")
if __name__ == '__main__':
device = sys.argv[1]
net = models.inception_v3(pretrained=True, aux_logits=True)
net = torch.jit.script(net)
if device == 'cuda':
sync_cuda()
net = net.to(device).eval()
input = torch.randn(1,3,299,299,requires_grad=False).to(device)
if device == 'cuda':
sync_cuda()
out0 = None
with torch.no_grad():
for i in range(30):
t0 = time.time()
pred = net(input)
if out0 is not None and not torch.allclose(out0.logits, pred.logits):
print(f"{i}: logits aren't allclose; abs-sum={(out0.logits - pred.logits).abs().sum()}")
print(f"{i} vms:{get_vms() / 1024/1024/1024:.3f}Gb seconds/iter={time.time() - t0:.3f}")
out0 = pred
CPU-based run:
(pt-nightly) igor@w ~/playground$ python memory_leakage_test.py cpu
0 vms:4.635Gb seconds/iter=2.548
1 vms:4.910Gb seconds/iter=10.017
2 vms:5.153Gb seconds/iter=10.036
3 vms:5.386Gb seconds/iter=9.321
4 vms:5.620Gb seconds/iter=8.975
5 vms:5.844Gb seconds/iter=7.905
6 vms:6.058Gb seconds/iter=7.819
7 vms:6.262Gb seconds/iter=6.870
8 vms:6.457Gb seconds/iter=6.727
9 vms:6.642Gb seconds/iter=5.774
10 vms:6.824Gb seconds/iter=5.617
11 vms:7.002Gb seconds/iter=5.067
12 vms:7.162Gb seconds/iter=5.008
13 vms:7.310Gb seconds/iter=3.942
14 vms:7.443Gb seconds/iter=3.695
15 vms:7.565Gb seconds/iter=2.950
16 vms:7.672Gb seconds/iter=2.746
17 vms:7.767Gb seconds/iter=2.042
18 vms:7.846Gb seconds/iter=1.871
19 vms:7.911Gb seconds/iter=1.318
20 vms:7.966Gb seconds/iter=1.048
21 vms:8.014Gb seconds/iter=0.841
22 vms:8.048Gb seconds/iter=0.813
23 vms:8.070Gb seconds/iter=0.561
24 vms:8.082Gb seconds/iter=0.468
25 vms:8.095Gb seconds/iter=0.370
26 vms:8.099Gb seconds/iter=0.327
CUDA:
(pt-nightly) igor@w ~/playground$ python memory_leakage_test.py cuda
cuda sync took 2.181 second(s)
cuda sync took 0.000 second(s)
0 vms:10.510Gb seconds/iter=2.357
1: logits aren't allclose; abs-sum=0.0007971674203872681
1 vms:10.769Gb seconds/iter=9.826
2 vms:11.018Gb seconds/iter=9.541
3 vms:11.261Gb seconds/iter=8.995
4 vms:11.493Gb seconds/iter=8.586
5 vms:11.717Gb seconds/iter=7.597
6 vms:11.929Gb seconds/iter=7.358
7 vms:12.135Gb seconds/iter=6.468
8 vms:12.329Gb seconds/iter=6.244
9 vms:12.516Gb seconds/iter=5.422
10 vms:12.697Gb seconds/iter=5.233
11 vms:12.874Gb seconds/iter=4.771
12 vms:13.034Gb seconds/iter=4.647
13 vms:13.185Gb seconds/iter=3.569
14 vms:13.316Gb seconds/iter=3.374
15 vms:13.439Gb seconds/iter=2.519
16 vms:13.545Gb seconds/iter=2.359
17 vms:13.642Gb seconds/iter=1.663
18 vms:13.721Gb seconds/iter=1.508
19 vms:13.787Gb seconds/iter=0.950
20 vms:13.841Gb seconds/iter=0.739
21 vms:13.889Gb seconds/iter=0.504
22 vms:13.922Gb seconds/iter=0.440
23 vms:13.946Gb seconds/iter=0.201
24 vms:13.954Gb seconds/iter=0.141
25 vms:13.955Gb seconds/iter=0.017
26 vms:13.956Gb seconds/iter=0.010
27 vms:13.956Gb seconds/iter=0.008
28 vms:13.956Gb seconds/iter=0.007
29 vms:13.956Gb seconds/iter=0.007
Expected behavior
Expecting all forward-s to take the same amount of time.
Environment
PyTorch version: 1.4.0.dev20191205
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 19.10
GCC version: (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
CMake version: version 3.13.4
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 2080 SUPER
Nvidia driver version: 440.36
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.16.2
[conda] mkl 2019.4 243
[conda] pytorch 1.4.0.dev20191205 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch-nightly
[conda] torchvision 0.5.0.dev20191205 py37_cu101 pytorch-nightly
cc @suo
Metadata
Metadata
Assignees
Labels
oncall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module