-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Basic network block:
(0): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1))
(1): LeakyReLU (0.333, inplace)
(2): FractionalMaxPool2d (
)
The following stack traces occur very 1-2 epochs. I randomly pools inputs set to fit memory.
Some time it throws the following stack trace:
THCudaCheck FAIL file=/home/ubuntu/src/pytorch/torch/lib/THCUNN/im2col.h line=60 error=59 : device-side assert triggered Traceback (most recent call last): File "train.py", line 502, in <module> best_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=200) File "train.py", line 184, in train_model outputs = model(inputs) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/projects/tutorial-pytorch/model_zoo.py", line 30, in forward x = self.features(x) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 237, in forward self.padding, self.dilation, self.groups) File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 43, in conv2d return f(input, weight, bias) RuntimeError: cuda runtime error (59) : device-side assert triggered at /home/ubuntu/src/pytorch/torch/lib/THCUNN/im2col.h:60
Some time it throws another:
Traceback (most recent call last):
File "train.py", line 502, in
best_model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=200)
File "train.py", line 204, in train_model
loss.backward()
File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 145, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
RuntimeError: cublas runtime error : the GPU program failed to execute at /home/ubuntu/src/pytorch/torch/lib/THC/THCBlas.cu:105