-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
needs reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.Ensure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.
Description
I am trying to train this model by
python train.py --use-cuda True --dropout 0.25 --batch-size 7 --num-iterations 2000000
after several iterations (~16k) on my 8GB GPU I've got
THCudaCheck FAIL file=/data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 84, in <module>
print(cdvae.translate(input, ['ru', 'ru'], batch_loader))
File "/home/py36/CDVAE/model/cdvae.py", line 97, in translate
return model_to.sample(bl, encoder_input.size()[1], encoder_input.is_cuda, z)
File "/home/py36/CDVAE/model/vae/vae.py", line 115, in sample
x, state, _, _ = self(0., None, x, z, state)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/home/py36/CDVAE/model/vae/vae.py", line 63, in forward
out, final_state = self.generate(decoder_input, z, drop_prob, initial_state)
File "/home/py36/CDVAE/model/vae/vae.py", line 79, in generate
return self.decoder(decoder_input, z, initial_state)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/home/py36/CDVAE/model/vae/decoder.py", line 48, in forward
result, final_state = self.rnn(decoder_input, initial_state)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 327, in forward
return func(input, *fargs, **fkwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 201, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 223, in forward
result = self.forward_extended(*nested_tensors)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/rnn.py", line 247, in forward
fn.weight_buf = x.new(num_weights)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66
Metadata
Metadata
Assignees
Labels
needs reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.Ensure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.