Skip to content

Memory leak? #1846

@kefirski

Description

@kefirski

I am trying to train this model by
python train.py --use-cuda True --dropout 0.25 --batch-size 7 --num-iterations 2000000

after several iterations (~16k) on my 8GB GPU I've got

THCudaCheck FAIL file=/data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 84, in <module>
    print(cdvae.translate(input, ['ru', 'ru'], batch_loader))
  File "/home/py36/CDVAE/model/cdvae.py", line 97, in translate
    return model_to.sample(bl, encoder_input.size()[1], encoder_input.is_cuda, z)
  File "/home/py36/CDVAE/model/vae/vae.py", line 115, in sample
    x, state, _, _ = self(0., None, x, z, state)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/py36/CDVAE/model/vae/vae.py", line 63, in forward
    out, final_state = self.generate(decoder_input, z, drop_prob, initial_state)
  File "/home/py36/CDVAE/model/vae/vae.py", line 79, in generate
    return self.decoder(decoder_input, z, initial_state)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/py36/CDVAE/model/vae/decoder.py", line 48, in forward
    result, final_state = self.rnn(decoder_input, initial_state)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 91, in forward
    output, hidden = func(input, self.all_weights, hx)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 327, in forward
    return func(input, *fargs, **fkwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 201, in _do_forward
    flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 223, in forward
    result = self.forward_extended(*nested_tensors)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
    cudnn.rnn.forward(self, input, hx, weight, output, hy)
  File "/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/rnn.py", line 247, in forward
    fn.weight_buf = x.new(num_weights)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions