Skip to content

Memory Leak & Performance Decrease on AMD CPU with PyTorch >1.3 #32008

@milutter

Description

@milutter

🐛 Bug

It seems there is a memory leak and performance decrease within torch.matmul on AMD processors and PyTorch >1.3. When I multiply two 'big' zero matrices 100000 times, the computation time is four times as long as for PyTorch >1.3 and the computation leaks around 600 MB. Interestingly, I can only observe this behaviour on AMD Processors running PyTorch greater 1.4. When running PyTorch 1.3.1 everything works fine.

For Intel Processors (OS = Ubuntu & Mac) running PyTorch 1.4.0, I did not observe the memory leak or the performance decrease.

To Reproduce

import os
import numpy as np
import psutil
import torch
from tqdm import trange

if __name__ == "__main__":
    print("\n\n\n")
    n_samples = 100000

    pid = os.getpid()
    prev_memoryUse = 0.0

    py = psutil.Process(pid)
    init_mem = py.memory_info()[0] / 2. ** 30
    print(f"PyTorch Version {torch.__version__}")
    for i in trange(n_samples, desc="Torch Memory:", ncols=100):
        tmp = torch.matmul(torch.zeros((1, 256, 256)), torch.zeros((1, 256, 2)))

    memoryUse = py.memory_info()[0] / 2. ** 30
    print(f"Torch Memory: Memory = {memoryUse:.3e}Gb \t Delta Memory = {memoryUse - init_mem:+.3e}Gb")

Expected behavior

600MB Memory Leak and 4x Computation Time with PyTorch 1.4.0 & 1.5.0 and AMD

AMD 3900X -> PyTorch Version 1.4.0 / Conda Installation / Ubuntu 18.04:
Torch Memory:: 100%|█████████████| 100000/100000 [00:19<00:00, 5091.66it/s]
Torch Memory: Memory = 7.318e-01Gb Delta Memory = +5.883e-01Gb

AMD 3900X -> PyTorch Nightly Version 1.5.0.dev20200109 / Pip Installation / Ubuntu 18.04:
PyTorch Version 1.5.0.dev20200109
Torch Memory:: 100%|█████████████| 100000/100000 [00:19<00:00, 5105.87it/s]
Torch Memory: Memory = 6.871e-01Gb Delta Memory = +5.464e-01Gb

Correct Behaviour with PyTorch 1.3.1 & AMD

AMD 3900X -> PyTorch Version 1.3.1 / Conda Installation / Ubuntu 18.04:
Torch Memory:: 100%|█████████████| 100000/100000 [00:05<00:00, 17323.61it/s]
Torch Memory: Memory = 1.603e-01Gb Delta Memory = +4.295e-03Gb

AMD 3900X -> PyTorch Version 1.3.1 / Pip Installation / Ubuntu 18.04:
Torch Memory:: 100%|█████████████| 100000/100000 [00:05<00:00, 17804.70it/s]
Torch Memory: Memory = 1.531e-01Gb Delta Memory = +3.162e-03Gb

Correct Behaviour with PyTorch 1.4 and Intel CPU

Intel i7 - 8th Edition -> PyTorch Version 1.4.0 / Conda Installation
Torch Memory:: 100%|█████████████| 100000/100000 [00:03<00:00, 27692.34it/s]
Torch Memory: Memory = 1.705e-01Gb Delta Memory = +3.803e-03Gb

Intel i7 - 9th Edition -> PyTorch Version 1.4.0 / Conda Installation
Torch Memory:: 100%|█████████████| 100000/100000 [00:03<00:00, 26194.08it/s]
Torch Memory: Memory = 1.562e-01Gb Delta Memory = +4.036e-03Gb

MacBook Pro 2018 i7 -> PyTorch Version 1.4.0 / Conda Installation
Torch Memory:: 100%|█████████████| 100000/100000 [00:05<00:00, 19769.36it/s]
Torch Memory: Memory = 8.743e-02Gb Delta Memory = +1.949e-03Gb

Environment

  • PyTorch Version: various, see above at expected behaviour
  • OS (e.g., Linux): Ubuntu 18.04
  • How you installed PyTorch (conda, pip, source): various, see above at expected behaviour
  • Python version: 3.7
  • CUDA/cuDNN version: CPU only

cc @ezyang @gchanan @zou3519 @VitalyFedyunin @ngimel @mruberry

Metadata

Metadata

Assignees

Labels

high prioritymodule: memory usagePyTorch is using more memory than it should, or it is leaking memorymodule: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions