Skip to content

pytorch nondeterministically hangs on CUDA tensor creation (Tegra X1) #1907

@dimatura

Description

@dimatura

Hello all, great work with pytorch. I'm trying to use pytorch in a Jetson TX1, and it sort of works. One issue I'm running into is that it sporadically hangs when using CUDA.

Some details:

  • Jetson TX1, a 64-bit ARM architecture with a tegra TX1
  • Ubuntu 16.04 aarch64
  • CUDA 8.0, CUDNN 5105
  • Python 2.7
  • Built from source (conda not available yet) from git commit 7c24a (Jun 23) and from tag 1.12. Both have the same issue.

Example program:

import torch
a = torch.FloatTensor(2).cuda()
print(a)

This code will work maybe 8/10 times (subjective belief, not actually measured) and the rest, it will hang on the second line. top indicates the program is using ~18% CPU. The program will not respond to Ctrl-C. Suspecting some kind of concurrency issue, I tried OMP_NUM_THREADS=1, and torch.set_num_threads(1) but the issue still occurs.

When not using .cuda() it never hangs up (and in fact starts relatively quickly compared to the cuda version -- the .cuda() call seems surprisingly slow).

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: cudaRelated to torch.cuda, and CUDA support in generalneeds reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions