Skip to content

Pytorch v0.4.1 fails to build on Nvidia Drive PX2 #11518

@jeff-hawke

Description

@jeff-hawke

Issue description

Building Pytorch 0.4.1 from source on a Nvidia Drive PX2 (Driveworks 0.6) currently does not work, due to an odd Nvidia print statement which breaks the CUDA architecture detection: running any CUDA process results in the following print statement to std out:
nvrm_gpu: Bug 200215060 workaround enabled.
Unfortunately there's nothing I can do (rather, that I've found) to work around this or suppress it - it's part of the CUDA 9.0 install which comes with the Driveworks SDK for these PX2s.

This breaks the CUDA architecture detection in <pytorch_root>/cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake here from function CUDA_DETECT_INSTALLED_GPUS:

  • This function writes and compiles a short cpp program which prints the CUDA device architectures, and caches the program output in CMakeCache.
  • Instead of printing 6.1 6.2 to stdout as expected on this device, this additional print statement results in CUDA_GPU_DETECT_OUTPUT being set to: nvrm_gpu: Bug 200215060 workaround enabled.\n6.1 6.2
  • This newline breaks CMakeCache, which doesn't handle newlines in cached variables.
  • In addition, the output results in a number of message(SEND_ERROR <>) from each parsed string (stating that 'nvrm_gpu:', 'Bug', '200215060', ... aren't valid architectures, understandably).

This can be fixed by adding a one-line addition to line 90 of this .cmake file, parsing the program output to ensure it has a list of sensible possible architecture version in the compute_capabilities variable (floats, e.g, 6.1, 6.2, etc)
string(REGEX MATCHALL "[0-9]+\\.[0-9]+" compute_capabilities "${compute_capabilities}")

With this patch, pytorch 0.4.1 builds happily.

If you have any other suggestions for a fix or workaround, I'd be happy to try them.

Code example

Reproduceable on multiple PX2s with this version of driveworks, python 3.5, a fresh checkout of 0.4.1, and the following install command:

MAX_JOBS=1 python3 setup.py install --user

System Info

CUDA used to build PyTorch: 9.0.225
OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.5
CUDA runtime version: 9.0.225
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.7.0.4
/usr/lib/aarch64-linux-gnu/libcudnn_static_v7.a

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions