Skip to content

Build from source: CMake not using compilers installed by Anaconda, fails with stl_pair.h errors #47717

@wpm

Description

@wpm

🐛 Bug

I tried to build PyTorch from source in an Anaconda environment, using Anaconda to update to the latest C and C++ compilers. PyTorch did not build because it ignored the upgraded compilers. When I forced it to use the upgraded compilers, it failed when building the Caffee components with errors about stl_pair.h.

To Reproduce

  1. conda create --name pytorch -y
  2. conda activate pytorch
  3. conda install -c anaconda gcc_linux-64 gxx_linux-64 -y
  4. conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
  5. conda install -c pytorch magma-cuda110 -y
  6. git clone --recursive https://github.com/pytorch/pytorch
  7. cd pytorch
  8. git submodule sync
  9. git submodule update --init --recursive
  10. export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
  11. python setup.py install

Everything except step (3) is taken from the PyTorch instructions for installing from source. Step (3) uses Anaconda to upgrade the C and C++ compilers.

As described in the Anaconda compiler tools documentation, step (3) installs compilers under names decorated with host and target platform names and updates the $CC, $CXX, $GCC, and $GXX environment variables accordingly. On my machine these are as follows.

(pytorch) $ echo $CC
/home/wmcneill/anaconda3/envs/pytorch/bin/x86_64-conda_cos6-linux-gnu-cc
(pytorch) $ echo $CXX
/home/wmcneill/anaconda3/envs/pytorch/bin/x86_64-conda_cos6-linux-gnu-c++
(pytorch) $ echo $GCC
/home/wmcneill/anaconda3/envs/pytorch/bin/x86_64-conda_cos6-linux-gnu-gcc
(pytorch) $ echo $GXX
/home/wmcneill/anaconda3/envs/pytorch/bin/x86_64-conda_cos6-linux-gnu-g++

These are all version 7.3.0.

The build fails immediately when checking compiler versions. The PyTorch compiler check runs on the Anaconda default /home/wmcneill/anaconda3/bin/c++ instead of the newer compilers installed by Anaconda. That compiler is version 5.2.0.

-- Check for working CXX compiler: /home/wmcneill/anaconda3/bin/c++
-- Check for working CXX compiler: /home/wmcneill/anaconda3/bin/c++ - broken
CMake Error at /home/wmcneill/anaconda3/envs/pytorch/share/cmake-3.18/Modules/CMakeTestCXXCompiler.cmake:59 (message):
  The C++ compiler

    "/home/wmcneill/anaconda3/bin/c++"

  is not able to compile a simple test program.

The full output is here: pytorch-compilation-error.txt.

If I manually change the compilers in /home/wmcneill/anaconda3/bin to point to the newly upgraded ones I get past the compiler checks, but get 91 error messages when compiling the Caffe component. They appear to occur for most of the *.cu source files in pytorch/aten/src/ATen/native/cuda. A typical error looks like this.

[4517/5577] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCTensorIndex.cu.o
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCTensorIndex.cu.o 
cd /home/wmcneill/src/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC && /home/wmcneill/anaconda3/envs/pytorch/bin/cmake -E make_directory /home/wmcneill/src/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/. && /home/wmcneill/anaconda3/envs/pytorch/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/home/wmcneill/src/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/./torch_cuda_generated_THCTensorIndex.cu.o -D generated_cubin_file:STRING=/home/wmcneill/src/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/./torch_cuda_generated_THCTensorIndex.cu.o.cubin.txt -P /home/wmcneill/src/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCTensorIndex.cu.o.Release.cmake
/home/wmcneill/anaconda3/envs/pytorch/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: argument list for class template "std::pair" is missing

/home/wmcneill/anaconda3/envs/pytorch/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ")"

/home/wmcneill/anaconda3/envs/pytorch/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: template parameter "_T1" may not be redeclared in this scope

/home/wmcneill/anaconda3/envs/pytorch/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ";"
[...]

Expected behavior

I expect the PyTorch build system to use the $CC, $CXX, $GCC, and $GXX environment variables which point to the newer compiler versions.

When I manually change the compilers to the new versions I expect PyTorch to build and install.

Environment

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Clang version: Could not collect
CMake version: version 3.18.4

Python version: 3.9 (64-bit runtime)
Is CUDA available: N/A
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB

Nvidia driver version: 455.32.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[conda] magma-cuda110             2.5.2                         1    pytorch
[conda] mkl                       2020.4             h726a3e6_304    conda-forge
[conda] mkl-include               2020.4             h726a3e6_304    conda-forge
[conda] numpy                     1.19.4           py39h57d35e7_1    conda-forge

The GCC version comes from the default system compiler installed at /usr/bin/gcc.

Additional context

I first encountered build problems in #47529. That involved git errors which have since been resolved. Following a request from @ptrblck, I am opening this new issue to focus exclusively on the compilation problems.

cc @malfet @seemethere @walterddr

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions