/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

I am seeing an error

```
caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCStorage.cu.o: in function `__device_stub__ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_6__fill7functorINS_10device_ptrIN3c107complexIdEEEESA_EElEESC_lEEvT0_T1_(thrust::cuda_cub::__fill::functor<thrust::device_ptr<c10::complex<double> >, c10::complex<double> >&, long)':
/home/gaoxiang/cuda11/include/cuda_runtime.h:209:(.text+0x7d4): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
```
when building with CUDA11 on my archlinux box.

There was no problem with the combination CUDA 10.2 + my archlinux, and CUDA11 + NVIDIA's NGC container.

After searching, seems that it is due to the code size being too large (see [code model](https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models), thank @mcarilli for pointing out the issue), and MXNet had the same issue: https://github.com/apache/incubator-mxnet/issues/17045

The cause of this problem could be because we are generating code for too many architectures by default:

```
--     CUDA include path   : /home/gaoxiang/cuda11/include
--     NVCC executable     : /home/gaoxiang/cuda11/bin/nvcc
--     NVCC flags          : -DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-Xcudafe;--diag_suppress=cc_clobber_ignored;-Xcudafe;--diag_suppress=integer_sign_change;-Xcudafe;--diag_suppress=useless_using_declaration;-Xcudafe;--diag_suppress=set_but_not_used;-Xcudafe;--diag_suppress=field_without_dll_interface;-Xcudafe;--diag_suppress=base_class_has_different_dll_interface;-Xcudafe;--diag_suppress=dll_interface_conflict_none_assumed;-Xcudafe;--diag_suppress=dll_interface_conflict_dllexport_assumed;-Xcudafe;--diag_suppress=implicit_return_from_non_void_function;-Xcudafe;--diag_suppress=unsigned_compare_with_zero;-Xcudafe;--diag_suppress=declared_but_not_referenced;-Xcudafe;--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/gcc-8
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
```

I am wondering if it is possible to add `-mcmodel=medium` during linking (tried for a few hours, didn't figure out how).

cc @ezyang @gchanan @zou3519 @malfet @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax #39968

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax #39968

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions