-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
high prioritymodule: buildBuild system issuesBuild system issuesmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
I am seeing an error
caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/THC/torch_cuda_generated_THCStorage.cu.o: in function `__device_stub__ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_6__fill7functorINS_10device_ptrIN3c107complexIdEEEESA_EElEESC_lEEvT0_T1_(thrust::cuda_cub::__fill::functor<thrust::device_ptr<c10::complex<double> >, c10::complex<double> >&, long)':
/home/gaoxiang/cuda11/include/cuda_runtime.h:209:(.text+0x7d4): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
when building with CUDA11 on my archlinux box.
There was no problem with the combination CUDA 10.2 + my archlinux, and CUDA11 + NVIDIA's NGC container.
After searching, seems that it is due to the code size being too large (see code model, thank @mcarilli for pointing out the issue), and MXNet had the same issue: apache/mxnet#17045
The cause of this problem could be because we are generating code for too many architectures by default:
-- CUDA include path : /home/gaoxiang/cuda11/include
-- NVCC executable : /home/gaoxiang/cuda11/bin/nvcc
-- NVCC flags : -DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-Xcudafe;--diag_suppress=cc_clobber_ignored;-Xcudafe;--diag_suppress=integer_sign_change;-Xcudafe;--diag_suppress=useless_using_declaration;-Xcudafe;--diag_suppress=set_but_not_used;-Xcudafe;--diag_suppress=field_without_dll_interface;-Xcudafe;--diag_suppress=base_class_has_different_dll_interface;-Xcudafe;--diag_suppress=dll_interface_conflict_none_assumed;-Xcudafe;--diag_suppress=dll_interface_conflict_dllexport_assumed;-Xcudafe;--diag_suppress=implicit_return_from_non_void_function;-Xcudafe;--diag_suppress=unsigned_compare_with_zero;-Xcudafe;--diag_suppress=declared_but_not_referenced;-Xcudafe;--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
-- CUDA host compiler : /usr/bin/gcc-8
-- NVCC --device-c : OFF
-- USE_TENSORRT : OFF
I am wondering if it is possible to add -mcmodel=medium during linking (tried for a few hours, didn't figure out how).
sailfish009, malfet, ian0371, gnaggnoyil, nelson-liu and 5 more
Metadata
Metadata
Assignees
Labels
high prioritymodule: buildBuild system issuesBuild system issuesmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module