Conversation
|
@tianleiwu , there is a strange error from CUDNN frontend , which was caused by upgrading CUDNN from 9.5 to 9.6. Could you please help me take a look? |
Tried upgrade both cudnn-frontend and cudnn, and submitted a test build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1579205&view=results Worst case is that we may add an fallback to cudnn backend directly as before for the case that cannot be handled by cudnn frontend. |
|
The error was: [E:onnxruntime:yolov3, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'conv2d_2_0' Status Message: Failed to initialize CUDNN Frontend/onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] CUDNN_FE failure 8: HEURISTIC_QUERY_FAILED ; GPU=0 ; hostname=98d137446008 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=225 ; expr=s_.cudnn_fe_graph->create_execution_plans({heur_mode}); |
@gedoensmax, @JTischbein, is it a known issue that cudnn 9.6 has regression of support convolution for yolo v3? Here is cudnn 9.6 debug log: |
c9c52e1 to
5944339
Compare
|
I downgraded the CUDA 12 image's CUDNN version back to 9.5, then the test passed. It means we cannot use same the same cudnn version for both CUDA 11 and 12. But, that's ok. |
69ddb90 to
494982c
Compare
The new images contain the following updates: 1. Added Git, Ninja and VCPKG to all docker images 2. Updated CPU containers' GCC version from 12 to 14 3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6) 4. Addressed container supply chain warnings by building CUDA 12 images from scratch(avoid using Nvidia's prebuilt images) 5. Updated manylinux commit id to 75aeda9d18eafb323b00620537c8b4097d4bef48 Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.
The new images contain the following updates: 1. Added Git, Ninja and VCPKG to all docker images 2. Updated CPU containers' GCC version from 12 to 14 3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6) 4. Addressed container supply chain warnings by building CUDA 12 images from scratch(avoid using Nvidia's prebuilt images) 5. Updated manylinux commit id to 75aeda9d18eafb323b00620537c8b4097d4bef48 Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.
The new images contain the following updates:
Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.