TRT EP failed to create model session with CUDA custom op

**Describe the bug**
TRT EP failed to run model with CUDA custom op.

**Urgency**
none.

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04, Win10
- ONNX Runtime installed from (source or binary): source
- ONNX Runtime version: 1.10.0
- Python version: 3.8
- Visual Studio version (if applicable): VS2019
- GCC/Compiler version (if compiling from source): 8.4
- CUDA/cuDNN version: 11.4 / 8.2
- GPU model and memory: V100

**To Reproduce**
- Describe steps/code to reproduce the behavior.
  The following code creates ORT session for the model with custom op.
```
#include <onnxruntime/core/session/onnxruntime_cxx_api.h>

#include <array>
#include <iostream>
#include <string>

using namespace std;

namespace Ort
{
    // The struct represents implementation of PackImageToSeqOp operator kernel
    struct PackImageToSeqKernel
    {
        inline PackImageToSeqKernel(CustomOpApi ort, const OrtKernelInfo* info) : ort_(ort)
        {
            margin_width_ = ort_.KernelInfoGetAttribute<int64_t>(info, "margin_width");
        }

        inline void GetOutputShape(OrtKernelContext* /*context*/, size_t /*output_index*/, OrtTensorTypeAndShapeInfo* /*info*/) {}

        void Compute(OrtKernelContext* /*context*/) { }

    private:
        CustomOpApi ort_;
        int64_t margin_width_;
    };

    // The struct represents implementation of PackImageToSeqOp operator
    struct PackImageToSeqOp : CustomOpBase<PackImageToSeqOp, PackImageToSeqKernel>
    {
        PackImageToSeqOp(const char* provider) : provider_(provider) {}

        inline void* CreateKernel(CustomOpApi api, const OrtKernelInfo* info) const { return new PackImageToSeqKernel(api, info); }
        inline const char* GetName() const { return "PackImageToSeq"; }
        const char* GetExecutionProviderType() const { return provider_; }

        static constexpr std::array<ONNXTensorElementDataType, 2> c_inputTypes =
        {
            // input type might be float/double/half precision values
            ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED, // input[0]: input feature, 1*C*H*W
            ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32,     // input[1]: sequence lengths for each image
        };

        inline size_t GetInputTypeCount() const { return c_inputTypes.size(); }
        inline ONNXTensorElementDataType GetInputType(size_t index) const { return c_inputTypes[index]; }

        static constexpr std::array<ONNXTensorElementDataType, 1> c_outputTypes =
        {
            // same with input type
            ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED, // output[0]: output feature, N*C*H*MaxSeqW
        };

        inline size_t GetOutputTypeCount() const { return 1; }
        inline ONNXTensorElementDataType GetOutputType(size_t /*index*/) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; }

    private:
        const char* provider_;
    };
}  // namespace Ort::contrib

int main()
{
    Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
    Ort::SessionOptions sessionOptions;

    sessionOptions.AppendExecutionProvider_TensorRT({});
    sessionOptions.AppendExecutionProvider_CUDA({});

    std::unique_ptr<Ort::PackImageToSeqOp> packImageToSeqOp(new Ort::PackImageToSeqOp("CUDAExecutionProvider"));

    Ort::CustomOpDomain customOpDomain("com.microsoft.oneocr");
    customOpDomain.Add(packImageToSeqOp.get());
    sessionOptions.Add(customOpDomain);

#ifdef _MSC_VER
    Ort::Session session(env, L"model.onnx", sessionOptions);
#else
    Ort::Session session(env, "model.onnx", sessionOptions);
#endif

    cout << "Create session successfully" << endl;

    return 0;
}
```
- Attach the ONNX model to the issue (where applicable) to expedite investigation.
[model.zip](https://github.com/microsoft/onnxruntime/files/9165032/model.zip)

**Expected behavior**
We expect that the session will be created successfully. The custom op will fall back to CUDA EP and other operators are using TRT EP.

**Screenshots**
The application crashes with the following stacktrace.
```
2022-07-14 00:03:57.6057213 [E:onnxruntime:, inference_session.cc:1449 onnxruntime::InferenceSession::Initialize::<lambda_7d2791fa692be73b9f61f97a634cdaa3>::operator ()] Exception during initialization: D:\Source\cognition\onnxruntime\onnxruntime\core\providers\tensorrt\tensorrt_execution_provider.cc:890 onnxruntime::TensorrtExecutionProvider::GetSupportedList graph_build.Resolve().IsOK() was false.
Stacktrace:
D:\Source\cognition\onnxruntime\onnxruntime\core\providers\shared_library\provider_bridge_provider.cc(414): onnxruntime::GetStackTrace
D:\Source\cognition\onnxruntime\onnxruntime\core\providers\tensorrt\tensorrt_execution_provider.cc(890): onnxruntime::TensorrtExecutionProvider::GetSupportedList
D:\Source\cognition\onnxruntime\onnxruntime\core\providers\tensorrt\tensorrt_execution_provider.cc(1100): onnxruntime::TensorrtExecutionProvider::GetCapability
D:\Source\cognition\onnxruntime\onnxruntime\core\framework\graph_partitioner.cc(200): onnxruntime::PartitionOnnxFormatModelImpl
D:\Source\cognition\onnxruntime\onnxruntime\core\framework\graph_partitioner.cc(373): onnxruntime::GraphPartitioner::PartitionOnnxFormatModel
D:\Source\cognition\onnxruntime\onnxruntime\core\framework\graph_partitioner.cc(539): onnxruntime::GraphPartitioner::Partition
D:\Source\cognition\onnxruntime\onnxruntime\core\session\inference_session.cc(918): onnxruntime::InferenceSession::TransformGraph
D:\Source\cognition\onnxruntime\onnxruntime\core\session\inference_session.cc(1352): onnxruntime::InferenceSession::Initialize
D:\Source\cognition\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc(688): `anonymous namespace'::InitializeSession
D:\Source\cognition\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc(704): OrtApis::CreateSession
D:\Source\cognition\onnxruntime\build\install\include\onnxruntime\core\session\onnxruntime_cxx_inline.h(542): Ort::Session::Session
C:\Users\tooth\Desktop\ort_trt\main.cpp(52): main
D:\agent\_work\10\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(79): invoke_main
D:\agent\_work\10\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(288): __scrt_common_main_seh
D:\agent\_work\10\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(331): __scrt_common_main
D:\agent\_work\10\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp(17): mainCRTStartup
???: BaseThreadInitThunk
???: RtlUserThreadStart
```

**Additional context**
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT EP failed to create model session with CUDA custom op #12282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TRT EP failed to create model session with CUDA custom op #12282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions