Conversation
onnxruntime/core/optimizer/transpose_optimizer/optimizer_utils.h
Outdated
Show resolved
Hide resolved
pranavsharma
left a comment
There was a problem hiding this comment.
Looks good. A few minor comments.
|
/azp run Android CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@jslhcl FYI, it looks like this PR broke the CANN EP build. see https://github.com/Ascend/onnxruntime/actions/runs/5322372781/jobs/9638730400#step:3:3026 /root/Git.d/onnxruntime/onnxruntime/core/providers/cann/cann_execution_provider.cc:1450:7: error: ‘cann_device’ was not declared in this scope |
+@fffrog ,@zhangsibo1129 FYI, this change broke the CANN EP build. can you advise/help @jslhcl fix it since he doesn't have an environment to build/test? |
|
I fixed the CANN EP build for this #15833 (comment), @jywu-msft would you take a look at this PR. Thanks a lot! |
thank you, @zhangsibo1129! |
### Description Remove AllocatorManager class ### Motivation and Context After the refactor PR #15833 is in, AllocatorManager class is not referenced anymore.
Fix memory leak issue which comes from TRT EP's allocator object not
being released upon destruction.
Following is the log from valgrind:
```
==1911860== 100,272 (56 direct, 100,216 indirect) bytes in 1 blocks are definitely lost in loss record 1,751 of 1,832
==1911860== at 0x483CFA3: operator new(unsigned long) (vg_replace_malloc.c:472)
==1911860== by 0x315DC2: std::_MakeUniq<onnxruntime::OrtAllocatorImplWrappingIAllocator>::__single_object std::make_unique<onnxruntime::OrtAllocatorImplWrappingIAllocator, std::shared_ptr<onnxruntime::IAllocator> >(std::shared_ptr<onnxruntime::IAllocator>&&) (unique_ptr.h:857)
==1911860== by 0x30EE7B: OrtApis::KernelContext_GetAllocator(OrtKernelContext const*, OrtMemoryInfo const*, OrtAllocator**) (custom_ops.cc:121)
==1911860== by 0x660D115: onnxruntime::TensorrtExecutionProvider::Compile(std::vector<onnxruntime::IExecutionProvider::FusedNodeAndGraph, std::allocator<onnxruntime::IExecutionProvider::FusedNodeAndGraph> > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocator<onnxruntime::NodeComputeInfo> >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const (tensorrt_execution_provider.cc:2223)
```
This issue happens after this [EP allocator
refactor](#15833)
) Modify CANN EP `CANNExecutionProvider::CreatePreferredAllocators`, `CANNExecutionProvider::CreateCannAllocator` to align with the EP API refactor and fix CANN CI for #15833 (comment) in this [PR](#15833)
### Description clean unused parameter in ORT_UNUSED_PARAMETER ### Motivation and Context clean unused parameters in ORT_UNUSED_PARAMETER which are introduced from #15833
…t allocation in ORT API (#17030) This addresses a DML performance regression from the following PR resulting in allocations not being rounded and pooled in the DML execution provider. #15833 This also fixes a pre-existing limitation that allocations during session initialization (primarily large weights and persistent resources) only bypassed rounding and pooling while using the Winml API. The allocator now also respects a caller's rounding mode parameter when provided.
…t allocation in ORT API (#17030) This addresses a DML performance regression from the following PR resulting in allocations not being rounded and pooled in the DML execution provider. #15833 This also fixes a pre-existing limitation that allocations during session initialization (primarily large weights and persistent resources) only bypassed rounding and pooling while using the Winml API. The allocator now also respects a caller's rounding mode parameter when provided.
…t allocation in ORT API (microsoft#17030) This addresses a DML performance regression from the following PR resulting in allocations not being rounded and pooled in the DML execution provider. microsoft#15833 This also fixes a pre-existing limitation that allocations during session initialization (primarily large weights and persistent resources) only bypassed rounding and pooling while using the Winml API. The allocator now also respects a caller's rounding mode parameter when provided.
### Description <!-- Describe your changes. --> Fixed a TRT context memory sharing bug where the context memory was assigned to a unique_ptr that was immediately destructed upon leaving scope. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The bug seems to be introduced by a refactor work: #15833 : 
### Description <!-- Describe your changes. --> Fixed a TRT context memory sharing bug where the context memory was assigned to a unique_ptr that was immediately destructed upon leaving scope. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The bug seems to be introduced by a refactor work: microsoft#15833 : 
Description
This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice
Motivation and Context
This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice. By this change, EP level will shift the burden of maintaining allocators, which will be user friendly for EP developers