Conversation
|
Looks like this PR need to be cherry picked as well. |
|
OK, I fixed the errors. So an EP may not register an allocator with type |
| const DebugGraphFn& debug_graph_fn) { | ||
| // sub graph recurse will be added later | ||
| auto api_graph = MakeApiGraph(graph, execution_provider.GetAllocator(OrtMemTypeDefault), nullptr); | ||
| auto cpu_allocator = execution_provider.GetAllocator(OrtMemTypeDefault); |
There was a problem hiding this comment.
Would it be simpler for the caller to pass in the value returned by ExecutionProviders.GetDefaultCpuAllocator()?
There was a problem hiding this comment.
it's used in this way -
Status PartitionOrtFormatModel(onnxruntime::Graph& graph,
const ExecutionProviders& providers,
KernelRegistryManager& kernel_registry_manager,
SessionState& session_state) {
layout_transformer::TransformLayoutFunction transform_layout_fn = layout_transformer::IsSupportedOpset(graph)
? layout_transformer::TransformLayoutForEP
: nullptr;
GraphPartitioner partitioner(kernel_registry_manager, providers);
ORT_RETURN_IF_ERROR(partitioner.Partition(graph,
session_state.GetMutableFuncMgr(),
transform_layout_fn,
GraphPartitioner::Mode::kOrtFormatLoad));
return Status::OK();
}
So this is going to modify the function signature of GraphPartitioner::Partition() and some callback type definition, which is not a trivial one.
There was a problem hiding this comment.
In InferenceSessions.cs there is a lambda used so technically you'd only need to update TransformLayoutForEP and not the entire path through GraphPartitioner.
There was a problem hiding this comment.
There are 2 places where TransformLayoutForEP() is called, and neither of them has a context with an existing allocator. If I make this change, I still need code to get allocator from the EP, and I need do this twice.
Edit: I see you mentioned ExecutionProviders.GetDefaultCpuAllocator(). Let me check if I can figure it out how to do this
There was a problem hiding this comment.
Is there an easier way to create lambdas in session_state_test.cc? is there a way to do currying in a simpler way
Description
because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here.
this change fixes transpose optimizer on GPU EP
Fixes the issue referred to in #15869, #15796