Conversation
onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md
Outdated
Show resolved
Hide resolved
| ## Overview | ||
| ONNX standard assumes NCHW format for tensor layout. However, NCHW is not the best (perf efficient) format for all hardware types. Depending on the underlying hardware, to get the best perf we need to convert the model or in some cases part of the model which will be executed on the hardware from NCHW -> NHWC. Layout Transformer enables just this. It works with ONNX and ORT format models. | ||
|
|
||
| *Note: Currently Layout Transformer works only for compiling EPs like NNAPI, QNN EP etc... More work is needed for it to be compatible with EPs like CPU and CUDA which use static kernel registration.* |
There was a problem hiding this comment.
FWIW this is largely figured out in #11912
GetCapability is always called twice if the EP asks for NHWC. The first call to GetCapability is purely for layout changes so we're only dealing with ONNX operators that the transpose operator knows about. The second call does things like convert a QDQ node group to a supported quantized op, or fusions such as Conv+activation.
It's a little obtuse as to what should be done in the first call to GetCapability vs. the second, so ideally we can refine that a bit.
There was a problem hiding this comment.
I added more clarification around GetCapability based on behavior on master branch.
onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md
Outdated
Show resolved
Hide resolved
onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md
Outdated
Show resolved
Hide resolved
onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md
Outdated
Show resolved
Hide resolved
|
|
||
| GetCapability returns a bunch of IndexedSubGraphs that the given execution provider can run. During layout transformation, new nodes (Transpose, Gather etc) can be added within these subgraph. Therefore, calling GetCapability for the second time ensures that the EP can claim these new nodes as well and fuse the entire sub graphs. This is important for perf. Without this the execution will unnecessary switch to a fallback EP (in most cases CPU EP). | ||
|
|
||
| *IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly. |
There was a problem hiding this comment.
| *IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly. | |
| *IMPORTANT NOTE* After layout transformation is done, Graph::Resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inferencing functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly. |
There was a problem hiding this comment.
Double checking why Graph::Resolve fails. I would have expected the new nodes may not have an Op() given they're in the internal domain and an EP won't necessarily have a static kernel for the operator so you'd potentially hit a nullptr vs. the ONNX type/shape inferencing failing (which I'd expect to only apply to nodes in the ONNX domain).
There was a problem hiding this comment.
Updated this paragraph. Please re-review and let me know if it makes sense now.
Description: Adding dev notes for layout transformer
Motivation and Context