dev notes for layout transformer by askhade · Pull Request #12396 · microsoft/onnxruntime

askhade · 2022-07-31T22:45:12Z

Description: Adding dev notes for layout transformer

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md

skottmckay · 2022-08-01T00:42:55Z

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md

+## Overview
+ONNX standard assumes NCHW format for tensor layout. However, NCHW is not the best (perf efficient) format for all hardware types. Depending on the underlying hardware, to get the best perf we need to convert the model or in some cases part of the model which will be executed on the hardware from NCHW -> NHWC. Layout Transformer enables just this. It works with ONNX and ORT format models.
+
+*Note: Currently Layout Transformer works only for compiling EPs like NNAPI, QNN EP etc... More work is needed for it to be compatible with EPs like CPU and CUDA which use static kernel registration.*


FWIW this is largely figured out in #11912

GetCapability is always called twice if the EP asks for NHWC. The first call to GetCapability is purely for layout changes so we're only dealing with ONNX operators that the transpose operator knows about. The second call does things like convert a QDQ node group to a supported quantized op, or fusions such as Conv+activation.

It's a little obtuse as to what should be done in the first call to GetCapability vs. the second, so ideally we can refine that a bit.

I added more clarification around GetCapability based on behavior on master branch.

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md

skottmckay · 2022-08-01T22:30:41Z

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md

+
+GetCapability returns a bunch of IndexedSubGraphs that the given execution provider can run. During layout transformation, new nodes (Transpose, Gather etc) can be added within these subgraph. Therefore, calling GetCapability for the second time ensures that the EP can claim these new nodes as well and fuse the entire sub graphs. This is important for perf. Without this the execution will unnecessary switch to a fallback EP (in most cases CPU EP).
+
+*IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.


Suggested change

*IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.

*IMPORTANT NOTE* After layout transformation is done, Graph::Resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inferencing functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.

Double checking why Graph::Resolve fails. I would have expected the new nodes may not have an Op() given they're in the internal domain and an EP won't necessarily have a static kernel for the operator so you'd potentially hit a nullptr vs. the ONNX type/shape inferencing failing (which I'd expect to only apply to nodes in the ONNX domain).

Updated this paragraph. Please re-review and let me know if it makes sense now.

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md

skottmckay

askhade added 3 commits July 31, 2022 15:44

first draft

dc2d1b6

plus fixes

a68deac

plus more links

1b69a85

skottmckay reviewed Aug 1, 2022

View reviewed changes

Plus updates per review

c6c0a6d

askhade requested a review from skottmckay August 1, 2022 21:46

skottmckay reviewed Aug 1, 2022

View reviewed changes

plus more clarifications

3ce903f

askhade requested a review from skottmckay August 2, 2022 16:26

edgchen1 reviewed Aug 2, 2022

View reviewed changes

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/transpose_optimizer/layout_tranformer_dev_notes.md Show resolved Hide resolved

plus updates

af8ced1

askhade requested a review from edgchen1 August 2, 2022 20:43

askhade added 2 commits August 2, 2022 22:24

plus more nit fixes

02e6f00

plus some additions

d80d72a

skottmckay approved these changes Aug 3, 2022

View reviewed changes

askhade merged commit 97268e0 into master Aug 3, 2022

askhade deleted the askhade/doc_layout_transformer branch August 3, 2022 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev notes for layout transformer#12396

dev notes for layout transformer#12396
askhade merged 8 commits intomasterfrom
askhade/doc_layout_transformer

askhade commented Jul 31, 2022

Uh oh!

Uh oh!

skottmckay Aug 1, 2022

Uh oh!

askhade Aug 1, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay Aug 1, 2022

Uh oh!

skottmckay Aug 1, 2022 •

edited

Loading

Uh oh!

askhade Aug 2, 2022

Uh oh!

Uh oh!

Uh oh!

skottmckay left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		GetCapability returns a bunch of IndexedSubGraphs that the given execution provider can run. During layout transformation, new nodes (Transpose, Gather etc) can be added within these subgraph. Therefore, calling GetCapability for the second time ensures that the EP can claim these new nodes as well and fuse the entire sub graphs. This is important for perf. Without this the execution will unnecessary switch to a fallback EP (in most cases CPU EP).

		IMPORTANT NOTE After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is NOT ENABLED when Graph Partitioner Mode is kAssignOnly.

Conversation

askhade commented Jul 31, 2022

Uh oh!

Uh oh!

skottmckay Aug 1, 2022

Choose a reason for hiding this comment

Uh oh!

askhade Aug 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay Aug 1, 2022

Choose a reason for hiding this comment

Uh oh!

skottmckay Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

askhade Aug 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skottmckay left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skottmckay Aug 1, 2022 •

edited

Loading