Skip to content

dev notes for layout transformer#12396

Merged
askhade merged 8 commits intomasterfrom
askhade/doc_layout_transformer
Aug 3, 2022
Merged

dev notes for layout transformer#12396
askhade merged 8 commits intomasterfrom
askhade/doc_layout_transformer

Conversation

@askhade
Copy link
Contributor

@askhade askhade commented Jul 31, 2022

Description: Adding dev notes for layout transformer

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

## Overview
ONNX standard assumes NCHW format for tensor layout. However, NCHW is not the best (perf efficient) format for all hardware types. Depending on the underlying hardware, to get the best perf we need to convert the model or in some cases part of the model which will be executed on the hardware from NCHW -> NHWC. Layout Transformer enables just this. It works with ONNX and ORT format models.

*Note: Currently Layout Transformer works only for compiling EPs like NNAPI, QNN EP etc... More work is needed for it to be compatible with EPs like CPU and CUDA which use static kernel registration.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is largely figured out in #11912

GetCapability is always called twice if the EP asks for NHWC. The first call to GetCapability is purely for layout changes so we're only dealing with ONNX operators that the transpose operator knows about. The second call does things like convert a QDQ node group to a supported quantized op, or fusions such as Conv+activation.

It's a little obtuse as to what should be done in the first call to GetCapability vs. the second, so ideally we can refine that a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more clarification around GetCapability based on behavior on master branch.

@askhade askhade requested a review from skottmckay August 1, 2022 21:46

GetCapability returns a bunch of IndexedSubGraphs that the given execution provider can run. During layout transformation, new nodes (Transpose, Gather etc) can be added within these subgraph. Therefore, calling GetCapability for the second time ensures that the EP can claim these new nodes as well and fuse the entire sub graphs. This is important for perf. Without this the execution will unnecessary switch to a fallback EP (in most cases CPU EP).

*IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*IMPORTANT NOTE* After layout transformation is done, graph resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inf functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.
*IMPORTANT NOTE* After layout transformation is done, Graph::Resolve cannot be called for the graph. This is because graph resolve validates the shape of the nodes by calling ONNX TypeAndShapeInferenceFunction, these type and shape inferencing functions can ONLY infer shapes for NCHW format inputs. Therefore, when passed a graph with NHWC nodes the inferred shape validation fails and hence graph resolve throws. This is the very reason layout transformation is *NOT ENABLED* when Graph Partitioner Mode is kAssignOnly.

Copy link
Contributor

@skottmckay skottmckay Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking why Graph::Resolve fails. I would have expected the new nodes may not have an Op() given they're in the internal domain and an EP won't necessarily have a static kernel for the operator so you'd potentially hit a nullptr vs. the ONNX type/shape inferencing failing (which I'd expect to only apply to nodes in the ONNX domain).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this paragraph. Please re-review and let me know if it makes sense now.

@askhade askhade requested a review from skottmckay August 2, 2022 16:26
@askhade askhade requested a review from edgchen1 August 2, 2022 20:43
Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@askhade askhade merged commit 97268e0 into master Aug 3, 2022
@askhade askhade deleted the askhade/doc_layout_transformer branch August 3, 2022 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants