[xnnpack] basic QDQ operators support by wejoncy · Pull Request #11912 · microsoft/onnxruntime

wejoncy · 2022-06-20T12:23:12Z

Description: Describe your changes.

QDQConv qu8 qs8 qc8
QDQSoftmax qu8
QDQAveragepool qu8
QMaxpool qu8 qs8
Motivation and Context
Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

onnxruntime/core/graph/contrib_ops/ms_opset.h

onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc

onnxruntime/core/providers/xnnpack/detail/node_support_checker.cc

onnxruntime/core/providers/xnnpack/detail/op_checker_impl.cc

onnxruntime/core/providers/xnnpack/detail/utils.h

onnxruntime/core/providers/xnnpack/nn/qlinear_conv.cc

onnxruntime/core/providers/xnnpack/nn/softmax.cc

onnxruntime/test/perftest/ort_test_session.cc

skottmckay · 2022-06-28T03:31:07Z

Can you also please address the cpplint warnings.

onnxruntime/core/framework/graph_partitioner.cc

skottmckay · 2022-07-12T05:17:49Z

onnxruntime/core/graph/contrib_ops/internal_nhwc_onnx_defs.cc

+namespace internal_nhwc_onnx {
+
+void RegisterInternalNHWCOpset() {
+ONNX_CONTRIB_OPERATOR_SCHEMA(QLinearSoftmax)


Not sure about the cost of this vs. using function_utils::CreateSchema or something similar to dynamically create a schema as needed.

It makes more sense to me if we also have a CPU EP implementation of the operator.

It makes less sense to me if it's XNNPACK specific, as you're going to have to create/maintain a schema for every quantized operator (including multiple opsets) that we add XNNPACK support for.

What value does this static schema provide that a dynamically created schema does not?

A CPU implementation is here. #12177
I will refactor it after that PR is finished.

OK.

Softmax isn't a layout sensitive operator though so the domain should not be onnxruntime::kMSInternalNHWCDomain. We only want to use that for operators that specifically take NCHW input.

The transpose optimizer has a list of layout sensitive ops if you ever need to check.

onnxruntime/onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.cc

Lines 2022 to 2025 in c40f73a

static std::unordered_set<std::string_view> layout_sensitive_ops = {"Conv", "QLinearConv", "BatchNormalization",

"AveragePool", "GlobalAveragePool", "MaxPool",

"GlobalMaxPool", "LRN", "GridSample",

"DepthToSpace", "SpaceToDepth"};

Some extra info. When the layout transformer runs (TransformLayoutForEP) it will replace each layout sensitive node with a node in the NHWC domain, and wrap Transpose operators around that node to convert from the original NCHW to NHWC. That limits the new nodes in the kMSInternalNHWCDomain domain to this list, along with a few ORT specific ones listed here.

Once that process finishes, the transpose optimizer will move the Transpose nodes and cancel as many out as possible, but that process does not involve changing the domain of the other nodes. Due to this the Softmax should stay in the ONNX domain.

Say we the EP asked for Conv -> Conv -> Softmax. First step would create Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Transpose(to NCHW) -> Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Transpose(to NCHW) -> Softmax.

Second step would cancel out most of the Transpose ops, and push the last one past the Softmax so it's at the boundary of the nodes requested by the EP to maximize the nodes the EP can run in NHWC layout. That results in Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Conv(kMSInternalNHWCDomain) -> Softmax -> Transpose(to NCHW).

Thanks for explaning.
On top of that, It should be fine to register only Layout-sensitive Ops in kMSInternalNHWCDomain, and OutputShape inferencing is the same as onnx Ops.

onnxruntime/core/providers/shared/node_unit/node_unit.h

onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc

skottmckay · 2022-07-12T09:12:07Z

onnxruntime/core/providers/xnnpack/detail/utils.h

+  xnn_compute_type_qu8_to_fp32,*/
+};
+
+struct QuantParam {


Sorry - still not understanding this. A NodeUnit has QuantParam info in the input/output defs so why do we need to duplicate it here instead of using NodeUnit?

Also not understanding 'all ops will have at least X or W or Y'. Softmax only has one input. MatMul has 'A' and 'B' not and 'X' and 'W'. The quant param info in NodeUnit is attached to each input/output and will match exactly what the node uses.

onnxruntime/core/providers/xnnpack/detail/utils.cc

onnxruntime/core/providers/xnnpack/nn/softmax.cc

onnxruntime/core/providers/xnnpack/detail/utils.h

onnxruntime/core/framework/graph_partitioner.cc

onnxruntime/core/providers/xnnpack/nn/conv.cc

onnxruntime/test/providers/xnnpack/xnnpack_basic_test.cc

update Xnnpack to latest unit test adress comments

qdq model e2e test

lgtm-com · 2022-07-25T14:20:17Z

This pull request introduces 5 alerts when merging 9c522c0 into 8d0e86d - view on LGTM.com

new alerts:

5 for Uncontrolled data used in path expression

skottmckay · 2022-07-26T01:06:39Z

onnxruntime/core/graph/contrib_ops/internal_nhwc_onnx_defs.cc

+namespace internal_nhwc_onnx {
+
+void RegisterInternalNHWCOpset() {
+ONNX_CONTRIB_OPERATOR_SCHEMA(QLinearSoftmax)


OK.

Softmax isn't a layout sensitive operator though so the domain should not be onnxruntime::kMSInternalNHWCDomain. We only want to use that for operators that specifically take NCHW input.

The transpose optimizer has a list of layout sensitive ops if you ever need to check.

onnxruntime/onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.cc

Lines 2022 to 2025 in c40f73a

static std::unordered_set<std::string_view> layout_sensitive_ops = {"Conv", "QLinearConv", "BatchNormalization",

"AveragePool", "GlobalAveragePool", "MaxPool",

"GlobalMaxPool", "LRN", "GridSample",

"DepthToSpace", "SpaceToDepth"};

onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc

skottmckay · 2022-07-26T05:33:27Z

onnxruntime/core/providers/xnnpack/detail/node_support_checker.cc

                            const GraphViewer& graph,
                            const std::unordered_set<const Node*>& supported_nodes) {
  const Node* fuse_with{nullptr};
+  static const std::unordered_set<std::string> node_to_be_fuse = {"Conv", "MaxPool", "AveragePool"};


We can do it in a separate PR.

I think we need to try and support both the input and activation nodes being be a QDQ group. We should think about things in terms of NodeUnit and whether we can fuse one NodeUnit to another. A supported NodeUnit will have an entry in the map to the ComputeCapability. When fusing we're updating that ComputeCapability. It's just that there may be more steps if they're QDQ groups (e.g. use the zp/scale from the activation node output in the ComputeCapability).

onnxruntime/core/providers/xnnpack/detail/utils.cc

onnxruntime/core/providers/xnnpack/nn/softmax.cc

onnxruntime/core/providers/xnnpack/nn/max_pool.cc

onnxruntime/core/providers/xnnpack/nn/average_pool.cc

onnxruntime/core/providers/xnnpack/detail/node_support_checker.h

onnxruntime/core/providers/xnnpack/detail/utils.cc

onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc

skottmckay

* basic ops for mobilenet,qconv,qsoftmax,qavgpool update Xnnpack to latest unit test * NodeUnit: use outputedge to replace output-node * qdq model e2e test * use inlinedvector to replace vector * conv bias check * tensorshape helpers * Refactor xnn_op minmax * Qlinearsoftmax schema update * Remove qlinearsoftmax registration Co-authored-by: Jicheng Wen <[email protected]>

wejoncy requested a review from a team as a code owner June 24, 2022 08:15

wejoncy changed the title ~~[xnnpack] QDQConv~~ [xnnpack] basic QDQ operators support Jun 24, 2022

skottmckay reviewed Jun 27, 2022

View reviewed changes

onnxruntime/core/graph/contrib_ops/ms_opset.h Show resolved Hide resolved

skottmckay reviewed Jun 28, 2022

View reviewed changes

Amadeuy previously approved these changes Jul 4, 2022

View reviewed changes

wejoncy dismissed Amadeuy’s stale review via 48b4de5 July 4, 2022 04:02

wejoncy requested a review from skottmckay July 8, 2022 03:17

wejoncy force-pushed the jicwen/xnnpack_QDQConv branch 2 times, most recently from a539394 to 03a54be Compare July 8, 2022 05:21

skottmckay reviewed Jul 12, 2022

View reviewed changes

skottmckay reviewed Jul 15, 2022

View reviewed changes

Jicheng Wen added 2 commits July 25, 2022 20:02

basic ops for mobilenet,qconv,qsoftmax,qavgpool

48dc875

update Xnnpack to latest unit test adress comments

NodeUnit: use outputedge to replace output-node

3c7ede3

qdq model e2e test

wejoncy force-pushed the jicwen/xnnpack_QDQConv branch from 10be23c to 3c7ede3 Compare July 25, 2022 12:05

fix

9c522c0

Jicheng Wen added 5 commits July 26, 2022 11:30

fix warning

42755e9

use inlinedvector to replace vector

4457369

sync up with the main branch

43dcc71

Merge remote-tracking branch 'origin/master' into jicwen/xnnpack_QDQConv

43f804d

fix lint warning

ca18c35

skottmckay reviewed Jul 26, 2022

View reviewed changes

Jicheng Wen added 2 commits July 26, 2022 16:40

fix comments

cccc378

format

b71e362

skottmckay reviewed Jul 27, 2022

View reviewed changes

onnxruntime/core/providers/xnnpack/detail/node_support_checker.h Outdated Show resolved Hide resolved

onnxruntime/core/providers/xnnpack/detail/utils.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc Outdated Show resolved Hide resolved

Jicheng Wen added 4 commits July 27, 2022 11:09

code polish

3dbd3dc

fix

0a3b23e

conv bias check

1cc8574

Merge remote-tracking branch 'origin/master' into jicwen/xnnpack_QDQConv

1f021ba

tensorshape helpers

629ef41

skottmckay mentioned this pull request Aug 1, 2022

dev notes for layout transformer #12396

Merged

Jicheng Wen added 2 commits August 2, 2022 14:07

refactor xnn_op minmax

e3d5bd3

Merge remote-tracking branch 'origin/Master' into jicwen/xnnpack_QDQConv

ebb934e

wejoncy requested a review from skottmckay August 3, 2022 07:44

wejoncy added 3 commits August 10, 2022 11:39

Merge remote-tracking branch 'origin/main' into jicwen/xnnpack_QDQConv

8b0b097

Qlinearsoftmax schema update

bc875b8

remove qlinearsoftmax registration

e4d5152

skottmckay approved these changes Aug 10, 2022

View reviewed changes

wejoncy merged commit 819c367 into main Aug 11, 2022

wejoncy deleted the jicwen/xnnpack_QDQConv branch August 11, 2022 02:12

	static std::unordered_set<std::string_view> layout_sensitive_ops = {"Conv", "QLinearConv", "BatchNormalization",
	"AveragePool", "GlobalAveragePool", "MaxPool",
	"GlobalMaxPool", "LRN", "GridSample",
	"DepthToSpace", "SpaceToDepth"};

Conversation

wejoncy commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay commented Jun 28, 2022

Uh oh!

Uh oh!

skottmckay Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

wejoncy Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

skottmckay Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

skottmckay Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wejoncy Jul 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lgtm-com bot commented Jul 25, 2022

Uh oh!

skottmckay Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skottmckay Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

wejoncy commented Jun 20, 2022 •

edited

Loading

skottmckay Jul 27, 2022 •

edited

Loading