[xnnpack] basic QDQ operators support#11912
Conversation
onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/xnnpack/detail/node_support_checker.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/xnnpack/detail/node_support_checker.cc
Outdated
Show resolved
Hide resolved
|
Can you also please address the cpplint warnings. |
a539394 to
03a54be
Compare
| namespace internal_nhwc_onnx { | ||
|
|
||
| void RegisterInternalNHWCOpset() { | ||
| ONNX_CONTRIB_OPERATOR_SCHEMA(QLinearSoftmax) |
There was a problem hiding this comment.
Not sure about the cost of this vs. using function_utils::CreateSchema or something similar to dynamically create a schema as needed.
It makes more sense to me if we also have a CPU EP implementation of the operator.
It makes less sense to me if it's XNNPACK specific, as you're going to have to create/maintain a schema for every quantized operator (including multiple opsets) that we add XNNPACK support for.
What value does this static schema provide that a dynamically created schema does not?
There was a problem hiding this comment.
A CPU implementation is here. #12177
I will refactor it after that PR is finished.
There was a problem hiding this comment.
OK.
Softmax isn't a layout sensitive operator though so the domain should not be onnxruntime::kMSInternalNHWCDomain. We only want to use that for operators that specifically take NCHW input.
The transpose optimizer has a list of layout sensitive ops if you ever need to check.
onnxruntime/onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.cc
Lines 2022 to 2025 in c40f73a
There was a problem hiding this comment.
Some extra info. When the layout transformer runs (TransformLayoutForEP) it will replace each layout sensitive node with a node in the NHWC domain, and wrap Transpose operators around that node to convert from the original NCHW to NHWC. That limits the new nodes in the kMSInternalNHWCDomain domain to this list, along with a few ORT specific ones listed here.
Once that process finishes, the transpose optimizer will move the Transpose nodes and cancel as many out as possible, but that process does not involve changing the domain of the other nodes. Due to this the Softmax should stay in the ONNX domain.
Say we the EP asked for Conv -> Conv -> Softmax. First step would create Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Transpose(to NCHW) -> Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Transpose(to NCHW) -> Softmax.
Second step would cancel out most of the Transpose ops, and push the last one past the Softmax so it's at the boundary of the nodes requested by the EP to maximize the nodes the EP can run in NHWC layout. That results in Transpose(to NHWC) -> Conv(kMSInternalNHWCDomain) -> Conv(kMSInternalNHWCDomain) -> Softmax -> Transpose(to NCHW).
There was a problem hiding this comment.
Thanks for explaning.
On top of that, It should be fine to register only Layout-sensitive Ops in kMSInternalNHWCDomain, and OutputShape inferencing is the same as onnx Ops.
onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc
Outdated
Show resolved
Hide resolved
| xnn_compute_type_qu8_to_fp32,*/ | ||
| }; | ||
|
|
||
| struct QuantParam { |
There was a problem hiding this comment.
Sorry - still not understanding this. A NodeUnit has QuantParam info in the input/output defs so why do we need to duplicate it here instead of using NodeUnit?
Also not understanding 'all ops will have at least X or W or Y'. Softmax only has one input. MatMul has 'A' and 'B' not and 'X' and 'W'. The quant param info in NodeUnit is attached to each input/output and will match exactly what the node uses.
update Xnnpack to latest unit test adress comments
qdq model e2e test
10be23c to
3c7ede3
Compare
|
This pull request introduces 5 alerts when merging 9c522c0 into 8d0e86d - view on LGTM.com new alerts:
|
| namespace internal_nhwc_onnx { | ||
|
|
||
| void RegisterInternalNHWCOpset() { | ||
| ONNX_CONTRIB_OPERATOR_SCHEMA(QLinearSoftmax) |
There was a problem hiding this comment.
OK.
Softmax isn't a layout sensitive operator though so the domain should not be onnxruntime::kMSInternalNHWCDomain. We only want to use that for operators that specifically take NCHW input.
The transpose optimizer has a list of layout sensitive ops if you ever need to check.
onnxruntime/onnxruntime/core/optimizer/transpose_optimizer/transpose_optimizer.cc
Lines 2022 to 2025 in c40f73a
onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc
Outdated
Show resolved
Hide resolved
| const GraphViewer& graph, | ||
| const std::unordered_set<const Node*>& supported_nodes) { | ||
| const Node* fuse_with{nullptr}; | ||
| static const std::unordered_set<std::string> node_to_be_fuse = {"Conv", "MaxPool", "AveragePool"}; |
There was a problem hiding this comment.
We can do it in a separate PR.
I think we need to try and support both the input and activation nodes being be a QDQ group. We should think about things in terms of NodeUnit and whether we can fuse one NodeUnit to another. A supported NodeUnit will have an entry in the map to the ComputeCapability. When fusing we're updating that ComputeCapability. It's just that there may be more steps if they're QDQ groups (e.g. use the zp/scale from the activation node output in the ComputeCapability).
onnxruntime/core/providers/xnnpack/detail/node_support_checker.h
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/xnnpack/xnnpack_execution_provider.cc
Outdated
Show resolved
Hide resolved
* basic ops for mobilenet,qconv,qsoftmax,qavgpool update Xnnpack to latest unit test * NodeUnit: use outputedge to replace output-node * qdq model e2e test * use inlinedvector to replace vector * conv bias check * tensorshape helpers * Refactor xnn_op minmax * Qlinearsoftmax schema update * Remove qlinearsoftmax registration Co-authored-by: Jicheng Wen <[email protected]>
Description: Describe your changes.
Motivation and Context