Add a non-normative table of operators by category#868
Conversation
fdwr
left a comment
There was a problem hiding this comment.
Thanks Anssi. Yes, I think it's useful to have an upfront table that categorizes operators like this.
|
|
||
| *This section is non-normative.* | ||
|
|
||
| The WebNN API defines a set of operators required by well-known CNN and RNN, transformer and generative models that address key [[#usecases-application]]. The details of each operator are defined in the normative sections of this specification, in alphabetical order by the operator name. These operators are grouped into categories based on their functionality in the following non-normative table to give a functional overview of the API surface. |
There was a problem hiding this comment.
The hardest challenge with putting them all into mutually distinct groups is that some operators belong to multiple sets. For example, if we were to add sinh & cosh to WebNN, then surely they would go along with all the other math operators, but sinh/cosh/tanh are a trio, and so it would be weird to have tanh in a completely different category. Similarly, clamp is definitely a math function along with its siblings min and max, but quite rarely it's also useable for activation. So is it more a math function or an activation function? Neither - it's both. I tried putting operators into distinct categories in my listing here and eventually realized it was futile 😉, instead giving them multiple tags.
There was a problem hiding this comment.
I have to admin I struggled with this part too. Some of the groupings inherited from another exercise with @huningxin where we had to simplify things to another audience.
For web specs, dynamic elements might not be appropriate for a11y, printing and other reasons. Otherwise I'd happily port over your dynamic table to the spec (I had forgotten about this fantastic resource!).
Would a reasonable tradeoff for a static representation be to add those operators that are clearly multipurpose to multiple categories? I experimented with this idea using clamp. For those operators that are primarily associated with a particular category e.g. for historical reasons, say math functions, but are also used for other purposes, say for masking, I'd still keep them in maths. Not perfect, but...
There was a problem hiding this comment.
Yes, that's what I was thinking - just include tanh and clamp under math but also activation.
There was a problem hiding this comment.
OK, let's go with this simple solution. Updated to include tanh and clamp under math too.
index.bs
Outdated
| {{MLGraphBuilder/lesser()}}, | ||
| {{MLGraphBuilder/lesserOrEqual()}}, | ||
| {{MLGraphBuilder/logicalNot()}}, | ||
| {{MLGraphBuilder/where()}}, |
There was a problem hiding this comment.
🤔 I suppose the selection operator (aka TOSA select) could be thought of as a logical operator, given it takes boolean input (all the others produce boolean output), but I always thought of it as more of a data reorganization operator like gather, gathering data respectively from input a or b, depending on the boolean indices (e.g. pseudocode kinda like gather(concat([a, b], axis:rank), cast(indices, "uint32"), axis:rank)), and where isn`t under the Element-wise logical operations list either.
There was a problem hiding this comment.
I parked where to "Tensor rearrangement" for now.
There was a problem hiding this comment.
There was a problem hiding this comment.
FYI, CoreML puts select into "control flow" category:
I can see why someone might think that, given it feels like a C ternary operator (a ? b : c), but functionally it's more like a gatherElements with 0/1 indices, as it doesn't cause any control flow change in the graph execution (unlike say ONNX If/While/Scan or CoreML cond/while_loop), and webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.
There was a problem hiding this comment.
webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.
That's true.
TOSA places select into "Elementwise Ternary Operators" category: https://www.mlplatform.org/tosa/tosa_spec.html#_elementwise_ternary_operators
Should we use "element-wise binary", "element-wise logical" and "element-wise unary" categories? They are already used in the normative text.
There was a problem hiding this comment.
This table currently proposes a higher-level functional grouping, and as such abstracts out "element-wise" from "logical" ops and groups both "element-wise binary" and "element-wise unary" ops under "Mathematics".
I'd propose we try to go with a higher-level grouping first, and let the normative text offer more detail where appropriate.
index.bs
Outdated
| {{MLGraphBuilder/softplus()}}, | ||
| {{MLGraphBuilder/softsign()}}, | ||
| {{MLGraphBuilder/tanh()}}, | ||
| {{MLGraphBuilder/triangular()}} |
There was a problem hiding this comment.
The triangular matrix masking function doesn't feel like activation, but I don't know what category it fits in either 🤔. Are there cases where trilu acts as activation that I'm unaware of?
There was a problem hiding this comment.
Does it make sense to add a new "Tensor masking" / "Matrix masking" category for operators that are primarily(?) used for masking?
Experimented with this idea using triangular.
There was a problem hiding this comment.
Also "tensor rearrangement" op? FYI, CoreML puts band_part into "tensor operation" category: https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.tensor_operation.band_part
There was a problem hiding this comment.
Does it make sense to add a new "Tensor masking"
Also "tensor rearrangement" op?
Content with either.
There was a problem hiding this comment.
Moved triangular to "Tensor manipulation" (was "Tensor rearrangement") so we can drop the "Tensor masking" category, which I did.
Incorporate review feedback and suggestions.
|
@fdwr, much thanks for your suggestions. I pushed an update with a new iteration to test some new ideas based on your feedback. |
| {{MLGraphBuilder/slice()}}, | ||
| {{MLGraphBuilder/split()}}, | ||
| {{MLGraphBuilder/transpose()}}, | ||
| {{MLGraphBuilder/resample2d()}}, |
There was a problem hiding this comment.
I feel resample2d is not a tensor rearrangement op, because it doesn't just rearrange the existing values, but generate new values. FYI, CoreML puts "resample" into "image resizing" category: https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS16.image_resizing.resample
There was a problem hiding this comment.
because it doesn't just rearrange the existing values, but generate new values
That is true for linear interpolation.
Notice for nearest neighbor that resample is functionally identical to integral upsampling (or blockwise expansion, like that used in dequantizeLinear). One can even implement dequantizeLinear's blockwise expansion step via a backend's resampling operator + NN (and I was thinking to augment expand to accept block sizes for better functional decomposition of Q/DQ). So, there is clearly a familial relationship between spatial expansion and resampling, along with other spatial interpolation functions like ROI alignment and warped grid sampling.
into "image resizing" category:
It's true that these spatial interpolation functions are often used with imagery, but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing"). Maybe there's a category name that better encompasses all these spatial remappings/reprojections? Anssi originally had the category name of "Tensor manipulation", which he changed per my comment because I found it a little too generic, but would you find that amenable? "Spatial modification"? "Spatial manipulation"? "Tensor reprojection"?
There was a problem hiding this comment.
but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing")
Agreed. For 2D resmaple/resize, TOSA also places it under "image operators" category: https://www.mlplatform.org/tosa/tosa_spec.html#_image_operators
There was a problem hiding this comment.
Switched back to the more generic "Tensor manipulation" name for now so that resample feels welcome.
fdwr
left a comment
There was a problem hiding this comment.
Thanks for the updates AK. Looks good to me after tanh/clamp dual citizenship and deciding on a name for the spatial remapping category.
|
|
||
| *This section is non-normative.* | ||
|
|
||
| The WebNN API defines a set of operators required by well-known CNN and RNN, transformer and generative models that address key [[#usecases-application]]. The details of each operator are defined in the normative sections of this specification, in alphabetical order by the operator name. These operators are grouped into categories based on their functionality in the following non-normative table to give a functional overview of the API surface. |
There was a problem hiding this comment.
Yes, that's what I was thinking - just include tanh and clamp under math but also activation.
| {{MLGraphBuilder/slice()}}, | ||
| {{MLGraphBuilder/split()}}, | ||
| {{MLGraphBuilder/transpose()}}, | ||
| {{MLGraphBuilder/resample2d()}}, |
There was a problem hiding this comment.
because it doesn't just rearrange the existing values, but generate new values
That is true for linear interpolation.
Notice for nearest neighbor that resample is functionally identical to integral upsampling (or blockwise expansion, like that used in dequantizeLinear). One can even implement dequantizeLinear's blockwise expansion step via a backend's resampling operator + NN (and I was thinking to augment expand to accept block sizes for better functional decomposition of Q/DQ). So, there is clearly a familial relationship between spatial expansion and resampling, along with other spatial interpolation functions like ROI alignment and warped grid sampling.
into "image resizing" category:
It's true that these spatial interpolation functions are often used with imagery, but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing"). Maybe there's a category name that better encompasses all these spatial remappings/reprojections? Anssi originally had the category name of "Tensor manipulation", which he changed per my comment because I found it a little too generic, but would you find that amenable? "Spatial modification"? "Spatial manipulation"? "Tensor reprojection"?
index.bs
Outdated
| {{MLGraphBuilder/lesser()}}, | ||
| {{MLGraphBuilder/lesserOrEqual()}}, | ||
| {{MLGraphBuilder/logicalNot()}}, | ||
| {{MLGraphBuilder/where()}}, |
There was a problem hiding this comment.
FYI, CoreML puts select into "control flow" category:
I can see why someone might think that, given it feels like a C ternary operator (a ? b : c), but functionally it's more like a gatherElements with 0/1 indices, as it doesn't cause any control flow change in the graph execution (unlike say ONNX If/While/Scan or CoreML cond/while_loop), and webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.
index.bs
Outdated
| {{MLGraphBuilder/softplus()}}, | ||
| {{MLGraphBuilder/softsign()}}, | ||
| {{MLGraphBuilder/tanh()}}, | ||
| {{MLGraphBuilder/triangular()}} |
There was a problem hiding this comment.
Does it make sense to add a new "Tensor masking"
Also "tensor rearrangement" op?
Content with either.
| {{MLGraphBuilder/gatherElements()}}, | ||
| {{MLGraphBuilder/scatterElements()}}, | ||
| {{MLGraphBuilder/gatherND()}}, | ||
| {{MLGraphBuilder/scatterND()}}, |
There was a problem hiding this comment.
Both CoreML and TOSA have a dedicated "gather/scatter" category:
https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#module-coremltools.converters.mil.mil.ops.defs.iOS15.scatter_gather
https://www.mlplatform.org/tosa/tosa_spec.html#_scattergather_operators
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td>Tensor casting</td> |
There was a problem hiding this comment.
FYI, TOSA groups cast and scale (equivalent to quantizeLinear/dequantizeLinear) to "Type Conversion" category: https://www.mlplatform.org/tosa/tosa_spec.html#_type_conversion
There was a problem hiding this comment.
I'm happy either way on this, having Q/DQ in their own category and putting them as part of data type conversion.
Incorporate more review feedback and suggestions.
|
@fdwr @huningxin thanks for your suggestions. I pushed another update. PTAL if I missed any feedback: fed3ae8 (This has been a fun exercise and a reminder naming in this field is hard and does not always agree with other fields, statistics, physics, information theory... While "Web API for chains of differentiable, parameterized geometric functions” would be a more accurate name, we call this spec the "Web Neural Network API" ;-)) |
|
I will merge this first stab to shorten the open PR queue. Thank you for your insights @fdwr and @huningxin! Please feel free to continue adjust this table as you see fit. A natural check point for adjustment is when new ops are being added or existing removed, i.e. the moment when this table needs to be updated manually. I hope that won't become a chore but is actually a useful exercise to reason about our op coverage. |
SHA: f2f5f93 Reason: push, by anssiko Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…g#868) With contributions from Ningxin and Dwayne.
This non-normative table with functional grouping of operators complements the normative definition of operators that are currently in alphabetical order.
I tried to check we're covering all the operators, but it is very likely I may have missed some.
I proactively addedFeedback welcome whether this makes sense and what would make for a good categorization.roundEven#859.The caveat is this table needs to be manually maintained alongside the normative definition, but that'd be just one extra line.
Preview | Diff