Skip to content

Comments

Add a non-normative table of operators by category#868

Merged
anssiko merged 4 commits intomainfrom
ops-by-category
Jun 17, 2025
Merged

Add a non-normative table of operators by category#868
anssiko merged 4 commits intomainfrom
ops-by-category

Conversation

@anssiko
Copy link
Member

@anssiko anssiko commented Jun 11, 2025

This non-normative table with functional grouping of operators complements the normative definition of operators that are currently in alphabetical order.

I tried to check we're covering all the operators, but it is very likely I may have missed some. I proactively added roundEven #859. Feedback welcome whether this makes sense and what would make for a good categorization.

The caveat is this table needs to be manually maintained alongside the normative definition, but that'd be just one extra line.


Preview | Diff

@anssiko anssiko requested review from fdwr and huningxin June 11, 2025 17:04
Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Anssi. Yes, I think it's useful to have an upfront table that categorizes operators like this.


*This section is non-normative.*

The WebNN API defines a set of operators required by well-known CNN and RNN, transformer and generative models that address key [[#usecases-application]]. The details of each operator are defined in the normative sections of this specification, in alphabetical order by the operator name. These operators are grouped into categories based on their functionality in the following non-normative table to give a functional overview of the API surface.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardest challenge with putting them all into mutually distinct groups is that some operators belong to multiple sets. For example, if we were to add sinh & cosh to WebNN, then surely they would go along with all the other math operators, but sinh/cosh/tanh are a trio, and so it would be weird to have tanh in a completely different category. Similarly, clamp is definitely a math function along with its siblings min and max, but quite rarely it's also useable for activation. So is it more a math function or an activation function? Neither - it's both. I tried putting operators into distinct categories in my listing here and eventually realized it was futile 😉, instead giving them multiple tags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admin I struggled with this part too. Some of the groupings inherited from another exercise with @huningxin where we had to simplify things to another audience.

For web specs, dynamic elements might not be appropriate for a11y, printing and other reasons. Otherwise I'd happily port over your dynamic table to the spec (I had forgotten about this fantastic resource!).

Would a reasonable tradeoff for a static representation be to add those operators that are clearly multipurpose to multiple categories? I experimented with this idea using clamp. For those operators that are primarily associated with a particular category e.g. for historical reasons, say math functions, but are also used for other purposes, say for masking, I'd still keep them in maths. Not perfect, but...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I was thinking - just include tanh and clamp under math but also activation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with this simple solution. Updated to include tanh and clamp under math too.

index.bs Outdated
{{MLGraphBuilder/lesser()}},
{{MLGraphBuilder/lesserOrEqual()}},
{{MLGraphBuilder/logicalNot()}},
{{MLGraphBuilder/where()}},
Copy link
Collaborator

@fdwr fdwr Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I suppose the selection operator (aka TOSA select) could be thought of as a logical operator, given it takes boolean input (all the others produce boolean output), but I always thought of it as more of a data reorganization operator like gather, gathering data respectively from input a or b, depending on the boolean indices (e.g. pseudocode kinda like gather(concat([a, b], axis:rank), cast(indices, "uint32"), axis:rank)), and where isn`t under the Element-wise logical operations list either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parked where to "Tensor rearrangement" for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, CoreML puts select into "control flow" category:

I can see why someone might think that, given it feels like a C ternary operator (a ? b : c), but functionally it's more like a gatherElements with 0/1 indices, as it doesn't cause any control flow change in the graph execution (unlike say ONNX If/While/Scan or CoreML cond/while_loop), and webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.

Copy link
Contributor

@huningxin huningxin Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.

That's true.

TOSA places select into "Elementwise Ternary Operators" category: https://www.mlplatform.org/tosa/tosa_spec.html#_elementwise_ternary_operators

Should we use "element-wise binary", "element-wise logical" and "element-wise unary" categories? They are already used in the normative text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table currently proposes a higher-level functional grouping, and as such abstracts out "element-wise" from "logical" ops and groups both "element-wise binary" and "element-wise unary" ops under "Mathematics".

I'd propose we try to go with a higher-level grouping first, and let the normative text offer more detail where appropriate.

index.bs Outdated
{{MLGraphBuilder/softplus()}},
{{MLGraphBuilder/softsign()}},
{{MLGraphBuilder/tanh()}},
{{MLGraphBuilder/triangular()}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triangular matrix masking function doesn't feel like activation, but I don't know what category it fits in either 🤔. Are there cases where trilu acts as activation that I'm unaware of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a new "Tensor masking" / "Matrix masking" category for operators that are primarily(?) used for masking?

Experimented with this idea using triangular.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a new "Tensor masking"
Also "tensor rearrangement" op?

Content with either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved triangular to "Tensor manipulation" (was "Tensor rearrangement") so we can drop the "Tensor masking" category, which I did.

Incorporate review feedback and suggestions.
@anssiko
Copy link
Member Author

anssiko commented Jun 12, 2025

@fdwr, much thanks for your suggestions. I pushed an update with a new iteration to test some new ideas based on your feedback.

{{MLGraphBuilder/slice()}},
{{MLGraphBuilder/split()}},
{{MLGraphBuilder/transpose()}},
{{MLGraphBuilder/resample2d()}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel resample2d is not a tensor rearrangement op, because it doesn't just rearrange the existing values, but generate new values. FYI, CoreML puts "resample" into "image resizing" category: https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS16.image_resizing.resample

Copy link
Collaborator

@fdwr fdwr Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it doesn't just rearrange the existing values, but generate new values

That is true for linear interpolation.

Notice for nearest neighbor that resample is functionally identical to integral upsampling (or blockwise expansion, like that used in dequantizeLinear). One can even implement dequantizeLinear's blockwise expansion step via a backend's resampling operator + NN (and I was thinking to augment expand to accept block sizes for better functional decomposition of Q/DQ). So, there is clearly a familial relationship between spatial expansion and resampling, along with other spatial interpolation functions like ROI alignment and warped grid sampling.

into "image resizing" category:

It's true that these spatial interpolation functions are often used with imagery, but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing"). Maybe there's a category name that better encompasses all these spatial remappings/reprojections? Anssi originally had the category name of "Tensor manipulation", which he changed per my comment because I found it a little too generic, but would you find that amenable? "Spatial modification"? "Spatial manipulation"? "Tensor reprojection"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing")

Agreed. For 2D resmaple/resize, TOSA also places it under "image operators" category: https://www.mlplatform.org/tosa/tosa_spec.html#_image_operators

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back to the more generic "Tensor manipulation" name for now so that resample feels welcome.

Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates AK. Looks good to me after tanh/clamp dual citizenship and deciding on a name for the spatial remapping category.


*This section is non-normative.*

The WebNN API defines a set of operators required by well-known CNN and RNN, transformer and generative models that address key [[#usecases-application]]. The details of each operator are defined in the normative sections of this specification, in alphabetical order by the operator name. These operators are grouped into categories based on their functionality in the following non-normative table to give a functional overview of the API surface.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I was thinking - just include tanh and clamp under math but also activation.

{{MLGraphBuilder/slice()}},
{{MLGraphBuilder/split()}},
{{MLGraphBuilder/transpose()}},
{{MLGraphBuilder/resample2d()}},
Copy link
Collaborator

@fdwr fdwr Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it doesn't just rearrange the existing values, but generate new values

That is true for linear interpolation.

Notice for nearest neighbor that resample is functionally identical to integral upsampling (or blockwise expansion, like that used in dequantizeLinear). One can even implement dequantizeLinear's blockwise expansion step via a backend's resampling operator + NN (and I was thinking to augment expand to accept block sizes for better functional decomposition of Q/DQ). So, there is clearly a familial relationship between spatial expansion and resampling, along with other spatial interpolation functions like ROI alignment and warped grid sampling.

into "image resizing" category:

It's true that these spatial interpolation functions are often used with imagery, but 1D resampling is useful for audio rescaling too (so I wouldn't limit it to just "image resizing"). Maybe there's a category name that better encompasses all these spatial remappings/reprojections? Anssi originally had the category name of "Tensor manipulation", which he changed per my comment because I found it a little too generic, but would you find that amenable? "Spatial modification"? "Spatial manipulation"? "Tensor reprojection"?

index.bs Outdated
{{MLGraphBuilder/lesser()}},
{{MLGraphBuilder/lesserOrEqual()}},
{{MLGraphBuilder/logicalNot()}},
{{MLGraphBuilder/where()}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, CoreML puts select into "control flow" category:

I can see why someone might think that, given it feels like a C ternary operator (a ? b : c), but functionally it's more like a gatherElements with 0/1 indices, as it doesn't cause any control flow change in the graph execution (unlike say ONNX If/While/Scan or CoreML cond/while_loop), and webNN doesn't support any true control flow operators currently, meaning it would be a weirdly isolated category.

index.bs Outdated
{{MLGraphBuilder/softplus()}},
{{MLGraphBuilder/softsign()}},
{{MLGraphBuilder/tanh()}},
{{MLGraphBuilder/triangular()}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a new "Tensor masking"
Also "tensor rearrangement" op?

Content with either.

{{MLGraphBuilder/gatherElements()}},
{{MLGraphBuilder/scatterElements()}},
{{MLGraphBuilder/gatherND()}},
{{MLGraphBuilder/scatterND()}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</td>
</tr>
<tr>
<td>Tensor casting</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, TOSA groups cast and scale (equivalent to quantizeLinear/dequantizeLinear) to "Type Conversion" category: https://www.mlplatform.org/tosa/tosa_spec.html#_type_conversion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy either way on this, having Q/DQ in their own category and putting them as part of data type conversion.

Incorporate more review feedback and suggestions.
@anssiko
Copy link
Member Author

anssiko commented Jun 13, 2025

@fdwr @huningxin thanks for your suggestions. I pushed another update. PTAL if I missed any feedback: fed3ae8

(This has been a fun exercise and a reminder naming in this field is hard and does not always agree with other fields, statistics, physics, information theory... While "Web API for chains of differentiable, parameterized geometric functions” would be a more accurate name, we call this spec the "Web Neural Network API" ;-))

Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@anssiko
Copy link
Member Author

anssiko commented Jun 17, 2025

I will merge this first stab to shorten the open PR queue. Thank you for your insights @fdwr and @huningxin!

Please feel free to continue adjust this table as you see fit. A natural check point for adjustment is when new ops are being added or existing removed, i.e. the moment when this table needs to be updated manually. I hope that won't become a chore but is actually a useful exercise to reason about our op coverage.

@anssiko anssiko merged commit f2f5f93 into main Jun 17, 2025
2 checks passed
@anssiko anssiko deleted the ops-by-category branch June 17, 2025 09:54
github-actions bot added a commit that referenced this pull request Jun 17, 2025
SHA: f2f5f93
Reason: push, by anssiko

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
zolkis pushed a commit to zolkis/webnn that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants