-
Notifications
You must be signed in to change notification settings - Fork 59
Closed
Labels
Description
MLActivation currently has two uses:
- With operators like
batchNormalizationandconv2d, as a hint that this activation may be fusable with the given operator - With recurrent operators like
lstmandgru, as activations applied to gates within the given recurrent unit
| Operator | Recurrent? | Purpose |
|---|---|---|
| batchNormalization | No | Maybe fusable |
| conv2d | No | Maybe fusable |
| convTranspose2d | No | Maybe fusable |
| gru | Yes | Applied to gate |
| gruCell | Yes | Applied to gate |
| lstm | Yes | Applied to gate |
| lstmCell | Yes | Applied to gate |
I have some thoughts on (2), but I'll leave that to another issue. Let's focus on (1) here :)
My claim is that there's no benefit to passing MLActivations as parameters to MLOperands for the purpose of op fusion. Here's why:
- False positives: For any given permutation of operator, activation, and backend, op fusion may not be possible (it's quite unlikely, actually)
- CoreML does not natively support fusing activations with any of these operators
- DirectML only supports fusing some activations with some operators. There's an existing Chromium bug to un-fuse
MLActivations from their respectiveMLOperandif the combo is not supported for the given version of DML
- False negatives: For any given operator which does not take currently take an
MLActivationin the WebNN spec, it may in fact be fusible with its input or output- DML claims to support op fusion for
DML_OPERATOR_ELEMENT_WISE_ADD1andDML_OPERATOR_GEMM, which ~map to WebNN'saddandgemmoperators - TFLite supports fusing various operators, as well
- DML claims to support op fusion for
What this means in practice is:
- Implementations wrapping backends which can't fuse operators, or can't fuse some operators with some activations, must trivially break apart the
MLOperandand itsMLActivationinto what's effectively just twoMLOperands connected to each other (as Chromium's CoreML backend currently does):input | operator | activation | output - Implementations wrapping backends which can fuse operators must do an optimization pass anyways to fuse operators which do not have
MLActivationparameters in WebNN (as Chromium's DML backend currently does)
Whether a given operator can be fused with a given activation is a very backend-specific quirk. Presumably we don't want to plumb through a new MLActivation operator to the web for every operator which any backend decides it can now fuse with some activation! This seems best left as an implementation detail either by the user agent (as described above) or the framework (who knows how much op fusion is happening in Core ML under the hood?! 🤔)
Thoughts?
Reactions are currently unavailable