Skip to content

Consider removing MLActivation parameters used for op fusion #658

@a-sully

Description

@a-sully

MLActivation currently has two uses:

  1. With operators like batchNormalization and conv2d, as a hint that this activation may be fusable with the given operator
  2. With recurrent operators like lstm and gru, as activations applied to gates within the given recurrent unit
Operator Recurrent? Purpose
batchNormalization No Maybe fusable
conv2d No Maybe fusable
convTranspose2d No Maybe fusable
gru Yes Applied to gate
gruCell Yes Applied to gate
lstm Yes Applied to gate
lstmCell Yes Applied to gate

I have some thoughts on (2), but I'll leave that to another issue. Let's focus on (1) here :)

My claim is that there's no benefit to passing MLActivations as parameters to MLOperands for the purpose of op fusion. Here's why:

  1. False positives: For any given permutation of operator, activation, and backend, op fusion may not be possible (it's quite unlikely, actually)
    • CoreML does not natively support fusing activations with any of these operators
    • DirectML only supports fusing some activations with some operators. There's an existing Chromium bug to un-fuse MLActivations from their respective MLOperand if the combo is not supported for the given version of DML
  2. False negatives: For any given operator which does not take currently take an MLActivation in the WebNN spec, it may in fact be fusible with its input or output

What this means in practice is:

  • Implementations wrapping backends which can't fuse operators, or can't fuse some operators with some activations, must trivially break apart the MLOperand and its MLActivation into what's effectively just two MLOperands connected to each other (as Chromium's CoreML backend currently does):
       input
         |
      operator
         |
     activation
         |
       output
    
  • Implementations wrapping backends which can fuse operators must do an optimization pass anyways to fuse operators which do not have MLActivation parameters in WebNN (as Chromium's DML backend currently does)

Whether a given operator can be fused with a given activation is a very backend-specific quirk. Presumably we don't want to plumb through a new MLActivation operator to the web for every operator which any backend decides it can now fuse with some activation! This seems best left as an implementation detail either by the user agent (as described above) or the framework (who knows how much op fusion is happening in Core ML under the hood?! 🤔)

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions