Skip to content

Operation-specific APIs #2

@jbingham

Description

@jbingham

Operation-specific APIs

This is a proposal to define and implement a small number of standalone APIs for individual compute-intensive operations (like convolution 2D and matrix multiplication) that are often the target of hardware acceleration. The APIs would be atomic, and would not be tied to a graph or model loader implementation. It would be up to javascript libraries or WASM to call into these low-level APIs.

Short description

Across many common machine learning models, there are a handful of compute-intensive operations that may account for 90-99% of inference time, based on the benchmarking done for Web NN. If these few operations were offered as standalone APIs, hardware acceleration could give much of the performance benefit with a small simple API surface, without needing to define all of the many other instructions and graph topology needed for a higher-level API like a graph or model loader. As a benefit, it ought to be faster to get this handful of APIs shipped.

JavaScript ML libraries would need to be updated to take advantage of the APIs, just like they can take advantage of Web GL today.

Example use cases

Image classification typically uses convolution and matrix multiplication. With hardware accelerated versions of these two operations, the performance boost would be close to the optimal that could be achieved with a complete graph or model execution API.

A rough idea or two about implementation

Maybe the closest example is Web GL compute shaders, except that these operations would be much simpler.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions