-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Operation-specific APIs
This is a proposal to define and implement a small number of standalone APIs for individual compute-intensive operations (like convolution 2D and matrix multiplication) that are often the target of hardware acceleration. The APIs would be atomic, and would not be tied to a graph or model loader implementation. It would be up to javascript libraries or WASM to call into these low-level APIs.
Short description
Across many common machine learning models, there are a handful of compute-intensive operations that may account for 90-99% of inference time, based on the benchmarking done for Web NN. If these few operations were offered as standalone APIs, hardware acceleration could give much of the performance benefit with a small simple API surface, without needing to define all of the many other instructions and graph topology needed for a higher-level API like a graph or model loader. As a benefit, it ought to be faster to get this handful of APIs shipped.
JavaScript ML libraries would need to be updated to take advantage of the APIs, just like they can take advantage of Web GL today.
Example use cases
Image classification typically uses convolution and matrix multiplication. With hardware accelerated versions of these two operations, the performance boost would be close to the optimal that could be achieved with a complete graph or model execution API.
A rough idea or two about implementation
Maybe the closest example is Web GL compute shaders, except that these operations would be much simpler.