-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Open this issue to follow up the operation-specific APIs discussion of 3/18 WebML CG call. @pyu10055 @wchao1115 @anssiko @jbingham, please take a look.
Use case
This is one scenario of the framework's op level execution use case (more details can be found in operation-specific API proposal). A JavaScript ML framework executes ops on the CPU device with WebAssembly. For the compute intensive ops, such as conv2d or matmul, the framework also wants to use WebNN API to execute the op (by a single-op MLGraph) with the ML-specific instructions, such as Vector Neural Network Instructions (VNNI), on the same CPU device.
Requirements
WebNN should allow frameworks create a MLContex for CPU device. This would avoid the unnecessary data copying cross devices when frameworks use WebAssembly - CPU to execute other ops.
WebNN should allow frameworks control when the output data is available for access. This would avoid the unnecessary tensor layout conversions between native ML API and the WebNN. Some background:
- Some native ML APIs use hardware dependent memory layout for acceleration, for example oneDNN uses different blocked memory layouts for better vectorization and cache reuse on different platforms.
- The memory layout conversions are expensive.
- Frameworks may use WebNN API to execute multiple ops (via multiple single-op MLGraphs) without access the intermediate results between them.
For example, a user of TensorFlow.js may execute 3 conv2d but only access the output of the last one:
c = tf.conv2d(a, b);
e = tf.conv2d(c, d);
h = tf.conv2d(f, g);
output = await h.data();A potential WebNN implementation would only need to do the memory layout conversion and put the data into ArrayBufferView when h.data() is invoked.