Conversation
… WebGPU device. - Make power preference part of context creation options. - Constant operands can be created from either WebGL or WebGPU buffers - Model inputs and outputs can be bound with WebGL or WebGPU textures - Prefix all types with "ML". Simplify "NeuralNetworkContext" to just "MLContext" - Switch to use constructor for MLModelBuilder instead of factory method
anssiko
left a comment
There was a problem hiding this comment.
Thanks @wchao1115!
Added some comments on how to make Bikeshed link to externally defined IDL that is not yet in the autolinking database.
index.bs
Outdated
|
|
||
| // Create a model that encapsulates the composition of operands by identifying the output operands. | ||
| Model createModel(NamedOperands outputs); | ||
| MLModel createModel(MLNamedOperands outputs); |
There was a problem hiding this comment.
Should we still keep MLModel building device agnostic and leave the device association to compile time? Say to support model.compile(glContext) and model.compile(gpuDevice). With that, developer can build the same model and compile for different devices.
There was a problem hiding this comment.
There are interop use cases where the model constants (e.g. weights, etc.) are already uploaded to the GPU buffers before the model is created. In that case it will be inefficient to require the caller to make a roundtrip of that data back to the CPU memory just to construct a device-agnostic model.
As we position WebNN also as a backend API for frameworks, the notion of device-agnostic graph could be in a way of the interop efficiency since by the time the backend graph is initiated (by the frameworks), the model constants may already be mapped to the memory hosted by the frameworks, and for the frameworks already dealing with device resources, those constants could already be uploaded to the device.
There was a problem hiding this comment.
It makes sense, thanks for the explanation. Given the MLModel is now device specific, would we consider merging it into the MLCompilation? Let's say replace the MLModel createModel(MLNamedOperands outputs) with Promise<MLCompilation> compile(MLNamedOperands outputs).
There was a problem hiding this comment.
I can see the logic to it. I think it makes sense.
There was a problem hiding this comment.
@huningxin, the #1 use case is a good one. I plan to address it by adding a device preference option in the MLContextOptions. This option will be complementary to the existing power preference. As for your #2 I'm still a bit confused. I looked at the scenario that was raised in #156 (copied as follow):
c = tf.conv2d(a, b);
e = tf.conv2d(c, d);
h = tf.conv2d(f, g);
output = await h.data();If webnn API is going to be used this way, what prevents the caller from compiling all the 3 conv2d ops into a single compiled graph before executing it in one go? i.e. if the caller has a graph of A + B and that they want the platform-specific intermediate result after A to stay intact before it enters B during execution, then that's exactly what a compiled graph of 'A + B' is for. That's precisely why graph execution is most likely more efficient than op-by-op execution.
If the graph is A + b + C where only A and C are defined as webnn ops, but b is what the caller implements themselves, and that the caller insists that A and C must be executed separately and sequentially, then the end result will be no worse than having A and C defined elsewhere as standalone API calls outside of the webnn graph definition.
The point is that the end result of executing a publicly defined API must not be platform-specific i.e. the resulting tensor of the operation must be in a standard layout, whether that public API is implemented as a compiled webnn graph, or as a standalone API outside of webnn. Hope this helps clarify my point of view.
There was a problem hiding this comment.
adding a device preference option in the
MLContextOptions. This option will be complementary to the existing power preference.
I like it!
what prevents the caller from compiling all the 3
conv2dops into a single compiled graph before executing it in one go?
That's because the framework provides the op API. When executing a graph, the user code just calls the op API to execute the operations of the graph one-by-one. The framework doesn't have the knowledge of the graph, it just executes ops. cc @pyu10055
For this scenario, the framework doesn't know the user code will execute 3 conv2d. When user code calls a conv2d, the framework just computes that conv2d. The framework may use a platform-specific tensor layout in the conv2d implementation. Until the user code calls h.data(), the framework starts to convert and copy the tensor data into a ArrayBufferView in plain layout.
So if the framework uses WebNN to compute conv2d, it would hurt the performance if WebNN always converts and copies the output tensor data to ArrayBufferView for each compute.
The point is that the end result of executing a publicly defined API must not be platform-specific
I agree with you, at the interops point, it should use the plain/standard tensor layout.
There was a problem hiding this comment.
That's because the framework provides the op API. When executing a graph, the user code just calls the op API to execute the operations of the graph one-by-one.
That doesn't sound like what a framework does. Are you saying that there are frameworks that never compiles a graph internally and just relying on individual op execution?
There was a problem hiding this comment.
Framework may provide both op API and model API, for example TensorFlow.js provides Operations API and Models/Loading API. It would depend on what API the user code uses. If the user code uses the op API to execute a graph, the framework won't get the graph but individual ops.
Actually, webnn-polyfill uses TensorFlow.js op API. I suppose it is a common scenario. cc @pyu10055 .
There was a problem hiding this comment.
@wchao1115 @huningxin Thank you Ningxin for brought up this point, op level execution is common for eager execution, especially in JS environment, users could creating graph on the fly. There will be no specific graph boundary, framework would not be able to know which are the intermediate tensors that users would not need to access.
index.bs
Outdated
| Promise<MLNamedOutputs> compute(MLNamedInputs inputs, | ||
| optional MLNamedOutputs outputs = {}); | ||
|
|
||
| Promise<MLNamedWebGLOutputs> compute(MLNamedWebGLInputs glInputs, |
There was a problem hiding this comment.
Since the glInputs (same as gpuInputs) is device specific, what would happen if this MLCompilation is compiled for another device? The browser implementation would handle the data transferring or throw an error?
There was a problem hiding this comment.
Yes, it will fail. Cross-device, cross-adapter support should not occur at this level. If the context is created with one device/adapter and the input is a resource from another, the API should fail. The callers of WebNN needs to ensure that the all the device-dependent resources are from the device that is bound to the context.
There was a problem hiding this comment.
It makes sense. We may need to specify this behavior in the following PR.
…upport for accepting GPU and WebGLBuffer as inputs.
…pdate examples accordingly.
|
And there is a bikeshield build error The spec still uses {{MLModel}} in Compilation section. |
I like that! |
We may need to define which resource type is acceptable for a specific type of device. For example, if the device is CPU, the typedef (WebGLBuffer or GPUBuffer or WebGLTexture or GPUTexture) GPUResource;
@wchao1115 , I put together above table with two opens. Please take a look and feel free to add anything I missed. |
… instead of the view in the buffer union type.
… and the GPU buffer resource view.
huningxin
left a comment
There was a problem hiding this comment.
LGTM. Thanks much for the great work.
|
@anssiko @huningxin It looks like |
I am not aware the manual way to trigger a github action. Hopefully PR #151 would fix it. Please take a look. |
|
The merge of #151 triggered a rebuild that deployed properly. @wchao1115 @huningxin great work with this PR. This is on our agenda for this week's call (note: it is 1 hour later than usual in the US/Canada!), and that's a good opportunity to summarize the key changes and design considerations discussed in this PR. |
Preview | Diff