Skip to content

Support for device-based tensor storage objects #482

@bbernhar

Description

@bbernhar

This issue proposes a new opaque device-specific storage type in WebNN, MLBuffer. MLBuffer is a backend-agnostic storage type (CPU, GPU, NPU, etc) which can be used in WebNN operations.

MLBuffer would be the solution to:

  1. Give WebNN developer control of device-storage to avoid round-trips to/from CPU.
  2. Could be extended to export/import to support WebNN interop with web APIs.

Construction/Destruction

typedef [EnforceRange] unsigned long long MLSize64;

dictionary MLBufferDescriptor {
    required MLSize64 size;
};

[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {
    MLBuffer createBuffer(MLBufferDescriptor descriptor);
};
  • Layout of MLBuffer is always known (and linear access is assumed).
typedef unsigned long long MLSize64Out;

[Exposed=(Window, DedicatedWorker)]
interface MLBuffer {
  [CallWith=Isolate] void destroy();

  readonly attribute MLSize64Out size;
}
  • WebNN developers should prefer calling Destroy(), vs relying on GC, for predictable device memory usage.
  • Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.

Upload/Download tensor data

[Exposed=(Window, DedicatedWorker), SecureContext]
interface MLContext {

   undefined writeBuffer(
        MLBuffer dstBuffer, 
        MLSize64 dstOffset, 
        AllowSharedBufferSource srcData,
        optional MLSize64 srcOffset = 0,
        optional MLSize64 srcSize);

  [Exposed=(Window)]
  Promise<ArrayBuffer> readBuffer(
        MLBuffer srcBuffer,
        MLSize64 srcOffset,
        MLSize64 srcSize);
  
  [Exposed=(DedicatedWorker)]
  void readBufferSync(
        MLBuffer srcBuffer, 
        MLSize64 srcOffset,
        MLSize64 srcSize,
       AllowSharedBufferSource dstData);
};
  • Transfer operations will execute on the device timeline in the same order they were enqueued on the context timeline.
  • A copy of srcData is always made and returns control back to the web developer immediately.
  • For synchronous compute, use the read-back functions for window and workers, async and sync, respectively.

Binding to graphs

dictionary MLBufferView {
  required MLBuffer buffer;
  MLSize64 offset = 0;
  MLSize64 size;
};

typedef record<DOMString, MLBufferView> MLNamedMLBufferViews;

undefined dispatch(
      MLGraph graph, MLNamedMLBufferViewsinputs, MLNamedMLBufferViews outputs);
  • Buffer usage is always assumed on first access (ex. passed as outputs assumes output usage).
  • WebNN developer must call readBuffer() to get a resulting output ML buffer back after compute().
const bufferA = new Float32Array(4).fill(1.0);
const bufferB = new MLBuffer({size:4});
const inputs = {'A': bufferA};
const outputs = {'B': bufferB};
context.dispatch(graph, inputs, outputs);
context.readBuffer(bufferB);

Edits:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions