Skip to content

Integration with real-time video processing #226

@dontcallmedom

Description

@dontcallmedom

The WebRTC Working Group is working on an API to allow fast processing of real-time video, with two proposals under discussion towards convergence: https://github.com/alvestrand/mediacapture-transform and https://github.com/jan-ivar/mediacapture-transform (see also relevant issues on convergence). Chromium has started shipping an implementation based on the first proposal which should allow for initial experimentation with the overall approach.

Since we can expect a lot of this real-time processing to be done based on Machine Learning models, and as suggested by the Web Machine Learning Working Group charter, we should ensure that models loaded via WebNN-backed JS frameworks can be used in the context of that API (in particular, of a WHATWG Streams-based API, running in a worker context, with video frames coming from a webcam likely stored in a GPU memory context), and that it delivers actual performance improvements (in particular that any boost from the hardware acceleration provided by WebNN doesn't get overtaken by cost associated with e.g. memory copies).

My sense is that the best way to determine this would be:

  • to build a prototype that integrates the mediacapture-transform API (in a worker context) with e.g. a TF.js model that allows for background blur
  • measure performance of the said prototype across various TF.js backends, including a WebNN-native one; ideally this would include specific measurements of memory copies, although the raw result on FPS may already give sufficient hints

While the real-time video processing framework in WebRTC is still somewhat in flux, I think we have enough convergence on the overall picture and a good enough basis for experimentation with the Chromium implementation to get started with such a work. The WebRTC Samples repo has a few examples of that API in action (video-crop in particular exercises it in a worker context).

/cc @aboba @alvestrand @jan-ivar

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions