-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Description
Background
The metal backends usage of multi threaded encoding dramatically improved raster thread throughput, by allowing the engine to better utilize N cores when given multiple render passes. (flutter/engine#42028)
We should do the same thing with the Vulkan backend. Unlike Metal we'll have substantially more work to do - which is documented here! 👿
Overview
At a high level, there are a few changes we need to make to the Vulkan backend to support this:
- Prevent the raster thread from blocking on workloads that can be handled by background tasks.
- Correctly apply backpressure to resepct frames in flight.
- Ensure thread safety and coordination between encoding tasks and presentation tasks, especially layout transitions for sampled textures.
Swapchain management
Originally the Vulkan swapchain implementation would synchronously block on presentaiton. Since presentKHR can take upwards of 1ms, this would pessimize rendering - especially if there were multiple swapchains due to platform views. in flutter/engine#43976, I moved this to a background worker however this introduced #131610. To fix this, we'll need to track frames in flight in the swapchain as well.
While in theory we could be encoding multiple frames at the same time, based on the swapchain count, to simplify resource management it would be preferrable to block on queue submission before presenting. If not, then we would need to handle having multiple glyph atlases, et cetera.
Layout transitions
When there are multiple render passes, then async encoding runs into the issue of images potentially being in the wrong state. Today layout state works roughly as follows:
- Start building entity pass.
- Hit subpass, begin encoding.
- Finish encoding, layout transition output texture to optimal
- Finish encoding entity pass. Sample from texture in 3.
With async encoding, we may start encoding the entity pass before we've finished encoding the subpass, leaving the texture in the wrong layout state. To fix this, we may need to lift all layout transitions into blocking calls - or change how we structure encoding so that the order doesn't matter as much. I'm not sure what the exact right solution is, needs more investigation.
Queue submission order
Currently we submit all render passes as soon as they finished encoding on the raster thread; We rely on the implicit queue ordering. If we move encoding to worker threads, then we can't rely on this order anymore. That leaves two potential solutions:
- See [Impeller] Track per-resource synchronization timelines #120406. Essentially use timeline semaphores to track the optimal dependency processing order.
- Join all queue submissions into one. This works fairly easily if we have to block on submission anyway, though it may hurt latency.
