-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Description
Previously explored in flutter/engine#48848
Background
Impeller has a deferred command recording design, that is given a command like "draw rectangle":
- SolidColorContents::Render is executed. This constructs an impeller::Command object which contains the buffer and uniform bindings, and is stored in the impeller::RenderPass
... More commands are recorded and the render pass is finished...
- the impeller:::RenderPass dispatches to construct a vk::RenderPass (or equivalent MTL structure) which constructs the actual command buffer, and the impeller::Command and bindings objects are converted to real bindings
Overview
This process works reasonably well, but it adds some measurable overhead. In theory, we could directly record to both the native Metal and Vulkan command buffers instead of an impeller structure, which would remove all of this allocation.
Problems:
- The workload in requires some non-trivial amounts of heap allocations per command. This work tends to be fairly fragmented and doesn't work great with Scudo allocator on Android.
- Extra state: all commands current carry a stencil rect despite the fact that it is completely unused by Impeller. We may find that we're unwilling to add features that would be useful for Flutter GPU due to the additional cost it requires Impeller to incur.
- We need to add more state setting commands, such as barriers ([Impeller] HAL needs barriers for both Metal/Vulkan for compute usage. #140798) for correct rendering with compute. Doing so to an intermediate requires even more allocation.
In contrast, if we record directly to the native cmd buffer, then we remove all extra allocation for Metal/Vulkan. This leaves us with two general problems and one Vulkan specific problem to fix:
- Host Buffer allocation happens once at the end of render pass recording. This would also require us to implement [Impeller] Change the transient buffer to be a per-frame arena. #138161 or similar, as the current HostBuffer strategy works by flushing to a device buffer once at the end of render pass recording.
- Vulkan will need to guess how many descriptor sets to create. But we can handle this with the recycling.
- We need to make changes to the cmd state setting API to be stateful instead of creating a command object.
Example
We need to change the intermediate state setting so that it maps directly to the underlying cmd state. Something like this:
Before
Command cmd;
cmd.pipeline = context.getPipeline();
BindData(cmd, data);
pass.addCommand(std::move(cmd));After
pass.setPipeline(context.getPipeline());
pass.bindData(metadata, data);
pass.draw();