Skip to content

[Impeller] More efficient command encoding by removing deferred encoding. #140804

@jonahwilliams

Description

@jonahwilliams

Previously explored in flutter/engine#48848

Background

Impeller has a deferred command recording design, that is given a command like "draw rectangle":

  1. SolidColorContents::Render is executed. This constructs an impeller::Command object which contains the buffer and uniform bindings, and is stored in the impeller::RenderPass

... More commands are recorded and the render pass is finished...

  1. the impeller:::RenderPass dispatches to construct a vk::RenderPass (or equivalent MTL structure) which constructs the actual command buffer, and the impeller::Command and bindings objects are converted to real bindings

Overview

This process works reasonably well, but it adds some measurable overhead. In theory, we could directly record to both the native Metal and Vulkan command buffers instead of an impeller structure, which would remove all of this allocation.

Problems:

  • The workload in requires some non-trivial amounts of heap allocations per command. This work tends to be fairly fragmented and doesn't work great with Scudo allocator on Android.
  • Extra state: all commands current carry a stencil rect despite the fact that it is completely unused by Impeller. We may find that we're unwilling to add features that would be useful for Flutter GPU due to the additional cost it requires Impeller to incur.
  • We need to add more state setting commands, such as barriers ([Impeller] HAL needs barriers for both Metal/Vulkan for compute usage. #140798) for correct rendering with compute. Doing so to an intermediate requires even more allocation.

In contrast, if we record directly to the native cmd buffer, then we remove all extra allocation for Metal/Vulkan. This leaves us with two general problems and one Vulkan specific problem to fix:

  • Host Buffer allocation happens once at the end of render pass recording. This would also require us to implement [Impeller] Change the transient buffer to be a per-frame arena. #138161 or similar, as the current HostBuffer strategy works by flushing to a device buffer once at the end of render pass recording.
  • Vulkan will need to guess how many descriptor sets to create. But we can handle this with the recycling.
  • We need to make changes to the cmd state setting API to be stateful instead of creating a command object.

Example

We need to change the intermediate state setting so that it maps directly to the underlying cmd state. Something like this:

Before

Command cmd;
cmd.pipeline = context.getPipeline();
BindData(cmd, data);
pass.addCommand(std::move(cmd));

After

pass.setPipeline(context.getPipeline());
pass.bindData(metadata, data);
pass.draw();

Metadata

Metadata

Assignees

Labels

P1High-priority issues at the top of the work liste: impellerImpeller rendering backend issues and features requeststeam-engineOwned by Engine teamtriaged-engineTriaged by Engine team

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions