[Impeller] Storing MSAA stencil texture has disproportionate impact on performance on Vulkan backend.

Animated Advanced Blend Performance is 50x worse than iPhone 11 on a Pixel 7 Pro, With GPU times in excess of 50ms (with an 8ms budget). Below is a Mali streamline trace recorded on a Pixel 7 Pro, one with MSAA enabled and one with MSAA disabled. The top left corner shows the total counter per time period, which for these graphs was 100ms.


## Some notable differences:

Texture External Read Beats:
  * "The number of read beats received by the texture unit that required an external memory access due to an L2 cache miss.".
~11.5x higher with MSAA enabled
Texture L2 Read Beats:
  * "The number of read beats received by the texture unit." Possibly number of _not_ missed reads?
4x higher with MSAA enabled.

Other external bus/read metrics: all seem to indicate that we are reading/writing way too much texture data. So why is this happening?

### MSAA Enabled

![image](https://github.com/flutter/flutter/assets/8975114/467801f0-439b-4545-9e2d-31ef5d0b4d90)

### MSAA Disabled

![image](https://github.com/flutter/flutter/assets/8975114/8ec00074-73cd-4584-9ff3-3ebe52f04720)




## Is this due to stencil storage?

Right now we multisample the stencil buffer and store it across backdrop filters (to avoid repopulating it). Since this test app has no clips, we can experiment with this by disabling the stencil storage. This was done by changing the stencil store/load actions to be DontCare and Clear, respectively.


Diff

```
diff --git a/impeller/entity/entity_pass.cc b/impeller/entity/entity_pass.cc
index 4d6897b5b3..980c3fe583 100644
--- a/impeller/entity/entity_pass.cc
+++ b/impeller/entity/entity_pass.cc
@@ -258,8 +258,7 @@ void EntityPass::AddSubpassInline(std::unique_ptr<EntityPass> pass) {
 
 static RenderTarget::AttachmentConfig GetDefaultStencilConfig(bool readable) {
   return RenderTarget::AttachmentConfig{
-      .storage_mode = readable ? StorageMode::kDevicePrivate
-                               : StorageMode::kDeviceTransient,
+      .storage_mode = StorageMode::kDeviceTransient,
       .load_action = LoadAction::kDontCare,
       .store_action = StoreAction::kDontCare,
   };
diff --git a/impeller/entity/inline_pass_context.cc b/impeller/entity/inline_pass_context.cc
index 4253e75987..f83247a1ad 100644
--- a/impeller/entity/inline_pass_context.cc
+++ b/impeller/entity/inline_pass_context.cc
@@ -142,13 +142,12 @@ InlinePassContext::RenderPassResult InlinePassContext::GetRenderPass(
 
   // Only clear the stencil if this is the very first pass of the
   // layer.
-  stencil->load_action =
-      pass_count_ > 0 ? LoadAction::kLoad : LoadAction::kClear;
+  stencil->load_action = LoadAction::kClear;
   // If we're on the last pass of the layer, there's no need to store the
   // stencil because nothing needs to read it.
   stencil->store_action = pass_count_ == total_pass_reads_
                               ? StoreAction::kDontCare
-                              : StoreAction::kStore;
+                              : StoreAction::kDontCare;
   pass_target_.target_.SetStencilAttachment(stencil.value());
 
   pass_target_.target_.SetColorAttachment(color0, 0);
```

### Results

The performance is immediately much better than with MSAA enabled. ~80FPS up from ~4FPS.

The results are much closer to the MSAA disabled case, but why? The stencil buffer is only 8 bits, so a multisampled stencil is 32 bits, roughly the same size as our offscreen texture we use for each backdrop. Naively I'd expect something like a 2x difference by removing the stencil storage, but given the substantial overhead it might be possible that we're causing some other form of deoptimization by storing the stencil - or something like the implementation deciding to back the S8 stencil with a full depth/stencil without telling us? That seems contrary to the spirit of Vulkan, but just a guess

![image](https://github.com/flutter/flutter/assets/8975114/26b48b93-ab78-47b3-aeb5-62cad3f097eb)


### How to avoid storing the stencil:

I detailed an approximate solution to this in the discussion for StC here under "Depth Buffer for Clipping": [Stencil Then Cover for Impeller]() . The gist is rather than storing the stencil, we replay all stencil affecting commands when recreating the root pass (with an easy nop if there were none). The benefit of this approach is that it also opens a route to depth/stencil usage for StC!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Impeller] Storing MSAA stencil texture has disproportionate impact on performance on Vulkan backend. #137302

Some notable differences:

MSAA Enabled

MSAA Disabled

Is this due to stencil storage?

Results

How to avoid storing the stencil:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Impeller] Storing MSAA stencil texture has disproportionate impact on performance on Vulkan backend. #137302

Description

Some notable differences:

MSAA Enabled

MSAA Disabled

Is this due to stencil storage?

Results

How to avoid storing the stencil:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions