Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

@jonahwilliams jonahwilliams commented Dec 8, 2023

Rewrites binding to avoid lots of intermediate and per-command allocations.

Today each Command object holds 4 vectors (previously 4 maps) that contain all of the binding data per command. This leads to a large number of fragmented heap allocations, which is both slow to allocate and slow to deallocate on Android devices.

There are also secondary issues like:

  • Command/Compute command needing a vtable to implement ResourceBinder
  • the render_pass logic needing to do more work to access binding information
  • TrackedObjects vk needing to copy bindings, despite the fact that we could just move the binding vectors into the tracked objects.
  • Command is a big object.

And finally, the problem that I just really don't like the current binding API, where we seem to just sort of litter the bind calls across the Contents::Render method.

image

New Binding API

The binding methods have been changed to return the texture/sampler or buffer wrapped in the binding metadata. RenderPass/ComputePass addCommand has been given two new arguments that take a vector/initializer_list of these bindings which emplaces them into per-render pass pre-allocated storage.

When the command is recorded, the offsets into this render pass allocated storage are recorded back on the command object.

Example

 pass.AddCommand(
     std::move(cmd),
     {
         VS::BindFrameInfo(host_buffer.EmplaceUniform(frame_info)),
         FS::BindFragInfo(host_buffer.EmplaceUniform(frag_info)),
     },
     {
         FS::BindTextureSampler(
             gradient_texture,
             renderer.GetContext()->GetSamplerLibrary()->GetSampler(
                 sampler_desc)),
     }));

Follow up changes

This patch still leaves some inefficiencies, in that we are still continaully copying the binding metadata despite the fact that most of it is constants, just to support runtime effect.

@jonahwilliams jonahwilliams marked this pull request as ready for review December 8, 2023 22:07
@gaaclarke
Copy link
Member

So the before and after is:

before

  VS::BindFrameInfo(cmd, host_buffer.EmplaceUniform(frame_info)),
  FS::BindFrameInfo(cmd, host_buffer.EmplaceUniform(frame_info)),
  FS::BindTextureSampler(
             cmd
             gradient_texture,
             renderer.GetContext()->GetSamplerLibrary()->GetSampler(
                 sampler_desc)),
  pass.AddCommand(std::move(cmd));

after

  pass.AddCommand(
     std::move(cmd),
     {
         VS::BindFrameInfo(host_buffer.EmplaceUniform(frame_info)),
         FS::BindFragInfo(host_buffer.EmplaceUniform(frag_info)),
     },
     {
         FS::BindTextureSampler(
             gradient_texture,
             renderer.GetContext()->GetSamplerLibrary()->GetSampler(
                 sampler_desc)),
     }));

The ResourceBinder thing is weird. I'm not sure why it's polymorphic, just seems to be a shortcut to not have to type in which collection is going to hold the bindings, which we know at compile-time anyways.

I think we could solve most of the problems listed less invasively by getting rid of ResourceBinder and stealing ownership of the bindings from the Commands. It is a bit weird for a Command to need bindings to execute, but having no reference to them.

@jonahwilliams
Copy link
Contributor Author

You can't steal ownership of the heap allocated vector though, that space will be allocated and must be freed.

It is a bit weird for a Command to need bindings to execute, but having no reference to them.

But commands themselves cannot execute, they must be added to a render pass or compute pass.

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is sounding good (edit: I didn't do a code review). It solves the listed issues with performance and the oddity with ResourceBinder. I'm not 100% sold on the change to Command since it's kind of changing what it means. I'd like other people to consider it too. I don't have a good alternative after staring at it for a bit.

Comment on lines +77 to +79
bool AddCommand(Command&& command,
std::vector<BoundBuffer> buffers,
std::vector<BoundTexture> textures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this option? There is a lot of duplicated code here between these implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this one is a good question for you 😄

For some of the contents (like runtime effect) we do need to manage a vector allocation because the binding is very dynamic. I had thought that if I only had the method that accepted vectors, then the initializer lists would always be converted to vectors first, whereas I'm trying to avoid extra heap allocations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't know the size at compile-time we can't avoid the heap allocation (safely). If you make the method take in iterators you can share the same implementation and have the std::initializer_list and std::vector versions just call into that for convenience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, i'm not worried about avoiding that for runtime effect - but for things like solid color contents I am worried.

I like the idea of using iterators, I will try that out.

@chinmaygarde
Copy link
Member

Just an initial comment, I really dislike the ergonomics of the new API. Command now reads like a poor name for what is being added to the render pass now. Though I suppose I can get on board with the approach and massage the names a bit.

I am curious to see the after results from this change. Either in a benchmark or a flamegraph like the one you pasted in the original post.

Also please link an issue detailing the problem you are trying to solve.

@jonahwilliams
Copy link
Contributor Author

Thanks for taking a look at this @chinmaygarde !

Can you be a bit more specific about what you like dislike about the new API? There is certainly a trade off here in that I've made Command less monolithic. This isn't the first place we've thought about it, see also: flutter/flutter#133179

But binding can be thought of both as per command and even per-group of commands or per render pass. The current approach isn't necessarily the most accurate reflection of the driver APIs as it (without post-processing) disallows shared bindings.

I am curious to see the after results from this change. Either in a benchmark or a flamegraph like the one you pasted in the original post.

For changes focused on the efficiency of the HAL, I've been using the sample app from qr.flutter: https://github.com/theyakka/qr.flutter/tree/master/example . Nevertheless, in the past month we've seen some good CPU reduction on the complex_layout_scroll_perf application:

https://flutter-flutter-perf.skia.org/e/?queries=sub_result%3D90th_percentile_frame_rasterizer_time_millis%26sub_result%3Daverage_frame_rasterizer_time_millis%26test%3Dcomplex_layout_scroll_perf_ios__timeline_summary&selected=commit%3D38334%26name%3D%252Carch%253Dintel%252Cbranch%253Dmaster%252Cconfig%253Ddefault%252Cdevice_type%253DiPhone_11%252Cdevice_version%253Dnone%252Chost_type%253Dmac%252Csub_result%253Daverage_frame_rasterizer_time_millis%252Ctest%253Dcomplex_layout_scroll_perf_ios__timeline_summary%252C

@chinmaygarde chinmaygarde marked this pull request as draft December 14, 2023 21:12
@flutter-dashboard
Copy link

This pull request has been changed to a draft. The currently pending flutter-gold status will not be able to resolve until a new commit is pushed or the change is marked ready for review again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

No open projects
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants