-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] do not allocate bindings per-command object. #48848
Conversation
|
So the before and after is: before VS::BindFrameInfo(cmd, host_buffer.EmplaceUniform(frame_info)),
FS::BindFrameInfo(cmd, host_buffer.EmplaceUniform(frame_info)),
FS::BindTextureSampler(
cmd
gradient_texture,
renderer.GetContext()->GetSamplerLibrary()->GetSampler(
sampler_desc)),
pass.AddCommand(std::move(cmd));after pass.AddCommand(
std::move(cmd),
{
VS::BindFrameInfo(host_buffer.EmplaceUniform(frame_info)),
FS::BindFragInfo(host_buffer.EmplaceUniform(frag_info)),
},
{
FS::BindTextureSampler(
gradient_texture,
renderer.GetContext()->GetSamplerLibrary()->GetSampler(
sampler_desc)),
}));The I think we could solve most of the problems listed less invasively by getting rid of ResourceBinder and stealing ownership of the bindings from the Commands. It is a bit weird for a Command to need bindings to execute, but having no reference to them. |
|
You can't steal ownership of the heap allocated vector though, that space will be allocated and must be freed.
But commands themselves cannot execute, they must be added to a render pass or compute pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is sounding good (edit: I didn't do a code review). It solves the listed issues with performance and the oddity with ResourceBinder. I'm not 100% sold on the change to Command since it's kind of changing what it means. I'd like other people to consider it too. I don't have a good alternative after staring at it for a bit.
| bool AddCommand(Command&& command, | ||
| std::vector<BoundBuffer> buffers, | ||
| std::vector<BoundTexture> textures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this option? There is a lot of duplicated code here between these implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh this one is a good question for you 😄
For some of the contents (like runtime effect) we do need to manage a vector allocation because the binding is very dynamic. I had thought that if I only had the method that accepted vectors, then the initializer lists would always be converted to vectors first, whereas I'm trying to avoid extra heap allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't know the size at compile-time we can't avoid the heap allocation (safely). If you make the method take in iterators you can share the same implementation and have the std::initializer_list and std::vector versions just call into that for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i'm not worried about avoiding that for runtime effect - but for things like solid color contents I am worried.
I like the idea of using iterators, I will try that out.
|
Just an initial comment, I really dislike the ergonomics of the new API. I am curious to see the after results from this change. Either in a benchmark or a flamegraph like the one you pasted in the original post. Also please link an issue detailing the problem you are trying to solve. |
|
Thanks for taking a look at this @chinmaygarde ! Can you be a bit more specific about what you like dislike about the new API? There is certainly a trade off here in that I've made Command less monolithic. This isn't the first place we've thought about it, see also: flutter/flutter#133179 But binding can be thought of both as per command and even per-group of commands or per render pass. The current approach isn't necessarily the most accurate reflection of the driver APIs as it (without post-processing) disallows shared bindings.
For changes focused on the efficiency of the HAL, I've been using the sample app from qr.flutter: https://github.com/theyakka/qr.flutter/tree/master/example . Nevertheless, in the past month we've seen some good CPU reduction on the complex_layout_scroll_perf application: |
|
This pull request has been changed to a draft. The currently pending flutter-gold status will not be able to resolve until a new commit is pushed or the change is marked ready for review again. |
Rewrites binding to avoid lots of intermediate and per-command allocations.
Today each Command object holds 4 vectors (previously 4 maps) that contain all of the binding data per command. This leads to a large number of fragmented heap allocations, which is both slow to allocate and slow to deallocate on Android devices.
There are also secondary issues like:
And finally, the problem that I just really don't like the current binding API, where we seem to just sort of litter the bind calls across the Contents::Render method.
New Binding API
The binding methods have been changed to return the texture/sampler or buffer wrapped in the binding metadata. RenderPass/ComputePass addCommand has been given two new arguments that take a vector/initializer_list of these bindings which emplaces them into per-render pass pre-allocated storage.
When the command is recorded, the offsets into this render pass allocated storage are recorded back on the command object.
Example
pass.AddCommand( std::move(cmd), { VS::BindFrameInfo(host_buffer.EmplaceUniform(frame_info)), FS::BindFragInfo(host_buffer.EmplaceUniform(frag_info)), }, { FS::BindTextureSampler( gradient_texture, renderer.GetContext()->GetSamplerLibrary()->GetSampler( sampler_desc)), }));Follow up changes
This patch still leaves some inefficiencies, in that we are still continaully copying the binding metadata despite the fact that most of it is constants, just to support runtime effect.