-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Gives ownership of the worker concurrent loop to the context. #41527
[Impeller] Gives ownership of the worker concurrent loop to the context. #41527
Conversation
…ng to a dead context.
0e58418 to
fe0ab34
Compare
fe0ab34 to
b8b9a82
Compare
| // Don't just drop tasks on the floor in case of shutdown. | ||
| if (shutdown_) { | ||
| FML_DLOG(WARNING) | ||
| << "Tried to post a task to shutdown concurrent message " | ||
| "loop. The task will be executed on the callers thread."; | ||
| lock.unlock(); | ||
| task(); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks could always be dropped when shutting down a ConcurrentMessageLoop, there was no guarantee that they would all be executed despite a few places that attempted to make that a thing. Specifically if more than N tasks were queued up and the loop was deleted, task[N+1, ..] would be dropped.
Dropping late tasks matches he behavior elsewhere in the engine (for example platform channel messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly notes for myself:
- This is ok because we don't give these workers to Dart. We only use them internally for SKP warmup and for image decoding.
| loop->PostTask(task); | ||
| return; | ||
| { | ||
| std::scoped_lock lock(weak_loop_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a small amount of overhead compared to the shared_ptr, but we could share the tasks_mutex_ if we wanted to add extra complexity.
jason-simmons
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid the complexity of managing the circular reference between the ConcurrentTaskLoop and the ConcurrentTaskRunner and the associated object lifetime issues.
Would it be feasible to have a design where the ContextVK provides an interface to the PipelineLibraryVK for queueing tasks so the PipelineLibraryVK does not need to directly hold a reference to the ConcurrentTaskRunner?
With that, the ContextVK would hold the only reference to the ConcurrentTaskRunner and could safely shut down the ConcurrentTaskRunner before deleting the Vulkan device.
| { | ||
| std::scoped_lock lock(weak_loop_mutex_); | ||
| if (weak_loop_) { | ||
| weak_loop_->PostTask(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be undefined behavior if a method like ConcurrentMessageLoop::PostTask is called on one thread after the object has begun deletion on a different thread.
At the point where this call happens, another thread may have already dropped the last reference to the ConcurrentTaskLoop, invoked delete, and called into the ConcurrentTaskLoop destructor.
The ConcurrentTaskLoop destructor will then try to acquire the weak_loop_mutex_, so the destructor will not make any further progress before the ConcurrentMessageLoop::PostTask call completes. However, I don't think it's guaranteed to be safe to execute the ConcurrentMessageLoop::PostTask call while the ConcurrentMessageLoop is in that state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be undefined behavior if a method like ConcurrentMessageLoop::PostTask is called on one thread after the object has begun deletion on a different thread.
Yep, good observation. In order for it to get undefined behavior ConcurrentTaskLoop would need virtual methods and a subclass. That would mean that the subclass could already have been deallocated while another thread is executing PostTask. ConcurrentTaskLoop doesn't have virtual methods but that's easy to add and introduce undefined behavior on accident so I've made the class final to avoid this.
The order of operations of deallocation is well defined. At the point where we are clearing out the reference the ConcurrentMessageLoop instance variables will still be valid and the memory for this will still be around. Since it is final we can know this is the very first thing that will be called.
Let me know if you you think we are still missing something.
|
Thanks, Jason. Responses are inline.
That's how the existing design worked. I think the lifetime issues should be resolved by making the class final (as noted above). The fact that it is cyclical doesn't bother me because these classes are
That wouldn't work unfortunately because the |
|
Sorry - got confused between As you noted, Embedders that create a Some other potential ideas:
|
Remember that the
I thought about that. The only reason to pass the |
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
dnfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that the ConcurrentMessageLoop is a unique_ptr, so it can't be used after its given to the ContextVK.
This is true as long as ContextVK takes a unique_ptr. There are other implementations in the engine using it as a shared_ptr, specifically dart_vm.h/cc. Why aren't we changing that to also use a unique_ptr?
fml/concurrent_message_loop.h
Outdated
|
|
||
| std::weak_ptr<ConcurrentMessageLoop> weak_loop_; | ||
| // Raw pointer that is cleared out in ~ConcurrentMessageLoop. | ||
| ConcurrentMessageLoop* weak_loop_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a macro like IPLR_GUARDED_BY to fml/macros.h and annotating this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and defining the mutex first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPLR_GUARDED_BY is impeller only, it's in //impeller/base/thread_safety.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which is why I'm saying to add something like FML_GUARDED_BY to //fml/macros.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, now it's done (i had added it but forgot to commit the change where it is used).
| // Shut down the worker message loop before the context starts getting torn | ||
| // down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to read this comment correctly.
The context is currently getting torn down right?
The extra reset here is guarding us against what? If someone adds another ivar after worker_message_loop_ that could somehow ref it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the comment, let me know if that is more clear.
The context is currently getting torn down right?
Right, but once we hit the closing bracket the ivars will start getting deleted in a undefined order (or at least a non-obvious order). We want the threads all joined before we start doing that.
The extra reset here is guarding us against what?
Guarding us from an unclear order of operations and adding more deterministic destruction. This, for example, makes sure that the shader_library_ is destructed on the raster thread instead of on some worker thread.
Using it as a The linked issue had clear reproductions steps and is demonstrably broken. So, that is what is addressed here, all while trying to make sure the changes make sense from ConcurrentMessageLoop's perspective. I didn't reason about dart_vm's usage since there was no filed issue that I know of. If there was a problem I'd expect it to manifest in its tests. |
dnfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the nit about a new macro for FML.
Would appreciate a review from @chinmaygarde
|
@chinmaygarde @dnfield I'd like to get consensus on the general design for how to avoid races between Impeller Vulkan background tasks and the lifetime of the Vulkan device. In particular I'm still concerned about the potential risk of calling a method on Of the options discussed here I think the simplest would be having the |
993b513 to
09d6627
Compare
Won't that be prevented by the mutex usage?
I can't think of a good reason not to do this either though. |
|
The |
Can you explain more about how this is a risk? Maybe it's not a risk or maybe you're seeing something I don't. You agree that calling a non-virtual method on |
That's what we have except we use a Is the idea that if we shut things down in the correct order we wouldn't need to clear out |
|
Calling a non-virtual method within a destructor is safe. But in this case thread A is calling the non-virtual method while thread B has invoked the delete operator and called into the destructor. I don't know if there is any explicit guarantee that thread A's concurrent usage of the object is safe. The current design of If There may also need to be a different interface between With something like this, the |
|
Having a synchronization on deleting the |
|
Dan, Jason and I had a chat offline about this and what we decided is we'd really like to get @chinmaygarde 's opinion about this. So, this PR gives us the ability to shutdown the It is demonstrably sufficient in solving the flake linked in the issue, and the CommandPoolVK leak, but we'd like to make sure there isn't more work that needs to be done to manage the lifetime of the device. I would really like to see a deterministic cc @jason-simmons @dnfield (let me know if I missed anything) |
|
I tried this patch: diff --git a/impeller/renderer/backend/vulkan/context_vk.cc b/impeller/renderer/backend/vulkan/context_vk.cc
index d41ccd0a5f..2d175d6489 100644
--- a/impeller/renderer/backend/vulkan/context_vk.cc
+++ b/impeller/renderer/backend/vulkan/context_vk.cc
@@ -108,7 +108,7 @@ ContextVK::ContextVK() : hash_(CalculateHash(this)) {}
ContextVK::~ContextVK() {
if (device_) {
- [[maybe_unused]] auto result = device_->waitIdle();
+ [[maybe_unused]] auto result = device_.waitIdle();
}
CommandPoolVK::ClearAllPools(this);
}
@@ -273,7 +273,7 @@ void ContextVK::Setup(Settings settings) {
device_info.setPEnabledFeatures(&required_features.value());
// Device layers are deprecated and ignored.
- auto device = physical_device->createDeviceUnique(device_info);
+ auto device = physical_device->createDevice(device_info);
if (device.result != vk::Result::eSuccess) {
VALIDATION_LOG << "Could not create logical device.";
return;
@@ -291,7 +291,7 @@ void ContextVK::Setup(Settings settings) {
weak_from_this(), //
application_info.apiVersion, //
physical_device.value(), //
- device.value.get(), //
+ device.value, //
instance.value.get(), //
dispatcher.vkGetInstanceProcAddr, //
dispatcher.vkGetDeviceProcAddr //
@@ -306,7 +306,7 @@ void ContextVK::Setup(Settings settings) {
/// Setup the pipeline library.
///
auto pipeline_library = std::shared_ptr<PipelineLibraryVK>(
- new PipelineLibraryVK(device.value.get(), //
+ new PipelineLibraryVK(device.value, //
caps, //
std::move(settings.cache_directory), //
settings.worker_task_runner //
@@ -317,11 +317,11 @@ void ContextVK::Setup(Settings settings) {
return;
}
- auto sampler_library = std::shared_ptr<SamplerLibraryVK>(
- new SamplerLibraryVK(device.value.get()));
+ auto sampler_library =
+ std::shared_ptr<SamplerLibraryVK>(new SamplerLibraryVK(device.value));
auto shader_library = std::shared_ptr<ShaderLibraryVK>(
- new ShaderLibraryVK(device.value.get(), //
+ new ShaderLibraryVK(device.value, //
settings.shader_libraries_data) //
);
@@ -334,7 +334,7 @@ void ContextVK::Setup(Settings settings) {
/// Create the fence waiter.
///
auto fence_waiter =
- std::shared_ptr<FenceWaiterVK>(new FenceWaiterVK(device.value.get()));
+ std::shared_ptr<FenceWaiterVK>(new FenceWaiterVK(device.value));
if (!fence_waiter->IsValid()) {
VALIDATION_LOG << "Could not create fence waiter.";
return;
@@ -343,7 +343,7 @@ void ContextVK::Setup(Settings settings) {
//----------------------------------------------------------------------------
/// Fetch the queues.
///
- QueuesVK queues(device.value.get(), //
+ QueuesVK queues(device.value, //
graphics_queue.value(), //
compute_queue.value(), //
transfer_queue.value() //
@@ -363,7 +363,7 @@ void ContextVK::Setup(Settings settings) {
instance_ = std::move(instance.value);
debug_report_ = std::move(debug_report);
physical_device_ = physical_device.value();
- device_ = std::move(device.value);
+ device_ = device.value;
allocator_ = std::move(allocator);
shader_library_ = std::move(shader_library);
sampler_library_ = std::move(sampler_library);
@@ -378,7 +378,7 @@ void ContextVK::Setup(Settings settings) {
/// Label all the relevant objects. This happens after setup so that the debug
/// messengers have had a chance to be setup.
///
- SetDebugName(device_.get(), device_.get(), "ImpellerDevice");
+ SetDebugName(device_, device_, "ImpellerDevice");
}
// |Context|
@@ -422,7 +422,7 @@ vk::Instance ContextVK::GetInstance() const {
}
vk::Device ContextVK::GetDevice() const {
- return *device_;
+ return device_;
}
std::unique_ptr<Surface> ContextVK::AcquireNextSurface() {
@@ -489,7 +489,7 @@ std::unique_ptr<CommandEncoderVK> ContextVK::CreateGraphicsCommandEncoder()
return nullptr;
}
auto encoder = std::unique_ptr<CommandEncoderVK>(new CommandEncoderVK(
- *device_, //
+ device_, //
queues_.graphics_queue, //
tls_pool, //
fence_waiter_ //
diff --git a/impeller/renderer/backend/vulkan/context_vk.h b/impeller/renderer/backend/vulkan/context_vk.h
index cc1376649b..e36a689894 100644
--- a/impeller/renderer/backend/vulkan/context_vk.h
+++ b/impeller/renderer/backend/vulkan/context_vk.h
@@ -77,7 +77,7 @@ class ContextVK final : public Context, public BackendCast<ContextVK, Context> {
template <typename T>
bool SetDebugName(T handle, std::string_view label) const {
- return SetDebugName(*device_, handle, label);
+ return SetDebugName(device_, handle, label);
}
template <typename T>
@@ -126,7 +126,7 @@ class ContextVK final : public Context, public BackendCast<ContextVK, Context> {
vk::UniqueInstance instance_;
std::unique_ptr<DebugReportVK> debug_report_;
vk::PhysicalDevice physical_device_;
- vk::UniqueDevice device_;
+ vk::Device device_;
std::shared_ptr<Allocator> allocator_;
std::shared_ptr<ShaderLibraryVK> shader_library_;
std::shared_ptr<SamplerLibraryVK> sampler_library_;On top of |
|
(The theory with that patch was maybe we shouldn't use the |
|
Here's a BT with that change: |
Ah yea. Looking at the source code for vk::Device, it just has a pointer to a FWIW I see how the device reference is managed as a separate issue to this PR. Even if we did migrate the device to a shared pointer, we'd still want to synchronize on the worker threads to avoid odd behaviors and to give us the ability to clean up the thread_local global variable. For example, something like: ContextVK::~ContextVK() {
device_.reset();
worker_message_loop_.reset();
// Cleans up the thread_local global variable.
CommandPoolVK::ClearAllPools(this);
// Assert that there is no lingering references to the device.
FML_DCHECK(!weak_device_.lock());
}Adopting this PR doesn't stop us from changing how device references are tracked. |
|
We had another offline meeting with Chinmay and he made the point that having a shared_ptr to the concurrent message loop and passing it into the ContextVK was intentional so that it could be shared across the whole process, limiting the number of threads. This PR doesn't allow that. If we do that, that means there needs to be a shutoff mechanism so that tasks that are queued up for execution after the ContextVK is deleted have a way to be aborted. Chinmay was also asked if it would be safe to delete the VkDevice from an arbitrary thread and he believed so. It sounds like the easiest thing to do will be to eliminate all instance variable storage of the VkDevice and instead store a I'm uncertain if this approach will easily allow us to clean up our cached CommandPoolVKs, but chinmay was of the opinion that we could probably switch back to creating them on demand. I made the point that maybe we could cache them by sending them through the stack instead of relying on a thread_local map. |
…41748) An alternative to #41527 This introduces a new protocol called `DeviceHolder` whose only purpose is to return a pointer to a device. That way people can hold onto weak_ptr's to the `DeviceHolder` and know to abort further work once the Device is deleted (like loading pipelines on a background thread). This PR also changes the return type of ContextVK::GetDevice from returning a reference to a pointer as a hope to better communicate what is happening and discourage copying of the vk::Device. I would actually recommend abolishing holding onto instance variables of the vk::Device since it isn't clear that that is just a raw pointer. That's sort of how we got into this problem. fixes flutter/flutter#125522 fixes flutter/flutter#125519 ~~fixes flutter/flutter#125571 Testing: Fixes flake in existing tests. [C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
|
I believe this is obsoleted by #41748. Closing. |
This gives ownership of the worker concurrent loop to the
ContextVK. This guarantees that the concurrent message loop is destroyed before theContextVKis destroyed. Without it there are random crashes when tasks on the current message loop are executing while or after theContextVKis destroyed.fixes flutter/flutter#125522
fixes flutter/flutter#125519
Test coverage: Covered by existing tests (see linked issue)
Pre-launch Checklist
///).If you need help, consider asking for advice on the #hackers-new channel on Discord.