Conversation
… implementations.
| @@ -1208,841 +1209,841 @@ KernelCreateInfo BuildKernelCreateInfo<void>() { | |||
|
|
|||
| static Status RegisterRocmKernels(KernelRegistry& kernel_registry) { | |||
| static const BuildKernelCreateInfoFn function_table[] = { | |||
| BuildKernelCreateInfo<void>, //default entry to avoid the list become empty after ops-reducing | |||
There was a problem hiding this comment.
lots of formatting changes in this file, can ignore whitespace when viewing diff
| @@ -0,0 +1,46 @@ | |||
| // Copyright (c) Microsoft Corporation. All rights reserved. | |||
| // Licensed under the MIT License. | |||
|
|
|||
There was a problem hiding this comment.
Or you may follow CPU EP's implementation. That one is thread-safe.
| return registry; | ||
| } | ||
|
|
||
| static SharedPtrThreadSafeWrapper<KernelRegistry> s_kernel_registry{&CreateCudaKernelRegistry}; |
There was a problem hiding this comment.
I would suggest not having global initializers. If you change it to function local, things will be much simpler.
There was a problem hiding this comment.
Function local static would be nice, but it needs to be reset by another function. @RyanUnderhill knows more about why.
There was a problem hiding this comment.
Yep, in shared providers function local statics whose destructors depend on core code will be destroyed too late. They'll crash as the core code it depends on is already uninitialized. So we have to make them explicit.
There was a problem hiding this comment.
Even it is function local, you still can call Reset() manually. Function local delays initiation to run time.
There was a problem hiding this comment.
I guess you mean doing something like:
static std::shared_ptr<KernelRegistry>& CudaKernelRegistry() {
static std::shared_ptr<KernelRegistry> registry = ...
return registry;
}
std::shared_ptr<KernelRegistry> CUDAExecutionProvider::GetKernelRegistry() const {
return CudaKernelRegistry();
}
void Shutdown_DeleteRegistry() {
CudaKernelRegistry().reset();
}But that has different behavior when calls to GetKernelRegistry() and Shutdown_DeleteRegistry() are interleaved. E.g., GetKernelRegistry() will return an empty shared_ptr after Shutdown_DeleteRegistry() is called.
There was a problem hiding this comment.
@Craigacp We can match the same behavior as C# for Java as well. Which part won't be a singleton like C#? C# does not allow the user to configure it on instantiation.
There was a problem hiding this comment.
I'd prefer not to remove useful functionality like configuring a threadpool and setting the logging level. Switching it over to a singleton will mean the environment is constructed at an unpredictable time (as it's based on class initialization) and there's no potential for a user to close it to free up resources while keeping the JVM alive. A shutdown hook will clean up the environment most of the time (apart from when the JVM gets SIGKILLed or similar) and we'd need that for the singleton anyway. Allowing user controlled construction avoids the first problem while giving more flexibility.
What kind of things live inside the environment object? Is it worth allowing users to close it explicitly if they are finished with ORT, under the assumption that it throws some kind of exception if they try to make another one in the same process?
There was a problem hiding this comment.
I've filed #10670 which switches the environment over so that only one can be created in the lifetime of the JVM (unless the user starts messing with class loaders), and its closed by a JVM shutdown hook. I left the close method in place for compatibility though it is now a no-op. If we want the environment to be closeable to release resources then I'll need to do some more refactoring as the current close idiom will be very confusing.
There was a problem hiding this comment.
In terms of resources the most important one that the Environment contains is the threadpool and that too only if you create the env with the global threadpool option. See here. Besides this there are global registrations for various schemas that can't be undone.
The OrtEnv obj exposed in the C API wraps the internal Environment class. Each time you request an OrtEnv object the refcount is incremented for the same instance in the process. We do provide a Release method that decrements the refcount and finally gets rid of the instance allowing another instance to be created in the same process. The Release method calls the OrtEnv destructor which unloads the shared libs.
The C# wrapper doesn't provide any of this.
There was a problem hiding this comment.
I used to have a similar refcount in the Java side and it would hand out the same Java reference to the native OrtEnv, but it would call the destructor when the count gets to zero and allow users to build a fresh environment after that, which is the source of the trouble. It doesn't look like there are too many resources held by it, so it's probably ok without an explicit close.
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Outdated
Show resolved
Hide resolved
| thread = std::thread{load_model_thread_fn}; | ||
| } | ||
|
|
||
| for (auto& thread : threads) { |
There was a problem hiding this comment.
I am concerned that in the event of the test failure, we won't get a good diagnostics since we are going to destroy unjoined and undetached threads. And that would call std::terminate
There was a problem hiding this comment.
I tried to catch exceptions in the thread function before, but it turns out that wasn't the only way it was failing. For example, sometimes debug assertions from the standard library would be hit which also stop the process. I didn't make any further attempt to fail gracefully. Do you have any suggestions?
This reverts commit 0a68c6a.
|
How is it going? I hope it will be in the upcoming release. |
…nel_registry_thread_safety
This reverts commit cc12013.
Description
Make GetKernelRegistry() kernel registry initialization thread-safe.
Motivation and Context
Fix #10179