Skip to content

Conversation

@chwarr
Copy link
Member

@chwarr chwarr commented Aug 13, 2025

Description

Delay the call to OrtGetApiBase() until the first call to Ort::GetApi() so that OrtGetApiBase() is typically called after dynamic library loading.

Motivation and Context

When ORT_API_MANUAL_INIT is not defined (which is the default), the static Ort::Global<void>::api_ has a dynamic initializer that calls OrtGetApiBase()->GetApi(ORT_API_VERSION) This dynamic initialization can cause problems when it interacts with other global/static initialization. On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries.

  • Replace the templated Global<void>::api_ with an inline static initialized to nullptr.
  • Ort::GetApi() now calls detail::Global::GetApi() which calls detail::Global::DefaultInit() if initialization is needed.
    • When ORT_API_MANUAL_INIT is defined, DefaultInit() returns nullptr, which will eventually cause the program to crash. The callers have violated the initialization contract by not calling one of the Ort::InitApi overloads.
    • When ORT_API_MANUAL_INIT is not defined, DefaultInit() uses a function-level static to compute the result of OrtGetApiBase()->GetApi(ORT_API_VERSION) once and return it.
  • Ort::Global<void> has been replaced with a non-templated type and moved inside a detail namespace. Since the Global<void> object was documented as being used internally, it is believed that these changes here are non-breaking, as they do not impact a public API. The public APIs, Ort::InitApi() and Ort::InitApi(const OrtApi*) remain unchanged.
  • Add #pragma detect_mismatch to surface issues with compilation units that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.)

chwarr added 3 commits August 13, 2025 15:26
When ORT_API_MANUAL_INIT is not defined (which is the default), the
static `Ort::Global<void>::api_` has a dynamic initializer that calls
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization
can cause problems when it interacts with other global/static
initialization. On Windows in particular, it can also cause deadlocks
when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to
load any other libraries.

Delay the call to `OrtGetApiBase()` until the first call to
`Ort::GetApi()` so that `OrtGetApiBase()` is typically called after
dynamic library loading.

* Replace the templated `Global<void>::api_` with an inline static
  initialized to nullptr.
* `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls
  `detail::Global::DefaultInit()` if initialization is needed.
  * When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns
    nullptr, which will eventually cause the program to crash. The
    callers have violated the initialization contract by not calling one
    of the `Ort::InitApi` overloads.
  * When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a
    function-level static to compute the result of
    `OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it.
* `Ort::Global<void>` has been replaced with a non-templated type and
  moved inside a `detail` namespace. Since the `Global<void>` object was
  documented as being used internally, it is believed that these changes
  here are non-breaking, as they do not impact a public API. The public
  APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain
  unchanged.
* Add `#pragma detect_mismatch` to surface issues with compilation units
  that disagree on how ORT_API_MANUAL_INIT is defined. (Clang and MSVC
  only.)
Some of the Clang docs implied that detect_mismatch was supported by
Clang, but the CI build produced errors about an unknown pragrma.
The Node.js and Vitis EPs were accessing internal implementation details
for the C++ API.

Switch them to use public APIs instead.

This comment was marked as outdated.

@skottmckay
Copy link
Contributor

On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries.

When does this happen? AFAIK the ORT implementation simply returns a struct from OrtGetApiBase and GetApi(version) does a version check. What is triggering other library loads as part of that process?

const OrtApiBase* ORT_API_CALL OrtGetApiBase(void) NO_EXCEPTION {
return &ort_api_base;
}

ORT_API(const OrtApi*, OrtApis::GetApi, uint32_t version) {
if (version >= 1 && version <= ORT_API_VERSION)
return &ort_api_1_to_23;
fprintf(stderr,
"The requested API version [%u] is not available, only API versions [1, %u] are supported in this build."
" Current ORT Version is: %s\n",
version, ORT_API_VERSION, ORT_VERSION);
return nullptr; // Unsupported version
}

@chwarr
Copy link
Member Author

chwarr commented Aug 15, 2025

On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries.

When does this happen? AFAIK the ORT implementation simply returns a struct from OrtGetApiBase and GetApi(version) does a version check. What is triggering other library loads as part of that process?

@skottmckay, yeah, I could have made that sentence clearer. :-) How's this? "On Windows in particular, it can also cause deadlocks when used in a dynamic library if calling OrtGetApiBase()->GetApi() causes other libraries to be loaded."

With the current implementation, if a .cpp that is part of foo.dll has #include <onnxruntime_cxx_api.h>, the presence of the Global<void>::api_ = OrtGetApiBase()->GetApi(ORT_API_VERSION); static initializer means that a call to OrtGetApiBase() must happen when foo.dll is being loaded and it holds the loader lock. If OrtGetApiBase() is provided by another DLL like onnxruntime.dll and that DLL isn't loaded yet (e.g., it's delayloaded), then foo.dll ends up attempting to make a call to LoadLibrary while holding the loader lock.

The guidance on Windows is to do as little as possible when the DLL is loaded. In C++, this means that your global and static initializers effectively to be 0/constant values.

This change does that: Ort::Global::api_ is now initialized to 0. The interesting initialization logic will happen the first time Ort::GetApi() is called, hopefully outside of the loader lock.

@chwarr chwarr requested review from Copilot and yuslepukhin August 18, 2025 21:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ONNX Runtime C++ API to eliminate dynamic initialization of the static global API pointer, replacing it with lazy initialization to avoid deadlocks and static initialization order issues when used in dynamic libraries.

  • Replaces templated Global<void>::api_ with a non-templated detail::Global class that uses lazy initialization
  • Introduces function-level static initialization in detail::Global::Api() to delay API loading until first use
  • Updates all direct usages of Ort::Global<void>::api_ to use proper API functions

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
include/onnxruntime/core/session/onnxruntime_cxx_api.h Main refactoring of API initialization from templated global to lazy-initialized detail class
onnxruntime/test/testdata/custom_op_library/custom_op_library.cc Updates direct API assignment to use Ort::InitApi()
onnxruntime/test/autoep/library/ep_arena.h Adds manual init define/undef block
onnxruntime/core/providers/vitisai/imp/global_api.cc Replaces direct API access with Ort::GetApi() calls
onnxruntime/core/providers/shared_library/provider_ort_api_init.cc Updates provider initialization to use Ort::InitApi()
js/node/src/inference_session_wrap.cc Changes API null check to use Ort::GetApi()

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

yuslepukhin
yuslepukhin previously approved these changes Aug 22, 2025
* Fix spelling error
* Fix assert code that doesn't appear to ever be compiled.
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@chwarr
Copy link
Member Author

chwarr commented Aug 28, 2025

Five of the pending checks have been "Waiting for status to be reported" for the past two days. How can I kick them again? I don't see any GitHub UX to do this. I assume I'm lacking the right permissions.

I'm unsure how to handle this build issue from the CUDA build. It looks like vcpkg install failed to process telemetry. Is there a way to re-run this check? I don't see any GitHub UX to do this. I assume I'm lacking the right permissions.

 -- Running vcpkg install
warning: feature cuda-ep was passed, but that is not a feature supported by onnxruntime supports.
Fetching registry information from https://github.com/Microsoft/vcpkg (HEAD)...
Fatal error. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
   at System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Generic.KeyValuePair`2[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Double, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(Int32)
   at Microsoft.ApplicationInsights.Extensibility.Implementation.Property.SanitizeMeasurements(System.Collections.Generic.IDictionary`2<System.String,Double>)
   at Microsoft.ApplicationInsights.Extensibility.Implementation.JsonSerializer.SerializeToStream(System.Collections.Generic.IEnumerable`1<Microsoft.ApplicationInsights.Channel.ITelemetry>, System.IO.TextWriter)
   at Microsoft.ApplicationInsights.Extensibility.Implementation.JsonSerializer.Serialize(System.Collections.Generic.IEnumerable`1<Microsoft.ApplicationInsights.Channel.ITelemetry>, Boolean)
   at Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.Send(System.Collections.Generic.IEnumerable`1<Microsoft.ApplicationInsights.Channel.ITelemetry>, System.TimeSpan)
   at Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.DequeueAndSend(System.TimeSpan)
   at Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.Runner()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)

@yuslepukhin
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chwarr
Copy link
Member Author

chwarr commented Aug 28, 2025

Thanks for the help re-running, @yuslepukhin.

Are you able to merge this too?

The PR guidelines say that I should be the one to do the merge, but I only see "Close with comment" and "Comment" buttons. No "Merge" button. I assume this is a permissions issue too.

@yuslepukhin yuslepukhin merged commit 179f371 into microsoft:main Aug 28, 2025
87 of 88 checks passed
preetha-intel added a commit to intel/onnxruntime that referenced this pull request Sep 1, 2025
* [CPU] Optimize GQA attention bias application for FP16 (microsoft#25871)

### Description

When using attention bias input for GQA op with FP16, on the platforms
that don't natively support FP16 math a cast to fp32 needs to be
performed, and thus a temporary buffer needs to be created to store the
fp32 values. The issue is that this temporary buffer was being allocated
/ deallocated inside of a loop for every token being processed.
Refactored the implementation so that the allocation takes place only
once.

Phi model throughput increased by 15%.

* Fixes for DynamicQuantizeMatMul and Attention3D tests (microsoft#25814)

### Description
This change fixes correctness issues in two areas that were causing
failures in onnxruntime_test_all:

- DynamicQuantizeMatMul.WithConstantBInputs
- AttentionTest.Attention3DDefault
- AttentionTest.Attention3DWithPastAndPresentQkMatmul

What was wrong and how it’s fixed
1) DynamicQuantizeMatMul.WithConstantBInputs
- Root cause: The Kleidi dynamic quantization GEMM path could be
selected even when the B scales contained values such as (zero,
negative, or non-finite). That violates kernel assumptions and can lead
to incorrect results.
- Fix: In
`onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`,
we now explicitly validate that all B scales are finite and strictly
positive before enabling the Kleidi/MLAS dynamic path. If any scale is
invalid, we disable that path.

2) Attention tests (Attention3DDefault,
Attention3DWithPastAndPresentQkMatmul)
- Root causes in
`onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`:
- Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g.,
not respecting C = beta*C when alpha==0 or K==0).
  - Unnecessary or premature fallbacks for small shapes.
- Fixes:
- Add early-outs for degenerate sizes: if M==0 or N==0, return handled.
  - Correctly implement alpha/beta semantics:

---------

Signed-off-by: Jonathan Clohessy <[email protected]>

* Fix MoE CPP tests (microsoft#25877)

This change adds skip test for QMoE CPU tests when running on TensorRT
or CUDA EP.
In the QMoE kernel there was a memory overwrite bug in the accumulate
part, updated that and this fixed the python tests back

* [c++] Eliminate dynamic initialization of static Ort::Global<void>::api_ (microsoft#25741)

### Description

Delay the call to `OrtGetApiBase()` until the first call to
`Ort::GetApi()` so that `OrtGetApiBase()` is typically called after
dynamic library loading.

### Motivation and Context

When ORT_API_MANUAL_INIT is not defined (which is the default), the
static `Ort::Global<void>::api_` has a dynamic initializer that calls
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization
can cause problems when it interacts with other global/static
initialization. On Windows in particular, it can also cause deadlocks
when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to
load any other libraries.

* Replace the templated `Global<void>::api_` with an inline static
initialized to nullptr.
* `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls
`detail::Global::DefaultInit()` if initialization is needed.
* When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns
nullptr, which will eventually cause the program to crash. The callers
have violated the initialization contract by not calling one of the
`Ort::InitApi` overloads.
* When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a
function-level static to compute the result of
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it.
* `Ort::Global<void>` has been replaced with a non-templated type and
moved inside a `detail` namespace. Since the `Global<void>` object was
documented as being used internally, it is believed that these changes
here are non-breaking, as they do not impact a public API. The public
APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain
unchanged.
* Add `#pragma detect_mismatch` to surface issues with compilation units
that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.)

---------

Co-authored-by: Copilot <[email protected]>

* python GPU IO Bindings for NVIDIA  (microsoft#25776)

### Description
<!-- Describe your changes. -->
1. A Small change to use the shared allocator in Python binding. 
2. Remove the FP64 support from the EP. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The Python GPU IO binding is necessary for performance. The change will
enable the shared allocator for GPU allocation.
The FP64 was using the FP32 inference—aligned WRT TRT RTX support.

---------

Co-authored-by: Gaurav Garg <[email protected]>

* [CANN] Add a `enable_cann_subgraph` feature parameter (microsoft#25867)

### Description

Add a `enable_cann_subgraph` feature parameter. this parameter controls
whether graph splitting is performed and can help quickly identify
issues in certain scenarios.

* [EP ABI] Add OpAttr_GetTensorAttributeAsOrtValue and replace the existing Node_GetTensorAttributeAsOrtValue (microsoft#25886)

### Description
Replace `Node_GetTensorAttributeAsOrtValue` with
`OpAttr_GetTensorAttributeAsOrtValue`.
Change the API signature to make it one of the `OpAttr` interfaces
instead of the `OrtNode` interface.

The original API was added
[here](microsoft#25566).

* Language bindings for model compatibility API (microsoft#25878)

### Description
This change builds on top of microsoft#25841 , and adds the scaffolding necessary
to call into this API from C++ / C# / Python.

### Motivation and Context
microsoft#25454 talks more about the broader notion of precompiled model
compatibility. This change is directed at app developers whose apps may
want to determine if a particular precompiled model (e.g. on a server
somewhere) is compatible with the device where the application is
running. There is functionality in `OrtEpFactory` for making this
determination, which was exposed as a C API in microsoft#25841, and this change
makes the API more broadly available in other languages.

### Testing and Validation
Introduced new unit test cases across each language, and verified that
the API was being called and returned the correct result for the default
CPU EP.

---------

Co-authored-by: Aditya Rastogi <[email protected]>

* [QNN-EP] Introduce Level1 Transformer into qnn.preprocess (microsoft#25883)

### Description
- Introduce Level1 Transformer into qnn.preprocess to support various optimizations.

### Motivation and Context
- This change brings in several useful optimizations such as `ConvBnFusion` and `ConstantFolding`, which are part of
`TransformerLevel::Level1` and can benefit QNNEP.
- The goal is to optimize the ONNX model before quantization by integrating these passes into the Python tooling workflow.

* [QNN EP] Minor fix weight name missing when not valid QDQ node group (microsoft#25887)

### Description
Minor fix weight name missing when not valid QDQ node group

### Motivation and Context
Some quantized model failed QDQ node group validation, the weights then won't be folded as initializer. QNN EP failed to handle the dynamic weights here due to the transpose op input name look up. This change make sure we process the weights tensor before adding transposes.

* Add custom ops library_path to EP metadata (microsoft#25830)

## Summary
Adds EP metadata library path support to enable custom ops DLL
registration with proper path resolution.

## Changes
- Added `library_path` metadata key to EP metadata infrastructure
- Pass resolved library path directly to `EpLibraryProviderBridge`
constructor
- Simplified implementation per reviewer feedback (removed virtual
method complexity)
- Added `#include <utility>` for std::move compliance

## Purpose
Enables downstream applications (like onnxruntime-genai) to resolve
relative custom ops library paths using EP metadata, improving DLL
registration reliability.

## Files Modified
- `plugin_ep/ep_factory_provider_bridge.h`
- `plugin_ep/ep_library.h` 
- `plugin_ep/ep_library_plugin.h`
- `plugin_ep/ep_library_provider_bridge.cc`
- `plugin_ep/ep_library_provider_bridge.h`
- `utils.cc`

* [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.23  (microsoft#25884)

### Description
This update introduces multiple improvements, fixes, and feature enhancements to the OpenVINO Execution Provider (OVEP) and related components in ONNX Runtime:

#### Configuration & Properties

- Updated load_config mapping to act as a passthrough to OpenVINO properties.
- Added support for providing layout information to inputs/outputs in OpenVINO.

#### Inference & Tensor Handling

- Improved OVInferRequest::SetTensor to correctly handle cached binding shape mismatches.
- Added support for self-detecting on-the-fly bfloat16 → float16 conversion.
- Fixed issues with input ONNX models when used with shared execution contexts.

#### Model Handling & Operator Support

- Fixed model copying behavior for QDQ stripping.
- Updated operator support status for OpenVINO 2025.2.

#### Platform & Integration Fixes

- Applied multiple PSU Lora fixes and related updates.
- Resolved filename confusion issues with wrapped OVIRs in EPCtx.
- Enabled memory-mapped native binaries for OpenVINO 2025.3.

#### Quality & Maintenance

- Addressed linting issues.
- Fixed coverage gaps in OVEP.
- Added a new test script for OpenVINO with ORT ABI integration.

---------

Co-authored-by: Ankit Maheshkar <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
Co-authored-by: Klimenko, Mikhail <[email protected]>
Co-authored-by: sfatimar <[email protected]>
Co-authored-by: Garth Long <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
Co-authored-by: Eric Crawford <[email protected]>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: Vishnudas Thaniel S <[email protected]>
Co-authored-by: Javier Martinez <[email protected]>

* [java] Auto EP and compile model support (microsoft#25131)

### Description
Java API for compile model and EP discovery APIs. Roughly equivalent to
the C# version in microsoft#24604.

cc: @skottmckay.

I haven't quite got the CMake configured so the Java tests for the ep
registration only run when the ONNX Runtime shared provider support is
built, but everything else works. I expect that to be a quick fix, but
I'm not sure in what conditions it should be built and how we should
handle it so I don't know where/when to plumb it through.

### Motivation and Context
API parity for Java.

* Add error handling to extract_nuget_files.ps1 (microsoft#25866)

### Description
1. Check process exit code when running 7z.exe . Currently the errors
were silently ignored.
2. Add snld20 flag to the 7z.exe commands, which is needed to be
compatible with the latest 7z release.

* [Fix] illegal memory access in GetInputIndices with optional inputs (microsoft#25881)

### Description
Fix illegal memory access in GetInputIndices with optional inputs

### Motivation and Context
When an input is optional, its ValueInfo may be nullptr. 
The current implementation directly calls InputValueInfo->GetName(), leading to illegal memory access.

Update logic to skip optional inputs when valueInfo is nullptr .

* Re-enable cpuinfo for ARM64EC (microsoft#25863)

### Description
<!-- Describe your changes. -->

Re-enable cpuinfo for ARM64EC build and fix `CPUIDINFO_ARCH_ARM` so it
is actually used.

Patch cpuinfo to support vcpkg ARM64EC build. See
pytorch/cpuinfo#324.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix for workaround in microsoft#25831.

---------

Signed-off-by: Jonathan Clohessy <[email protected]>
Co-authored-by: derdeljan-msft <[email protected]>
Co-authored-by: Jonathan Clohessy <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: Christopher Warrington <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Ishwar Raut <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Co-authored-by: Xinpeng Dou <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: adrastogi <[email protected]>
Co-authored-by: Aditya Rastogi <[email protected]>
Co-authored-by: qti-hungjuiw <[email protected]>
Co-authored-by: qti-yuduo <[email protected]>
Co-authored-by: Pradeep Sakhamoori <[email protected]>
Co-authored-by: Preetha Veeramalai <[email protected]>
Co-authored-by: Ankit Maheshkar <[email protected]>
Co-authored-by: Ryan Metcalfe <[email protected]>
Co-authored-by: Klimenko, Mikhail <[email protected]>
Co-authored-by: sfatimar <[email protected]>
Co-authored-by: Garth Long <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
Co-authored-by: Eric Crawford <[email protected]>
Co-authored-by: jatinwadhwa921 <[email protected]>
Co-authored-by: Vishnudas Thaniel S <[email protected]>
Co-authored-by: Javier Martinez <[email protected]>
Co-authored-by: Adam Pocock <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: mingyue <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
…pi_ (microsoft#25741)

### Description

Delay the call to `OrtGetApiBase()` until the first call to
`Ort::GetApi()` so that `OrtGetApiBase()` is typically called after
dynamic library loading.

### Motivation and Context

When ORT_API_MANUAL_INIT is not defined (which is the default), the
static `Ort::Global<void>::api_` has a dynamic initializer that calls
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization
can cause problems when it interacts with other global/static
initialization. On Windows in particular, it can also cause deadlocks
when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to
load any other libraries.

* Replace the templated `Global<void>::api_` with an inline static
initialized to nullptr.
* `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls
`detail::Global::DefaultInit()` if initialization is needed.
* When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns
nullptr, which will eventually cause the program to crash. The callers
have violated the initialization contract by not calling one of the
`Ort::InitApi` overloads.
* When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a
function-level static to compute the result of
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it.
* `Ort::Global<void>` has been replaced with a non-templated type and
moved inside a `detail` namespace. Since the `Global<void>` object was
documented as being used internally, it is believed that these changes
here are non-breaking, as they do not impact a public API. The public
APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain
unchanged.
* Add `#pragma detect_mismatch` to surface issues with compilation units
that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.)

---------

Co-authored-by: Copilot <[email protected]>
Jaswanth51 pushed a commit to intel/onnxruntime that referenced this pull request Sep 3, 2025
…pi_ (microsoft#25741)

### Description

Delay the call to `OrtGetApiBase()` until the first call to
`Ort::GetApi()` so that `OrtGetApiBase()` is typically called after
dynamic library loading.

### Motivation and Context

When ORT_API_MANUAL_INIT is not defined (which is the default), the
static `Ort::Global<void>::api_` has a dynamic initializer that calls
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization
can cause problems when it interacts with other global/static
initialization. On Windows in particular, it can also cause deadlocks
when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to
load any other libraries.

* Replace the templated `Global<void>::api_` with an inline static
initialized to nullptr.
* `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls
`detail::Global::DefaultInit()` if initialization is needed.
* When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns
nullptr, which will eventually cause the program to crash. The callers
have violated the initialization contract by not calling one of the
`Ort::InitApi` overloads.
* When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a
function-level static to compute the result of
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it.
* `Ort::Global<void>` has been replaced with a non-templated type and
moved inside a `detail` namespace. Since the `Global<void>` object was
documented as being used internally, it is believed that these changes
here are non-breaking, as they do not impact a public API. The public
APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain
unchanged.
* Add `#pragma detect_mismatch` to surface issues with compilation units
that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.)

---------

Co-authored-by: Copilot <[email protected]>
@chwarr chwarr deleted the no-global-dynamic-init branch October 8, 2025 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants