Skip to content

Conversation

@adrianlizarraga
Copy link
Contributor

Description

Adds the following commits to the rel-1.23.1 branch for ORT 1.23.1:

xieofxie and others added 10 commits September 26, 2025 14:11
…25590)

### Description
<!-- Describe your changes. -->

use session id to track them with LogSessionCreation

if we call Run in different threads, we could differentiate them with
thread id given Run is not async

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: hualxie <[email protected]>
### Description

fix WebAssembly build on macOS/arm64 by disable appending
"-Donnxruntime_USE_KLEIDIAI=ON" to the cmake_args

KleidiAI should not be enabled for WebAssembly build.
CPU MoE Kernel
```
name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 16, max_diff: 2.682209014892578e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 32, max_diff: 2.980232238769531e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 16, max_diff: 2.980232238769531e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 32, max_diff: 4.172325134277344e-07
.MoE CPU kernel time: 15.721677541732786 ms
.
----------------------------------------------------------------------
Ran 5 tests in 30.217s
```
This PR adds block-wise quant kernel for QMoE CPU
This pull request adds new APIs and updates existing ones to improve
memory and device information handling in the ONNX Runtime C# bindings.
The most significant changes introduce methods for fetching memory info
and device info for session inputs/outputs, and add support for shared
allocators and synchronization streams. There are also several updates
and renamings for LoraAdapter delegates and related APIs.

### Memory and Device Info APIs

* Added `GetMemoryInfosForInputs`, `GetMemoryInfosForOutputs`, and
`GetEpDeviceForInputs` methods to `InferenceSession.shared.cs` to fetch
memory info and device info for session inputs/outputs. These methods
utilize new native delegates for retrieving memory and device
information.
* Introduced native delegates in `NativeMethods.shared.cs` for
`OrtSessionGetMemoryInfoForInputs`, `OrtSessionGetMemoryInfoForOutputs`,
and `OrtSessionGetEpDeviceForInputs`, and wired them up in the static
constructor.
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R530-R532)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1312-R1335)

### Shared Allocator and Synchronization Stream Support

* Added delegates and static fields for creating, getting, and releasing
shared allocators, as well as for creating and managing synchronization
streams (`OrtCreateSharedAllocator`, `OrtGetSharedAllocator`,
`OrtReleaseSharedAllocator`, `OrtCreateSyncStreamForEpDevice`,
`OrtSyncStream_GetHandle`, `OrtReleaseSyncStream`).
* Added delegate for copying tensors (`OrtCopyTensors`).

### LoraAdapter API Updates

* Renamed LoraAdapter-related delegates to use the `Ort` prefix
(`OrtCreateLoraAdapter`, `OrtCreateLoraAdapterFromArray`,
`OrtReleaseLoraAdapter`) and updated their usage throughout the
codebase.
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L699-R710)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1561-R1672)
[[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1578-R1695)

### MemoryInfo Enhancements

* Added new delegates for creating memory info with more parameters
(`OrtCreateMemoryInfoV2`), and for querying device memory type and
vendor ID (`OrtMemoryInfoGetDeviceMemType`, `OrtMemoryInfoGetVendorId`).
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R594-R596)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1804-R1817)
[[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1866-R1877)

### Minor API Documentation Update

* Clarified the lifetime of allocators in the documentation, noting they
can be explicitly unregistered.### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- Regenerates the `input_propagate_to_output.onnx` model used in [this
unit
test](https://github.com/microsoft/onnxruntime/blob/35dcab5088118117acc6086c9b6dd6dd92c7060f/onnxruntime/test/shared_lib/test_inference.cc#L497-L506)
so that it uses an ONNX IR version compatible with ONNX 1.18.0 (i.e., IR
version < 12).
- Adds script `input_propagate_to_output.py` that can be used to
regenerate the `input_propagate_to_output.onnx` model.
- Embed missing weight values that are needed to run the existing
`test_dangling_input_segment_ids.py` script.



### Motivation and Context
The main branch is using ONNX 1.19. However, this unit test also needs
to pass in the `rel-1.23.1` branch, which is still using ONNX 1.18.0.
So, by downgrading the model's IR version, the unit test can run in both
branches.

See original PR that added the test models:
#26021
…n assigned (#26156)

### Description
Fixes segfault in `PluginExecutionProvider::GetCapability()` when the
underlying `OrtEp` tries to claim nodes that have already been assigned
to another EP.


### Motivation and Context
Should log a warning (instead of crashing or throwing an exception) when
a plugin EP tries to claim a node that is already assigned to another
EP.

---------

Co-authored-by: Edward Chen <[email protected]>
### Description
Add a new EP Dynamic option to set HTP performance mode after session creation.

---------

Co-authored-by: quic-ashwshan <[email protected]>
@snnn snnn merged commit d9b2048 into rel-1.23.1 Sep 27, 2025
74 of 75 checks passed
@snnn snnn deleted the adrianl/rel-1.23.1-cherrypick-2 branch September 27, 2025 03:28
TedThemistokleous added a commit to ROCm/onnxruntime that referenced this pull request Oct 17, 2025
* ORT 1.23.1 cherrypick 1 [REDO] (microsoft#26140)

### Description
Cherry-pick the following PRs into the ORT 1.23.1 branch:

- Fix Attention GQA implementation on CPU
- **MANUAL MERGE**: see
microsoft#26057
  - main merge date: Sept 15, 11:33am
  - pr: microsoft#25966
  - commit: d530b29
- Address edge GetMemInfo edge cases
  - main merge date: Sept 16, 10:32am
  - pr: microsoft#26021
  - commit: d251f3a
- Implement new Python APIs
  - main merge date: Sept 17, 11:44am
  - pr: microsoft#25999
  - commit: abc63e8
- MemcpyFromHost and MemcpyToHost support for plugin EPs
- **MERGE CONFLICT** on file
onnxruntime/test/optimizer/transpose_optimizer_test.cc. Conflicts with
microsoft#25689
  - main merge date: Sept 23, 10:42am
  - pr: microsoft#26088
  - commit: 4545732
- [TRT RTX EP] Fix bug for generating the correct subgraph in
GetCapability microsoft#26132
  - main merge date: Sept 23, 8:54pm
  - pr: microsoft#26132
  - commit: 72e56e7


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Chi Lo <[email protected]>

* ORT 1.23.1 cherrypick 2 (microsoft#26182)

### Description
Adds the following commits to the `rel-1.23.1` branch for ORT 1.23.1:


- add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart
  - main merge date: July 31, 1:05am
  - pr: microsoft#25590
  - commit: e753643
- [build] fix WebAssembly build on macOS/arm64
  - main merge date: Aug 5, 8:07am
  - pr: microsoft#25653
  - commit: 53f152b
- [CPU] MoE Kernel (microsoft#25958)
  - main merge date: Sept 10, 4:54pm
  - pr: microsoft#25958
  - commit: 930e640
- [CPU] Block-wise QMoE kernel for CPU
  - main merge date: Sept 15, 8:32am
  - pr: microsoft#26009
  - commit: 5d17734
- [C#] Implement missing APIs
  - main merge date: Sept 24, 10:50am
  - pr: microsoft#26101
  - commit: 35dcab5
- Regenerate test model with ONNX IR < 12
  - main merge date: Sept 24, 2:50pm
  - pr: microsoft#26149
  - commit: 88f2652
- [CPU] Fix compilation errors because of unused variables
  - main merge date: Sept 25, 1:21pm
  - pr: microsoft#26147
  - commit: 42fcd71
- [EP ABI] Check if nodes specified in GetCapability() have already been
assigned
  - main merge date: Sept 26, 1:24am
  - pr: microsoft#26156
  - commit: 67d3ba0
- [QNN EP] Add dynamic option to set HTP performance mode
  - main merge date: Sept 26, 11:55am
  - pr: microsoft#26135
  - commit: 6cc40fd

---------

Co-authored-by: xieofxie <[email protected]>
Co-authored-by: hualxie <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: quic-tirupath <[email protected]>
Co-authored-by: quic-ashwshan <[email protected]>

---------

Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: xieofxie <[email protected]>
Co-authored-by: hualxie <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: quic-tirupath <[email protected]>
Co-authored-by: quic-ashwshan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants