Skip to content

Conversation

@yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Mar 10, 2025

Description

This PR converts TensorProto graph initializers to TensorProto/OrtValue pairs.
Currently, we only split the output for some optimizers to the above pairs.
Eventually, we should be able to convert all initializers to OrtValues on load.
Small weights will continue to be an exception as the are sometimes required by ONNX inference functions.
Some graph API leaks to EPs so we are not able to remove it at present, and this constrains our ability to convert everything at once.

Motivation and Context

Lay Gound for proper layers separation. Eventually eliminate weights copies in the EPs.

// can not trace string tensor
ORT_ENFORCE(entry->second->data_type() != ONNX_NAMESPACE::TensorProto_DataType_STRING, "Can not trace string tensor");
ORT_RETURN_IF_ERROR(planner.Trace(entry->first, entry->second));
const auto [_, tensor_proto] = *entry;

Check notice

Code scanning / CodeQL

Unused local variable

Variable _ is not used.

Copilot Autofix

AI 10 months ago

To fix the problem, we need to remove the unused variable _ from the code. This can be done by modifying the line where _ is declared and removing it. The best way to fix this without changing existing functionality is to directly destructure the entry variable to only extract the tensor_proto part, which is actually used.

Suggested changeset 1
onnxruntime/core/framework/session_state_utils.cc

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/onnxruntime/core/framework/session_state_utils.cc b/onnxruntime/core/framework/session_state_utils.cc
--- a/onnxruntime/core/framework/session_state_utils.cc
+++ b/onnxruntime/core/framework/session_state_utils.cc
@@ -269,3 +269,3 @@
                 "OrtValue index: ", ort_value_index, " from initializer_allocation_order not found among initialized tensors");
-    const auto [_, tensor_proto] = *entry;
+    const auto& tensor_proto = entry->second;
 
EOF
@@ -269,3 +269,3 @@
"OrtValue index: ", ort_value_index, " from initializer_allocation_order not found among initialized tensors");
const auto [_, tensor_proto] = *entry;
const auto& tensor_proto = entry->second;

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@yuslepukhin yuslepukhin force-pushed the yuslepukhin/ort_initializers branch 8 times, most recently from 650c07f to 2c4f38f Compare March 18, 2025 02:26
@yuslepukhin yuslepukhin force-pushed the yuslepukhin/ort_initializers branch 3 times, most recently from dfcbe93 to ac5bed7 Compare March 21, 2025 01:28
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@yuslepukhin
Copy link
Member Author

Cc: @ranjitshs pls, take a look and perhaps build as well. Thx!

@yuslepukhin yuslepukhin changed the title [DRAFT do not review] Convert graph initializers into OrtValue [DRAFT] Convert graph initializers into OrtValue Apr 28, 2025
@yuslepukhin yuslepukhin changed the title [DRAFT] Convert graph initializers into OrtValue [DRAFT] Convert graph initializers into OrtValue Phase I Apr 28, 2025
@yuslepukhin yuslepukhin requested a review from skottmckay April 28, 2025 20:44
@ranjitshs
Copy link
Contributor

Cc: @ranjitshs pls, take a look and perhaps build as well. Thx!

Yes. I will try to build and let you know in 1-2 days.

@yuslepukhin yuslepukhin marked this pull request as ready for review May 5, 2025 23:04
@yuslepukhin yuslepukhin marked this pull request as draft May 5, 2025 23:05
@yuslepukhin yuslepukhin force-pushed the yuslepukhin/ort_initializers branch from 0610388 to 0bc68a1 Compare May 7, 2025 01:05
@yuslepukhin yuslepukhin marked this pull request as ready for review May 7, 2025 01:05
@yuslepukhin yuslepukhin changed the title [DRAFT] Convert graph initializers into OrtValue Phase I Convert graph initializers into OrtValue Phase I May 7, 2025
@ranjitshs
Copy link
Contributor

ranjitshs commented May 12, 2025

Hi @yuslepukhin
I have tried out this branch on AIX and currently I see failure in test suites.
As CI for linux is passing, I will debug on these AIX failures and let you know.

The following tests FAILED:
          1 - onnxruntime_test_all (SEGFAULT)
          4 - onnxruntime_shared_lib_test (SEGFAULT)

@yuslepukhin
Copy link
Member Author

Hi @yuslepukhin I have tried out this branch on AIX and currently I see failure in test suites. As CI for linux is passing, I will debug on these AIX failures and let you know.

The following tests FAILED:
          1 - onnxruntime_test_all (SEGFAULT)
          4 - onnxruntime_shared_lib_test (SEGFAULT)

Which commit did you try?

@ranjitshs
Copy link
Contributor

@yuslepukhin
I used latest commit.

bash-5.2$ git status
On branch yuslepukhin/ort_initializers
Your branch is up to date with 'origin/yuslepukhin/ort_initializers'.

nothing to commit, working tree clean
bash-5.2$ git log | head -10
commit 22eade88e9f4f789cf7dcf6daa1124c3843eaaef
Author: Dmitri Smirnov <[email protected]>
Date:   Thu May 8 14:12:16 2025 -0700

    Fix improper iterator usage. 

@ranjitshs
Copy link
Contributor

@yuslepukhin

Below are some stack trace for failures:

  1. ./onnxruntime_test_all "--gtest_filter=EmbedLayerNormTest*"
    This is crashed in tensor_proto.set_name(data.def.Name()); : onnxruntime/test/providers/base_tester.cc
(gdb) bt
#0  0x09000000002ede6c in malloc_y () from /usr/lib/libc.a(_shr_64.o)
#1  0x090000000024bed8 in malloc_common@AF119_102 () from /usr/lib/libc.a(_shr_64.o)
#2  0x09000000031ab064 in std::bad_alloc::what() const () from /usr/lib/libc++abi.a(libc++abi.so.1)
#3  0x00000001000e77dc in google::protobuf::internal::(anonymous namespace)::CreateString (value=...)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/arenastring.cc:102
#4  google::protobuf::internal::ArenaStringPtr::Set (
    this=0x18 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)+24>, value=..., arena=0x291b4)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/arenastring.cc:125
#5  0x000000010034ce14 in onnx::TensorProto::set_name<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&> (this=<optimized out>, 
    arg0=...) at _deps/onnx-build/onnx/onnx-ml.pb.h:11582
#6  onnxruntime::test::BaseTester::AddInitializers (this=0xfffffffffffe890, graph=...) at /home/buildusr/onnxruntime/onnxruntime/test/providers/base_tester.cc:80
#7  0x000000010034b9fc in onnxruntime::test::OpTester::BuildModel (this=0xfffffffffffe890, extra_domain_to_version=..., model_options=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:64
#8  0x000000010034aef4 in onnxruntime::test::OpTester::CreateModelToTest (this=0xfffffffffffe890, model_options=..., 
    model=@0x111b39090: 0x11001b9f8 <vtable for onnxruntime::Graph+16>) at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:72
  1. ./onnxruntime_test_all "--gtest_filter=GatherOpTest*"

This is crashed in onnxruntime::Graph::AddInitializedTensor .

(gdb) bt
#0  0x09000000002edef0 in malloc_y () from /usr/lib/libc.a(_shr_64.o)
#1  0x090000000024bed8 in malloc_common@AF119_102 () from /usr/lib/libc.a(_shr_64.o)
#2  0x09000000031ab064 in std::bad_alloc::what() const () from /usr/lib/libc++abi.a(libc++abi.so.1)
#3  0x00000001000ec414 in google::protobuf::internal::RepeatedPtrFieldBase::InternalExtend (
    this=0x10 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)+16>, extend_amount=<optimized out>)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/repeated_ptr_field.cc:69
#4  0x00000001000eec04 in google::protobuf::internal::RepeatedPtrFieldBase::AddOutOfLineHelper (this=warning: (Internal error: pc 0x0 in read in CU, but not in symtab.)
 
    0x0 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)>, obj=0x111b39850)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/repeated_ptr_field.cc:116
#5  0x000000010021c66c in google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<onnx::TensorProto>::TypeHandler> (this=0x111b38d80, 
    prototype=warning: (Internal error: pc 0x0 in read in CU, but not in symtab.)
0x0 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)>)
    at _deps/protobuf-src/src/google/protobuf/repeated_ptr_field.h:218
#6  google::protobuf::RepeatedPtrField<onnx::TensorProto>::Add (this=0x111b38d80) at _deps/protobuf-src/src/google/protobuf/repeated_ptr_field.h:1274
#7  onnx::GraphProto::_internal_add_initializer (this=<optimized out>) at _deps/onnx-build/onnx/onnx-ml.pb.h:10796
#8  onnx::GraphProto::add_initializer (this=<optimized out>) at _deps/onnx-build/onnx/onnx-ml.pb.h:10799
#9  onnxruntime::Graph::AddInitializedTensor (this=0x111b38ed0, tensor=...) at /home/buildusr/onnxruntime/onnxruntime/core/graph/graph.cc:3438
#10 0x000000010034d52c in onnxruntime::test::BaseTester::AddInitializers (this=<optimized out>, graph=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/base_tester.cc:89
#11 0x000000010034bd7c in onnxruntime::test::OpTester::BuildModel (this=0xfffffffffffe6f8, extra_domain_to_version=..., model_options=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:64

Let me know if you find any useful info from this . I will try to understand changes in this PR. :)

@yuslepukhin yuslepukhin requested a review from fs-eire June 10, 2025 00:13
@yuslepukhin yuslepukhin merged commit 11f0a0a into main Jun 11, 2025
90 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/ort_initializers branch June 11, 2025 23:32
@ranjitshs
Copy link
Contributor

@yuslepukhin
I was busy with other activity.
I see this is merged to main and in AIX, local CI is reporting below failures .
Yet to check python test suites . Will keep you posted..
Also , I will create a defect to track the AIX fix.

BTW, any idea when we are going for next release ?

1: [----------] Global test environment tear-down
1: [==========] 4807 tests from 314 test suites ran. (97696 ms total)
1: [  PASSED  ] 4779 tests.
1: [  SKIPPED ] 2 tests, listed below:
1: [  SKIPPED ] MatMulFpQ4.MatMul2DSym
1: [  SKIPPED ] MatMulFpQ4.MatMul2DBlkZp
1: [  FAILED  ] 26 tests, listed below:
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Float16InputScaleOutput_Initializers
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Broadcast_Fp16
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Float16InputScaleBiasOutput
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Float16InputScaleBiasOutput_Initializers
1: [  FAILED  ] OptimizerInitializerTest.RawData
1: [  FAILED  ] QDQTransformerTests.Clip
1: [  FAILED  ] DequantizeLinearOpTest.DequantizeLinear_per_tensor_float_int16_cpu
1: [  FAILED  ] DequantizeLinearOpTest.DequantizeLinear_per_tensor_float_uint16_cpu
1: [  FAILED  ] DequantizeLinearOpTest.Int16
1: [  FAILED  ] DequantizeLinearOpTest.Uint16
1: [  FAILED  ] QuantizeLinearOpTest.Uint16
1: [  FAILED  ] QuantizeLinearOpTest.Int16
1: [  FAILED  ] MathOpTest.Clip_MLFloat16
1: [  FAILED  ] GraphTransformationTests.ReluClip11Fusion
1: [  FAILED  ] GraphTransformationTests.QuickGelu
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_ShareFloatOrHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_Share2DFloatOrHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_ShareFloatAndHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_Share2DFloatAndHalfTypedInitializer
1: [  FAILED  ] ConvertRawDataInTensorProtoTest.FloatData
1: [  FAILED  ] ConvertRawDataInTensorProtoTest.Int32Data
1: [  FAILED  ] FlatbufferUtilsTest.ExternalWriteReadWithLoadInitializers
1: [  FAILED  ] QuantizeLinearContribOpTest.QuantizeLinear_per_tensor_float_uint16
1: [  FAILED  ] QuantizeLinearContribOpTest.QuantizeLinear_per_tensor_float_int16
1: [  FAILED  ] SparseTensorConversionTests.TestConstantNodeConversion
1: [  FAILED  ] InternalTestingEP.TestSaveAndLoadOrtModel

adrianlizarraga added a commit that referenced this pull request Jun 26, 2025
#25159)

### Description
Updates the `OrtGraph` implementation to take advantage of the work done
in PR #23979, which sets
the infrastructure to store initializers as `OrtValue` instances in the
`onnxruntime::Graph`.

There still needs to be second part to the [aforementioned
PR](#23979) to ensure that
all initializers are stored as `OrtValue`s in the Graph.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
adrianlizarraga pushed a commit that referenced this pull request Jul 23, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for #23979
wcy123 pushed a commit to wcy123/onnxruntime that referenced this pull request Aug 1, 2025
carzh pushed a commit that referenced this pull request Aug 7, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for #23979
adrianlizarraga pushed a commit that referenced this pull request Aug 8, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
adrianlizarraga pushed a commit that referenced this pull request Aug 9, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for microsoft#23979
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
yuslepukhin added a commit that referenced this pull request Oct 30, 2025
…lues early (#26345)

### Description
Converts weights early and revert "Properly remove in-memory references
(#25652)"
This reverts commit 3ca49d8 and makes
appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263)
allows us to convert initializers to OrtValues early and save lots of
memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks
like this:

**Before**
<img width="1204" height="124" alt="Before change DEBUG 2025-10-16
144819"
src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6"
/>

**After**

<img width="997" height="114" alt="After change DEBUG 2025-10-16 144819"
src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f"
/>

The two peaks represent memory usage at optimization time (8.1Gb before)
and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb
respectively.
Most of the savings during optimization phase come from
`ConstantFolding` where we are able to reuse the resulting OrtValues
directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb
respectively. We are saving almost 6Gb during optimization and 2Gb for
the steady state.
 
 
<img width="1175" height="139" alt="image"
src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04"
/>

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we
duplicate memory for higher peak before Resolve takes care of no longer
used initializers.
<img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding
Transpose Optimizer"
src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40"
/>

<img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from
ConstantFolding"
src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e"
/>

<img width="325" height="160" alt="image"
src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc"
/>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage.
quic-ankus pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Nov 25, 2025
### Description
This PR converts TensorProto graph initializers to TensorProto/OrtValue
pairs.
Currently, we only split the output for some optimizers to the above
pairs.
Eventually, we should be able to convert all initializers to OrtValues
on load.
Small weights will continue to be an exception as they are sometimes
required by ONNX inference functions.
Some graph API leaks to EPs so we are not able to remove it at present,
and this constrains our ability to convert everything at once.

### Motivation and Context
Lay Gound for proper layers separation. Eventually eliminate weights
copies in the EPs.
quic-ankus pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Nov 25, 2025
microsoft#25159)

### Description
Updates the `OrtGraph` implementation to take advantage of the work done
in PR microsoft#23979, which sets
the infrastructure to store initializers as `OrtValue` instances in the
`onnxruntime::Graph`.

There still needs to be second part to the [aforementioned
PR](microsoft#23979) to ensure that
all initializers are stored as `OrtValue`s in the Graph.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants