Convert graph initializers into OrtValue Phase I #23979

yuslepukhin · 2025-03-10T22:19:41Z

Description

This PR converts TensorProto graph initializers to TensorProto/OrtValue pairs.
Currently, we only split the output for some optimizers to the above pairs.
Eventually, we should be able to convert all initializers to OrtValues on load.
Small weights will continue to be an exception as the are sometimes required by ONNX inference functions.
Some graph API leaks to EPs so we are not able to remove it at present, and this constrains our ability to convert everything at once.

Motivation and Context

Lay Gound for proper layers separation. Eventually eliminate weights copies in the EPs.

onnxruntime/core/framework/session_state_utils.cc

-      // can not trace string tensor
-      ORT_ENFORCE(entry->second->data_type() != ONNX_NAMESPACE::TensorProto_DataType_STRING, "Can not trace string tensor");
-      ORT_RETURN_IF_ERROR(planner.Trace(entry->first, entry->second));
+    const auto [_, tensor_proto] = *entry;


To fix the problem, we need to remove the unused variable _ from the code. This can be done by modifying the line where _ is declared and removing it. The best way to fix this without changing existing functionality is to directly destructure the entry variable to only extract the tensor_proto part, which is actually used.

onnxruntime/core/graph/graph_utils.cc

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/framework/allocator.cc

yuslepukhin · 2025-04-28T20:39:02Z

Cc: @ranjitshs pls, take a look and perhaps build as well. Thx!

ranjitshs · 2025-04-30T11:58:47Z

Cc: @ranjitshs pls, take a look and perhaps build as well. Thx!

Yes. I will try to build and let you know in 1-2 days.

onnxruntime/core/framework/endian_utils.cc

onnxruntime/core/providers/shared_library/provider_wrappedtypes.h

ranjitshs · 2025-05-12T13:41:59Z

Hi @yuslepukhin
I have tried out this branch on AIX and currently I see failure in test suites.
As CI for linux is passing, I will debug on these AIX failures and let you know.

The following tests FAILED:
          1 - onnxruntime_test_all (SEGFAULT)
          4 - onnxruntime_shared_lib_test (SEGFAULT)

yuslepukhin · 2025-05-12T17:06:38Z

Hi @yuslepukhin I have tried out this branch on AIX and currently I see failure in test suites. As CI for linux is passing, I will debug on these AIX failures and let you know.
The following tests FAILED:
          1 - onnxruntime_test_all (SEGFAULT)
          4 - onnxruntime_shared_lib_test (SEGFAULT)

Which commit did you try?

ranjitshs · 2025-05-13T09:09:50Z

@yuslepukhin
I used latest commit.

bash-5.2$ git status
On branch yuslepukhin/ort_initializers
Your branch is up to date with 'origin/yuslepukhin/ort_initializers'.

nothing to commit, working tree clean
bash-5.2$ git log | head -10
commit 22eade88e9f4f789cf7dcf6daa1124c3843eaaef
Author: Dmitri Smirnov <[email protected]>
Date:   Thu May 8 14:12:16 2025 -0700

    Fix improper iterator usage.

ranjitshs · 2025-05-15T13:58:23Z

@yuslepukhin

Below are some stack trace for failures:

./onnxruntime_test_all "--gtest_filter=EmbedLayerNormTest*"
This is crashed in tensor_proto.set_name(data.def.Name()); : onnxruntime/test/providers/base_tester.cc

(gdb) bt
#0  0x09000000002ede6c in malloc_y () from /usr/lib/libc.a(_shr_64.o)
#1  0x090000000024bed8 in malloc_common@AF119_102 () from /usr/lib/libc.a(_shr_64.o)
#2  0x09000000031ab064 in std::bad_alloc::what() const () from /usr/lib/libc++abi.a(libc++abi.so.1)
#3  0x00000001000e77dc in google::protobuf::internal::(anonymous namespace)::CreateString (value=...)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/arenastring.cc:102
#4  google::protobuf::internal::ArenaStringPtr::Set (
    this=0x18 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)+24>, value=..., arena=0x291b4)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/arenastring.cc:125
#5  0x000000010034ce14 in onnx::TensorProto::set_name<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&> (this=<optimized out>, 
    arg0=...) at _deps/onnx-build/onnx/onnx-ml.pb.h:11582
#6  onnxruntime::test::BaseTester::AddInitializers (this=0xfffffffffffe890, graph=...) at /home/buildusr/onnxruntime/onnxruntime/test/providers/base_tester.cc:80
#7  0x000000010034b9fc in onnxruntime::test::OpTester::BuildModel (this=0xfffffffffffe890, extra_domain_to_version=..., model_options=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:64
#8  0x000000010034aef4 in onnxruntime::test::OpTester::CreateModelToTest (this=0xfffffffffffe890, model_options=..., 
    model=@0x111b39090: 0x11001b9f8 <vtable for onnxruntime::Graph+16>) at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:72

./onnxruntime_test_all "--gtest_filter=GatherOpTest*"

This is crashed in onnxruntime::Graph::AddInitializedTensor .

(gdb) bt
#0  0x09000000002edef0 in malloc_y () from /usr/lib/libc.a(_shr_64.o)
#1  0x090000000024bed8 in malloc_common@AF119_102 () from /usr/lib/libc.a(_shr_64.o)
#2  0x09000000031ab064 in std::bad_alloc::what() const () from /usr/lib/libc++abi.a(libc++abi.so.1)
#3  0x00000001000ec414 in google::protobuf::internal::RepeatedPtrFieldBase::InternalExtend (
    this=0x10 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)+16>, extend_amount=<optimized out>)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/repeated_ptr_field.cc:69
#4  0x00000001000eec04 in google::protobuf::internal::RepeatedPtrFieldBase::AddOutOfLineHelper (this=warning: (Internal error: pc 0x0 in read in CU, but not in symtab.)
 
    0x0 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)>, obj=0x111b39850)
    at /home/buildusr/onnxruntime/build/Linux/RelWithDebInfo/_deps/protobuf-src/src/google/protobuf/repeated_ptr_field.cc:116
#5  0x000000010021c66c in google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<onnx::TensorProto>::TypeHandler> (this=0x111b38d80, 
    prototype=warning: (Internal error: pc 0x0 in read in CU, but not in symtab.)
0x0 <std::__1::__call_once_proxy[abi:v15007]<std::__1::tuple<onnxruntime::test::DnnlHasBF16Support()::$_0&&> >(void*)>)
    at _deps/protobuf-src/src/google/protobuf/repeated_ptr_field.h:218
#6  google::protobuf::RepeatedPtrField<onnx::TensorProto>::Add (this=0x111b38d80) at _deps/protobuf-src/src/google/protobuf/repeated_ptr_field.h:1274
#7  onnx::GraphProto::_internal_add_initializer (this=<optimized out>) at _deps/onnx-build/onnx/onnx-ml.pb.h:10796
#8  onnx::GraphProto::add_initializer (this=<optimized out>) at _deps/onnx-build/onnx/onnx-ml.pb.h:10799
#9  onnxruntime::Graph::AddInitializedTensor (this=0x111b38ed0, tensor=...) at /home/buildusr/onnxruntime/onnxruntime/core/graph/graph.cc:3438
#10 0x000000010034d52c in onnxruntime::test::BaseTester::AddInitializers (this=<optimized out>, graph=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/base_tester.cc:89
#11 0x000000010034bd7c in onnxruntime::test::OpTester::BuildModel (this=0xfffffffffffe6f8, extra_domain_to_version=..., model_options=...)
    at /home/buildusr/onnxruntime/onnxruntime/test/providers/op_tester.cc:64

Let me know if you find any useful info from this . I will try to understand changes in this PR. :)

include/onnxruntime/core/framework/allocator.h

ranjitshs · 2025-06-17T13:22:32Z

@yuslepukhin
I was busy with other activity.
I see this is merged to main and in AIX, local CI is reporting below failures .
Yet to check python test suites . Will keep you posted..
Also , I will create a defect to track the AIX fix.

BTW, any idea when we are going for next release ?

1: [----------] Global test environment tear-down
1: [==========] 4807 tests from 314 test suites ran. (97696 ms total)
1: [  PASSED  ] 4779 tests.
1: [  SKIPPED ] 2 tests, listed below:
1: [  SKIPPED ] MatMulFpQ4.MatMul2DSym
1: [  SKIPPED ] MatMulFpQ4.MatMul2DBlkZp
1: [  FAILED  ] 26 tests, listed below:
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Float16InputScaleOutput_Initializers
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Broadcast_Fp16
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Float16InputScaleBiasOutput
1: [  FAILED  ] LayerNormTest.LayerNorm_Scale_Bias_Float16InputScaleBiasOutput_Initializers
1: [  FAILED  ] OptimizerInitializerTest.RawData
1: [  FAILED  ] QDQTransformerTests.Clip
1: [  FAILED  ] DequantizeLinearOpTest.DequantizeLinear_per_tensor_float_int16_cpu
1: [  FAILED  ] DequantizeLinearOpTest.DequantizeLinear_per_tensor_float_uint16_cpu
1: [  FAILED  ] DequantizeLinearOpTest.Int16
1: [  FAILED  ] DequantizeLinearOpTest.Uint16
1: [  FAILED  ] QuantizeLinearOpTest.Uint16
1: [  FAILED  ] QuantizeLinearOpTest.Int16
1: [  FAILED  ] MathOpTest.Clip_MLFloat16
1: [  FAILED  ] GraphTransformationTests.ReluClip11Fusion
1: [  FAILED  ] GraphTransformationTests.QuickGelu
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_ShareFloatOrHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_Share2DFloatOrHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_ShareFloatAndHalfTypedInitializer
1: [  FAILED  ] GraphTransformationTests.ConstantSharing_Share2DFloatAndHalfTypedInitializer
1: [  FAILED  ] ConvertRawDataInTensorProtoTest.FloatData
1: [  FAILED  ] ConvertRawDataInTensorProtoTest.Int32Data
1: [  FAILED  ] FlatbufferUtilsTest.ExternalWriteReadWithLoadInitializers
1: [  FAILED  ] QuantizeLinearContribOpTest.QuantizeLinear_per_tensor_float_uint16
1: [  FAILED  ] QuantizeLinearContribOpTest.QuantizeLinear_per_tensor_float_int16
1: [  FAILED  ] SparseTensorConversionTests.TestConstantNodeConversion
1: [  FAILED  ] InternalTestingEP.TestSaveAndLoadOrtModel

#25159) ### Description Updates the `OrtGraph` implementation to take advantage of the work done in PR #23979, which sets the infrastructure to store initializers as `OrtValue` instances in the `onnxruntime::Graph`. There still needs to be second part to the [aforementioned PR](#23979) to ensure that all initializers are stored as `OrtValue`s in the Graph. ### Motivation and Context

### Description Make protobuf weights refer to OrtValues on load. Create OrtValues for initializers that are loaded from ORT format for uniformity. Create OrtValues for ORT format initializers. Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data. Make CoreML process external data including in memory references so it can copy it. ### Motivation and Context Follow up for #23979

it is related to microsoft#25320 microsoft#23979

### Description Make protobuf weights refer to OrtValues on load. Create OrtValues for initializers that are loaded from ORT format for uniformity. Create OrtValues for ORT format initializers. Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data. Make CoreML process external data including in memory references so it can copy it. ### Motivation and Context Follow up for #23979

### Description It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>

### Description Make protobuf weights refer to OrtValues on load. Create OrtValues for initializers that are loaded from ORT format for uniformity. Create OrtValues for ORT format initializers. Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data. Make CoreML process external data including in memory references so it can copy it. ### Motivation and Context Follow up for microsoft#23979

### Description It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>

…lues early (#26345) ### Description Converts weights early and revert "Properly remove in-memory references (#25652)" This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code. This PR is made possible and on the heels of: #26263 #25833. Previous history: #23979 #25320 #25626 #25652 The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time. Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this: **Before** <img width="1204" height="124" alt="Before change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6" /> **After** <img width="997" height="114" alt="After change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f" /> The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb) After this change corresponding numbers look 3.5Gb and 4.7Gb respectively. Most of the savings during optimization phase come from `ConstantFolding` where we are able to reuse the resulting OrtValues directly for the new initializers. This PR concludes a series of PRs converting initializers to OrtValues. Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state. <img width="1175" height="139" alt="image" src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04" /> The model also loads about 12 seconds faster. Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers. <img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding Transpose Optimizer" src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40" /> <img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from ConstantFolding" src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e" /> <img width="325" height="160" alt="image" src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc" /> ### Motivation and Context  Reduce memory usage.

### Description This PR converts TensorProto graph initializers to TensorProto/OrtValue pairs. Currently, we only split the output for some optimizers to the above pairs. Eventually, we should be able to convert all initializers to OrtValues on load. Small weights will continue to be an exception as they are sometimes required by ONNX inference functions. Some graph API leaks to EPs so we are not able to remove it at present, and this constrains our ability to convert everything at once. ### Motivation and Context Lay Gound for proper layers separation. Eventually eliminate weights copies in the EPs.

microsoft#25159) ### Description Updates the `OrtGraph` implementation to take advantage of the work done in PR microsoft#23979, which sets the infrastructure to store initializers as `OrtValue` instances in the `onnxruntime::Graph`. There still needs to be second part to the [aforementioned PR](microsoft#23979) to ensure that all initializers are stored as `OrtValue`s in the Graph. ### Motivation and Context

github-advanced-security bot found potential problems Mar 10, 2025

View reviewed changes

yuslepukhin force-pushed the yuslepukhin/ort_initializers branch 8 times, most recently from 650c07f to 2c4f38f Compare March 18, 2025 02:26

yuslepukhin force-pushed the yuslepukhin/ort_initializers branch 3 times, most recently from dfcbe93 to ac5bed7 Compare March 21, 2025 01:28

github-actions bot reviewed Apr 28, 2025

View reviewed changes

onnxruntime/core/framework/allocator.cc Show resolved Hide resolved

yuslepukhin changed the title ~~[DRAFT do not review] Convert graph initializers into OrtValue~~ [DRAFT] Convert graph initializers into OrtValue Apr 28, 2025

yuslepukhin requested a review from adrianlizarraga April 28, 2025 20:43

yuslepukhin changed the title ~~[DRAFT] Convert graph initializers into OrtValue~~ [DRAFT] Convert graph initializers into OrtValue Phase I Apr 28, 2025

yuslepukhin requested a review from skottmckay April 28, 2025 20:44

yuslepukhin marked this pull request as ready for review May 5, 2025 23:04

yuslepukhin marked this pull request as draft May 5, 2025 23:05

yuslepukhin commented May 5, 2025

View reviewed changes

onnxruntime/core/framework/endian_utils.cc Outdated Show resolved Hide resolved

yuslepukhin commented May 6, 2025

View reviewed changes

onnxruntime/core/providers/shared_library/provider_wrappedtypes.h Outdated Show resolved Hide resolved

yuslepukhin force-pushed the yuslepukhin/ort_initializers branch from 0610388 to 0bc68a1 Compare May 7, 2025 01:05

yuslepukhin marked this pull request as ready for review May 7, 2025 01:05

yuslepukhin changed the title ~~[DRAFT] Convert graph initializers into OrtValue Phase I~~ Convert graph initializers into OrtValue Phase I May 7, 2025

fs-eire mentioned this pull request Jun 9, 2025

Fix in-memory initializer handling for non-CPU device #24978

Closed

yuslepukhin added 2 commits June 9, 2025 17:09

Address review comments

7bb91ff

Merge branch 'main' into yuslepukhin/ort_initializers

d66dc6d

yuslepukhin requested a review from fs-eire June 10, 2025 00:13

snnn reviewed Jun 10, 2025

View reviewed changes

include/onnxruntime/core/framework/allocator.h Show resolved Hide resolved

fs-eire mentioned this pull request Jun 11, 2025

Native WebGPU EP fails to run model with in-memory external data #24768

Closed

skottmckay approved these changes Jun 11, 2025

View reviewed changes

yuslepukhin merged commit 11f0a0a into main Jun 11, 2025
90 checks passed

yuslepukhin deleted the yuslepukhin/ort_initializers branch June 11, 2025 23:32

Honry mentioned this pull request Jun 16, 2025

[WebNN EP] Fail to run some models with in-memory external data #25078

Closed

adrianlizarraga mentioned this pull request Jun 24, 2025

[EP ABI] Update OrtGraph to use new OrtValues stored in internal Graph #25159

Merged

yuslepukhin mentioned this pull request Jul 8, 2025

Convert Initializers to OrtValues Phase 2 #25320

Merged

wcy123 pushed a commit to wcy123/onnxruntime that referenced this pull request Aug 1, 2025

[VitisAI] bugfix model_clone optimization

1109d03

it is related to microsoft#25320 microsoft#23979

This was referenced Aug 1, 2025

[VitisAI] bugfix model_clone optimization #25629

Merged

Bugfix vitisai ep model clone with 23979 25320 #25654

Closed

yifei410 mentioned this pull request Aug 8, 2025

[VitisAI] bugfix model_clone optimization #25707

Closed

ranjitshs mentioned this pull request Aug 19, 2025

[AIX] Test failures fixes #25790

Closed

yuslepukhin mentioned this pull request Oct 17, 2025

Save much memory at model loading time by converting weights to OrtValues early #26345

Merged

@@ -269,3 +269,3 @@
                             "OrtValue index: ", ort_value_index, " from initializer_allocation_order not found among initialized tensors");
-                const auto [_, tensor_proto] = *entry;
+                const auto& tensor_proto = entry->second;

Convert graph initializers into OrtValue Phase I #23979

Convert graph initializers into OrtValue Phase I #23979

Uh oh!

Conversation

yuslepukhin commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Check notice

Copilot Autofix

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuslepukhin commented Apr 28, 2025

Uh oh!

ranjitshs commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

ranjitshs commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented May 12, 2025

Uh oh!

ranjitshs commented May 13, 2025

Uh oh!

ranjitshs commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

ranjitshs commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yuslepukhin commented Mar 10, 2025 •

edited

Loading

ranjitshs commented May 12, 2025 •

edited

Loading