[DML EP] Add frequent upload heap flushing by PatriceVignola · Pull Request #15960 · microsoft/onnxruntime

PatriceVignola · 2023-05-16T07:06:50Z

This reduces peak nonlocal memory consumption when uploading large weights for big models (e.g. LLMs), while at the same time trying to keep the GPU as busy as possible. This change could be more sophisticated, but at this stage it is the most minimal and least risky change required to support LLMs.

jstoecker · 2023-05-16T07:17:40Z

onnxruntime/core/providers/dml/DmlExecutionProvider/src/ExecutionProvider.cpp

+    {
+        // Periodically flush uploads to make sure the GPU is not idle for too long
+        std::chrono::duration<double> elapsed = std::chrono::steady_clock::now() - m_lastUploadFlushTime;
+        auto elapsedMicroSeconds = elapsed.count() * 1e6;


Nitpick (just slightly cleaner):

Set interval with strong type:

static constexpr std::chrono::milliseconds m_batchFlushInterval = std::chrono::milliseconds(10);

Check if exceeded:

if (std::chrono::steady_clock::now() - m_lastUploadFlushTime > m_batchFlushInterval) { ... }

This reduces peak nonlocal memory consumption when uploading large weights for big models (e.g. LLMs), while at the same time trying to keep the GPU as busy as possible. This change could be more sophisticated, but at this stage it is the most minimal and least risky change required to support LLMs.

### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (#15819) [js/web] fix terser reserved symbols for worker (#15864) [JSEP] fix constructor for OrtDevice (#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798) [wasm] revert emsdk to v3.1.19 (#15793) [wasm/JSEP] add threaded build to artifacts (#15777) [js/web] add target ort.webgpu.min.js (#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688) fix: setting builder optimization level to TRT 8.6 default (#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921) Fix segfault for multiple GPU run (regression) (#15823) android package fix (#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993) Adding support for conv fp16 fusion on Resnet50v1 (#15474) update onnx release 1.14 for docker files (#15680) Avoid generating training documentation during packaging (#15795) Update Conv-Add-Relu Fusion Transformation (#15834) Fix symbolic shape infer empty value_info (#15842) NhwcFusedConv: Add before Activation (#15837) use __hmul2 instead of __hmul2_rn (#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (#15903) Fixing NhwcFusedConv fp16 (#15950) fix topo sort in quantization tool (#16003) [doc] add LeakyRelu to coreml supported ops (#15944) [DML EP] Add frequent upload heap flushing (#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola

### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (microsoft#15819) [js/web] fix terser reserved symbols for worker (microsoft#15864) [JSEP] fix constructor for OrtDevice (microsoft#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798) [wasm] revert emsdk to v3.1.19 (microsoft#15793) [wasm/JSEP] add threaded build to artifacts (microsoft#15777) [js/web] add target ort.webgpu.min.js (microsoft#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688) fix: setting builder optimization level to TRT 8.6 default (microsoft#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921) Fix segfault for multiple GPU run (regression) (microsoft#15823) android package fix (microsoft#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993) Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474) update onnx release 1.14 for docker files (microsoft#15680) Avoid generating training documentation during packaging (microsoft#15795) Update Conv-Add-Relu Fusion Transformation (microsoft#15834) Fix symbolic shape infer empty value_info (microsoft#15842) NhwcFusedConv: Add before Activation (microsoft#15837) use __hmul2 instead of __hmul2_rn (microsoft#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903) Fixing NhwcFusedConv fp16 (microsoft#15950) fix topo sort in quantization tool (microsoft#16003) [doc] add LeakyRelu to coreml supported ops (microsoft#15944) [DML EP] Add frequent upload heap flushing (microsoft#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola

[DML EP] Add frequent upload heap flushing

82cbdae

PatriceVignola requested a review from jstoecker May 16, 2023 07:06

jstoecker previously approved these changes May 16, 2023

View reviewed changes

Address PR comments

0ffd060

PatriceVignola dismissed jstoecker’s stale review via 0ffd060 May 16, 2023 08:11

PatriceVignola requested a review from jstoecker May 16, 2023 08:11

jstoecker approved these changes May 16, 2023

View reviewed changes

PatriceVignola merged commit 0ff915e into main May 17, 2023

PatriceVignola deleted the user/pavignol/add-frequent-upload-heap-flushing branch May 17, 2023 05:35

PatriceVignola added the release:1.15 label May 18, 2023

snnn added the triage:approved Approved for cherrypicks for release label May 18, 2023

snnn removed triage:approved Approved for cherrypicks for release release:1.15 labels May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[DML EP] Add frequent upload heap flushing#15960

[DML EP] Add frequent upload heap flushing#15960
PatriceVignola merged 2 commits intomainfrom
user/pavignol/add-frequent-upload-heap-flushing

PatriceVignola commented May 16, 2023

Uh oh!

jstoecker May 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

PatriceVignola commented May 16, 2023

Uh oh!

jstoecker May 16, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants