Skip to content

Comments

[DML EP] Add frequent upload heap flushing#15960

Merged
PatriceVignola merged 2 commits intomainfrom
user/pavignol/add-frequent-upload-heap-flushing
May 17, 2023
Merged

[DML EP] Add frequent upload heap flushing#15960
PatriceVignola merged 2 commits intomainfrom
user/pavignol/add-frequent-upload-heap-flushing

Conversation

@PatriceVignola
Copy link
Contributor

This reduces peak nonlocal memory consumption when uploading large weights for big models (e.g. LLMs), while at the same time trying to keep the GPU as busy as possible. This change could be more sophisticated, but at this stage it is the most minimal and least risky change required to support LLMs.

@PatriceVignola PatriceVignola requested a review from jstoecker May 16, 2023 07:06
jstoecker
jstoecker previously approved these changes May 16, 2023
{
// Periodically flush uploads to make sure the GPU is not idle for too long
std::chrono::duration<double> elapsed = std::chrono::steady_clock::now() - m_lastUploadFlushTime;
auto elapsedMicroSeconds = elapsed.count() * 1e6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick (just slightly cleaner):

Set interval with strong type:

static constexpr std::chrono::milliseconds m_batchFlushInterval = std::chrono::milliseconds(10);

Check if exceeded:

if (std::chrono::steady_clock::now() - m_lastUploadFlushTime > m_batchFlushInterval)
{
...
}

@PatriceVignola PatriceVignola merged commit 0ff915e into main May 17, 2023
@PatriceVignola PatriceVignola deleted the user/pavignol/add-frequent-upload-heap-flushing branch May 17, 2023 05:35
@snnn snnn added the triage:approved Approved for cherrypicks for release label May 18, 2023
snnn pushed a commit that referenced this pull request May 19, 2023
This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.
snnn pushed a commit that referenced this pull request May 19, 2023
This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.
PatriceVignola added a commit that referenced this pull request May 19, 2023
This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.
snnn pushed a commit that referenced this pull request May 19, 2023
### Description
Cherry-picks 26 commits to the release branch. 
Most cherry-picks are clean merges. Except:

1. When I got conflicts in cgmanifest.json and download-deps.yml, I
choose to ignore the conflicts and regenerate the two files
2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc


PR list:

[js/webgpu] fix Transpose with non-float tensor (#15819)
[js/web] fix terser reserved symbols for worker (#15864)
[JSEP] fix constructor for OrtDevice (#15805)
Bump engine.io from 6.4.1 to 6.4.2 in /js/web (#15799)
Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798)
[wasm] revert emsdk to v3.1.19 (#15793)
[wasm/JSEP] add threaded build to artifacts (#15777)
[js/web] add target ort.webgpu.min.js (#15780)
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688)
fix: setting builder optimization level to TRT 8.6 default (#15897)
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921)
Fix segfault for multiple GPU run (regression) (#15823)
android package fix (#15999)
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993)
Adding support for conv fp16 fusion on Resnet50v1 (#15474)
update onnx release 1.14 for docker files (#15680)
Avoid generating training documentation during packaging (#15795)
Update Conv-Add-Relu Fusion Transformation (#15834)
Fix symbolic shape infer empty value_info (#15842)
NhwcFusedConv: Add before Activation (#15837)
use __hmul2 instead of __hmul2_rn (#15852)
change the EP device to default OrtDevice() for memoryType equals CPU Input (#15903)
Fixing NhwcFusedConv fp16 (#15950)
fix topo sort in quantization tool (#16003)
[doc] add LeakyRelu to coreml supported ops (#15944)
[DML EP] Add frequent upload heap flushing (#15960)

Co-authored-by: Yulong Wang 
Co-authored-by: dependabot[bot] 
Co-authored-by: Guenther Schmuelling 
Co-authored-by: Shalva Mist 
Co-authored-by: Maximilian Müller 
Co-authored-by: Dmitri Smirnov 
Co-authored-by: pengwa 
Co-authored-by: Ashwini Khade 
Co-authored-by: Edward Chen 
Co-authored-by: Jian Chen 
Co-authored-by: liqun Fu 
Co-authored-by: Baiju Meswani 
Co-authored-by: Tianlei Wu 
Co-authored-by: Chen Fu 
Co-authored-by: Ye Wang 
Co-authored-by: cao lei 
Co-authored-by: Yufeng Li 
Co-authored-by: Rachel Guo 
Co-authored-by: Patrice Vignola
@snnn snnn removed triage:approved Approved for cherrypicks for release release:1.15 labels May 19, 2023
preetha-intel pushed a commit to intel/onnxruntime that referenced this pull request Jun 7, 2023
### Description
Cherry-picks 26 commits to the release branch. 
Most cherry-picks are clean merges. Except:

1. When I got conflicts in cgmanifest.json and download-deps.yml, I
choose to ignore the conflicts and regenerate the two files
2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc


PR list:

[js/webgpu] fix Transpose with non-float tensor (microsoft#15819)
[js/web] fix terser reserved symbols for worker (microsoft#15864)
[JSEP] fix constructor for OrtDevice (microsoft#15805)
Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799)
Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798)
[wasm] revert emsdk to v3.1.19 (microsoft#15793)
[wasm/JSEP] add threaded build to artifacts (microsoft#15777)
[js/web] add target ort.webgpu.min.js (microsoft#15780)
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688)
fix: setting builder optimization level to TRT 8.6 default (microsoft#15897)
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921)
Fix segfault for multiple GPU run (regression) (microsoft#15823)
android package fix (microsoft#15999)
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993)
Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474)
update onnx release 1.14 for docker files (microsoft#15680)
Avoid generating training documentation during packaging (microsoft#15795)
Update Conv-Add-Relu Fusion Transformation (microsoft#15834)
Fix symbolic shape infer empty value_info (microsoft#15842)
NhwcFusedConv: Add before Activation (microsoft#15837)
use __hmul2 instead of __hmul2_rn (microsoft#15852)
change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903)
Fixing NhwcFusedConv fp16 (microsoft#15950)
fix topo sort in quantization tool (microsoft#16003)
[doc] add LeakyRelu to coreml supported ops (microsoft#15944)
[DML EP] Add frequent upload heap flushing (microsoft#15960)

Co-authored-by: Yulong Wang 
Co-authored-by: dependabot[bot] 
Co-authored-by: Guenther Schmuelling 
Co-authored-by: Shalva Mist 
Co-authored-by: Maximilian Müller 
Co-authored-by: Dmitri Smirnov 
Co-authored-by: pengwa 
Co-authored-by: Ashwini Khade 
Co-authored-by: Edward Chen 
Co-authored-by: Jian Chen 
Co-authored-by: liqun Fu 
Co-authored-by: Baiju Meswani 
Co-authored-by: Tianlei Wu 
Co-authored-by: Chen Fu 
Co-authored-by: Ye Wang 
Co-authored-by: cao lei 
Co-authored-by: Yufeng Li 
Co-authored-by: Rachel Guo 
Co-authored-by: Patrice Vignola
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants