change the EP device to default OrtDevice() for memoryType equals CPUInput#15903
Merged
change the EP device to default OrtDevice() for memoryType equals CPUInput#15903
Conversation
…Input for cuda, rocm, migraphx and tensorRT EP
Contributor
Author
|
/azp run orttraining-linux-gpu-ci-pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
Verified this PR also fixes a perf regression issue in rel1.5 RC introuced by #15618 |
Contributor
Author
|
/azp run ONNX Runtime React Native CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azp run orttraining-amd-gpu-ci-pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
PeixuanZuo
added a commit
that referenced
this pull request
May 15, 2023
update ROCm/MIGraphX CI to ROC5.5. TODO: two PR to fix failure on orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py - test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal (#15903) - test_ortmodule_attribute_name_collision_warning (#15884)
souptc
reviewed
May 15, 2023
| if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) { | ||
| return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, 0 /*CPU device id always be 0*/); | ||
| } | ||
| if (mem_type == OrtMemTypeCPUInput) return OrtDevice(); |
Member
Contributor
Author
There was a problem hiding this comment.
Will do in the next PR
prathikr
pushed a commit
that referenced
this pull request
May 16, 2023
update ROCm/MIGraphX CI to ROC5.5. TODO: two PR to fix failure on orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py - test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal (#15903) - test_ortmodule_attribute_name_collision_warning (#15884)
prathikr
pushed a commit
that referenced
this pull request
May 16, 2023
…Input (#15903) ### Description <!-- Describe your changes. --> change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.
snnn
pushed a commit
that referenced
this pull request
May 19, 2023
…Input (#15903) ### Description <!-- Describe your changes. --> change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.
snnn
pushed a commit
that referenced
this pull request
May 19, 2023
…Input (#15903) ### Description <!-- Describe your changes. --> change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.
snnn
pushed a commit
that referenced
this pull request
May 19, 2023
### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (#15819) [js/web] fix terser reserved symbols for worker (#15864) [JSEP] fix constructor for OrtDevice (#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798) [wasm] revert emsdk to v3.1.19 (#15793) [wasm/JSEP] add threaded build to artifacts (#15777) [js/web] add target ort.webgpu.min.js (#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688) fix: setting builder optimization level to TRT 8.6 default (#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921) Fix segfault for multiple GPU run (regression) (#15823) android package fix (#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993) Adding support for conv fp16 fusion on Resnet50v1 (#15474) update onnx release 1.14 for docker files (#15680) Avoid generating training documentation during packaging (#15795) Update Conv-Add-Relu Fusion Transformation (#15834) Fix symbolic shape infer empty value_info (#15842) NhwcFusedConv: Add before Activation (#15837) use __hmul2 instead of __hmul2_rn (#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (#15903) Fixing NhwcFusedConv fp16 (#15950) fix topo sort in quantization tool (#16003) [doc] add LeakyRelu to coreml supported ops (#15944) [DML EP] Add frequent upload heap flushing (#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola
preetha-intel
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Jun 7, 2023
### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (microsoft#15819) [js/web] fix terser reserved symbols for worker (microsoft#15864) [JSEP] fix constructor for OrtDevice (microsoft#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798) [wasm] revert emsdk to v3.1.19 (microsoft#15793) [wasm/JSEP] add threaded build to artifacts (microsoft#15777) [js/web] add target ort.webgpu.min.js (microsoft#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688) fix: setting builder optimization level to TRT 8.6 default (microsoft#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921) Fix segfault for multiple GPU run (regression) (microsoft#15823) android package fix (microsoft#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993) Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474) update onnx release 1.14 for docker files (microsoft#15680) Avoid generating training documentation during packaging (microsoft#15795) Update Conv-Add-Relu Fusion Transformation (microsoft#15834) Fix symbolic shape infer empty value_info (microsoft#15842) NhwcFusedConv: Add before Activation (microsoft#15837) use __hmul2 instead of __hmul2_rn (microsoft#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903) Fixing NhwcFusedConv fp16 (microsoft#15950) fix topo sort in quantization tool (microsoft#16003) [doc] add LeakyRelu to coreml supported ops (microsoft#15944) [DML EP] Add frequent upload heap flushing (microsoft#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph
x and tensorRT EP
Motivation and Context
My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test:
root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
...
E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128]
Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0);
Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.