Cherry-picks to the release branch#16017
Merged
snnn merged 27 commits intorel-1.15.0from May 19, 2023
Merged
Conversation
### Description fix Transpose with non-float tensor. only register float type for Transpose.
### Description due to change from emscripten-core/emscripten@3935cdc, our minimizer need to be updated to add "startWorker" to reserved symbol.
### Description Add the missing `OrtDevice` initialization in JSEP introduced by #15618
Bumps [engine.io](https://github.com/socketio/engine.io) from 6.4.1 to 6.4.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/socketio/engine.io/releases">engine.io's releases</a>.</em></p> <blockquote> <h2>6.4.2</h2> <p>:warning: This release contains an important security fix :warning:</p> <p>A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:</p> <pre><code>TypeError: Cannot read properties of undefined (reading 'handlesUpgrades') at Server.onWebSocket (build/server.js:515:67) </code></pre> <p>Please upgrade as soon as possible.</p> <h3>Bug Fixes</h3> <ul> <li>include error handling for Express middlewares (<a href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>) (<a href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9">9395782</a>)</li> <li>prevent crash when provided with an invalid query param (<a href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44">fc480b4</a>)</li> <li><strong>typings:</strong> make clientsCount public (<a href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>) (<a href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73">bd6d471</a>)</li> <li><strong>uws:</strong> prevent crash when using with middlewares (<a href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9">8b22162</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/tyilo"><code>@tyilo</code></a> and <a href="https://github.com/cieldeville"><code>@cieldeville</code></a> for helping!</p> <h4>Links</h4> <ul> <li>Diff: <a href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">https://github.com/socketio/engine.io/compare/6.4.1...6.4.2</a></li> <li>Client release: -</li> <li>ws version: <a href="https://github.com/websockets/ws/releases/tag/8.11.0">~8.11.0</a> (no change)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/socketio/engine.io/blob/main/CHANGELOG.md">engine.io's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">6.4.2</a> (2023-05-02)</h2> <p>:warning: This release contains an important security fix :warning:</p> <p>A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:</p> <pre><code>TypeError: Cannot read properties of undefined (reading 'handlesUpgrades') at Server.onWebSocket (build/server.js:515:67) </code></pre> <p>Please upgrade as soon as possible.</p> <h3>Bug Fixes</h3> <ul> <li>include error handling for Express middlewares (<a href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>) (<a href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9">9395782</a>)</li> <li>prevent crash when provided with an invalid query param (<a href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44">fc480b4</a>)</li> <li><strong>typings:</strong> make clientsCount public (<a href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>) (<a href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73">bd6d471</a>)</li> <li><strong>uws:</strong> prevent crash when using with middlewares (<a href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9">8b22162</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/tyilo"><code>@tyilo</code></a> and <a href="https://github.com/cieldeville"><code>@cieldeville</code></a> for helping!</p> <h3>Dependencies</h3> <ul> <li><a href="https://github.com/websockets/ws/releases/tag/8.11.0"><code>ws@~8.11.0</code></a> (no change)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/socketio/engine.io/commit/95e215387c589025dde3982865bf8c862d049469"><code>95e2153</code></a> chore(release): 6.4.2</li> <li><a href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44"><code>fc480b4</code></a> fix: prevent crash when provided with an invalid query param</li> <li><a href="https://github.com/socketio/engine.io/commit/014195118535669af0ad3bde38a76601dafa4d81"><code>0141951</code></a> refactor(types): ensure compatibility with Express middlewares</li> <li><a href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9"><code>8b22162</code></a> fix(uws): prevent crash when using with middlewares</li> <li><a href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9"><code>9395782</code></a> fix: include error handling for Express middlewares (<a href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)</li> <li><a href="https://github.com/socketio/engine.io/commit/911d0e35757ea9ee93d1807c401c734661615e96"><code>911d0e3</code></a> refactor: return HTTP 400 upon invalid request overlap</li> <li><a href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73"><code>bd6d471</code></a> fix(typings): make clientsCount public (<a href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)</li> <li>See full diff in <a href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description latest emsdk generated multi-thread version sometimes crash with unknown reason ( error: memory access out of bounds ). we don't want to break existing ort-web users, so revert emsdk back to 3.1.19 (same to what ort v1.14.0 uses)
### Description This is the first part to create a webassembly artifacts for ort-web webgpu EP (wasm build). there will be following steps to consume the artifacts in web build
### Description add target ort.webgpu.min.js WebGPU is experimental feature, so I don't want to put webgpu into the ort.min.js file. This change adds 2 ways for users to access ort-web with webgpu: - using script tag: by URL `https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js` ( this URL is not ready yet ) - using `import()`: use `import { Tensor, InferenceSession } from 'onnxruntime-web/webgpu';` - 'onnxruntime-web/webgpu' instead of 'onnxruntime-web'
…5688) needed to get tokenizers/decode for whisper --------- Co-authored-by: Shalva Mist <[email protected]>
The actual released default level is 3 and not the previously used 2. Just a small sample of the effects: 
…m to OrtApi (#15921) This PR partially reverts changes introduced in #15643 We make two API return std::string always in UTF-8. We also move the entry points from OrtApiBase to OrtApi to make them versioned. `GetVersionString` always returns x.y.z numbers that are not subject to internationalization. `GetBuildInfoString` can hold international chars, but UTF-8 should be fine to contain those. We prefix them with u8"" in case the compiler default charset is not UTF-8. Furthermore, creating platform dependent APIs is discouraged. `ORTCHAR_T` is platform dependent and was created for paths only. On non-unix platforms would still produce `std::string` that can only contain UTF-8 The API was introduced after the latest release, and can still be adjusted.
### Fix segfault for multiple GPU run #15618 introduced `GetOrtDeviceByMemType`. The intention should be: handle CPU device differently in the if branch, while might by mistakenly passing the unique default non-cpu device id. ``` OrtDevice CUDAExecutionProvider::GetOrtDeviceByMemType(OrtMemType mem_type) const { if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) { return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, default_device_.Id()); } return default_device_; } ``` We observed a segement fault thrown when running multiple GPU training ` CUDA_LAUNCH_BLOCKING=1 python -m torch.distributed.launch --nproc_per_node=2 examples/onnxruntime/training/language-modeling/run_mlm.py --model_name_or_path distilbert-base-uncased --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --num_train_epochs 10 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --do_train --do_eval --overwrite_output_dir --output_dir ./outputs222/ --seed 1137 --fp16 --report_to none --optim adamw_ort_fused --max_steps 400 --logging_steps 1 ` It is found GPU0 works fine, GPU1 throw segement fault. Looking further, a Shape node trying to allocate it's output tensor, trying to fetch corresponding allocator with ORTDevice(Device:[DeviceType:0 MemoryType:1 DeviceId:1]), while CPU device did not have device id = 1, so a no allocator returned. When we try to call `AsStreamBasedAllocator` for the allocator, segement happens as no null check was done there. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description This PR adds the training headers to the training android packages. ### Motivation and Context Training headers need to be added as part of the training android packages, however because of the typo in the cmake these headers were not being added. This PR fixes the issue.
… models. (#15993) ### Description <!-- Describe your changes. --> Minor changes to allow CoreML EP to handle more nodes and models. - Remove graph input dynamic shape check from coreml::GetSupportedNodes(). Each node input is still checked. - Add check for optional input in coreml::IsInputSupported(). If an input does not exist it should not be considered unsupported. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Some CoreML EP checks seem too strict now.
### Description Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act. Specifically tested on on Resnet50v1 ### Motivation and Context Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act. Specifically tested on on Resnet50v1
this is for ort 1.15 release to work with onnx 1.14 It shall be merged after onnx 1.14 release and before ort 1.15 release. --------- Signed-off-by: Liqun Fu <[email protected]>
…n name to make it more intuitive. ### Description Update Conv-Add-Relu Fusion Transformation to handle additional case where NhwcFusedConv is present. ### Motivation and Context Handle additional case where NhwcFusedConv is present.
### Description When node output is optional, symbolic shape infer might add an empty value_info item. Add some checking to avoid this. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - Stable diffusion optimized model reported invalid data type 0 during inference.
### Description Fp16 FusedConv and NhwcFusedConv. Fused Add operator should be performed BEFORE the activation operator. ### Motivation and Context Previous understanding of fused conv is incorrect.
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #15840
…Input (#15903) ### Description <!-- Describe your changes. --> change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.
### Description <!-- Describe your changes. --> This should produced fused Resnet50.fp16.onnx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Should not set up dependent node list for empty('') input
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
This reduces peak nonlocal memory consumption when uploading large weights for big models (e.g. LLMs), while at the same time trying to keep the GPU as busy as possible. This change could be more sophisticated, but at this stage it is the most minimal and least risky change required to support LLMs.
Contributor
Contributor
|
Changes that I requested look good. |
pranavsharma
approved these changes
May 19, 2023
yufenglee
approved these changes
May 19, 2023
jchen351
approved these changes
May 19, 2023
preetha-intel
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Jun 7, 2023
### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (microsoft#15819) [js/web] fix terser reserved symbols for worker (microsoft#15864) [JSEP] fix constructor for OrtDevice (microsoft#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798) [wasm] revert emsdk to v3.1.19 (microsoft#15793) [wasm/JSEP] add threaded build to artifacts (microsoft#15777) [js/web] add target ort.webgpu.min.js (microsoft#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688) fix: setting builder optimization level to TRT 8.6 default (microsoft#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921) Fix segfault for multiple GPU run (regression) (microsoft#15823) android package fix (microsoft#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993) Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474) update onnx release 1.14 for docker files (microsoft#15680) Avoid generating training documentation during packaging (microsoft#15795) Update Conv-Add-Relu Fusion Transformation (microsoft#15834) Fix symbolic shape infer empty value_info (microsoft#15842) NhwcFusedConv: Add before Activation (microsoft#15837) use __hmul2 instead of __hmul2_rn (microsoft#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903) Fixing NhwcFusedConv fp16 (microsoft#15950) fix topo sort in quantization tool (microsoft#16003) [doc] add LeakyRelu to coreml supported ops (microsoft#15944) [DML EP] Add frequent upload heap flushing (microsoft#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Cherry-picks to the release branch. The biggest batch.
Most cherry-picks are clean merges. Except:
Motivation and Context