Skip to content

Comments

Cherry-picks to the release branch#16017

Merged
snnn merged 27 commits intorel-1.15.0from
user/snnn/cr1
May 19, 2023
Merged

Cherry-picks to the release branch#16017
snnn merged 27 commits intorel-1.15.0from
user/snnn/cr1

Conversation

@snnn
Copy link
Contributor

@snnn snnn commented May 19, 2023

Description

Cherry-picks to the release branch. The biggest batch.

Most cherry-picks are clean merges. Except:

  1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files
  2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc

Motivation and Context

fs-eire and others added 27 commits May 18, 2023 17:31
### Description
fix Transpose with non-float tensor.

only register float type for Transpose.
### Description
due to change from
emscripten-core/emscripten@3935cdc,
our minimizer need to be updated to add "startWorker" to reserved
symbol.
### Description
Add the missing `OrtDevice` initialization in JSEP introduced by #15618
Bumps [engine.io](https://github.com/socketio/engine.io) from 6.4.1 to
6.4.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/engine.io/releases">engine.io's
releases</a>.</em></p>
<blockquote>
<h2>6.4.2</h2>
<p>:warning: This release contains an important security fix
:warning:</p>
<p>A malicious client could send a specially crafted HTTP request,
triggering an uncaught exception and killing the Node.js process:</p>
<pre><code>TypeError: Cannot read properties of undefined (reading
'handlesUpgrades')
  at Server.onWebSocket (build/server.js:515:67)
</code></pre>
<p>Please upgrade as soon as possible.</p>
<h3>Bug Fixes</h3>
<ul>
<li>include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)
(<a
href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9">9395782</a>)</li>
<li>prevent crash when provided with an invalid query param (<a
href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44">fc480b4</a>)</li>
<li><strong>typings:</strong> make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)
(<a
href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73">bd6d471</a>)</li>
<li><strong>uws:</strong> prevent crash when using with middlewares (<a
href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9">8b22162</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/tyilo"><code>@​tyilo</code></a> and <a
href="https://github.com/cieldeville"><code>@​cieldeville</code></a> for
helping!</p>
<h4>Links</h4>
<ul>
<li>Diff: <a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">https://github.com/socketio/engine.io/compare/6.4.1...6.4.2</a></li>
<li>Client release: -</li>
<li>ws version: <a
href="https://github.com/websockets/ws/releases/tag/8.11.0">~8.11.0</a>
(no change)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/engine.io/blob/main/CHANGELOG.md">engine.io's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">6.4.2</a>
(2023-05-02)</h2>
<p>:warning: This release contains an important security fix
:warning:</p>
<p>A malicious client could send a specially crafted HTTP request,
triggering an uncaught exception and killing the Node.js process:</p>
<pre><code>TypeError: Cannot read properties of undefined (reading
'handlesUpgrades')
  at Server.onWebSocket (build/server.js:515:67)
</code></pre>
<p>Please upgrade as soon as possible.</p>
<h3>Bug Fixes</h3>
<ul>
<li>include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)
(<a
href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9">9395782</a>)</li>
<li>prevent crash when provided with an invalid query param (<a
href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44">fc480b4</a>)</li>
<li><strong>typings:</strong> make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)
(<a
href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73">bd6d471</a>)</li>
<li><strong>uws:</strong> prevent crash when using with middlewares (<a
href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9">8b22162</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/tyilo"><code>@​tyilo</code></a> and <a
href="https://github.com/cieldeville"><code>@​cieldeville</code></a> for
helping!</p>
<h3>Dependencies</h3>
<ul>
<li><a
href="https://github.com/websockets/ws/releases/tag/8.11.0"><code>ws@~8.11.0</code></a>
(no change)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/socketio/engine.io/commit/95e215387c589025dde3982865bf8c862d049469"><code>95e2153</code></a>
chore(release): 6.4.2</li>
<li><a
href="https://github.com/socketio/engine.io/commit/fc480b4f305e16fe5972cf337d055e598372dc44"><code>fc480b4</code></a>
fix: prevent crash when provided with an invalid query param</li>
<li><a
href="https://github.com/socketio/engine.io/commit/014195118535669af0ad3bde38a76601dafa4d81"><code>0141951</code></a>
refactor(types): ensure compatibility with Express middlewares</li>
<li><a
href="https://github.com/socketio/engine.io/commit/8b2216290330b174c9e67be32765bec0c74769f9"><code>8b22162</code></a>
fix(uws): prevent crash when using with middlewares</li>
<li><a
href="https://github.com/socketio/engine.io/commit/93957828be1252c83275b56f0c7c0bd145a0ceb9"><code>9395782</code></a>
fix: include error handling for Express middlewares (<a
href="https://redirect.github.com/socketio/engine.io/issues/674">#674</a>)</li>
<li><a
href="https://github.com/socketio/engine.io/commit/911d0e35757ea9ee93d1807c401c734661615e96"><code>911d0e3</code></a>
refactor: return HTTP 400 upon invalid request overlap</li>
<li><a
href="https://github.com/socketio/engine.io/commit/bd6d4713b02ff646c581872cd9ffe753acff0d73"><code>bd6d471</code></a>
fix(typings): make clientsCount public (<a
href="https://redirect.github.com/socketio/engine.io/issues/675">#675</a>)</li>
<li>See full diff in <a
href="https://github.com/socketio/engine.io/compare/6.4.1...6.4.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=engine.io&package-manager=npm_and_yarn&previous-version=6.4.1&new-version=6.4.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
latest emsdk generated multi-thread version sometimes crash with unknown
reason ( error: memory access out of bounds ).

we don't want to break existing ort-web users, so revert emsdk back to
3.1.19 (same to what ort v1.14.0 uses)
### Description
This is the first part to create a webassembly artifacts for ort-web
webgpu EP (wasm build).

there will be following steps to consume the artifacts in web build
### Description
add target ort.webgpu.min.js

WebGPU is experimental feature, so I don't want to put webgpu into the
ort.min.js file. This change adds 2 ways for users to access ort-web
with webgpu:
- using script tag: by URL
`https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js`
( this URL is not ready yet )
- using `import()`: use `import { Tensor, InferenceSession } from
'onnxruntime-web/webgpu';` - 'onnxruntime-web/webgpu' instead of
'onnxruntime-web'
…5688)

needed to get tokenizers/decode for whisper

---------

Co-authored-by: Shalva Mist <[email protected]>
…m to OrtApi (#15921)

This PR partially reverts changes introduced in
#15643

We make two API return std::string always in UTF-8.

We also move the entry points from OrtApiBase to OrtApi to make them
versioned.

`GetVersionString` always returns x.y.z numbers that are not subject to
internationalization.
`GetBuildInfoString` can hold international chars, but UTF-8 should be
fine to contain those.
We prefix them with u8"" in case the compiler default charset is not
UTF-8.
Furthermore, creating platform dependent APIs is discouraged.
`ORTCHAR_T` is platform dependent and was created for paths only.
On non-unix platforms would still produce `std::string` that can only
contain UTF-8

The API was introduced after the latest release, and can still be
adjusted.
### Fix segfault for multiple GPU run

#15618 introduced
`GetOrtDeviceByMemType`. The intention should be: handle CPU device
differently in the if branch, while might by mistakenly passing the
unique default non-cpu device id.


```
OrtDevice CUDAExecutionProvider::GetOrtDeviceByMemType(OrtMemType mem_type) const {
  if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) {
    return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, default_device_.Id());
  }
  return default_device_;
}
```

We observed a segement fault thrown when running multiple GPU training  

`
CUDA_LAUNCH_BLOCKING=1 python -m torch.distributed.launch
--nproc_per_node=2
examples/onnxruntime/training/language-modeling/run_mlm.py
--model_name_or_path distilbert-base-uncased --dataset_name wikitext
--dataset_config_name wikitext-2-raw-v1 --num_train_epochs 10
--per_device_train_batch_size 8 --per_device_eval_batch_size 8
--do_train --do_eval --overwrite_output_dir --output_dir ./outputs222/
--seed 1137 --fp16 --report_to none --optim adamw_ort_fused --max_steps
400 --logging_steps 1
`

It is found GPU0 works fine, GPU1 throw segement fault. Looking further,
a Shape node trying to allocate it's output tensor, trying to fetch
corresponding allocator with ORTDevice(Device:[DeviceType:0 MemoryType:1
DeviceId:1]), while CPU device did not have device id = 1, so a no
allocator returned. When we try to call `AsStreamBasedAllocator` for the
allocator, segement happens as no null check was done there.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR adds the training headers to the training android packages.


### Motivation and Context
Training headers need to be added as part of the training android
packages, however because of the typo in the cmake these headers were
not being added. This PR fixes the issue.
… models. (#15993)

### Description
<!-- Describe your changes. -->

Minor changes to allow CoreML EP to handle more nodes and models.
- Remove graph input dynamic shape check from
coreml::GetSupportedNodes(). Each node input is still checked.
- Add check for optional input in coreml::IsInputSupported(). If an
input does not exist it should not be considered unsupported.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Some CoreML EP checks seem too strict now.
### Description
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1



### Motivation and Context
Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act.
Specifically tested on on Resnet50v1
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.

---------

Signed-off-by: Liqun Fu <[email protected]>
…n name to make it more intuitive.

### Description
Update Conv-Add-Relu Fusion Transformation to handle additional case
where NhwcFusedConv is present.



### Motivation and Context
Handle additional case where NhwcFusedConv is present.
### Description
When node output is optional, symbolic shape infer might add an empty
value_info item. Add some checking to avoid this.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- 
Stable diffusion optimized model reported invalid data type 0 during
inference.
### Description

Fp16 FusedConv and NhwcFusedConv. Fused Add operator should be performed
BEFORE the activation operator.


### Motivation and Context

Previous understanding of fused conv is incorrect.
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#15840
…Input (#15903)

### Description
<!-- Describe your changes. -->
change the EP device to default OrtDevice() for memoryType equals
CPUInput for cuda, rocm, migraph
x and tensorRT EP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My previous PR (#15618)
caused random failures on cuda training test
GradientCheckerTest.TileGrad (see build
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb)
and rocm test:

root@a59558217e53:/workspace# pytest
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
... 
E RuntimeError: Error in backward pass execution: Non-zero status code
returned while running ATen node.
Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size
calculation overflowed with sizes=[72340172838076673, 72340172838076673,
128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx
EP is CPUInput, previously the corresponding device in the IAllocator's
memoryInfo is default OrtDevice(), while after my change, it becomes
OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training
build.
### Description
<!-- Describe your changes. -->

This should produced fused Resnet50.fp16.onnx

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Should not set up dependent node list for empty('') input


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.
@snnn snnn requested a review from a team as a code owner May 19, 2023 00:39
@snnn snnn requested a review from a team May 19, 2023 00:39
@snnn snnn requested a review from a team as a code owner May 19, 2023 00:39
@snnn snnn marked this pull request as draft May 19, 2023 00:40
@snnn snnn marked this pull request as ready for review May 19, 2023 05:06
@skottmckay
Copy link
Contributor

2 CoreML changes look good (#15944 and #15993)

@pranavsharma
Copy link
Contributor

Changes that I requested look good.

@yufenglee
Copy link
Member

#15474, #15950, #15837, #16003 look good to me.

@snnn snnn merged commit 6cdf071 into rel-1.15.0 May 19, 2023
@snnn snnn deleted the user/snnn/cr1 branch May 19, 2023 21:04
preetha-intel pushed a commit to intel/onnxruntime that referenced this pull request Jun 7, 2023
### Description
Cherry-picks 26 commits to the release branch. 
Most cherry-picks are clean merges. Except:

1. When I got conflicts in cgmanifest.json and download-deps.yml, I
choose to ignore the conflicts and regenerate the two files
2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc


PR list:

[js/webgpu] fix Transpose with non-float tensor (microsoft#15819)
[js/web] fix terser reserved symbols for worker (microsoft#15864)
[JSEP] fix constructor for OrtDevice (microsoft#15805)
Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799)
Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798)
[wasm] revert emsdk to v3.1.19 (microsoft#15793)
[wasm/JSEP] add threaded build to artifacts (microsoft#15777)
[js/web] add target ort.webgpu.min.js (microsoft#15780)
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688)
fix: setting builder optimization level to TRT 8.6 default (microsoft#15897)
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921)
Fix segfault for multiple GPU run (regression) (microsoft#15823)
android package fix (microsoft#15999)
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993)
Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474)
update onnx release 1.14 for docker files (microsoft#15680)
Avoid generating training documentation during packaging (microsoft#15795)
Update Conv-Add-Relu Fusion Transformation (microsoft#15834)
Fix symbolic shape infer empty value_info (microsoft#15842)
NhwcFusedConv: Add before Activation (microsoft#15837)
use __hmul2 instead of __hmul2_rn (microsoft#15852)
change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903)
Fixing NhwcFusedConv fp16 (microsoft#15950)
fix topo sort in quantization tool (microsoft#16003)
[doc] add LeakyRelu to coreml supported ops (microsoft#15944)
[DML EP] Add frequent upload heap flushing (microsoft#15960)

Co-authored-by: Yulong Wang 
Co-authored-by: dependabot[bot] 
Co-authored-by: Guenther Schmuelling 
Co-authored-by: Shalva Mist 
Co-authored-by: Maximilian Müller 
Co-authored-by: Dmitri Smirnov 
Co-authored-by: pengwa 
Co-authored-by: Ashwini Khade 
Co-authored-by: Edward Chen 
Co-authored-by: Jian Chen 
Co-authored-by: liqun Fu 
Co-authored-by: Baiju Meswani 
Co-authored-by: Tianlei Wu 
Co-authored-by: Chen Fu 
Co-authored-by: Ye Wang 
Co-authored-by: cao lei 
Co-authored-by: Yufeng Li 
Co-authored-by: Rachel Guo 
Co-authored-by: Patrice Vignola
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.