Cherry-picks to the release branch by snnn · Pull Request #16017 · microsoft/onnxruntime

snnn · 2023-05-19T00:39:48Z

Description

Cherry-picks to the release branch. The biggest batch.

Most cherry-picks are clean merges. Except:

When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files
There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc

Motivation and Context

### Description fix Transpose with non-float tensor. only register float type for Transpose.

### Description due to change from emscripten-core/emscripten@3935cdc, our minimizer need to be updated to add "startWorker" to reserved symbol.

### Description Add the missing `OrtDevice` initialization in JSEP introduced by #15618

### Description latest emsdk generated multi-thread version sometimes crash with unknown reason ( error: memory access out of bounds ). we don't want to break existing ort-web users, so revert emsdk back to 3.1.19 (same to what ort v1.14.0 uses)

### Description This is the first part to create a webassembly artifacts for ort-web webgpu EP (wasm build). there will be following steps to consume the artifacts in web build

### Description add target ort.webgpu.min.js WebGPU is experimental feature, so I don't want to put webgpu into the ort.min.js file. This change adds 2 ways for users to access ort-web with webgpu: - using script tag: by URL `https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js` ( this URL is not ready yet ) - using `import()`: use `import { Tensor, InferenceSession } from 'onnxruntime-web/webgpu';` - 'onnxruntime-web/webgpu' instead of 'onnxruntime-web'

…5688) needed to get tokenizers/decode for whisper --------- Co-authored-by: Shalva Mist <[email protected]>

The actual released default level is 3 and not the previously used 2. Just a small sample of the effects: ![Screenshot 2023-05-10 at 15 49 55](https://github.com/microsoft/onnxruntime/assets/44298237/5a694446-22c0-4943-9ddf-80670781878f)

…m to OrtApi (#15921) This PR partially reverts changes introduced in #15643 We make two API return std::string always in UTF-8. We also move the entry points from OrtApiBase to OrtApi to make them versioned. `GetVersionString` always returns x.y.z numbers that are not subject to internationalization. `GetBuildInfoString` can hold international chars, but UTF-8 should be fine to contain those. We prefix them with u8"" in case the compiler default charset is not UTF-8. Furthermore, creating platform dependent APIs is discouraged. `ORTCHAR_T` is platform dependent and was created for paths only. On non-unix platforms would still produce `std::string` that can only contain UTF-8 The API was introduced after the latest release, and can still be adjusted.

### Fix segfault for multiple GPU run #15618 introduced `GetOrtDeviceByMemType`. The intention should be: handle CPU device differently in the if branch, while might by mistakenly passing the unique default non-cpu device id. ``` OrtDevice CUDAExecutionProvider::GetOrtDeviceByMemType(OrtMemType mem_type) const { if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) { return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, default_device_.Id()); } return default_device_; } ``` We observed a segement fault thrown when running multiple GPU training ` CUDA_LAUNCH_BLOCKING=1 python -m torch.distributed.launch --nproc_per_node=2 examples/onnxruntime/training/language-modeling/run_mlm.py --model_name_or_path distilbert-base-uncased --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --num_train_epochs 10 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --do_train --do_eval --overwrite_output_dir --output_dir ./outputs222/ --seed 1137 --fp16 --report_to none --optim adamw_ort_fused --max_steps 400 --logging_steps 1 ` It is found GPU0 works fine, GPU1 throw segement fault. Looking further, a Shape node trying to allocate it's output tensor, trying to fetch corresponding allocator with ORTDevice(Device:[DeviceType:0 MemoryType:1 DeviceId:1]), while CPU device did not have device id = 1, so a no allocator returned. When we try to call `AsStreamBasedAllocator` for the allocator, segement happens as no null check was done there. ### Motivation and Context

### Description This PR adds the training headers to the training android packages. ### Motivation and Context Training headers need to be added as part of the training android packages, however because of the typo in the cmake these headers were not being added. This PR fixes the issue.

… models. (#15993) ### Description  Minor changes to allow CoreML EP to handle more nodes and models. - Remove graph input dynamic shape check from coreml::GetSupportedNodes(). Each node input is still checked. - Add check for optional input in coreml::IsInputSupported(). If an input does not exist it should not be considered unsupported. ### Motivation and Context  Some CoreML EP checks seem too strict now.

### Description Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act. Specifically tested on on Resnet50v1 ### Motivation and Context Adding support for conv fp16 fusion with Conv-Add and Conv-Add-act. Specifically tested on on Resnet50v1

this is for ort 1.15 release to work with onnx 1.14 It shall be merged after onnx 1.14 release and before ort 1.15 release. --------- Signed-off-by: Liqun Fu <[email protected]>

…n name to make it more intuitive. ### Description Update Conv-Add-Relu Fusion Transformation to handle additional case where NhwcFusedConv is present. ### Motivation and Context Handle additional case where NhwcFusedConv is present.

### Description When node output is optional, symbolic shape infer might add an empty value_info item. Add some checking to avoid this. ### Motivation and Context  - Stable diffusion optimized model reported invalid data type 0 during inference.

### Description Fp16 FusedConv and NhwcFusedConv. Fused Add operator should be performed BEFORE the activation operator. ### Motivation and Context Previous understanding of fused conv is incorrect.

### Description  ### Motivation and Context  #15840

…Input (#15903) ### Description  change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context  My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.

### Description  This should produced fused Resnet50.fp16.onnx ### Motivation and Context

### Description  Should not set up dependent node list for empty('') input ### Motivation and Context

### Description  ### Motivation and Context

This reduces peak nonlocal memory consumption when uploading large weights for big models (e.g. LLMs), while at the same time trying to keep the GPU as busy as possible. This change could be more sophisticated, but at this stage it is the most minimal and least risky change required to support LLMs.

skottmckay · 2023-05-19T06:14:40Z

2 CoreML changes look good (#15944 and #15993)

pranavsharma · 2023-05-19T06:17:31Z

Changes that I requested look good.

yufenglee · 2023-05-19T20:51:23Z

#15474, #15950, #15837, #16003 look good to me.

### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (microsoft#15819) [js/web] fix terser reserved symbols for worker (microsoft#15864) [JSEP] fix constructor for OrtDevice (microsoft#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798) [wasm] revert emsdk to v3.1.19 (microsoft#15793) [wasm/JSEP] add threaded build to artifacts (microsoft#15777) [js/web] add target ort.webgpu.min.js (microsoft#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688) fix: setting builder optimization level to TRT 8.6 default (microsoft#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921) Fix segfault for multiple GPU run (regression) (microsoft#15823) android package fix (microsoft#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993) Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474) update onnx release 1.14 for docker files (microsoft#15680) Avoid generating training documentation during packaging (microsoft#15795) Update Conv-Add-Relu Fusion Transformation (microsoft#15834) Fix symbolic shape infer empty value_info (microsoft#15842) NhwcFusedConv: Add before Activation (microsoft#15837) use __hmul2 instead of __hmul2_rn (microsoft#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903) Fixing NhwcFusedConv fp16 (microsoft#15950) fix topo sort in quantization tool (microsoft#16003) [doc] add LeakyRelu to coreml supported ops (microsoft#15944) [DML EP] Add frequent upload heap flushing (microsoft#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola

fs-eire and others added 27 commits May 18, 2023 17:31

[js/webgpu] fix Transpose with non-float tensor (#15819)

7d0c105

### Description fix Transpose with non-float tensor. only register float type for Transpose.

[js/web] fix terser reserved symbols for worker (#15864)

05c66b8

### Description due to change from emscripten-core/emscripten@3935cdc, our minimizer need to be updated to add "startWorker" to reserved symbol.

[JSEP] fix constructor for OrtDevice (#15805)

2ff7fa2

### Description Add the missing `OrtDevice` initialization in JSEP introduced by #15618

Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798)

6f6229b

[wasm] revert emsdk to v3.1.19 (#15793)

2dd30f0

### Description latest emsdk generated multi-thread version sometimes crash with unknown reason ( error: memory access out of bounds ). we don't want to break existing ort-web users, so revert emsdk back to 3.1.19 (same to what ort v1.14.0 uses)

[wasm/JSEP] add threaded build to artifacts (#15777)

80ec485

### Description This is the first part to create a webassembly artifacts for ort-web webgpu EP (wasm build). there will be following steps to consume the artifacts in web build

update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#1…

83a61b1

…5688) needed to get tokenizers/decode for whisper --------- Co-authored-by: Shalva Mist <[email protected]>

update onnx release 1.14 for docker files (#15680)

e61c08a

this is for ort 1.15 release to work with onnx 1.14 It shall be merged after onnx 1.14 release and before ort 1.15 release. --------- Signed-off-by: Liqun Fu <[email protected]>

Avoid generating training documentation during packaging (#15795)

a1a8caf

NhwcFusedConv: Add before Activation (#15837)

3e3df44

### Description Fp16 FusedConv and NhwcFusedConv. Fused Add operator should be performed BEFORE the activation operator. ### Motivation and Context Previous understanding of fused conv is incorrect.

use __hmul2 instead of __hmul2_rn (#15852)

a1780c5

### Description  ### Motivation and Context  #15840

Fixing NhwcFusedConv fp16 (#15950)

2d3f294

### Description  This should produced fused Resnet50.fp16.onnx ### Motivation and Context

[doc] add LeakyRelu to coreml supported ops (#15944)

2414aff

### Description  ### Motivation and Context

Regenerate cgmanifest.json and download-deps.yml

5aa4cd3

snnn requested a review from a team as a code owner May 19, 2023 00:39

snnn requested a review from a team May 19, 2023 00:39

snnn requested a review from a team as a code owner May 19, 2023 00:39

snnn marked this pull request as draft May 19, 2023 00:40

snnn marked this pull request as ready for review May 19, 2023 05:06

pranavsharma approved these changes May 19, 2023

View reviewed changes

yufenglee approved these changes May 19, 2023

View reviewed changes

jchen351 approved these changes May 19, 2023

View reviewed changes

snnn merged commit 6cdf071 into rel-1.15.0 May 19, 2023

snnn deleted the user/snnn/cr1 branch May 19, 2023 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Cherry-picks to the release branch#16017

Cherry-picks to the release branch#16017
snnn merged 27 commits intorel-1.15.0from
user/snnn/cr1

snnn commented May 19, 2023 •

edited

Loading

Uh oh!

skottmckay commented May 19, 2023

Uh oh!

pranavsharma commented May 19, 2023

Uh oh!

yufenglee commented May 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Comments

Conversation

snnn commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

skottmckay commented May 19, 2023

Uh oh!

pranavsharma commented May 19, 2023

Uh oh!

yufenglee commented May 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

snnn commented May 19, 2023 •

edited

Loading